| 03:08.05 | Maloeran | Erik or anyone else : is there a portable solution for detecting the size of pointers at preprocessor time? I'm not sure __WORDSIZE is really portable |
| 03:25.24 | brlcad | Maloeran: I'm not aware of a truely portable means to do that |
| 03:26.15 | brlcad | probably the closest that comes to mind would be some hack-trick of defining some internal struct with a pointer in it and using the offsetof() macro |
| 03:45.16 | Maloeran | I need to know at preprocessor time, sizeof() and offsetof() are not available then |
| 03:45.54 | Maloeran | If there's no standard cpp macro available, the only thing I can think of is sticking some test in configure.ac to dump the result in config.h and use that |
| 03:46.20 | brlcad | offsetof is a preprocessor macro |
| 03:46.51 | brlcad | though it probably still won't evaluate to something like a numeral .. hmm |
| 03:48.16 | Maloeran | I just need to know if I'm dealing with 32 or 64 bits pointers at compile time when doing some messy SSE operations on pointers |
| 03:49.19 | Twingy | write a little test program that returns the size |
| 03:49.38 | Twingy | return sizeof(ptr_t); |
| 03:50.14 | Twingy | put the test in configure.ac |
| 03:50.28 | Twingy | either way you gotta do it |
| 03:51.04 | brlcad | ooh, for *that* |
| 03:51.19 | brlcad | yeah, just make a little program .. that gets shoved into the configure script rather trivially |
| 03:51.25 | brlcad | then you have your symbol |
| 03:51.40 | brlcad | for that matter, there are predefined autoconf macros to do exactly that for you already |
| 03:51.40 | Twingy | why not use the HAVE_64bits |
| 03:52.04 | Twingy | AC_LONG_64_BITS |
| 03:52.08 | brlcad | # figure out what size pointers the compiler is actually generating |
| 03:52.09 | brlcad | AC_CHECK_SIZEOF(int) |
| 03:52.09 | brlcad | AC_CHECK_SIZEOF(long) |
| 03:52.09 | brlcad | AC_CHECK_SIZEOF(long long) |
| 03:52.09 | brlcad | AC_CHECK_SIZEOF(void *, 4) |
| 03:52.18 | Twingy | AC_LONG_64_BITS is shorter :) |
| 03:52.32 | Maloeran | I just put these things in configure.ac? |
| 03:52.33 | brlcad | that just checks if longs are 64 bits |
| 03:52.43 | brlcad | depends what it is exactly that you want to know |
| 03:52.59 | Maloeran | I want to know the size of pointers, integer data types are already all defined in limits.h |
| 03:53.04 | Twingy | all depends on what OS/Arch you're supporting |
| 03:53.19 | brlcad | then probably just that last one, using void * or char * etc |
| 03:53.30 | brlcad | it'll give you a preprocessor symbol for the result |
| 03:54.04 | Twingy | I haven't setup a mail server since before I joined ARL |
| 03:54.20 | Twingy | doing postfix + imap + dns + web + roundcube is making me type too much |
| 03:55.20 | Maloeran | Neat, thanks. I got my SIZEOF_VOID_P symbol |
| 03:56.50 | Twingy | I've almost saved a bulgogi lunch worth of energy |
| 03:57.08 | Twingy | in terms of cost |
| 03:57.36 | Twingy | 1 bulgogi lunch buys you 50kWh of electricity, heh |
| 03:58.55 | Maloeran | Yes, that's a much more standard unit |
| 03:59.28 | Twingy | ramen is a good one |
| 03:59.36 | Twingy | I might have to do kiloramen though |
| 03:59.41 | Twingy | otherwise the number gets too big |
| 04:02.06 | Twingy | 794 960 joules |
| 04:03.00 | Twingy | 4.5285 ramen per kilowatt hour |
| 04:03.09 | Twingy | a hair more than a twinkie |
| 04:04.14 | Twingy | my roof reminds of the space race in civilization |
| 04:04.15 | brlcad | how much are you using for ramen cost? |
| 04:04.26 | Twingy | 190 calories = 4.5285 kWh |
| 04:05.29 | Twingy | the inverse of that rather |
| 04:05.54 | Twingy | humans require 2.0 - 2.3 kWh a day to live |
| 04:11.22 | brlcad | erm DRA is 2000 calories average, most eat 3k .. i think i average about 4k with the workouts |
| 04:16.29 | Twingy | I try to stay around 2000 with a 2 mile run every other day |
| 04:16.49 | brlcad | that's a pretty sure way to lose weight |
| 04:17.55 | brlcad | probably burning about 200-500 for the immediate run, plus a few hundred more residual through the day |
| 04:18.33 | Twingy | I run after work |
| 04:18.43 | Twingy | which makes it difficult |
| 04:19.07 | Twingy | been doing this for the last 3 years or so |
| 04:19.53 | Twingy | hrm |
| 04:19.59 | Twingy | imap is not letting me log in |
| 04:20.23 | Twingy | problem to solve tomorrow, bed time |
| 05:10.36 | *** join/#brlcad dragonlake (n=dragonla@221.221.238.208) | |
| 08:07.32 | *** join/#brlcad clock_ (n=clock@zux221-122-143.adsl.green.ch) | |
| 15:15.05 | Maloeran | It's really hard to believe no one taught compilers how to manage registers properly yet |
| 15:15.27 | clock_ | Maloeran: that's the theory of register allocation |
| 15:15.48 | clock_ | You make a variable lifetime analysis |
| 15:16.31 | clock_ | and then colour the DAG with the same number of colours as you have registers |
| 15:18.02 | Maloeran | That's some nice theory, all the implementations are terribly broken in practice |
| 15:18.28 | clock_ | you mean like gcc producing |
| 15:18.30 | clock_ | mov ax,cx |
| 15:18.32 | clock_ | mov ax,bx |
| 15:18.33 | clock_ | ? |
| 15:19.11 | Maloeran | The list of horrors goes on and on... Typical non-sense : Load A into xmm0, load B into xmm1, move xmm0 to xmm2, load C into xmm0 |
| 15:19.48 | clock_ | It could have loaded A right into xmm2, right? |
| 15:20.08 | clock_ | Maloeran: but then the alrogithm is wrong |
| 15:20.22 | clock_ | it should have figured out it's a single variable and give it a single register |
| 15:20.27 | clock_ | and not smear it all around |
| 15:21.04 | Maloeran | These are not even "variables", just internal temporaries |
| 15:21.34 | clock_ | well, it then needs internal temporary dependence graph colouring :) |
| 15:21.49 | Maloeran | It's really a mess, and it's saturated of such inefficient use of registers |
| 15:22.03 | clock_ | it's easier for the ivory tower kooks from gcc when they live in an illusion they are good than if they actually did something that is really good |
| 15:23.51 | Maloeran | Last time I quickly rewrote a big chunk of code in assembly, it was 30% faster just because of the half-decent register management |
| 15:23.51 | clock_ | Maloeran: tell them about the problem - and they will ignore you. Insist on solution of the problem - they will mark you as their enemy |
| 15:24.11 | Maloeran | GCC has got to understand that the variables in inner loops are _more_ important, and to keep the good stuff in registers instead of hitting the stack constantly |
| 15:24.25 | clock_ | Maloeran: yes but everyone will tell you that today it doesn't pay off to write assembly code because today's compilers produce better code than a human assembly writer |
| 15:24.41 | Maloeran | I heard that many times, compilers are pathetic |
| 15:25.02 | clock_ | Maloeran: mov ax, cx mov ax, bx should have ben caught at least by the peephole optimization! |
| 15:25.12 | clock_ | But this shows that even such a trivial peephole is not programmed in |
| 15:25.19 | clock_ | mov r1, r2 |
| 15:25.24 | clock_ | mov r1, r3 translates into |
| 15:25.27 | clock_ | mov r1, r3 |
| 15:25.57 | clock_ | Maloeran: but there are worse problems in the world than bad compiler output |
| 15:25.59 | Maloeran | It goes that frequently when trying to shift by a variable count of bits, value must be in %rcx |
| 15:26.01 | clock_ | for example a lack of sex |
| 15:26.13 | Maloeran | But it will never load the value in that register directly, just move it around |
| 15:26.47 | clock_ | Maloeran: why do you care? What kind of code do you write that you need speed? |
| 15:26.58 | Maloeran | High-performance ray-tracing code ;) |
| 15:27.04 | clock_ | BRL-CAD? |
| 15:27.13 | Maloeran | Yes, the next raytracer of BRL-CAD |
| 15:27.23 | clock_ | are you paid for that? |
| 15:27.26 | Maloeran | Sure |
| 15:27.34 | clock_ | I want to be paid for such things :) |
| 15:28.07 | Maloeran | :) These are interesting problems to play with, it's great to be able to do that full-time |
| 15:28.22 | clock_ | Maloeran: I can do fast programs even without assembly |
| 15:28.42 | clock_ | Maloeran: for example run the Links browser. Display some big fat JPEG so that it is rescaled in the process. |
| 15:29.19 | clock_ | Then relax and realize that the rescaling is performed in linear photometric space with 48bits per pixel and there is gamma correction and dithering applied after, even on 24bpp display |
| 15:29.27 | clock_ | And it's not even in assembly. |
| 15:29.40 | clock_ | But people tend to say my dither.c is hard to understand |
| 15:29.41 | Maloeran | Using mmx there? |
| 15:29.45 | clock_ | no mmx |
| 15:29.49 | clock_ | just ordinary C compiler output |
| 15:30.14 | Maloeran | Nice, though the problem is rather simple |
| 15:30.23 | clock_ | Maloeran: I realized Linux people don't like self-modifying code |
| 15:30.33 | clock_ | so I found out how to work around this limitation |
| 15:30.53 | clock_ | I generate a separate routine for every memory organization using a #define template :) |
| 15:30.56 | Maloeran | Processors generally don't like it much, but it's worth it if you modify once and execute million times |
| 15:31.16 | Maloeran | Would you have an amd64 opcode emitter at hand? |
| 15:31.25 | clock_ | Well - all the linux folks reached with their anti-self-modifying-code stance is that the code has to be bigger |
| 15:31.30 | clock_ | but is as fast :) |
| 15:31.46 | clock_ | what is an opcode emitter? |
| 15:32.07 | Maloeran | To be able to generate binary encoding of instructions at runtime from code, to be able to run it |
| 15:32.52 | clock_ | you mean to link an assembler into the program and then the program compiles parts of itself on the fly? |
| 15:33.09 | Maloeran | More or less, the program generates optimized pipelines for the task at hand and executes them |
| 15:33.09 | clock_ | I don't have amd64 assembler at hand. |
| 15:33.43 | clock_ | Maloeran: what computer did you start with? |
| 15:33.58 | Maloeran | I would prefer to do that instead of fixed assembly pipelines, once I get too tired of compiler incompetence |
| 15:34.14 | Maloeran | I begun coding on a 486 |
| 15:34.36 | clock_ | I began basically with assembly on ZX Spectrum when I was 13. |
| 15:34.44 | Maloeran | Trying to do fancy graphics on the thing, 2d and 3d, I learned assembly back then |
| 15:34.58 | clock_ | I did fancy graphics too |
| 15:35.05 | Maloeran | Eheh nice. I was 12-13 as well |
| 15:35.13 | clock_ | for example I wrote a doom engine where there was a bathroom where there was 10 cm of blood on the floor |
| 15:35.28 | clock_ | when you walked there, it did real waves and circles like on water which reflected off the walls |
| 15:35.55 | Maloeran | Impressive, I struggled for a while to understand the basics of 3d rendering back then, quaternions especially |
| 15:35.56 | clock_ | and when you killed an enemy, blood sprayed around the screen and then the drops slowly moved down |
| 15:36.59 | Maloeran | Doing any work on or related to BRL-CAD lately? |
| 15:37.09 | clock_ | no but I would like to |
| 15:37.18 | clock_ | now I work as a C/ASM programmer on an embedded 186 platform |
| 15:37.31 | Maloeran | Eheh, neat |
| 15:37.57 | clock_ | but we are using Borland C compiler where the optimizations don't work at all even if there are flags for it. I find this compiler a big turnoff |
| 15:38.01 | clock_ | It's buggy too |
| 15:38.06 | clock_ | and it's a fossil. |
| 15:38.25 | clock_ | and the CPU is buggy |
| 15:38.51 | Maloeran | What are the chips used for? |
| 15:38.59 | clock_ | for a MP3 player |
| 15:39.05 | clock_ | or Internet radio |
| 15:41.18 | clock_ | Maloeran: that's normal with today's products |
| 15:41.41 | clock_ | Maloeran: if it happens more than once in 5 minutes it's suspicious, but 1 per day is normal today |
| 15:42.29 | clock_ | unfortunately. |
| 15:43.10 | Maloeran | Microsoft really managed to get the masses used to deal with crappy software |
| 15:44.13 | Maloeran | Another "detail" : GCC never understood that movlps only takes 2 cycles instead of the 3 cycles of movss on amd64/Opterons for the same result in most cases |
| 15:44.24 | clock_ | Maloeran: I have two penguin plush dolls, one 60cm high, another 15cm high |
| 15:44.43 | Maloeran | movss for memory load that is |
| 15:45.29 | clock_ | Maloeran: you can't really expect me to understand movlps by heart when I am working on a 186 platform and the last time I wrote assembly for fun, the latest processor was Pentium |
| 15:47.07 | Maloeran | Eheh, sorry. In a context of scalar operations, movlps loads 64 bits from memory into xmm register and leaves the upper 64 bits untouched, movss loads 32 bits from memory and clears the upper 96 bits to zero |
| 15:47.54 | Maloeran | Especially when the load is followed by a shuffle to replicate the float 4 times in the register, as it's often the case |
| 15:49.07 | clock_ | does it calculate correctly? |
| 15:49.13 | clock_ | Or does it divide like Pentium? |
| 15:49.49 | Maloeran | Sure it's correct, and they fixed most of the "rounding mode" and denormals mess |
| 15:50.12 | clock_ | wow! |
| 15:50.19 | clock_ | Correct floating point implementation! |
| 15:50.23 | Maloeran | The instruction set it still a mess and the instruction encoding is atrociously long because all the short opcodes are used for legacy 8086 |
| 15:50.33 | clock_ | Like I worked with some arm920t from Cirrus Logic and they had crappy FPU |
| 15:50.40 | clock_ | sometimes it produced opposite sign etc. :) |
| 15:50.46 | Maloeran | Woohoo :) |
| 15:51.08 | clock_ | sometimes you had to wait a bit so it wouldn't make mistake etc. :) |
| 15:51.14 | Maloeran | It's nowhere near the elegancy and efficiency of Altivec, but... it's usable, unlike mmx |
| 15:51.23 | clock_ | what is altivec? |
| 15:51.54 | Maloeran | Apple's SIMD instructions on their IBM processors, G3/G4/G5 |
| 15:52.05 | clock_ | it was crappy, but it had a bold-sounding name MaverickCrunch(TM) |
| 15:52.20 | clock_ | You now today it doesn't matter if it works right or wrong - all that matters is the marketing. |
| 15:52.53 | clock_ | If your engineers cannot fix it, one addition (TM) will do. |
| 15:52.57 | Maloeran | That's mostly true, unfortunately |
| 15:53.08 | clock_ | And that's also why I am doing http://ronja.twibright.com |
| 15:53.16 | clock_ | and why I bought an old 8-bit computer yesterday. |
| 15:53.25 | clock_ | I want to have at least one BugFree(TM) computer at home |
| 15:53.35 | clock_ | It's the same model I had as a kid. |
| 15:54.58 | Maloeran | Sounds nice. I grew up with a 486 and a Pentium 133 |
| 15:55.20 | clock_ | You never rode a healthy silicon horse :) |
| 15:55.43 | clock_ | healthy pony better than a sick stallion |
| 15:57.35 | clock_ | But Frederico Faggini was at least able to do it right on the first try |
| 15:57.38 | Maloeran | The stallion doesn't run straight and occasionally crashes in stuff on the way, but it's still better |
| 15:58.08 | Maloeran | Not a name I'm familiar with, not finding much on google |
| 16:00.42 | clock_ | THe guy who designed Z80 |
| 16:03.37 | archivist | Z80 was slow |
| 16:04.49 | clock_ | yes Pentium 4 @ 3GHz is faster |
| 16:04.54 | archivist | 2meg 65C02 is da man in those days |
| 16:05.04 | clock_ | 6502 was buggy |
| 16:06.01 | clock_ | omg the old discussion what was better, whether a buggy 6502 virtually without registers that took little cycles per instruction or BugFree(TM) Z80 with tons of registers that took at least 4 ticks per inisn |
| 16:06.26 | clock_ | "and Z80 didn't have the CRS instruction!" |
| 16:06.40 | clock_ | CRS = CRash System |
| 16:27.42 | brlcad | yay, ponies |
| 22:21.55 | *** join/#brlcad Twingy (n=justin@74.92.144.217) | |
| 23:10.35 | ``Erik | o.O |
| 23:19.06 | ``Erik | /nick quanzaclause |