irclog2html for #brlcad on 20061122

03:08.05 Maloeran Erik or anyone else : is there a portable solution for detecting the size of pointers at preprocessor time? I'm not sure __WORDSIZE is really portable
03:25.24 brlcad Maloeran: I'm not aware of a truely portable means to do that
03:26.15 brlcad probably the closest that comes to mind would be some hack-trick of defining some internal struct with a pointer in it and using the offsetof() macro
03:45.16 Maloeran I need to know at preprocessor time, sizeof() and offsetof() are not available then
03:45.54 Maloeran If there's no standard cpp macro available, the only thing I can think of is sticking some test in to dump the result in config.h and use that
03:46.20 brlcad offsetof is a preprocessor macro
03:46.51 brlcad though it probably still won't evaluate to something like a numeral .. hmm
03:48.16 Maloeran I just need to know if I'm dealing with 32 or 64 bits pointers at compile time when doing some messy SSE operations on pointers
03:49.19 Twingy write a little test program that returns the size
03:49.38 Twingy return sizeof(ptr_t);
03:50.14 Twingy put the test in
03:50.28 Twingy either way you gotta do it
03:51.04 brlcad ooh, for *that*
03:51.19 brlcad yeah, just make a little program .. that gets shoved into the configure script rather trivially
03:51.25 brlcad then you have your symbol
03:51.40 brlcad for that matter, there are predefined autoconf macros to do exactly that for you already
03:51.40 Twingy why not use the HAVE_64bits
03:52.04 Twingy AC_LONG_64_BITS
03:52.08 brlcad # figure out what size pointers the compiler is actually generating
03:52.09 brlcad AC_CHECK_SIZEOF(int)
03:52.09 brlcad AC_CHECK_SIZEOF(long)
03:52.09 brlcad AC_CHECK_SIZEOF(long long)
03:52.09 brlcad AC_CHECK_SIZEOF(void *, 4)
03:52.18 Twingy AC_LONG_64_BITS is shorter :)
03:52.32 Maloeran I just put these things in
03:52.33 brlcad that just checks if longs are 64 bits
03:52.43 brlcad depends what it is exactly that you want to know
03:52.59 Maloeran I want to know the size of pointers, integer data types are already all defined in limits.h
03:53.04 Twingy all depends on what OS/Arch you're supporting
03:53.19 brlcad then probably just that last one, using void * or char * etc
03:53.30 brlcad it'll give you a preprocessor symbol for the result
03:54.04 Twingy I haven't setup a mail server since before I joined ARL
03:54.20 Twingy doing postfix + imap + dns + web + roundcube is making me type too much
03:55.20 Maloeran Neat, thanks. I got my SIZEOF_VOID_P symbol
03:56.50 Twingy I've almost saved a bulgogi lunch worth of energy
03:57.08 Twingy in terms of cost
03:57.36 Twingy 1 bulgogi lunch buys you 50kWh of electricity, heh
03:58.55 Maloeran Yes, that's a much more standard unit
03:59.28 Twingy ramen is a good one
03:59.36 Twingy I might have to do kiloramen though
03:59.41 Twingy otherwise the number gets too big
04:02.06 Twingy 794 960 joules
04:03.00 Twingy 4.5285 ramen per kilowatt hour
04:03.09 Twingy a hair more than a twinkie
04:04.14 Twingy my roof reminds of the space race in civilization
04:04.15 brlcad how much are you using for ramen cost?
04:04.26 Twingy 190 calories = 4.5285 kWh
04:05.29 Twingy the inverse of that rather
04:05.54 Twingy humans require 2.0 - 2.3 kWh a day to live
04:11.22 brlcad erm DRA is 2000 calories average, most eat 3k .. i think i average about 4k with the workouts
04:16.29 Twingy I try to stay around 2000 with a 2 mile run every other day
04:16.49 brlcad that's a pretty sure way to lose weight
04:17.55 brlcad probably burning about 200-500 for the immediate run, plus a few hundred more residual through the day
04:18.33 Twingy I run after work
04:18.43 Twingy which makes it difficult
04:19.07 Twingy been doing this for the last 3 years or so
04:19.53 Twingy hrm
04:19.59 Twingy imap is not letting me log in
04:20.23 Twingy problem to solve tomorrow, bed time
05:10.36 *** join/#brlcad dragonlake (n=dragonla@
08:07.32 *** join/#brlcad clock_ (
15:15.05 Maloeran It's really hard to believe no one taught compilers how to manage registers properly yet
15:15.27 clock_ Maloeran: that's the theory of register allocation
15:15.48 clock_ You make a variable lifetime analysis
15:16.31 clock_ and then colour the DAG with the same number of colours as you have registers
15:18.02 Maloeran That's some nice theory, all the implementations are terribly broken in practice
15:18.28 clock_ you mean like gcc producing
15:18.30 clock_ mov ax,cx
15:18.32 clock_ mov ax,bx
15:18.33 clock_ ?
15:19.11 Maloeran The list of horrors goes on and on... Typical non-sense : Load A into xmm0, load B into xmm1, move xmm0 to xmm2, load C into xmm0
15:19.48 clock_ It could have loaded A right into xmm2, right?
15:20.08 clock_ Maloeran: but then the alrogithm is wrong
15:20.22 clock_ it should have figured out it's a single variable and give it a single register
15:20.27 clock_ and not smear it all around
15:21.04 Maloeran These are not even "variables", just internal temporaries
15:21.34 clock_ well, it then needs internal temporary dependence graph colouring :)
15:21.49 Maloeran It's really a mess, and it's saturated of such inefficient use of registers
15:22.03 clock_ it's easier for the ivory tower kooks from gcc when they live in an illusion they are good than if they actually did something that is really good
15:23.51 Maloeran Last time I quickly rewrote a big chunk of code in assembly, it was 30% faster just because of the half-decent register management
15:23.51 clock_ Maloeran: tell them about the problem - and they will ignore you. Insist on solution of the problem - they will mark you as their enemy
15:24.11 Maloeran GCC has got to understand that the variables in inner loops are _more_ important, and to keep the good stuff in registers instead of hitting the stack constantly
15:24.25 clock_ Maloeran: yes but everyone will tell you that today it doesn't pay off to write assembly code because today's compilers produce better code than a human assembly writer
15:24.41 Maloeran I heard that many times, compilers are pathetic
15:25.02 clock_ Maloeran: mov ax, cx mov ax, bx should have ben caught at least by the peephole optimization!
15:25.12 clock_ But this shows that even such a trivial peephole is not programmed in
15:25.19 clock_ mov r1, r2
15:25.24 clock_ mov r1, r3 translates into
15:25.27 clock_ mov r1, r3
15:25.57 clock_ Maloeran: but there are worse problems in the world than bad compiler output
15:25.59 Maloeran It goes that frequently when trying to shift by a variable count of bits, value must be in %rcx
15:26.01 clock_ for example a lack of sex
15:26.13 Maloeran But it will never load the value in that register directly, just move it around
15:26.47 clock_ Maloeran: why do you care? What kind of code do you write that you need speed?
15:26.58 Maloeran High-performance ray-tracing code ;)
15:27.04 clock_ BRL-CAD?
15:27.13 Maloeran Yes, the next raytracer of BRL-CAD
15:27.23 clock_ are you paid for that?
15:27.26 Maloeran Sure
15:27.34 clock_ I want to be paid for such things :)
15:28.07 Maloeran :) These are interesting problems to play with, it's great to be able to do that full-time
15:28.22 clock_ Maloeran: I can do fast programs even without assembly
15:28.42 clock_ Maloeran: for example run the Links browser. Display some big fat JPEG so that it is rescaled in the process.
15:29.19 clock_ Then relax and realize that the rescaling is performed in linear photometric space with 48bits per pixel and there is gamma correction and dithering applied after, even on 24bpp display
15:29.27 clock_ And it's not even in assembly.
15:29.40 clock_ But people tend to say my dither.c is hard to understand
15:29.41 Maloeran Using mmx there?
15:29.45 clock_ no mmx
15:29.49 clock_ just ordinary C compiler output
15:30.14 Maloeran Nice, though the problem is rather simple
15:30.23 clock_ Maloeran: I realized Linux people don't like self-modifying code
15:30.33 clock_ so I found out how to work around this limitation
15:30.53 clock_ I generate a separate routine for every memory organization using a #define template :)
15:30.56 Maloeran Processors generally don't like it much, but it's worth it if you modify once and execute million times
15:31.16 Maloeran Would you have an amd64 opcode emitter at hand?
15:31.25 clock_ Well - all the linux folks reached with their anti-self-modifying-code stance is that the code has to be bigger
15:31.30 clock_ but is as fast :)
15:31.46 clock_ what is an opcode emitter?
15:32.07 Maloeran To be able to generate binary encoding of instructions at runtime from code, to be able to run it
15:32.52 clock_ you mean to link an assembler into the program and then the program compiles parts of itself on the fly?
15:33.09 Maloeran More or less, the program generates optimized pipelines for the task at hand and executes them
15:33.09 clock_ I don't have amd64 assembler at hand.
15:33.43 clock_ Maloeran: what computer did you start with?
15:33.58 Maloeran I would prefer to do that instead of fixed assembly pipelines, once I get too tired of compiler incompetence
15:34.14 Maloeran I begun coding on a 486
15:34.36 clock_ I began basically with assembly on ZX Spectrum when I was 13.
15:34.44 Maloeran Trying to do fancy graphics on the thing, 2d and 3d, I learned assembly back then
15:34.58 clock_ I did fancy graphics too
15:35.05 Maloeran Eheh nice. I was 12-13 as well
15:35.13 clock_ for example I wrote a doom engine where there was a bathroom where there was 10 cm of blood on the floor
15:35.28 clock_ when you walked there, it did real waves and circles like on water which reflected off the walls
15:35.55 Maloeran Impressive, I struggled for a while to understand the basics of 3d rendering back then, quaternions especially
15:35.56 clock_ and when you killed an enemy, blood sprayed around the screen and then the drops slowly moved down
15:36.59 Maloeran Doing any work on or related to BRL-CAD lately?
15:37.09 clock_ no but I would like to
15:37.18 clock_ now I work as a C/ASM programmer on an embedded 186 platform
15:37.31 Maloeran Eheh, neat
15:37.57 clock_ but we are using Borland C compiler where the optimizations don't work at all even if there are flags for it. I find this compiler a big turnoff
15:38.01 clock_ It's buggy too
15:38.06 clock_ and it's a fossil.
15:38.25 clock_ and the CPU is buggy
15:38.51 Maloeran What are the chips used for?
15:38.59 clock_ for a MP3 player
15:39.05 clock_ or Internet radio
15:41.18 clock_ Maloeran: that's normal with today's products
15:41.41 clock_ Maloeran: if it happens more than once in 5 minutes it's suspicious, but 1 per day is normal today
15:42.29 clock_ unfortunately.
15:43.10 Maloeran Microsoft really managed to get the masses used to deal with crappy software
15:44.13 Maloeran Another "detail" : GCC never understood that movlps only takes 2 cycles instead of the 3 cycles of movss on amd64/Opterons for the same result in most cases
15:44.24 clock_ Maloeran: I have two penguin plush dolls, one 60cm high, another 15cm high
15:44.43 Maloeran movss for memory load that is
15:45.29 clock_ Maloeran: you can't really expect me to understand movlps by heart when I am working on a 186 platform and the last time I wrote assembly for fun, the latest processor was Pentium
15:47.07 Maloeran Eheh, sorry. In a context of scalar operations, movlps loads 64 bits from memory into xmm register and leaves the upper 64 bits untouched, movss loads 32 bits from memory and clears the upper 96 bits to zero
15:47.54 Maloeran Especially when the load is followed by a shuffle to replicate the float 4 times in the register, as it's often the case
15:49.07 clock_ does it calculate correctly?
15:49.13 clock_ Or does it divide like Pentium?
15:49.49 Maloeran Sure it's correct, and they fixed most of the "rounding mode" and denormals mess
15:50.12 clock_ wow!
15:50.19 clock_ Correct floating point implementation!
15:50.23 Maloeran The instruction set it still a mess and the instruction encoding is atrociously long because all the short opcodes are used for legacy 8086
15:50.33 clock_ Like I worked with some arm920t from Cirrus Logic and they had crappy FPU
15:50.40 clock_ sometimes it produced opposite sign etc. :)
15:50.46 Maloeran Woohoo :)
15:51.08 clock_ sometimes you had to wait a bit so it wouldn't make mistake etc. :)
15:51.14 Maloeran It's nowhere near the elegancy and efficiency of Altivec, but... it's usable, unlike mmx
15:51.23 clock_ what is altivec?
15:51.54 Maloeran Apple's SIMD instructions on their IBM processors, G3/G4/G5
15:52.05 clock_ it was crappy, but it had a bold-sounding name MaverickCrunch(TM)
15:52.20 clock_ You now today it doesn't matter if it works right or wrong - all that matters is the marketing.
15:52.53 clock_ If your engineers cannot fix it, one addition (TM) will do.
15:52.57 Maloeran That's mostly true, unfortunately
15:53.08 clock_ And that's also why I am doing
15:53.16 clock_ and why I bought an old 8-bit computer yesterday.
15:53.25 clock_ I want to have at least one BugFree(TM) computer at home
15:53.35 clock_ It's the same model I had as a kid.
15:54.58 Maloeran Sounds nice. I grew up with a 486 and a Pentium 133
15:55.20 clock_ You never rode a healthy silicon horse :)
15:55.43 clock_ healthy pony better than a sick stallion
15:57.35 clock_ But Frederico Faggini was at least able to do it right on the first try
15:57.38 Maloeran The stallion doesn't run straight and occasionally crashes in stuff on the way, but it's still better
15:58.08 Maloeran Not a name I'm familiar with, not finding much on google
16:00.42 clock_ THe guy who designed Z80
16:03.37 archivist Z80 was slow
16:04.49 clock_ yes Pentium 4 @ 3GHz is faster
16:04.54 archivist 2meg 65C02 is da man in those days
16:05.04 clock_ 6502 was buggy
16:06.01 clock_ omg the old discussion what was better, whether a buggy 6502 virtually without registers that took little cycles per instruction or BugFree(TM) Z80 with tons of registers that took at least 4 ticks per inisn
16:06.26 clock_ "and Z80 didn't have the CRS instruction!"
16:06.40 clock_ CRS = CRash System
16:27.42 brlcad yay, ponies
22:21.55 *** join/#brlcad Twingy (n=justin@
23:10.35 ``Erik o.O
23:19.06 ``Erik /nick quanzaclause

Generated by by Jeff Waugh - find it at! Modified by Tim Riker to work with blootbot logs, split per channel, etc.