03:08.05 |
Maloeran |
Erik or anyone else : is there a portable
solution for detecting the size of pointers at preprocessor time?
I'm not sure __WORDSIZE is really portable |
03:25.24 |
brlcad |
Maloeran: I'm not aware of a truely portable
means to do that |
03:26.15 |
brlcad |
probably the closest that comes to mind would
be some hack-trick of defining some internal struct with a pointer
in it and using the offsetof() macro |
03:45.16 |
Maloeran |
I need to know at preprocessor time, sizeof()
and offsetof() are not available then |
03:45.54 |
Maloeran |
If there's no standard cpp macro available,
the only thing I can think of is sticking some test in configure.ac
to dump the result in config.h and use that |
03:46.20 |
brlcad |
offsetof is a preprocessor macro |
03:46.51 |
brlcad |
though it probably still won't evaluate to
something like a numeral .. hmm |
03:48.16 |
Maloeran |
I just need to know if I'm dealing with 32 or
64 bits pointers at compile time when doing some messy SSE
operations on pointers |
03:49.19 |
Twingy |
write a little test program that returns the
size |
03:49.38 |
Twingy |
return sizeof(ptr_t); |
03:50.14 |
Twingy |
put the test in configure.ac |
03:50.28 |
Twingy |
either way you gotta do it |
03:51.04 |
brlcad |
ooh, for *that* |
03:51.19 |
brlcad |
yeah, just make a little program .. that gets
shoved into the configure script rather trivially |
03:51.25 |
brlcad |
then you have your symbol |
03:51.40 |
brlcad |
for that matter, there are predefined autoconf
macros to do exactly that for you already |
03:51.40 |
Twingy |
why not use the HAVE_64bits |
03:52.04 |
Twingy |
AC_LONG_64_BITS |
03:52.08 |
brlcad |
# figure out what size pointers the compiler
is actually generating |
03:52.09 |
brlcad |
AC_CHECK_SIZEOF(int) |
03:52.09 |
brlcad |
AC_CHECK_SIZEOF(long) |
03:52.09 |
brlcad |
AC_CHECK_SIZEOF(long long) |
03:52.09 |
brlcad |
AC_CHECK_SIZEOF(void *, 4) |
03:52.18 |
Twingy |
AC_LONG_64_BITS is shorter :) |
03:52.32 |
Maloeran |
I just put these things in
configure.ac? |
03:52.33 |
brlcad |
that just checks if longs are 64
bits |
03:52.43 |
brlcad |
depends what it is exactly that you want to
know |
03:52.59 |
Maloeran |
I want to know the size of pointers, integer
data types are already all defined in limits.h |
03:53.04 |
Twingy |
all depends on what OS/Arch you're
supporting |
03:53.19 |
brlcad |
then probably just that last one, using void *
or char * etc |
03:53.30 |
brlcad |
it'll give you a preprocessor symbol for the
result |
03:54.04 |
Twingy |
I haven't setup a mail server since before I
joined ARL |
03:54.20 |
Twingy |
doing postfix + imap + dns + web + roundcube
is making me type too much |
03:55.20 |
Maloeran |
Neat, thanks. I got my SIZEOF_VOID_P
symbol |
03:56.50 |
Twingy |
I've almost saved a bulgogi lunch worth of
energy |
03:57.08 |
Twingy |
in terms of cost |
03:57.36 |
Twingy |
1 bulgogi lunch buys you 50kWh of electricity,
heh |
03:58.55 |
Maloeran |
Yes, that's a much more standard
unit |
03:59.28 |
Twingy |
ramen is a good one |
03:59.36 |
Twingy |
I might have to do kiloramen though |
03:59.41 |
Twingy |
otherwise the number gets too big |
04:02.06 |
Twingy |
794 960 joules |
04:03.00 |
Twingy |
4.5285 ramen per kilowatt hour |
04:03.09 |
Twingy |
a hair more than a twinkie |
04:04.14 |
Twingy |
my roof reminds of the space race in
civilization |
04:04.15 |
brlcad |
how much are you using for ramen
cost? |
04:04.26 |
Twingy |
190 calories = 4.5285 kWh |
04:05.29 |
Twingy |
the inverse of that rather |
04:05.54 |
Twingy |
humans require 2.0 - 2.3 kWh a day to
live |
04:11.22 |
brlcad |
erm DRA is 2000 calories average, most eat 3k
.. i think i average about 4k with the workouts |
04:16.29 |
Twingy |
I try to stay around 2000 with a 2 mile run
every other day |
04:16.49 |
brlcad |
that's a pretty sure way to lose
weight |
04:17.55 |
brlcad |
probably burning about 200-500 for the
immediate run, plus a few hundred more residual through the
day |
04:18.33 |
Twingy |
I run after work |
04:18.43 |
Twingy |
which makes it difficult |
04:19.07 |
Twingy |
been doing this for the last 3 years or
so |
04:19.53 |
Twingy |
hrm |
04:19.59 |
Twingy |
imap is not letting me log in |
04:20.23 |
Twingy |
problem to solve tomorrow, bed time |
05:10.36 |
*** join/#brlcad dragonlake
(n=dragonla@221.221.238.208) |
08:07.32 |
*** join/#brlcad clock_
(n=clock@zux221-122-143.adsl.green.ch) |
15:15.05 |
Maloeran |
It's really hard to believe no one taught
compilers how to manage registers properly yet |
15:15.27 |
clock_ |
Maloeran: that's the theory of register
allocation |
15:15.48 |
clock_ |
You make a variable lifetime
analysis |
15:16.31 |
clock_ |
and then colour the DAG with the same number
of colours as you have registers |
15:18.02 |
Maloeran |
That's some nice theory, all the
implementations are terribly broken in practice |
15:18.28 |
clock_ |
you mean like gcc producing |
15:18.30 |
clock_ |
mov ax,cx |
15:18.32 |
clock_ |
mov ax,bx |
15:18.33 |
clock_ |
? |
15:19.11 |
Maloeran |
The list of horrors goes on and on... Typical
non-sense : Load A into xmm0, load B into xmm1, move xmm0 to xmm2,
load C into xmm0 |
15:19.48 |
clock_ |
It could have loaded A right into xmm2,
right? |
15:20.08 |
clock_ |
Maloeran: but then the alrogithm is
wrong |
15:20.22 |
clock_ |
it should have figured out it's a single
variable and give it a single register |
15:20.27 |
clock_ |
and not smear it all around |
15:21.04 |
Maloeran |
These are not even "variables", just internal
temporaries |
15:21.34 |
clock_ |
well, it then needs internal temporary
dependence graph colouring :) |
15:21.49 |
Maloeran |
It's really a mess, and it's saturated of such
inefficient use of registers |
15:22.03 |
clock_ |
it's easier for the ivory tower kooks from gcc
when they live in an illusion they are good than if they actually
did something that is really good |
15:23.51 |
Maloeran |
Last time I quickly rewrote a big chunk of
code in assembly, it was 30% faster just because of the half-decent
register management |
15:23.51 |
clock_ |
Maloeran: tell them about the problem - and
they will ignore you. Insist on solution of the problem - they will
mark you as their enemy |
15:24.11 |
Maloeran |
GCC has got to understand that the variables
in inner loops are _more_ important, and to keep the good stuff in
registers instead of hitting the stack constantly |
15:24.25 |
clock_ |
Maloeran: yes but everyone will tell you that
today it doesn't pay off to write assembly code because today's
compilers produce better code than a human assembly
writer |
15:24.41 |
Maloeran |
I heard that many times, compilers are
pathetic |
15:25.02 |
clock_ |
Maloeran: mov ax, cx mov ax, bx should have
ben caught at least by the peephole optimization! |
15:25.12 |
clock_ |
But this shows that even such a trivial
peephole is not programmed in |
15:25.19 |
clock_ |
mov r1, r2 |
15:25.24 |
clock_ |
mov r1, r3 translates into |
15:25.27 |
clock_ |
mov r1, r3 |
15:25.57 |
clock_ |
Maloeran: but there are worse problems in the
world than bad compiler output |
15:25.59 |
Maloeran |
It goes that frequently when trying to shift
by a variable count of bits, value must be in %rcx |
15:26.01 |
clock_ |
for example a lack of sex |
15:26.13 |
Maloeran |
But it will never load the value in that
register directly, just move it around |
15:26.47 |
clock_ |
Maloeran: why do you care? What kind of code
do you write that you need speed? |
15:26.58 |
Maloeran |
High-performance ray-tracing code ;) |
15:27.04 |
clock_ |
BRL-CAD? |
15:27.13 |
Maloeran |
Yes, the next raytracer of BRL-CAD |
15:27.23 |
clock_ |
are you paid for that? |
15:27.26 |
Maloeran |
Sure |
15:27.34 |
clock_ |
I want to be paid for such things :) |
15:28.07 |
Maloeran |
:) These are interesting problems to play
with, it's great to be able to do that full-time |
15:28.22 |
clock_ |
Maloeran: I can do fast programs even without
assembly |
15:28.42 |
clock_ |
Maloeran: for example run the Links browser.
Display some big fat JPEG so that it is rescaled in the
process. |
15:29.19 |
clock_ |
Then relax and realize that the rescaling is
performed in linear photometric space with 48bits per pixel and
there is gamma correction and dithering applied after, even on
24bpp display |
15:29.27 |
clock_ |
And it's not even in assembly. |
15:29.40 |
clock_ |
But people tend to say my dither.c is hard to
understand |
15:29.41 |
Maloeran |
Using mmx there? |
15:29.45 |
clock_ |
no mmx |
15:29.49 |
clock_ |
just ordinary C compiler output |
15:30.14 |
Maloeran |
Nice, though the problem is rather
simple |
15:30.23 |
clock_ |
Maloeran: I realized Linux people don't like
self-modifying code |
15:30.33 |
clock_ |
so I found out how to work around this
limitation |
15:30.53 |
clock_ |
I generate a separate routine for every memory
organization using a #define template :) |
15:30.56 |
Maloeran |
Processors generally don't like it much, but
it's worth it if you modify once and execute million
times |
15:31.16 |
Maloeran |
Would you have an amd64 opcode emitter at
hand? |
15:31.25 |
clock_ |
Well - all the linux folks reached with their
anti-self-modifying-code stance is that the code has to be
bigger |
15:31.30 |
clock_ |
but is as fast :) |
15:31.46 |
clock_ |
what is an opcode emitter? |
15:32.07 |
Maloeran |
To be able to generate binary encoding of
instructions at runtime from code, to be able to run it |
15:32.52 |
clock_ |
you mean to link an assembler into the program
and then the program compiles parts of itself on the fly? |
15:33.09 |
Maloeran |
More or less, the program generates optimized
pipelines for the task at hand and executes them |
15:33.09 |
clock_ |
I don't have amd64 assembler at
hand. |
15:33.43 |
clock_ |
Maloeran: what computer did you start
with? |
15:33.58 |
Maloeran |
I would prefer to do that instead of fixed
assembly pipelines, once I get too tired of compiler
incompetence |
15:34.14 |
Maloeran |
I begun coding on a 486 |
15:34.36 |
clock_ |
I began basically with assembly on ZX Spectrum
when I was 13. |
15:34.44 |
Maloeran |
Trying to do fancy graphics on the thing, 2d
and 3d, I learned assembly back then |
15:34.58 |
clock_ |
I did fancy graphics too |
15:35.05 |
Maloeran |
Eheh nice. I was 12-13 as well |
15:35.13 |
clock_ |
for example I wrote a doom engine where there
was a bathroom where there was 10 cm of blood on the
floor |
15:35.28 |
clock_ |
when you walked there, it did real waves and
circles like on water which reflected off the walls |
15:35.55 |
Maloeran |
Impressive, I struggled for a while to
understand the basics of 3d rendering back then, quaternions
especially |
15:35.56 |
clock_ |
and when you killed an enemy, blood sprayed
around the screen and then the drops slowly moved down |
15:36.59 |
Maloeran |
Doing any work on or related to BRL-CAD
lately? |
15:37.09 |
clock_ |
no but I would like to |
15:37.18 |
clock_ |
now I work as a C/ASM programmer on an
embedded 186 platform |
15:37.31 |
Maloeran |
Eheh, neat |
15:37.57 |
clock_ |
but we are using Borland C compiler where the
optimizations don't work at all even if there are flags for it. I
find this compiler a big turnoff |
15:38.01 |
clock_ |
It's buggy too |
15:38.06 |
clock_ |
and it's a fossil. |
15:38.25 |
clock_ |
and the CPU is buggy |
15:38.51 |
Maloeran |
What are the chips used for? |
15:38.59 |
clock_ |
for a MP3 player |
15:39.05 |
clock_ |
or Internet radio |
15:41.18 |
clock_ |
Maloeran: that's normal with today's
products |
15:41.41 |
clock_ |
Maloeran: if it happens more than once in 5
minutes it's suspicious, but 1 per day is normal today |
15:42.29 |
clock_ |
unfortunately. |
15:43.10 |
Maloeran |
Microsoft really managed to get the masses
used to deal with crappy software |
15:44.13 |
Maloeran |
Another "detail" : GCC never understood that
movlps only takes 2 cycles instead of the 3 cycles of movss on
amd64/Opterons for the same result in most cases |
15:44.24 |
clock_ |
Maloeran: I have two penguin plush dolls, one
60cm high, another 15cm high |
15:44.43 |
Maloeran |
movss for memory load that is |
15:45.29 |
clock_ |
Maloeran: you can't really expect me to
understand movlps by heart when I am working on a 186 platform and
the last time I wrote assembly for fun, the latest processor was
Pentium |
15:47.07 |
Maloeran |
Eheh, sorry. In a context of scalar
operations, movlps loads 64 bits from memory into xmm register and
leaves the upper 64 bits untouched, movss loads 32 bits from memory
and clears the upper 96 bits to zero |
15:47.54 |
Maloeran |
Especially when the load is followed by a
shuffle to replicate the float 4 times in the register, as it's
often the case |
15:49.07 |
clock_ |
does it calculate correctly? |
15:49.13 |
clock_ |
Or does it divide like Pentium? |
15:49.49 |
Maloeran |
Sure it's correct, and they fixed most of the
"rounding mode" and denormals mess |
15:50.12 |
clock_ |
wow! |
15:50.19 |
clock_ |
Correct floating point
implementation! |
15:50.23 |
Maloeran |
The instruction set it still a mess and the
instruction encoding is atrociously long because all the short
opcodes are used for legacy 8086 |
15:50.33 |
clock_ |
Like I worked with some arm920t from Cirrus
Logic and they had crappy FPU |
15:50.40 |
clock_ |
sometimes it produced opposite sign etc.
:) |
15:50.46 |
Maloeran |
Woohoo :) |
15:51.08 |
clock_ |
sometimes you had to wait a bit so it wouldn't
make mistake etc. :) |
15:51.14 |
Maloeran |
It's nowhere near the elegancy and efficiency
of Altivec, but... it's usable, unlike mmx |
15:51.23 |
clock_ |
what is altivec? |
15:51.54 |
Maloeran |
Apple's SIMD instructions on their IBM
processors, G3/G4/G5 |
15:52.05 |
clock_ |
it was crappy, but it had a bold-sounding name
MaverickCrunch(TM) |
15:52.20 |
clock_ |
You now today it doesn't matter if it works
right or wrong - all that matters is the marketing. |
15:52.53 |
clock_ |
If your engineers cannot fix it, one addition
(TM) will do. |
15:52.57 |
Maloeran |
That's mostly true, unfortunately |
15:53.08 |
clock_ |
And that's also why I am doing http://ronja.twibright.com |
15:53.16 |
clock_ |
and why I bought an old 8-bit computer
yesterday. |
15:53.25 |
clock_ |
I want to have at least one BugFree(TM)
computer at home |
15:53.35 |
clock_ |
It's the same model I had as a kid. |
15:54.58 |
Maloeran |
Sounds nice. I grew up with a 486 and a
Pentium 133 |
15:55.20 |
clock_ |
You never rode a healthy silicon horse
:) |
15:55.43 |
clock_ |
healthy pony better than a sick
stallion |
15:57.35 |
clock_ |
But Frederico Faggini was at least able to do
it right on the first try |
15:57.38 |
Maloeran |
The stallion doesn't run straight and
occasionally crashes in stuff on the way, but it's still
better |
15:58.08 |
Maloeran |
Not a name I'm familiar with, not finding much
on google |
16:00.42 |
clock_ |
THe guy who designed Z80 |
16:03.37 |
archivist |
Z80 was slow |
16:04.49 |
clock_ |
yes Pentium 4 @ 3GHz is faster |
16:04.54 |
archivist |
2meg 65C02 is da man in those days |
16:05.04 |
clock_ |
6502 was buggy |
16:06.01 |
clock_ |
omg the old discussion what was better,
whether a buggy 6502 virtually without registers that took little
cycles per instruction or BugFree(TM) Z80 with tons of registers
that took at least 4 ticks per inisn |
16:06.26 |
clock_ |
"and Z80 didn't have the CRS
instruction!" |
16:06.40 |
clock_ |
CRS = CRash System |
16:27.42 |
brlcad |
yay, ponies |
22:21.55 |
*** join/#brlcad Twingy
(n=justin@74.92.144.217) |
23:10.35 |
``Erik |
o.O |
23:19.06 |
``Erik |
/nick quanzaclause |