00:06.57 |
Notify |
03BRL-CAD Wiki:Bhollister * 9118
/wiki/User:Bhollister/DevLogJuly2015: /* Mon, July 27, 2015: Start
of Week 10 (of 14) */ |
00:14.56 |
starseeker |
bhollister: unfortunately, a quick scan
through the code suggests there isn't an nmg_visit_*
example |
00:15.14 |
starseeker |
bhollister: I'd suggest writing a small test
program to exercise the various functions |
00:31.10 |
vasc |
weird |
00:31.41 |
vasc |
my code worked TOO WELL |
00:33.07 |
vasc |
yeah i knew it |
00:33.17 |
vasc |
it isn't calling the segment i just
wrote |
00:40.37 |
vasc |
that's more like it |
00:49.40 |
vasc |
uhoh |
01:06.36 |
Notify |
03BRL-CAD:starseeker * 65709
brlcad/trunk/src/libged/shape_recognition.cpp: The wmember list
seems to be volatile - take another approach to collecting the
finalize comb info. This needs a lot of cleanup, but at least the
hierarchy does get generated... |
01:10.07 |
*** join/#brlcad vasc__
(~vasc@bl13-114-172.dsl.telepac.pt) |
01:16.14 |
vasc__ |
back to the drawing board. this way of storing
data doesn't work because opencl vectorized loads must be aligned
to the type size. great. |
01:16.33 |
vasc__ |
a week to the trash it is |
01:16.35 |
vasc__ |
hmm |
01:16.38 |
vasc__ |
lets see |
01:16.45 |
vasc__ |
how i can reuse this |
01:30.04 |
Notify |
03BRL-CAD:starseeker * 65710
brlcad/trunk/src/libged/shape_recognition.cpp: Set up for a
different approch - create the combs, then edit them after they are
created. |
01:49.49 |
vasc__ |
later |
01:52.36 |
Notify |
03BRL-CAD:starseeker * 65711
(brlcad/trunk/include/brep.h
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp
brlcad/trunk/src/libged/shape_recognition.cpp): Start getting set
up for ray shooting. |
02:10.01 |
Notify |
03BRL-CAD:starseeker * 65712
(brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp
brlcad/trunk/src/libanalyze/util.cpp
brlcad/trunk/src/libged/shape_recognition.cpp): Go non-parallel for
debugging. |
02:20.18 |
Notify |
03BRL-CAD:starseeker * 65713
(brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp
brlcad/trunk/src/libanalyze/util.cpp): back off the rays some more
- got another problem somewhere. |
02:26.29 |
Notify |
03BRL-CAD:starseeker * 65714
brlcad/trunk/src/libanalyze/util.cpp: fix initialization when prep
is coming from outside. |
03:57.30 |
*** join/#brlcad gurwinder
(~chatzilla@117.214.205.207) |
04:32.50 |
*** join/#brlcad bhollister2
(~brad@2601:647:cb02:7a00:f04d:35ac:f0ba:5880) |
07:08.18 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9119
/wiki/User:Vasco.costa/GSoC15/logs: |
07:10.59 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9120
/wiki/User:Vasco.costa/GSoC15/logs: |
07:12.46 |
*** join/#brlcad ries
(~ries@D979C47E.cm-3-2d.dynamic.ziggo.nl) |
07:17.59 |
Notify |
03BRL-CAD Wiki:MeShubham99 * 9121
/wiki/User:MeShubham99/GSoc15/log_developmen: /* Week 9
*/ |
07:18.48 |
Notify |
03BRL-CAD Wiki:MeShubham99 * 9122
/wiki/User:MeShubham99/GSoc15/log_developmen: /* Week 9
*/ |
07:31.36 |
*** join/#brlcad teepee--
(bc5c2134@gateway/web/freenode/ip.188.92.33.52) |
08:07.25 |
*** join/#brlcad dracarys983
(dracarys98@nat/iiit/x-eruahluooryxrrfd) |
08:16.25 |
*** join/#brlcad luca79
(~luca@host129-17-dynamic.4-87-r.retail.telecomitalia.it) |
08:17.29 |
*** join/#brlcad shaina
(~shaina@59.89.100.105) |
08:52.40 |
*** join/#brlcad merzo
(~merzo@user-94-45-58-141.skif.com.ua) |
10:39.14 |
*** join/#brlcad packrat
(~packrator@c-71-231-32-234.hsd1.wa.comcast.net) |
11:08.37 |
*** join/#brlcad jordisayol
(~jordisayo@unaffiliated/jordisayol) |
11:09.00 |
jordisayol |
hello all |
11:10.47 |
jordisayol |
I don't have files upload permission to brlcad
sourceforge. Is this a temporary maintenance issue? |
11:30.41 |
*** join/#brlcad luca79
(~luca@host130-19-dynamic.4-87-r.retail.telecomitalia.it) |
11:37.02 |
*** join/#brlcad konrado
(~konro@41.205.22.27) |
11:40.10 |
jordisayol |
Yes, sourceforge upload files is
offline |
11:40.11 |
jordisayol |
http://sourceforge.net/blog/sourceforge-infrastructure-and-service-restoration-update-for-724/ |
12:23.09 |
*** join/#brlcad sofat
(~sofat@202.164.45.204) |
12:32.51 |
*** join/#brlcad andrei_il
(~andrei@109.100.128.78) |
12:46.46 |
*** join/#brlcad sofat
(~sofat@202.164.45.204) |
13:25.54 |
Notify |
03BRL-CAD:starseeker * 65715
(brlcad/trunk/include/analyze.h
brlcad/trunk/src/libanalyze/analyze_private.h and 5 others): Pass
in the cpu count. |
13:58.49 |
Notify |
03BRL-CAD:carlmoore * 65716
(brlcad/trunk/db/nist/NIST_MBE_PMI_11.stp
brlcad/trunk/db/nist/NIST_MBE_PMI_6.stp): remove trailing white
space |
14:14.57 |
*** join/#brlcad gurwinder
(~chatzilla@117.214.205.207) |
14:18.01 |
*** join/#brlcad ih8sum3r
(~deepak@122.173.163.248) |
14:58.47 |
*** join/#brlcad bhollister2
(~brad@2601:647:cb02:7a00:f04d:35ac:f0ba:5880) |
15:14.10 |
*** join/#brlcad merzo
(~merzo@user-94-45-58-138-1.skif.com.ua) |
15:37.02 |
*** join/#brlcad sofat
(~sofat@202.164.45.204) |
15:38.54 |
Notify |
03BRL-CAD:carlmoore * 65717
(brlcad/trunk/db/nist/NIST_MBE_PMI_11.stp
brlcad/trunk/db/nist/NIST_MBE_PMI_6.stp): ----------- |
15:46.41 |
*** join/#brlcad konrado
(~konro@41.205.22.16) |
15:49.33 |
Notify |
03BRL-CAD:ejno * 65718
brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp: fix
get_unioned() returning pointers to memory that may later be
freed |
15:54.14 |
Notify |
03BRL-CAD:carlmoore * 65719
(brlcad/trunk/src/conv/3dm/3dm-g.cpp
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp and 2
others): fix spellings; and, in 3dm-g.cpp , implement '?' as
option |
16:26.02 |
Notify |
03BRL-CAD:carlmoore * 65720
(brlcad/trunk/src/util/bw-ps.c brlcad/trunk/src/util/pix-ps.c):
cosmetic changes for bw-ps.c and pix-ps.c to look more alike
(bw-ps.c has had the placement of 2 routines shifted) |
16:26.05 |
*** part/#brlcad gurwinder
(~chatzilla@117.214.205.207) |
16:26.34 |
*** join/#brlcad gurwinder
(~chatzilla@117.214.205.207) |
16:32.48 |
Notify |
03BRL-CAD:carlmoore * 65721
brlcad/trunk/src/util/pix-ps.c: shift location of 'char Stdin' to
make pix-ps.c resemble bw-ps.c that more closely |
16:46.22 |
*** join/#brlcad vasc
(~vasc@bl13-114-172.dsl.telepac.pt) |
16:49.06 |
vasc |
http://www.cnet.com/news/insane-flying-semi-truck-sets-jump-record-nearly-takes-out-building/ |
17:14.02 |
*** join/#brlcad sofat
(~sofat@202.164.45.212) |
17:36.10 |
*** join/#brlcad sofat
(~sofat@202.164.45.204) |
17:51.49 |
Notify |
03BRL-CAD:starseeker * 65722
(brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp
brlcad/trunk/src/libanalyze/util.cpp): Add some debug
printing. |
18:24.26 |
Notify |
03BRL-CAD:starseeker * 65723
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: off by one
errors don't help plotting any... |
18:30.21 |
Notify |
03BRL-CAD:starseeker * 65724
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: plot gaps
while we're at it. |
18:30.45 |
Notify |
03BRL-CAD:ejno * 65725
(brlcad/trunk/src/libgcv/conv/fastgen4/NOTES
brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp): update
notes |
18:57.55 |
*** join/#brlcad bhollister
(~behollis@dhcp-59-221.cse.ucsc.edu) |
19:14.33 |
Notify |
03BRL-CAD:starseeker * 65726
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: Start
looking for missing gaps. |
19:17.11 |
Notify |
03BRL-CAD:starseeker * 65727
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: Move to the
next hit in that case rather than breaking out of the
loop... |
19:43.54 |
Notify |
03BRL-CAD:starseeker * 65728
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: plot the
missing gaps. |
19:55.47 |
Notify |
03BRL-CAD:brlcad * 65729
brlcad/trunk/src/librt/primitives/datum/datum.c: slightly bigger
points |
20:08.49 |
Notify |
03BRL-CAD Wiki:Terry.e.wen * 9123
/wiki/User:Terry.e.wen/log: |
20:09.07 |
Notify |
03BRL-CAD Wiki:Terry.e.wen * 9124
/wiki/User:Terry.e.wen/log: |
20:25.36 |
Notify |
03BRL-CAD:starseeker * 65730
(brlcad/trunk/include/analyze.h
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp
brlcad/trunk/src/libged/shape_recognition.cpp): Need to create
candidates to raytrace, but we don't seem to have everything ready.
Needs more investigation. |
20:51.31 |
Notify |
03BRL-CAD Wiki:Deekaysharma * 9125
/wiki/User:Deekaysharma/logs: |
20:52.38 |
Notify |
03BRL-CAD Wiki:Deekaysharma * 9126
/wiki/User:Deekaysharma/logs: |
20:53.27 |
*** part/#brlcad ih8sum3r
(~deepak@122.173.163.248) |
21:06.30 |
Notify |
03BRL-CAD:brlcad * 65731
brlcad/trunk/src/libdm/dm-ogl.c: draw smooth points (circles
instead of squares) |
21:08.04 |
Notify |
03BRL-CAD:brlcad * 65732
brlcad/trunk/src/libdm/dm-X.c: draw circles instead of a rectangle
when plotting points. this requires a little creativity as there
are limitations with X11 not wanting to draw small circles without
drawing both the exterior and the interior. |
21:13.50 |
Notify |
03BRL-CAD:brlcad * 65733
(brlcad/trunk/src/libdm/dm-ogl.c brlcad/trunk/src/libdm/dm-osgl.cpp
and 2 others): oof, too many duplicate opengl callers. make them
all draw smooth points. might pose an issue for large point clouds
and rtgl. |
21:27.00 |
Notify |
03BRL-CAD:brlcad * 65734
brlcad/trunk/src/libdm/dm-rtgl.c: remove unused functions |
21:49.59 |
*** join/#brlcad __monty__
(~toonn@d51A5489B.access.telenet.be) |
21:51.53 |
Notify |
03BRL-CAD Wiki:202.164.45.204 * 9127
/wiki/User:Hiteshsofat/GSoc15/log_developmen: |
21:53.29 |
*** part/#brlcad __monty__
(~toonn@d51A5489B.access.telenet.be) |
22:17.47 |
dracarys983 |
brlcad: I have initialized a new struct bu_vls
using BU_GET() first and then bu_vls_init(). But using it doesn't
print to the MGED window. What might be the problem? |
22:35.42 |
Notify |
03BRL-CAD:starseeker * 65735
(brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp
brlcad/trunk/src/libged/shape_recognition.cpp): getting crashes
with the raytracing now... |
22:39.49 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9128
/wiki/User:Vasco.costa/GSoC15/logs: |
22:43.19 |
Notify |
03BRL-CAD:starseeker * 65736
brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: rt_clean
causes things to hang - must not be using it right |
22:44.27 |
starseeker |
grrrr |
22:54.16 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9129
/wiki/User:Vasco.costa/GSoC15/logs: |
22:55.24 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9130
/wiki/User:Vasco.costa/GSoC15/logs: /* Development Status
*/ |
22:56.30 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9131
/wiki/User:Vasco.costa/GSoC15/logs: |
22:56.59 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9132
/wiki/User:Vasco.costa/GSoC15/logs: |
23:00.27 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9133
/wiki/User:Vasco.costa/GSoC15/logs: |
23:01.34 |
vasc |
time to work on tor and bot i guess |
23:01.45 |
vasc |
hmmm. dinner first. |
23:02.24 |
Stragus |
How is it going vasc? You had some issues with
OpenCL alignment requirements?... |
23:02.39 |
vasc |
well the vector load instructions require size
alignment |
23:02.48 |
vasc |
so my previous plan to use AoS was a
bust |
23:03.00 |
vasc |
anyway its done |
23:03.15 |
Stragus |
Told you so :p |
23:03.33 |
vasc |
well i thought they would have relaxed that by
now |
23:03.37 |
vasc |
but they didn't |
23:03.49 |
Stragus |
It's still a lot slower even when the hardware
allows it |
23:03.55 |
vasc |
its like SPARC RISC programming all over
again... |
23:04.25 |
vasc |
sure. but the the data is packed more
tightly. |
23:04.41 |
Stragus |
On CUDA hardware, have the whole warp fetch 32
consecutive floats: it's either 1 or 4 memory
transactions |
23:04.53 |
vasc |
not that it probably matters in this app since
the number of objects seems to be real slow |
23:05.00 |
Stragus |
If AoS, you get 32 memory transactions and you
don't have enough vmem bandwidth to feed all the cores
properly |
23:05.01 |
vasc |
you could prolly fit all the objects in L1
cache |
23:05.14 |
Stragus |
It's still a lot slower |
23:05.38 |
vasc |
well i'm using a mix fwiw |
23:05.42 |
Stragus |
Incoherent access in a warp is only fast
within CUDA shared memory (I think OpenCL calls it local
memory?...) |
23:05.53 |
vasc |
yeah its the local memory |
23:06.00 |
Stragus |
But then you better watch for shared memory
bank conflicts |
23:06.12 |
vasc |
but you know the latest GPUs aren't as picky
about that |
23:06.32 |
Stragus |
As picky about what? Incoherent
access? |
23:06.41 |
vasc |
the caches behave more like CPU
caches |
23:06.54 |
Stragus |
Yes yes... but if you have 32 incoherent
access, it's still really slow |
23:07.08 |
vasc |
well so far i'm having other issues |
23:07.18 |
Stragus |
CPUs are even worse. In AVX2's vgatherdps
instruction, the loads are *serialized* |
23:07.22 |
vasc |
like two orders or three of magnitude slowness
from all these bus transfers |
23:07.46 |
Stragus |
Oh, and on Xeon Phi... vgatherdps is not only
serialized, but you have to loop over the instruction until it
tells you it's done. Words fail me to describe how absurd that
is |
23:07.46 |
vasc |
maybe worse for all i know |
23:07.52 |
Stragus |
kicks Intel in the
tibia |
23:08.10 |
Stragus |
Bus transfers? CPU<->GPU? |
23:08.14 |
vasc |
yes |
23:08.15 |
vasc |
so |
23:08.40 |
vasc |
i keep calling a kernel every time i compute a
solid intersection |
23:08.43 |
Stragus |
That'll be resolved when they fix their code
to consume raytraced data right in GPU memory |
23:08.49 |
Stragus |
Ew... |
23:08.56 |
vasc |
well the solid data is in the gpu
now |
23:09.18 |
vasc |
the problem is storing the results and things
like that |
23:09.26 |
vasc |
the dynamic lists of temporaries and shit like
that |
23:09.43 |
vasc |
as i said yesterday |
23:09.46 |
Stragus |
thought the idea of a giant
static buffer allocated dynamically through atomics was a good
idea |
23:09.51 |
vasc |
it is |
23:09.58 |
vasc |
but i still need to do a lot of shit
first |
23:10.04 |
Stragus |
Right |
23:10.21 |
Stragus |
I feel I would have fun helping you with
this |
23:10.36 |
vasc |
its getting to a point where its easier to
dive into it |
23:10.38 |
Stragus |
has no idea how that GSoC
stuff works |
23:11.19 |
vasc |
i propose a workplan and if the project leads
accept it gets funded by google |
23:11.27 |
Stragus |
I still feel the first step would be to
implement a "hit" callback without any kind of hit
buffering |
23:11.44 |
Stragus |
Then someone can complete the job by putting
fancy buffering with atomics into that callback |
23:11.47 |
vasc |
well |
23:12.02 |
vasc |
the thing is the csg |
23:12.09 |
Stragus |
(Might not be what brlcad told you, and he
certainly has authority on the matter) |
23:12.29 |
vasc |
i think he said i could just do first hit
intersection and ignore the csg as a first approach |
23:12.40 |
Stragus |
Eh well, that also works |
23:12.55 |
Stragus |
If you make it an inlined callback, return 0
to terminate the ray, return 1 to continue |
23:14.42 |
Notify |
03BRL-CAD Wiki:85.246.114.172 * 9134
/wiki/User:Vasco.costa/GSoC15/logs: |
23:15.51 |
vasc |
i think i'll do the TOR and TGC
first |
23:16.02 |
vasc |
so i can get a better grasp of the problem
domain here |
23:16.29 |
vasc |
right now all the solids i implemented on the
GPU can have 2 intersection points max one in and another
out |
23:16.47 |
Stragus |
If it's a callback, you don't have to worry so
much about that |
23:16.55 |
Stragus |
Whatever the inlined callback does with the
hit is not your problem |
23:17.05 |
vasc |
sure but the problem is i don't know how the
boolean weaving of the csg works |
23:17.08 |
Stragus |
Then you make a simple callback that returns
the first hit and terminate the ray, or so |
23:17.14 |
Stragus |
Ah yes, right |
23:17.29 |
vasc |
well i saw a simple raytracer once |
23:17.32 |
vasc |
with CSG |
23:18.19 |
vasc |
but i don't quite get how BRL-CAD does its
thing yet |
23:18.46 |
vasc |
this is basically the problem i was interested
in working on the first place |
23:19.02 |
vasc |
sean suggested it and i thought it was an
interesting problem |
23:19.16 |
vasc |
the thing is we needed to do a LOT of ground
work first... |
23:19.22 |
Stragus |
Right |
23:20.39 |
vasc |
only got 4 primitives working now |
23:20.45 |
vasc |
next i'll add another 2 |
23:21.23 |
Stragus |
If the overall structure is sound, I feel it
would be easy for someone to add support for more
primitives |
23:21.31 |
Stragus |
So that shouldn't be too critical |
23:22.14 |
vasc |
it isn't transfering the solids data from the
cpu anymore. the data is stored on the gpu now. |
23:22.27 |
vasc |
next i'll implement a couple more
solids |
23:23.03 |
vasc |
then i'll probably work on doing the ray
generation on the gpu |
23:23.28 |
vasc |
dunno how i'll do about the shading yet
though |
23:23.37 |
Stragus |
Ray generation, shading? |
23:23.38 |
vasc |
i'll prolly need to send more data |
23:23.48 |
Stragus |
I thought BRL-CAD's raytracer always received
vectors through its API |
23:23.55 |
vasc |
well |
23:24.04 |
vasc |
depends on where you sink yours claws
into |
23:24.57 |
vasc |
i wanted to exploit ray parallelism so i what
to dig into the bit where it computes a whole image |
23:25.15 |
Stragus |
Of course, OpenCL is all about
parallelism |
23:25.30 |
Stragus |
Isn't there a batch/bundle API for the
raytracer? |
23:25.39 |
vasc |
it all starts with this do_run(int cur_pixel,
int last_pixel) |
23:26.27 |
vasc |
which then calls do_pixel() |
23:26.32 |
vasc |
for every pixel |
23:26.48 |
Stragus |
That sounds very high level for now |
23:26.50 |
vasc |
which generates the rays, traverses the scene,
and computes the shading |
23:27.11 |
vasc |
that's how BRL-CAD works |
23:27.30 |
vasc |
of course to do what we want to do we need to
bulldoze this neat little construction |
23:27.31 |
Stragus |
To generate pictures yes, but they use
raytracing for a lot more stuff |
23:27.39 |
vasc |
sure |
23:27.50 |
vasc |
but this is my current concern |
23:28.04 |
vasc |
rt_shootray() is called elsewhere
but |
23:28.17 |
vasc |
its usually something like the user clicks a
point and wants to know something |
23:28.42 |
vasc |
its not like a bit of latency from doing it on
the CPU is gonna be a big issue there |
23:28.45 |
Stragus |
I believe they do a lot of intense analysis
with raytracing |
23:28.54 |
vasc |
right there's that too |
23:29.06 |
vasc |
in those cases we'll need to do things
differently |
23:30.10 |
vasc |
if you generate the rays on the gpu you can
save a shitton of bus traffic |
23:30.12 |
Stragus |
Hum... I thought there was a batch/bundle
shootray() function somewhere |
23:30.16 |
vasc |
i do that on my renderer as well |
23:30.26 |
vasc |
there is. it just isn't used.
ANYWHERE. |
23:30.31 |
Stragus |
Ahah! |
23:30.32 |
Stragus |
Cool. |
23:30.43 |
Stragus |
That is terrible |
23:31.18 |
Stragus |
On the plus side, that means you are free to
design your own bundle/batch API since nothing uses the current
one |
23:31.24 |
vasc |
it might have been used by some branch that
didn't live or something |
23:31.55 |
vasc |
yeah |
23:32.06 |
Stragus |
It's only good if you use SSE2/AVX, CUDA,
OpenCL... and BRL-CAD isn't very strong on that stuff |
23:32.07 |
vasc |
anyway that's a shitton of work |
23:32.11 |
Stragus |
Agreed |
23:32.50 |
vasc |
well the current code has a definitive
emphasis on portability |
23:32.54 |
vasc |
and for good reason i think |
23:33.11 |
vasc |
that's why i'm not using CUDA |
23:33.20 |
Notify |
03BRL-CAD Wiki:Bhollister * 9135
/wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015
*/ |
23:34.05 |
vasc |
although opencl has its own
issues... |
23:34.16 |
vasc |
it still hasn't caught on enough |
23:34.39 |
Notify |
03BRL-CAD Wiki:Bhollister * 9136
/wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015
*/ |
23:35.02 |
Notify |
03BRL-CAD Wiki:Bhollister * 9137
/wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015
*/ |
23:35.05 |
vasc |
except for the cpu implementations all the gpu
implementations have warts in them |
23:35.15 |
vasc |
the amd gpu compiler has a lot of bugs in
it |
23:35.29 |
vasc |
and the nvidia gpu compiler only compiles an
ancient version of opencl |
23:35.40 |
vasc |
and now i hear the apple gpu compiler is
broken too |
23:36.06 |
vasc |
its like java code once test
everywhere |
23:36.18 |
vasc |
well its worse than java |
23:36.28 |
vasc |
its gonna be like java once OpenCL 2.0 is
commonplace |
23:36.35 |
vasc |
IF it ever gets to be commonplace |
23:37.21 |
vasc |
right now you send program source code to the
graphics driver and it compiles it and runs it |
23:37.49 |
vasc |
with 2.0 you compile intermediate code and
send that to the graphics driver which recompiles it to the target
architecture and runs it |
23:39.22 |
Stragus |
CUDA gives you a lot more control over Nvidia
hardware, as expected |
23:39.41 |
Stragus |
And OpenCL might seem like a good idea, but
the truth is that you must write completely different code for each
platform *anyway* |
23:39.55 |
Stragus |
If you write the same code for both AMD and
Nvidia, it's going to be slow |
23:40.12 |
vasc |
well |
23:40.23 |
vasc |
i think it's a better idea |
23:40.42 |
vasc |
and some things are better and others
worse |
23:40.48 |
Stragus |
It would be a good idea if the core language
exposed a bunch of vendor-specific extensions, like
OpenGL |
23:41.02 |
Stragus |
So you could still write good code for a bunch
of platforms |
23:41.06 |
vasc |
it does. but there aren't a lot of extensions
available. |
23:41.30 |
vasc |
you can even use inline assembly. |
23:41.39 |
Stragus |
Only CUDA PTX inline assembly ;) |
23:42.04 |
vasc |
well i dunno about other OpenCL
compilers |
23:42.35 |
Stragus |
The hardware targets are so different, "one
code runs everywhere" isn't a good idea if you care about
performance |
23:43.05 |
Stragus |
Now, I know brlcad keeps saying he doesn't
care about performance... but for a lot of people out there, that
isn't a good compromise |
23:43.56 |
vasc |
yeah but the gpu architectures are too
different |
23:44.06 |
Stragus |
Exactly, so you need different codes
anyway |
23:44.31 |
vasc |
well its not interesting if the code becomes
unportable |
23:44.36 |
vasc |
unrunnable |
23:45.01 |
Stragus |
I didn't say that, you can put the
hardware-specific stuff under #if or such |
23:45.15 |
vasc |
so you use a higher level language and if you
really need to squeeze perf in some place you can use inline
asm |
23:45.16 |
Stragus |
But the code's entire design is optimized for
a specific hardware architecture |
23:45.17 |
vasc |
at least on nvidia |
23:45.26 |
vasc |
well |
23:45.33 |
vasc |
i'm going to optimize it for SIMT
basically |
23:45.39 |
Stragus |
Right |
23:46.11 |
vasc |
but the SIMT model maps out decently to SIMD
and MIMD |
23:46.35 |
vasc |
e.g. |
23:46.41 |
vasc |
i had my triangle ray tracer |
23:46.43 |
Stragus |
It's a lot more flexible. SIMT-designed code
can run on SIMD SSE/AVX, but it may catastrophically slow
:p |
23:46.48 |
vasc |
and i rewrote it in opencl |
23:47.03 |
vasc |
i ran it on the cpu using amd opencl and it
was 4x faster |
23:47.06 |
vasc |
you can guess why |
23:47.16 |
Stragus |
SSE, eh |
23:47.40 |
vasc |
i think the gpu was 8x faster than that
one |
23:47.52 |
vasc |
the cpu opencl one |
23:47.55 |
Stragus |
That OpenCL CPU compiler was surprisingly
clever somehow |
23:47.58 |
vasc |
i only changed one line of code |
23:48.09 |
vasc |
so you see it's quite decent |
23:48.18 |
Stragus |
Compilers aren't good at emitting instructions
like movmaskps and everything above, which is essential for a
raytracer |
23:48.24 |
Stragus |
(or at least for mine) |
23:48.35 |
vasc |
well my raytracer was in ANSI C |
23:48.41 |
vasc |
with OpenMP |
23:48.54 |
Stragus |
can't stand
OpenMP |
23:49.08 |
vasc |
its kinda crappy but nearly any compiler can
use it |
23:49.21 |
vasc |
any compiler that matters supports
it |
23:49.36 |
Stragus |
Right, and everybody wants to use it, no
matter how crappy it is |
23:50.10 |
vasc |
i used pthreads at one point |
23:50.13 |
vasc |
the perf was the same |
23:50.17 |
vasc |
and it was unportable |
23:50.23 |
Stragus |
For a raytracer, probably |
23:50.34 |
Stragus |
For some problems, OpenMP really gets in the
way of doing things properly |
23:50.37 |
vasc |
sure |
23:51.10 |
vasc |
the thing is you can do it without using a lot
of synchronization |
23:51.20 |
Stragus |
Right |
23:51.25 |
vasc |
in fact i didn't use any synchronization
between threads at all |
23:51.41 |
Stragus |
remembers his atomic
NUMA-aware staged barriers written in assembly |
23:52.04 |
vasc |
anyway the thing is |
23:52.12 |
vasc |
opencl does use the sse perf |
23:52.19 |
vasc |
it might not get 100% of it but its
decent |
23:52.45 |
Stragus |
Right. But what is fast on GPU and what is
fast on CPU are sometimes radically opposed |
23:52.55 |
vasc |
yeah |
23:53.02 |
Stragus |
So if your code is designed for both, it's
going to be slow on both |
23:53.10 |
vasc |
but in my experience code optimized for SIMT
runs well on the cpu as well |
23:53.43 |
Stragus |
I would say you were lucky |
23:54.22 |
vasc |
i tried sse with intrinsics at one
point |
23:54.38 |
vasc |
the performance was so hit and miss it was
exasperating |
23:54.58 |
Stragus |
Yes, you really need to know what the compiler
and hardware are doing |
23:55.04 |
vasc |
let the damned compiler optimize it for my
cpu |
23:56.38 |
vasc |
there's room for hand written code but it
keeps getting harder as codebases get bigger |
23:56.50 |
vasc |
and the computer architectures more
complicated |
23:58.33 |
Stragus |
Compilers have a hard time with complex
architectures as well, partly because the code that's being fed to
them isn't designed for the actual architectures |
23:58.52 |
Stragus |
And parly because compilers are
stupid |
23:58.57 |
Stragus |
partly* |