IRC log for #brlcad on 20150728

00:06.57 Notify 03BRL-CAD Wiki:Bhollister * 9118 /wiki/User:Bhollister/DevLogJuly2015: /* Mon, July 27, 2015: Start of Week 10 (of 14) */
00:14.56 starseeker bhollister: unfortunately, a quick scan through the code suggests there isn't an nmg_visit_* example
00:15.14 starseeker bhollister: I'd suggest writing a small test program to exercise the various functions
00:31.10 vasc weird
00:31.41 vasc my code worked TOO WELL
00:33.07 vasc yeah i knew it
00:33.17 vasc it isn't calling the segment i just wrote
00:40.37 vasc that's more like it
00:49.40 vasc uhoh
01:06.36 Notify 03BRL-CAD:starseeker * 65709 brlcad/trunk/src/libged/shape_recognition.cpp: The wmember list seems to be volatile - take another approach to collecting the finalize comb info. This needs a lot of cleanup, but at least the hierarchy does get generated...
01:10.07 *** join/#brlcad vasc__ (~vasc@bl13-114-172.dsl.telepac.pt)
01:16.14 vasc__ back to the drawing board. this way of storing data doesn't work because opencl vectorized loads must be aligned to the type size. great.
01:16.33 vasc__ a week to the trash it is
01:16.35 vasc__ hmm
01:16.38 vasc__ lets see
01:16.45 vasc__ how i can reuse this
01:30.04 Notify 03BRL-CAD:starseeker * 65710 brlcad/trunk/src/libged/shape_recognition.cpp: Set up for a different approch - create the combs, then edit them after they are created.
01:49.49 vasc__ later
01:52.36 Notify 03BRL-CAD:starseeker * 65711 (brlcad/trunk/include/brep.h brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libged/shape_recognition.cpp): Start getting set up for ray shooting.
02:10.01 Notify 03BRL-CAD:starseeker * 65712 (brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libanalyze/util.cpp brlcad/trunk/src/libged/shape_recognition.cpp): Go non-parallel for debugging.
02:20.18 Notify 03BRL-CAD:starseeker * 65713 (brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libanalyze/util.cpp): back off the rays some more - got another problem somewhere.
02:26.29 Notify 03BRL-CAD:starseeker * 65714 brlcad/trunk/src/libanalyze/util.cpp: fix initialization when prep is coming from outside.
03:57.30 *** join/#brlcad gurwinder (~chatzilla@117.214.205.207)
04:32.50 *** join/#brlcad bhollister2 (~brad@2601:647:cb02:7a00:f04d:35ac:f0ba:5880)
07:08.18 Notify 03BRL-CAD Wiki:85.246.114.172 * 9119 /wiki/User:Vasco.costa/GSoC15/logs:
07:10.59 Notify 03BRL-CAD Wiki:85.246.114.172 * 9120 /wiki/User:Vasco.costa/GSoC15/logs:
07:12.46 *** join/#brlcad ries (~ries@D979C47E.cm-3-2d.dynamic.ziggo.nl)
07:17.59 Notify 03BRL-CAD Wiki:MeShubham99 * 9121 /wiki/User:MeShubham99/GSoc15/log_developmen: /* Week 9 */
07:18.48 Notify 03BRL-CAD Wiki:MeShubham99 * 9122 /wiki/User:MeShubham99/GSoc15/log_developmen: /* Week 9 */
07:31.36 *** join/#brlcad teepee-- (bc5c2134@gateway/web/freenode/ip.188.92.33.52)
08:07.25 *** join/#brlcad dracarys983 (dracarys98@nat/iiit/x-eruahluooryxrrfd)
08:16.25 *** join/#brlcad luca79 (~luca@host129-17-dynamic.4-87-r.retail.telecomitalia.it)
08:17.29 *** join/#brlcad shaina (~shaina@59.89.100.105)
08:52.40 *** join/#brlcad merzo (~merzo@user-94-45-58-141.skif.com.ua)
10:39.14 *** join/#brlcad packrat (~packrator@c-71-231-32-234.hsd1.wa.comcast.net)
11:08.37 *** join/#brlcad jordisayol (~jordisayo@unaffiliated/jordisayol)
11:09.00 jordisayol hello all
11:10.47 jordisayol I don't have files upload permission to brlcad sourceforge. Is this a temporary maintenance issue?
11:30.41 *** join/#brlcad luca79 (~luca@host130-19-dynamic.4-87-r.retail.telecomitalia.it)
11:37.02 *** join/#brlcad konrado (~konro@41.205.22.27)
11:40.10 jordisayol Yes, sourceforge upload files is offline
11:40.11 jordisayol http://sourceforge.net/blog/sourceforge-infrastructure-and-service-restoration-update-for-724/
12:23.09 *** join/#brlcad sofat (~sofat@202.164.45.204)
12:32.51 *** join/#brlcad andrei_il (~andrei@109.100.128.78)
12:46.46 *** join/#brlcad sofat (~sofat@202.164.45.204)
13:25.54 Notify 03BRL-CAD:starseeker * 65715 (brlcad/trunk/include/analyze.h brlcad/trunk/src/libanalyze/analyze_private.h and 5 others): Pass in the cpu count.
13:58.49 Notify 03BRL-CAD:carlmoore * 65716 (brlcad/trunk/db/nist/NIST_MBE_PMI_11.stp brlcad/trunk/db/nist/NIST_MBE_PMI_6.stp): remove trailing white space
14:14.57 *** join/#brlcad gurwinder (~chatzilla@117.214.205.207)
14:18.01 *** join/#brlcad ih8sum3r (~deepak@122.173.163.248)
14:58.47 *** join/#brlcad bhollister2 (~brad@2601:647:cb02:7a00:f04d:35ac:f0ba:5880)
15:14.10 *** join/#brlcad merzo (~merzo@user-94-45-58-138-1.skif.com.ua)
15:37.02 *** join/#brlcad sofat (~sofat@202.164.45.204)
15:38.54 Notify 03BRL-CAD:carlmoore * 65717 (brlcad/trunk/db/nist/NIST_MBE_PMI_11.stp brlcad/trunk/db/nist/NIST_MBE_PMI_6.stp): -----------
15:46.41 *** join/#brlcad konrado (~konro@41.205.22.16)
15:49.33 Notify 03BRL-CAD:ejno * 65718 brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp: fix get_unioned() returning pointers to memory that may later be freed
15:54.14 Notify 03BRL-CAD:carlmoore * 65719 (brlcad/trunk/src/conv/3dm/3dm-g.cpp brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp and 2 others): fix spellings; and, in 3dm-g.cpp , implement '?' as option
16:26.02 Notify 03BRL-CAD:carlmoore * 65720 (brlcad/trunk/src/util/bw-ps.c brlcad/trunk/src/util/pix-ps.c): cosmetic changes for bw-ps.c and pix-ps.c to look more alike (bw-ps.c has had the placement of 2 routines shifted)
16:26.05 *** part/#brlcad gurwinder (~chatzilla@117.214.205.207)
16:26.34 *** join/#brlcad gurwinder (~chatzilla@117.214.205.207)
16:32.48 Notify 03BRL-CAD:carlmoore * 65721 brlcad/trunk/src/util/pix-ps.c: shift location of 'char Stdin' to make pix-ps.c resemble bw-ps.c that more closely
16:46.22 *** join/#brlcad vasc (~vasc@bl13-114-172.dsl.telepac.pt)
16:49.06 vasc http://www.cnet.com/news/insane-flying-semi-truck-sets-jump-record-nearly-takes-out-building/
17:14.02 *** join/#brlcad sofat (~sofat@202.164.45.212)
17:36.10 *** join/#brlcad sofat (~sofat@202.164.45.204)
17:51.49 Notify 03BRL-CAD:starseeker * 65722 (brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libanalyze/util.cpp): Add some debug printing.
18:24.26 Notify 03BRL-CAD:starseeker * 65723 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: off by one errors don't help plotting any...
18:30.21 Notify 03BRL-CAD:starseeker * 65724 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: plot gaps while we're at it.
18:30.45 Notify 03BRL-CAD:ejno * 65725 (brlcad/trunk/src/libgcv/conv/fastgen4/NOTES brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp): update notes
18:57.55 *** join/#brlcad bhollister (~behollis@dhcp-59-221.cse.ucsc.edu)
19:14.33 Notify 03BRL-CAD:starseeker * 65726 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: Start looking for missing gaps.
19:17.11 Notify 03BRL-CAD:starseeker * 65727 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: Move to the next hit in that case rather than breaking out of the loop...
19:43.54 Notify 03BRL-CAD:starseeker * 65728 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: plot the missing gaps.
19:55.47 Notify 03BRL-CAD:brlcad * 65729 brlcad/trunk/src/librt/primitives/datum/datum.c: slightly bigger points
20:08.49 Notify 03BRL-CAD Wiki:Terry.e.wen * 9123 /wiki/User:Terry.e.wen/log:
20:09.07 Notify 03BRL-CAD Wiki:Terry.e.wen * 9124 /wiki/User:Terry.e.wen/log:
20:25.36 Notify 03BRL-CAD:starseeker * 65730 (brlcad/trunk/include/analyze.h brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libged/shape_recognition.cpp): Need to create candidates to raytrace, but we don't seem to have everything ready. Needs more investigation.
20:51.31 Notify 03BRL-CAD Wiki:Deekaysharma * 9125 /wiki/User:Deekaysharma/logs:
20:52.38 Notify 03BRL-CAD Wiki:Deekaysharma * 9126 /wiki/User:Deekaysharma/logs:
20:53.27 *** part/#brlcad ih8sum3r (~deepak@122.173.163.248)
21:06.30 Notify 03BRL-CAD:brlcad * 65731 brlcad/trunk/src/libdm/dm-ogl.c: draw smooth points (circles instead of squares)
21:08.04 Notify 03BRL-CAD:brlcad * 65732 brlcad/trunk/src/libdm/dm-X.c: draw circles instead of a rectangle when plotting points. this requires a little creativity as there are limitations with X11 not wanting to draw small circles without drawing both the exterior and the interior.
21:13.50 Notify 03BRL-CAD:brlcad * 65733 (brlcad/trunk/src/libdm/dm-ogl.c brlcad/trunk/src/libdm/dm-osgl.cpp and 2 others): oof, too many duplicate opengl callers. make them all draw smooth points. might pose an issue for large point clouds and rtgl.
21:27.00 Notify 03BRL-CAD:brlcad * 65734 brlcad/trunk/src/libdm/dm-rtgl.c: remove unused functions
21:49.59 *** join/#brlcad __monty__ (~toonn@d51A5489B.access.telenet.be)
21:51.53 Notify 03BRL-CAD Wiki:202.164.45.204 * 9127 /wiki/User:Hiteshsofat/GSoc15/log_developmen:
21:53.29 *** part/#brlcad __monty__ (~toonn@d51A5489B.access.telenet.be)
22:17.47 dracarys983 brlcad: I have initialized a new struct bu_vls using BU_GET() first and then bu_vls_init(). But using it doesn't print to the MGED window. What might be the problem?
22:35.42 Notify 03BRL-CAD:starseeker * 65735 (brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libged/shape_recognition.cpp): getting crashes with the raytracing now...
22:39.49 Notify 03BRL-CAD Wiki:85.246.114.172 * 9128 /wiki/User:Vasco.costa/GSoC15/logs:
22:43.19 Notify 03BRL-CAD:starseeker * 65736 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: rt_clean causes things to hang - must not be using it right
22:44.27 starseeker grrrr
22:54.16 Notify 03BRL-CAD Wiki:85.246.114.172 * 9129 /wiki/User:Vasco.costa/GSoC15/logs:
22:55.24 Notify 03BRL-CAD Wiki:85.246.114.172 * 9130 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */
22:56.30 Notify 03BRL-CAD Wiki:85.246.114.172 * 9131 /wiki/User:Vasco.costa/GSoC15/logs:
22:56.59 Notify 03BRL-CAD Wiki:85.246.114.172 * 9132 /wiki/User:Vasco.costa/GSoC15/logs:
23:00.27 Notify 03BRL-CAD Wiki:85.246.114.172 * 9133 /wiki/User:Vasco.costa/GSoC15/logs:
23:01.34 vasc time to work on tor and bot i guess
23:01.45 vasc hmmm. dinner first.
23:02.24 Stragus How is it going vasc? You had some issues with OpenCL alignment requirements?...
23:02.39 vasc well the vector load instructions require size alignment
23:02.48 vasc so my previous plan to use AoS was a bust
23:03.00 vasc anyway its done
23:03.15 Stragus Told you so :p
23:03.33 vasc well i thought they would have relaxed that by now
23:03.37 vasc but they didn't
23:03.49 Stragus It's still a lot slower even when the hardware allows it
23:03.55 vasc its like SPARC RISC programming all over again...
23:04.25 vasc sure. but the the data is packed more tightly.
23:04.41 Stragus On CUDA hardware, have the whole warp fetch 32 consecutive floats: it's either 1 or 4 memory transactions
23:04.53 vasc not that it probably matters in this app since the number of objects seems to be real slow
23:05.00 Stragus If AoS, you get 32 memory transactions and you don't have enough vmem bandwidth to feed all the cores properly
23:05.01 vasc you could prolly fit all the objects in L1 cache
23:05.14 Stragus It's still a lot slower
23:05.38 vasc well i'm using a mix fwiw
23:05.42 Stragus Incoherent access in a warp is only fast within CUDA shared memory (I think OpenCL calls it local memory?...)
23:05.53 vasc yeah its the local memory
23:06.00 Stragus But then you better watch for shared memory bank conflicts
23:06.12 vasc but you know the latest GPUs aren't as picky about that
23:06.32 Stragus As picky about what? Incoherent access?
23:06.41 vasc the caches behave more like CPU caches
23:06.54 Stragus Yes yes... but if you have 32 incoherent access, it's still really slow
23:07.08 vasc well so far i'm having other issues
23:07.18 Stragus CPUs are even worse. In AVX2's vgatherdps instruction, the loads are *serialized*
23:07.22 vasc like two orders or three of magnitude slowness from all these bus transfers
23:07.46 Stragus Oh, and on Xeon Phi... vgatherdps is not only serialized, but you have to loop over the instruction until it tells you it's done. Words fail me to describe how absurd that is
23:07.46 vasc maybe worse for all i know
23:07.52 Stragus kicks Intel in the tibia
23:08.10 Stragus Bus transfers? CPU<->GPU?
23:08.14 vasc yes
23:08.15 vasc so
23:08.40 vasc i keep calling a kernel every time i compute a solid intersection
23:08.43 Stragus That'll be resolved when they fix their code to consume raytraced data right in GPU memory
23:08.49 Stragus Ew...
23:08.56 vasc well the solid data is in the gpu now
23:09.18 vasc the problem is storing the results and things like that
23:09.26 vasc the dynamic lists of temporaries and shit like that
23:09.43 vasc as i said yesterday
23:09.46 Stragus thought the idea of a giant static buffer allocated dynamically through atomics was a good idea
23:09.51 vasc it is
23:09.58 vasc but i still need to do a lot of shit first
23:10.04 Stragus Right
23:10.21 Stragus I feel I would have fun helping you with this
23:10.36 vasc its getting to a point where its easier to dive into it
23:10.38 Stragus has no idea how that GSoC stuff works
23:11.19 vasc i propose a workplan and if the project leads accept it gets funded by google
23:11.27 Stragus I still feel the first step would be to implement a "hit" callback without any kind of hit buffering
23:11.44 Stragus Then someone can complete the job by putting fancy buffering with atomics into that callback
23:11.47 vasc well
23:12.02 vasc the thing is the csg
23:12.09 Stragus (Might not be what brlcad told you, and he certainly has authority on the matter)
23:12.29 vasc i think he said i could just do first hit intersection and ignore the csg as a first approach
23:12.40 Stragus Eh well, that also works
23:12.55 Stragus If you make it an inlined callback, return 0 to terminate the ray, return 1 to continue
23:14.42 Notify 03BRL-CAD Wiki:85.246.114.172 * 9134 /wiki/User:Vasco.costa/GSoC15/logs:
23:15.51 vasc i think i'll do the TOR and TGC first
23:16.02 vasc so i can get a better grasp of the problem domain here
23:16.29 vasc right now all the solids i implemented on the GPU can have 2 intersection points max one in and another out
23:16.47 Stragus If it's a callback, you don't have to worry so much about that
23:16.55 Stragus Whatever the inlined callback does with the hit is not your problem
23:17.05 vasc sure but the problem is i don't know how the boolean weaving of the csg works
23:17.08 Stragus Then you make a simple callback that returns the first hit and terminate the ray, or so
23:17.14 Stragus Ah yes, right
23:17.29 vasc well i saw a simple raytracer once
23:17.32 vasc with CSG
23:18.19 vasc but i don't quite get how BRL-CAD does its thing yet
23:18.46 vasc this is basically the problem i was interested in working on the first place
23:19.02 vasc sean suggested it and i thought it was an interesting problem
23:19.16 vasc the thing is we needed to do a LOT of ground work first...
23:19.22 Stragus Right
23:20.39 vasc only got 4 primitives working now
23:20.45 vasc next i'll add another 2
23:21.23 Stragus If the overall structure is sound, I feel it would be easy for someone to add support for more primitives
23:21.31 Stragus So that shouldn't be too critical
23:22.14 vasc it isn't transfering the solids data from the cpu anymore. the data is stored on the gpu now.
23:22.27 vasc next i'll implement a couple more solids
23:23.03 vasc then i'll probably work on doing the ray generation on the gpu
23:23.28 vasc dunno how i'll do about the shading yet though
23:23.37 Stragus Ray generation, shading?
23:23.38 vasc i'll prolly need to send more data
23:23.48 Stragus I thought BRL-CAD's raytracer always received vectors through its API
23:23.55 vasc well
23:24.04 vasc depends on where you sink yours claws into
23:24.57 vasc i wanted to exploit ray parallelism so i what to dig into the bit where it computes a whole image
23:25.15 Stragus Of course, OpenCL is all about parallelism
23:25.30 Stragus Isn't there a batch/bundle API for the raytracer?
23:25.39 vasc it all starts with this do_run(int cur_pixel, int last_pixel)
23:26.27 vasc which then calls do_pixel()
23:26.32 vasc for every pixel
23:26.48 Stragus That sounds very high level for now
23:26.50 vasc which generates the rays, traverses the scene, and computes the shading
23:27.11 vasc that's how BRL-CAD works
23:27.30 vasc of course to do what we want to do we need to bulldoze this neat little construction
23:27.31 Stragus To generate pictures yes, but they use raytracing for a lot more stuff
23:27.39 vasc sure
23:27.50 vasc but this is my current concern
23:28.04 vasc rt_shootray() is called elsewhere but
23:28.17 vasc its usually something like the user clicks a point and wants to know something
23:28.42 vasc its not like a bit of latency from doing it on the CPU is gonna be a big issue there
23:28.45 Stragus I believe they do a lot of intense analysis with raytracing
23:28.54 vasc right there's that too
23:29.06 vasc in those cases we'll need to do things differently
23:30.10 vasc if you generate the rays on the gpu you can save a shitton of bus traffic
23:30.12 Stragus Hum... I thought there was a batch/bundle shootray() function somewhere
23:30.16 vasc i do that on my renderer as well
23:30.26 vasc there is. it just isn't used. ANYWHERE.
23:30.31 Stragus Ahah!
23:30.32 Stragus Cool.
23:30.43 Stragus That is terrible
23:31.18 Stragus On the plus side, that means you are free to design your own bundle/batch API since nothing uses the current one
23:31.24 vasc it might have been used by some branch that didn't live or something
23:31.55 vasc yeah
23:32.06 Stragus It's only good if you use SSE2/AVX, CUDA, OpenCL... and BRL-CAD isn't very strong on that stuff
23:32.07 vasc anyway that's a shitton of work
23:32.11 Stragus Agreed
23:32.50 vasc well the current code has a definitive emphasis on portability
23:32.54 vasc and for good reason i think
23:33.11 vasc that's why i'm not using CUDA
23:33.20 Notify 03BRL-CAD Wiki:Bhollister * 9135 /wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015 */
23:34.05 vasc although opencl has its own issues...
23:34.16 vasc it still hasn't caught on enough
23:34.39 Notify 03BRL-CAD Wiki:Bhollister * 9136 /wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015 */
23:35.02 Notify 03BRL-CAD Wiki:Bhollister * 9137 /wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015 */
23:35.05 vasc except for the cpu implementations all the gpu implementations have warts in them
23:35.15 vasc the amd gpu compiler has a lot of bugs in it
23:35.29 vasc and the nvidia gpu compiler only compiles an ancient version of opencl
23:35.40 vasc and now i hear the apple gpu compiler is broken too
23:36.06 vasc its like java code once test everywhere
23:36.18 vasc well its worse than java
23:36.28 vasc its gonna be like java once OpenCL 2.0 is commonplace
23:36.35 vasc IF it ever gets to be commonplace
23:37.21 vasc right now you send program source code to the graphics driver and it compiles it and runs it
23:37.49 vasc with 2.0 you compile intermediate code and send that to the graphics driver which recompiles it to the target architecture and runs it
23:39.22 Stragus CUDA gives you a lot more control over Nvidia hardware, as expected
23:39.41 Stragus And OpenCL might seem like a good idea, but the truth is that you must write completely different code for each platform *anyway*
23:39.55 Stragus If you write the same code for both AMD and Nvidia, it's going to be slow
23:40.12 vasc well
23:40.23 vasc i think it's a better idea
23:40.42 vasc and some things are better and others worse
23:40.48 Stragus It would be a good idea if the core language exposed a bunch of vendor-specific extensions, like OpenGL
23:41.02 Stragus So you could still write good code for a bunch of platforms
23:41.06 vasc it does. but there aren't a lot of extensions available.
23:41.30 vasc you can even use inline assembly.
23:41.39 Stragus Only CUDA PTX inline assembly ;)
23:42.04 vasc well i dunno about other OpenCL compilers
23:42.35 Stragus The hardware targets are so different, "one code runs everywhere" isn't a good idea if you care about performance
23:43.05 Stragus Now, I know brlcad keeps saying he doesn't care about performance... but for a lot of people out there, that isn't a good compromise
23:43.56 vasc yeah but the gpu architectures are too different
23:44.06 Stragus Exactly, so you need different codes anyway
23:44.31 vasc well its not interesting if the code becomes unportable
23:44.36 vasc unrunnable
23:45.01 Stragus I didn't say that, you can put the hardware-specific stuff under #if or such
23:45.15 vasc so you use a higher level language and if you really need to squeeze perf in some place you can use inline asm
23:45.16 Stragus But the code's entire design is optimized for a specific hardware architecture
23:45.17 vasc at least on nvidia
23:45.26 vasc well
23:45.33 vasc i'm going to optimize it for SIMT basically
23:45.39 Stragus Right
23:46.11 vasc but the SIMT model maps out decently to SIMD and MIMD
23:46.35 vasc e.g.
23:46.41 vasc i had my triangle ray tracer
23:46.43 Stragus It's a lot more flexible. SIMT-designed code can run on SIMD SSE/AVX, but it may catastrophically slow :p
23:46.48 vasc and i rewrote it in opencl
23:47.03 vasc i ran it on the cpu using amd opencl and it was 4x faster
23:47.06 vasc you can guess why
23:47.16 Stragus SSE, eh
23:47.40 vasc i think the gpu was 8x faster than that one
23:47.52 vasc the cpu opencl one
23:47.55 Stragus That OpenCL CPU compiler was surprisingly clever somehow
23:47.58 vasc i only changed one line of code
23:48.09 vasc so you see it's quite decent
23:48.18 Stragus Compilers aren't good at emitting instructions like movmaskps and everything above, which is essential for a raytracer
23:48.24 Stragus (or at least for mine)
23:48.35 vasc well my raytracer was in ANSI C
23:48.41 vasc with OpenMP
23:48.54 Stragus can't stand OpenMP
23:49.08 vasc its kinda crappy but nearly any compiler can use it
23:49.21 vasc any compiler that matters supports it
23:49.36 Stragus Right, and everybody wants to use it, no matter how crappy it is
23:50.10 vasc i used pthreads at one point
23:50.13 vasc the perf was the same
23:50.17 vasc and it was unportable
23:50.23 Stragus For a raytracer, probably
23:50.34 Stragus For some problems, OpenMP really gets in the way of doing things properly
23:50.37 vasc sure
23:51.10 vasc the thing is you can do it without using a lot of synchronization
23:51.20 Stragus Right
23:51.25 vasc in fact i didn't use any synchronization between threads at all
23:51.41 Stragus remembers his atomic NUMA-aware staged barriers written in assembly
23:52.04 vasc anyway the thing is
23:52.12 vasc opencl does use the sse perf
23:52.19 vasc it might not get 100% of it but its decent
23:52.45 Stragus Right. But what is fast on GPU and what is fast on CPU are sometimes radically opposed
23:52.55 vasc yeah
23:53.02 Stragus So if your code is designed for both, it's going to be slow on both
23:53.10 vasc but in my experience code optimized for SIMT runs well on the cpu as well
23:53.43 Stragus I would say you were lucky
23:54.22 vasc i tried sse with intrinsics at one point
23:54.38 vasc the performance was so hit and miss it was exasperating
23:54.58 Stragus Yes, you really need to know what the compiler and hardware are doing
23:55.04 vasc let the damned compiler optimize it for my cpu
23:56.38 vasc there's room for hand written code but it keeps getting harder as codebases get bigger
23:56.50 vasc and the computer architectures more complicated
23:58.33 Stragus Compilers have a hard time with complex architectures as well, partly because the code that's being fed to them isn't designed for the actual architectures
23:58.52 Stragus And parly because compilers are stupid
23:58.57 Stragus partly*

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.