| 00:06.57 | Notify | 03BRL-CAD Wiki:Bhollister * 9118 /wiki/User:Bhollister/DevLogJuly2015: /* Mon, July 27, 2015: Start of Week 10 (of 14) */ |
| 00:14.56 | starseeker | bhollister: unfortunately, a quick scan through the code suggests there isn't an nmg_visit_* example |
| 00:15.14 | starseeker | bhollister: I'd suggest writing a small test program to exercise the various functions |
| 00:31.10 | vasc | weird |
| 00:31.41 | vasc | my code worked TOO WELL |
| 00:33.07 | vasc | yeah i knew it |
| 00:33.17 | vasc | it isn't calling the segment i just wrote |
| 00:40.37 | vasc | that's more like it |
| 00:49.40 | vasc | uhoh |
| 01:06.36 | Notify | 03BRL-CAD:starseeker * 65709 brlcad/trunk/src/libged/shape_recognition.cpp: The wmember list seems to be volatile - take another approach to collecting the finalize comb info. This needs a lot of cleanup, but at least the hierarchy does get generated... |
| 01:10.07 | *** join/#brlcad vasc__ (~vasc@bl13-114-172.dsl.telepac.pt) | |
| 01:16.14 | vasc__ | back to the drawing board. this way of storing data doesn't work because opencl vectorized loads must be aligned to the type size. great. |
| 01:16.33 | vasc__ | a week to the trash it is |
| 01:16.35 | vasc__ | hmm |
| 01:16.38 | vasc__ | lets see |
| 01:16.45 | vasc__ | how i can reuse this |
| 01:30.04 | Notify | 03BRL-CAD:starseeker * 65710 brlcad/trunk/src/libged/shape_recognition.cpp: Set up for a different approch - create the combs, then edit them after they are created. |
| 01:49.49 | vasc__ | later |
| 01:52.36 | Notify | 03BRL-CAD:starseeker * 65711 (brlcad/trunk/include/brep.h brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libged/shape_recognition.cpp): Start getting set up for ray shooting. |
| 02:10.01 | Notify | 03BRL-CAD:starseeker * 65712 (brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libanalyze/util.cpp brlcad/trunk/src/libged/shape_recognition.cpp): Go non-parallel for debugging. |
| 02:20.18 | Notify | 03BRL-CAD:starseeker * 65713 (brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libanalyze/util.cpp): back off the rays some more - got another problem somewhere. |
| 02:26.29 | Notify | 03BRL-CAD:starseeker * 65714 brlcad/trunk/src/libanalyze/util.cpp: fix initialization when prep is coming from outside. |
| 03:57.30 | *** join/#brlcad gurwinder (~chatzilla@117.214.205.207) | |
| 04:32.50 | *** join/#brlcad bhollister2 (~brad@2601:647:cb02:7a00:f04d:35ac:f0ba:5880) | |
| 07:08.18 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9119 /wiki/User:Vasco.costa/GSoC15/logs: |
| 07:10.59 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9120 /wiki/User:Vasco.costa/GSoC15/logs: |
| 07:12.46 | *** join/#brlcad ries (~ries@D979C47E.cm-3-2d.dynamic.ziggo.nl) | |
| 07:17.59 | Notify | 03BRL-CAD Wiki:MeShubham99 * 9121 /wiki/User:MeShubham99/GSoc15/log_developmen: /* Week 9 */ |
| 07:18.48 | Notify | 03BRL-CAD Wiki:MeShubham99 * 9122 /wiki/User:MeShubham99/GSoc15/log_developmen: /* Week 9 */ |
| 07:31.36 | *** join/#brlcad teepee-- (bc5c2134@gateway/web/freenode/ip.188.92.33.52) | |
| 08:07.25 | *** join/#brlcad dracarys983 (dracarys98@nat/iiit/x-eruahluooryxrrfd) | |
| 08:16.25 | *** join/#brlcad luca79 (~luca@host129-17-dynamic.4-87-r.retail.telecomitalia.it) | |
| 08:17.29 | *** join/#brlcad shaina (~shaina@59.89.100.105) | |
| 08:52.40 | *** join/#brlcad merzo (~merzo@user-94-45-58-141.skif.com.ua) | |
| 10:39.14 | *** join/#brlcad packrat (~packrator@c-71-231-32-234.hsd1.wa.comcast.net) | |
| 11:08.37 | *** join/#brlcad jordisayol (~jordisayo@unaffiliated/jordisayol) | |
| 11:09.00 | jordisayol | hello all |
| 11:10.47 | jordisayol | I don't have files upload permission to brlcad sourceforge. Is this a temporary maintenance issue? |
| 11:30.41 | *** join/#brlcad luca79 (~luca@host130-19-dynamic.4-87-r.retail.telecomitalia.it) | |
| 11:37.02 | *** join/#brlcad konrado (~konro@41.205.22.27) | |
| 11:40.10 | jordisayol | Yes, sourceforge upload files is offline |
| 11:40.11 | jordisayol | http://sourceforge.net/blog/sourceforge-infrastructure-and-service-restoration-update-for-724/ |
| 12:23.09 | *** join/#brlcad sofat (~sofat@202.164.45.204) | |
| 12:32.51 | *** join/#brlcad andrei_il (~andrei@109.100.128.78) | |
| 12:46.46 | *** join/#brlcad sofat (~sofat@202.164.45.204) | |
| 13:25.54 | Notify | 03BRL-CAD:starseeker * 65715 (brlcad/trunk/include/analyze.h brlcad/trunk/src/libanalyze/analyze_private.h and 5 others): Pass in the cpu count. |
| 13:58.49 | Notify | 03BRL-CAD:carlmoore * 65716 (brlcad/trunk/db/nist/NIST_MBE_PMI_11.stp brlcad/trunk/db/nist/NIST_MBE_PMI_6.stp): remove trailing white space |
| 14:14.57 | *** join/#brlcad gurwinder (~chatzilla@117.214.205.207) | |
| 14:18.01 | *** join/#brlcad ih8sum3r (~deepak@122.173.163.248) | |
| 14:58.47 | *** join/#brlcad bhollister2 (~brad@2601:647:cb02:7a00:f04d:35ac:f0ba:5880) | |
| 15:14.10 | *** join/#brlcad merzo (~merzo@user-94-45-58-138-1.skif.com.ua) | |
| 15:37.02 | *** join/#brlcad sofat (~sofat@202.164.45.204) | |
| 15:38.54 | Notify | 03BRL-CAD:carlmoore * 65717 (brlcad/trunk/db/nist/NIST_MBE_PMI_11.stp brlcad/trunk/db/nist/NIST_MBE_PMI_6.stp): ----------- |
| 15:46.41 | *** join/#brlcad konrado (~konro@41.205.22.16) | |
| 15:49.33 | Notify | 03BRL-CAD:ejno * 65718 brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp: fix get_unioned() returning pointers to memory that may later be freed |
| 15:54.14 | Notify | 03BRL-CAD:carlmoore * 65719 (brlcad/trunk/src/conv/3dm/3dm-g.cpp brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp and 2 others): fix spellings; and, in 3dm-g.cpp , implement '?' as option |
| 16:26.02 | Notify | 03BRL-CAD:carlmoore * 65720 (brlcad/trunk/src/util/bw-ps.c brlcad/trunk/src/util/pix-ps.c): cosmetic changes for bw-ps.c and pix-ps.c to look more alike (bw-ps.c has had the placement of 2 routines shifted) |
| 16:26.05 | *** part/#brlcad gurwinder (~chatzilla@117.214.205.207) | |
| 16:26.34 | *** join/#brlcad gurwinder (~chatzilla@117.214.205.207) | |
| 16:32.48 | Notify | 03BRL-CAD:carlmoore * 65721 brlcad/trunk/src/util/pix-ps.c: shift location of 'char Stdin' to make pix-ps.c resemble bw-ps.c that more closely |
| 16:46.22 | *** join/#brlcad vasc (~vasc@bl13-114-172.dsl.telepac.pt) | |
| 16:49.06 | vasc | http://www.cnet.com/news/insane-flying-semi-truck-sets-jump-record-nearly-takes-out-building/ |
| 17:14.02 | *** join/#brlcad sofat (~sofat@202.164.45.212) | |
| 17:36.10 | *** join/#brlcad sofat (~sofat@202.164.45.204) | |
| 17:51.49 | Notify | 03BRL-CAD:starseeker * 65722 (brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libanalyze/util.cpp): Add some debug printing. |
| 18:24.26 | Notify | 03BRL-CAD:starseeker * 65723 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: off by one errors don't help plotting any... |
| 18:30.21 | Notify | 03BRL-CAD:starseeker * 65724 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: plot gaps while we're at it. |
| 18:30.45 | Notify | 03BRL-CAD:ejno * 65725 (brlcad/trunk/src/libgcv/conv/fastgen4/NOTES brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp): update notes |
| 18:57.55 | *** join/#brlcad bhollister (~behollis@dhcp-59-221.cse.ucsc.edu) | |
| 19:14.33 | Notify | 03BRL-CAD:starseeker * 65726 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: Start looking for missing gaps. |
| 19:17.11 | Notify | 03BRL-CAD:starseeker * 65727 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: Move to the next hit in that case rather than breaking out of the loop... |
| 19:43.54 | Notify | 03BRL-CAD:starseeker * 65728 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: plot the missing gaps. |
| 19:55.47 | Notify | 03BRL-CAD:brlcad * 65729 brlcad/trunk/src/librt/primitives/datum/datum.c: slightly bigger points |
| 20:08.49 | Notify | 03BRL-CAD Wiki:Terry.e.wen * 9123 /wiki/User:Terry.e.wen/log: |
| 20:09.07 | Notify | 03BRL-CAD Wiki:Terry.e.wen * 9124 /wiki/User:Terry.e.wen/log: |
| 20:25.36 | Notify | 03BRL-CAD:starseeker * 65730 (brlcad/trunk/include/analyze.h brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libged/shape_recognition.cpp): Need to create candidates to raytrace, but we don't seem to have everything ready. Needs more investigation. |
| 20:51.31 | Notify | 03BRL-CAD Wiki:Deekaysharma * 9125 /wiki/User:Deekaysharma/logs: |
| 20:52.38 | Notify | 03BRL-CAD Wiki:Deekaysharma * 9126 /wiki/User:Deekaysharma/logs: |
| 20:53.27 | *** part/#brlcad ih8sum3r (~deepak@122.173.163.248) | |
| 21:06.30 | Notify | 03BRL-CAD:brlcad * 65731 brlcad/trunk/src/libdm/dm-ogl.c: draw smooth points (circles instead of squares) |
| 21:08.04 | Notify | 03BRL-CAD:brlcad * 65732 brlcad/trunk/src/libdm/dm-X.c: draw circles instead of a rectangle when plotting points. this requires a little creativity as there are limitations with X11 not wanting to draw small circles without drawing both the exterior and the interior. |
| 21:13.50 | Notify | 03BRL-CAD:brlcad * 65733 (brlcad/trunk/src/libdm/dm-ogl.c brlcad/trunk/src/libdm/dm-osgl.cpp and 2 others): oof, too many duplicate opengl callers. make them all draw smooth points. might pose an issue for large point clouds and rtgl. |
| 21:27.00 | Notify | 03BRL-CAD:brlcad * 65734 brlcad/trunk/src/libdm/dm-rtgl.c: remove unused functions |
| 21:49.59 | *** join/#brlcad __monty__ (~toonn@d51A5489B.access.telenet.be) | |
| 21:51.53 | Notify | 03BRL-CAD Wiki:202.164.45.204 * 9127 /wiki/User:Hiteshsofat/GSoc15/log_developmen: |
| 21:53.29 | *** part/#brlcad __monty__ (~toonn@d51A5489B.access.telenet.be) | |
| 22:17.47 | dracarys983 | brlcad: I have initialized a new struct bu_vls using BU_GET() first and then bu_vls_init(). But using it doesn't print to the MGED window. What might be the problem? |
| 22:35.42 | Notify | 03BRL-CAD:starseeker * 65735 (brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp brlcad/trunk/src/libged/shape_recognition.cpp): getting crashes with the raytracing now... |
| 22:39.49 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9128 /wiki/User:Vasco.costa/GSoC15/logs: |
| 22:43.19 | Notify | 03BRL-CAD:starseeker * 65736 brlcad/trunk/src/libanalyze/find_subtracted_shapes.cpp: rt_clean causes things to hang - must not be using it right |
| 22:44.27 | starseeker | grrrr |
| 22:54.16 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9129 /wiki/User:Vasco.costa/GSoC15/logs: |
| 22:55.24 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9130 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */ |
| 22:56.30 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9131 /wiki/User:Vasco.costa/GSoC15/logs: |
| 22:56.59 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9132 /wiki/User:Vasco.costa/GSoC15/logs: |
| 23:00.27 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9133 /wiki/User:Vasco.costa/GSoC15/logs: |
| 23:01.34 | vasc | time to work on tor and bot i guess |
| 23:01.45 | vasc | hmmm. dinner first. |
| 23:02.24 | Stragus | How is it going vasc? You had some issues with OpenCL alignment requirements?... |
| 23:02.39 | vasc | well the vector load instructions require size alignment |
| 23:02.48 | vasc | so my previous plan to use AoS was a bust |
| 23:03.00 | vasc | anyway its done |
| 23:03.15 | Stragus | Told you so :p |
| 23:03.33 | vasc | well i thought they would have relaxed that by now |
| 23:03.37 | vasc | but they didn't |
| 23:03.49 | Stragus | It's still a lot slower even when the hardware allows it |
| 23:03.55 | vasc | its like SPARC RISC programming all over again... |
| 23:04.25 | vasc | sure. but the the data is packed more tightly. |
| 23:04.41 | Stragus | On CUDA hardware, have the whole warp fetch 32 consecutive floats: it's either 1 or 4 memory transactions |
| 23:04.53 | vasc | not that it probably matters in this app since the number of objects seems to be real slow |
| 23:05.00 | Stragus | If AoS, you get 32 memory transactions and you don't have enough vmem bandwidth to feed all the cores properly |
| 23:05.01 | vasc | you could prolly fit all the objects in L1 cache |
| 23:05.14 | Stragus | It's still a lot slower |
| 23:05.38 | vasc | well i'm using a mix fwiw |
| 23:05.42 | Stragus | Incoherent access in a warp is only fast within CUDA shared memory (I think OpenCL calls it local memory?...) |
| 23:05.53 | vasc | yeah its the local memory |
| 23:06.00 | Stragus | But then you better watch for shared memory bank conflicts |
| 23:06.12 | vasc | but you know the latest GPUs aren't as picky about that |
| 23:06.32 | Stragus | As picky about what? Incoherent access? |
| 23:06.41 | vasc | the caches behave more like CPU caches |
| 23:06.54 | Stragus | Yes yes... but if you have 32 incoherent access, it's still really slow |
| 23:07.08 | vasc | well so far i'm having other issues |
| 23:07.18 | Stragus | CPUs are even worse. In AVX2's vgatherdps instruction, the loads are *serialized* |
| 23:07.22 | vasc | like two orders or three of magnitude slowness from all these bus transfers |
| 23:07.46 | Stragus | Oh, and on Xeon Phi... vgatherdps is not only serialized, but you have to loop over the instruction until it tells you it's done. Words fail me to describe how absurd that is |
| 23:07.46 | vasc | maybe worse for all i know |
| 23:07.52 | Stragus | kicks Intel in the tibia |
| 23:08.10 | Stragus | Bus transfers? CPU<->GPU? |
| 23:08.14 | vasc | yes |
| 23:08.15 | vasc | so |
| 23:08.40 | vasc | i keep calling a kernel every time i compute a solid intersection |
| 23:08.43 | Stragus | That'll be resolved when they fix their code to consume raytraced data right in GPU memory |
| 23:08.49 | Stragus | Ew... |
| 23:08.56 | vasc | well the solid data is in the gpu now |
| 23:09.18 | vasc | the problem is storing the results and things like that |
| 23:09.26 | vasc | the dynamic lists of temporaries and shit like that |
| 23:09.43 | vasc | as i said yesterday |
| 23:09.46 | Stragus | thought the idea of a giant static buffer allocated dynamically through atomics was a good idea |
| 23:09.51 | vasc | it is |
| 23:09.58 | vasc | but i still need to do a lot of shit first |
| 23:10.04 | Stragus | Right |
| 23:10.21 | Stragus | I feel I would have fun helping you with this |
| 23:10.36 | vasc | its getting to a point where its easier to dive into it |
| 23:10.38 | Stragus | has no idea how that GSoC stuff works |
| 23:11.19 | vasc | i propose a workplan and if the project leads accept it gets funded by google |
| 23:11.27 | Stragus | I still feel the first step would be to implement a "hit" callback without any kind of hit buffering |
| 23:11.44 | Stragus | Then someone can complete the job by putting fancy buffering with atomics into that callback |
| 23:11.47 | vasc | well |
| 23:12.02 | vasc | the thing is the csg |
| 23:12.09 | Stragus | (Might not be what brlcad told you, and he certainly has authority on the matter) |
| 23:12.29 | vasc | i think he said i could just do first hit intersection and ignore the csg as a first approach |
| 23:12.40 | Stragus | Eh well, that also works |
| 23:12.55 | Stragus | If you make it an inlined callback, return 0 to terminate the ray, return 1 to continue |
| 23:14.42 | Notify | 03BRL-CAD Wiki:85.246.114.172 * 9134 /wiki/User:Vasco.costa/GSoC15/logs: |
| 23:15.51 | vasc | i think i'll do the TOR and TGC first |
| 23:16.02 | vasc | so i can get a better grasp of the problem domain here |
| 23:16.29 | vasc | right now all the solids i implemented on the GPU can have 2 intersection points max one in and another out |
| 23:16.47 | Stragus | If it's a callback, you don't have to worry so much about that |
| 23:16.55 | Stragus | Whatever the inlined callback does with the hit is not your problem |
| 23:17.05 | vasc | sure but the problem is i don't know how the boolean weaving of the csg works |
| 23:17.08 | Stragus | Then you make a simple callback that returns the first hit and terminate the ray, or so |
| 23:17.14 | Stragus | Ah yes, right |
| 23:17.29 | vasc | well i saw a simple raytracer once |
| 23:17.32 | vasc | with CSG |
| 23:18.19 | vasc | but i don't quite get how BRL-CAD does its thing yet |
| 23:18.46 | vasc | this is basically the problem i was interested in working on the first place |
| 23:19.02 | vasc | sean suggested it and i thought it was an interesting problem |
| 23:19.16 | vasc | the thing is we needed to do a LOT of ground work first... |
| 23:19.22 | Stragus | Right |
| 23:20.39 | vasc | only got 4 primitives working now |
| 23:20.45 | vasc | next i'll add another 2 |
| 23:21.23 | Stragus | If the overall structure is sound, I feel it would be easy for someone to add support for more primitives |
| 23:21.31 | Stragus | So that shouldn't be too critical |
| 23:22.14 | vasc | it isn't transfering the solids data from the cpu anymore. the data is stored on the gpu now. |
| 23:22.27 | vasc | next i'll implement a couple more solids |
| 23:23.03 | vasc | then i'll probably work on doing the ray generation on the gpu |
| 23:23.28 | vasc | dunno how i'll do about the shading yet though |
| 23:23.37 | Stragus | Ray generation, shading? |
| 23:23.38 | vasc | i'll prolly need to send more data |
| 23:23.48 | Stragus | I thought BRL-CAD's raytracer always received vectors through its API |
| 23:23.55 | vasc | well |
| 23:24.04 | vasc | depends on where you sink yours claws into |
| 23:24.57 | vasc | i wanted to exploit ray parallelism so i what to dig into the bit where it computes a whole image |
| 23:25.15 | Stragus | Of course, OpenCL is all about parallelism |
| 23:25.30 | Stragus | Isn't there a batch/bundle API for the raytracer? |
| 23:25.39 | vasc | it all starts with this do_run(int cur_pixel, int last_pixel) |
| 23:26.27 | vasc | which then calls do_pixel() |
| 23:26.32 | vasc | for every pixel |
| 23:26.48 | Stragus | That sounds very high level for now |
| 23:26.50 | vasc | which generates the rays, traverses the scene, and computes the shading |
| 23:27.11 | vasc | that's how BRL-CAD works |
| 23:27.30 | vasc | of course to do what we want to do we need to bulldoze this neat little construction |
| 23:27.31 | Stragus | To generate pictures yes, but they use raytracing for a lot more stuff |
| 23:27.39 | vasc | sure |
| 23:27.50 | vasc | but this is my current concern |
| 23:28.04 | vasc | rt_shootray() is called elsewhere but |
| 23:28.17 | vasc | its usually something like the user clicks a point and wants to know something |
| 23:28.42 | vasc | its not like a bit of latency from doing it on the CPU is gonna be a big issue there |
| 23:28.45 | Stragus | I believe they do a lot of intense analysis with raytracing |
| 23:28.54 | vasc | right there's that too |
| 23:29.06 | vasc | in those cases we'll need to do things differently |
| 23:30.10 | vasc | if you generate the rays on the gpu you can save a shitton of bus traffic |
| 23:30.12 | Stragus | Hum... I thought there was a batch/bundle shootray() function somewhere |
| 23:30.16 | vasc | i do that on my renderer as well |
| 23:30.26 | vasc | there is. it just isn't used. ANYWHERE. |
| 23:30.31 | Stragus | Ahah! |
| 23:30.32 | Stragus | Cool. |
| 23:30.43 | Stragus | That is terrible |
| 23:31.18 | Stragus | On the plus side, that means you are free to design your own bundle/batch API since nothing uses the current one |
| 23:31.24 | vasc | it might have been used by some branch that didn't live or something |
| 23:31.55 | vasc | yeah |
| 23:32.06 | Stragus | It's only good if you use SSE2/AVX, CUDA, OpenCL... and BRL-CAD isn't very strong on that stuff |
| 23:32.07 | vasc | anyway that's a shitton of work |
| 23:32.11 | Stragus | Agreed |
| 23:32.50 | vasc | well the current code has a definitive emphasis on portability |
| 23:32.54 | vasc | and for good reason i think |
| 23:33.11 | vasc | that's why i'm not using CUDA |
| 23:33.20 | Notify | 03BRL-CAD Wiki:Bhollister * 9135 /wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015 */ |
| 23:34.05 | vasc | although opencl has its own issues... |
| 23:34.16 | vasc | it still hasn't caught on enough |
| 23:34.39 | Notify | 03BRL-CAD Wiki:Bhollister * 9136 /wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015 */ |
| 23:35.02 | Notify | 03BRL-CAD Wiki:Bhollister * 9137 /wiki/User:Bhollister/DevLogJuly2015: /* Tues, July 28, 2015 */ |
| 23:35.05 | vasc | except for the cpu implementations all the gpu implementations have warts in them |
| 23:35.15 | vasc | the amd gpu compiler has a lot of bugs in it |
| 23:35.29 | vasc | and the nvidia gpu compiler only compiles an ancient version of opencl |
| 23:35.40 | vasc | and now i hear the apple gpu compiler is broken too |
| 23:36.06 | vasc | its like java code once test everywhere |
| 23:36.18 | vasc | well its worse than java |
| 23:36.28 | vasc | its gonna be like java once OpenCL 2.0 is commonplace |
| 23:36.35 | vasc | IF it ever gets to be commonplace |
| 23:37.21 | vasc | right now you send program source code to the graphics driver and it compiles it and runs it |
| 23:37.49 | vasc | with 2.0 you compile intermediate code and send that to the graphics driver which recompiles it to the target architecture and runs it |
| 23:39.22 | Stragus | CUDA gives you a lot more control over Nvidia hardware, as expected |
| 23:39.41 | Stragus | And OpenCL might seem like a good idea, but the truth is that you must write completely different code for each platform *anyway* |
| 23:39.55 | Stragus | If you write the same code for both AMD and Nvidia, it's going to be slow |
| 23:40.12 | vasc | well |
| 23:40.23 | vasc | i think it's a better idea |
| 23:40.42 | vasc | and some things are better and others worse |
| 23:40.48 | Stragus | It would be a good idea if the core language exposed a bunch of vendor-specific extensions, like OpenGL |
| 23:41.02 | Stragus | So you could still write good code for a bunch of platforms |
| 23:41.06 | vasc | it does. but there aren't a lot of extensions available. |
| 23:41.30 | vasc | you can even use inline assembly. |
| 23:41.39 | Stragus | Only CUDA PTX inline assembly ;) |
| 23:42.04 | vasc | well i dunno about other OpenCL compilers |
| 23:42.35 | Stragus | The hardware targets are so different, "one code runs everywhere" isn't a good idea if you care about performance |
| 23:43.05 | Stragus | Now, I know brlcad keeps saying he doesn't care about performance... but for a lot of people out there, that isn't a good compromise |
| 23:43.56 | vasc | yeah but the gpu architectures are too different |
| 23:44.06 | Stragus | Exactly, so you need different codes anyway |
| 23:44.31 | vasc | well its not interesting if the code becomes unportable |
| 23:44.36 | vasc | unrunnable |
| 23:45.01 | Stragus | I didn't say that, you can put the hardware-specific stuff under #if or such |
| 23:45.15 | vasc | so you use a higher level language and if you really need to squeeze perf in some place you can use inline asm |
| 23:45.16 | Stragus | But the code's entire design is optimized for a specific hardware architecture |
| 23:45.17 | vasc | at least on nvidia |
| 23:45.26 | vasc | well |
| 23:45.33 | vasc | i'm going to optimize it for SIMT basically |
| 23:45.39 | Stragus | Right |
| 23:46.11 | vasc | but the SIMT model maps out decently to SIMD and MIMD |
| 23:46.35 | vasc | e.g. |
| 23:46.41 | vasc | i had my triangle ray tracer |
| 23:46.43 | Stragus | It's a lot more flexible. SIMT-designed code can run on SIMD SSE/AVX, but it may catastrophically slow :p |
| 23:46.48 | vasc | and i rewrote it in opencl |
| 23:47.03 | vasc | i ran it on the cpu using amd opencl and it was 4x faster |
| 23:47.06 | vasc | you can guess why |
| 23:47.16 | Stragus | SSE, eh |
| 23:47.40 | vasc | i think the gpu was 8x faster than that one |
| 23:47.52 | vasc | the cpu opencl one |
| 23:47.55 | Stragus | That OpenCL CPU compiler was surprisingly clever somehow |
| 23:47.58 | vasc | i only changed one line of code |
| 23:48.09 | vasc | so you see it's quite decent |
| 23:48.18 | Stragus | Compilers aren't good at emitting instructions like movmaskps and everything above, which is essential for a raytracer |
| 23:48.24 | Stragus | (or at least for mine) |
| 23:48.35 | vasc | well my raytracer was in ANSI C |
| 23:48.41 | vasc | with OpenMP |
| 23:48.54 | Stragus | can't stand OpenMP |
| 23:49.08 | vasc | its kinda crappy but nearly any compiler can use it |
| 23:49.21 | vasc | any compiler that matters supports it |
| 23:49.36 | Stragus | Right, and everybody wants to use it, no matter how crappy it is |
| 23:50.10 | vasc | i used pthreads at one point |
| 23:50.13 | vasc | the perf was the same |
| 23:50.17 | vasc | and it was unportable |
| 23:50.23 | Stragus | For a raytracer, probably |
| 23:50.34 | Stragus | For some problems, OpenMP really gets in the way of doing things properly |
| 23:50.37 | vasc | sure |
| 23:51.10 | vasc | the thing is you can do it without using a lot of synchronization |
| 23:51.20 | Stragus | Right |
| 23:51.25 | vasc | in fact i didn't use any synchronization between threads at all |
| 23:51.41 | Stragus | remembers his atomic NUMA-aware staged barriers written in assembly |
| 23:52.04 | vasc | anyway the thing is |
| 23:52.12 | vasc | opencl does use the sse perf |
| 23:52.19 | vasc | it might not get 100% of it but its decent |
| 23:52.45 | Stragus | Right. But what is fast on GPU and what is fast on CPU are sometimes radically opposed |
| 23:52.55 | vasc | yeah |
| 23:53.02 | Stragus | So if your code is designed for both, it's going to be slow on both |
| 23:53.10 | vasc | but in my experience code optimized for SIMT runs well on the cpu as well |
| 23:53.43 | Stragus | I would say you were lucky |
| 23:54.22 | vasc | i tried sse with intrinsics at one point |
| 23:54.38 | vasc | the performance was so hit and miss it was exasperating |
| 23:54.58 | Stragus | Yes, you really need to know what the compiler and hardware are doing |
| 23:55.04 | vasc | let the damned compiler optimize it for my cpu |
| 23:56.38 | vasc | there's room for hand written code but it keeps getting harder as codebases get bigger |
| 23:56.50 | vasc | and the computer architectures more complicated |
| 23:58.33 | Stragus | Compilers have a hard time with complex architectures as well, partly because the code that's being fed to them isn't designed for the actual architectures |
| 23:58.52 | Stragus | And parly because compilers are stupid |
| 23:58.57 | Stragus | partly* |