IRC log for #brlcad on 20150811

01:04.08 Notify 03BRL-CAD Wiki:Bhollister * 9267 /wiki/User:Bhollister/DevLogAug2015:
01:12.54 *** part/#brlcad Ch3ck_ (~Ch3ck@41.205.28.203)
01:49.36 Notify 03BRL-CAD Wiki:SideburnEtic * 0 /wiki/User:SideburnEtic:
01:55.21 Notify 03BRL-CAD Wiki:SideburnEtic * 9268 /wiki/ARL_Technical_Reports: removed spam
01:59.05 Notify 03BRL-CAD:vasco_costa * 65870 (brlcad/branches/opencl/src/librt/librt_private.h brlcad/branches/opencl/src/librt/primitives/primitive_util.c and 3 others): add device side solid database storage.
02:01.55 Notify 03BRL-CAD Wiki:Vasco.costa * 9269 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 12 : 10 Aug-16 Aug */
02:02.48 Notify 03BRL-CAD Wiki:Vasco.costa * 9270 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */
02:02.59 Notify 03BRL-CAD Wiki:Vasco.costa * 9271 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */
03:43.23 Notify 03BRL-CAD Wiki:Vasco.costa * 9272 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 12 : 10 Aug-16 Aug */
03:44.12 Notify 03BRL-CAD Wiki:Vasco.costa * 9273 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 12 : 10 Aug-16 Aug */
03:45.11 Notify 03BRL-CAD Wiki:Vasco.costa * 9274 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 12 : 10 Aug-16 Aug */
03:48.27 Notify 03BRL-CAD Wiki:Bhollister * 9275 /wiki/User:Bhollister/DevLogAug2015: /* Mon, August 10, 2015 Week 12 (of 14) */
04:28.40 *** join/#brlcad gurwinder (~chatzilla@117.199.101.198)
04:33.30 Notify 03BRL-CAD:vasco_costa * 65871 (brlcad/branches/opencl/include/rt/shoot.h brlcad/branches/opencl/src/librt/librt_private.h and 4 others): minor cleanup of opencl database shot code.
04:37.33 *** join/#brlcad shaina (~shaina@59.91.88.56)
04:57.35 Notify 03BRL-CAD Wiki:Gurwinder Singh * 9276 /wiki/Povray:
06:42.30 *** join/#brlcad roop (~roop@59.91.88.56)
07:02.03 *** join/#brlcad gurwinder (~chatzilla@117.199.101.198)
07:49.18 *** join/#brlcad d_rossberg (~rossberg@66-118-151-70.static.sagonet.net)
07:49.53 *** join/#brlcad teepee-- (bc5c2134@gateway/web/freenode/ip.188.92.33.52)
09:55.55 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
10:19.46 *** join/#brlcad Shubham (6719e766@gateway/web/freenode/ip.103.25.231.102)
10:22.31 Shubham brlcad: I request you to please update the google sheet (mentor review checklist) that was shared with all the GSoC students at the start of the coding period, for our review as well.
10:23.33 Shubham I mean in order for us to see where we stand, as far as our objectives as GSoC students are concerned.
10:38.47 *** join/#brlcad packrat (~packrator@c-71-231-32-234.hsd1.wa.comcast.net)
10:51.20 *** join/#brlcad dracarys983 (dracarys98@nat/iiit/x-hmvfyehekiwavadt)
11:08.39 *** join/#brlcad KimK (~Kim__@ip68-102-188-176.ks.ok.cox.net)
11:28.22 *** join/#brlcad konrado (~konro@41.205.22.24)
11:32.34 konrado d_rossberg: Hello
12:20.30 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
12:31.01 *** join/#brlcad ih8sum3r (~deepak@122.173.195.50)
12:31.25 *** part/#brlcad ih8sum3r (~deepak@122.173.195.50)
12:32.04 *** join/#brlcad D33pak (~D33pak@122.173.195.50)
12:57.40 *** join/#brlcad ih8sum3r_ (~ih8sum3r@122.173.163.145)
13:10.20 d_rossberg konrado: hi
13:20.27 *** join/#brlcad ih8sum3r (~ih8sum3r@122.173.163.145)
13:28.36 *** join/#brlcad ries_nicked (~ries@D979C47E.cm-3-2d.dynamic.ziggo.nl)
14:02.58 *** join/#brlcad konrado (~konro@41.205.22.42)
14:47.58 *** join/#brlcad konrado (~konro@41.205.27.94)
15:25.52 Notify 03BRL-CAD:carlmoore * 65872 brlcad/trunk/src/librt/primitives/primitive_util.c: remove a trailing white space character
15:49.34 Notify 03BRL-CAD Wiki:Rontheslow * 0 /wiki/User:Rontheslow:
16:22.21 *** join/#brlcad roop (~roop@106.78.67.189)
16:24.47 *** join/#brlcad roop (~roop@106.78.67.189)
16:37.41 *** join/#brlcad gurwinder (~chatzilla@117.199.101.198)
16:42.06 *** join/#brlcad konrado (~konro@41.205.27.94)
17:05.16 *** join/#brlcad smile (~smile@202.164.45.204)
17:12.55 *** join/#brlcad shaina (~shaina@117.214.242.21)
17:22.39 Notify 03BRL-CAD Wiki:Deekaysharma * 9277 /wiki/User:Deekaysharma/logs:
18:02.34 *** join/#brlcad sofat (~smile@202.164.45.204)
18:22.25 *** join/#brlcad konrado (~konro@41.205.22.19)
18:39.54 *** join/#brlcad sofat (~smile@202.164.45.204)
18:49.24 *** join/#brlcad ries_nicked (~ries@D979C47E.cm-3-2d.dynamic.ziggo.nl)
18:58.43 *** join/#brlcad Stragus (~alexis@modemcable090.29-19-135.mc.videotron.ca)
19:01.35 *** join/#brlcad bhollister (~brad@2601:647:cb01:9750:28ca:d514:9f8c:b4d3)
19:01.57 Notify 03BRL-CAD:vasco_costa * 65873 (brlcad/branches/opencl/src/librt/librt_private.h brlcad/branches/opencl/src/librt/primitives/arb8/arb8.c and 6 others): refactor opencl database storage.
19:02.34 *** join/#brlcad vasc (~vasc@bl7-127-135.dsl.telepac.pt)
19:24.45 *** join/#brlcad sofat (~smile@101.215.81.12)
19:27.07 Notify 03BRL-CAD:vasco_costa * 65874 (brlcad/trunk/include/rt/shoot.h brlcad/trunk/src/librt/librt_private.h and 8 others): backport solid database storage from opencl branch to trunk.
19:27.36 Notify 03BRL-CAD Wiki:Vasco.costa * 9278 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */
19:34.35 Notify 03BRL-CAD Wiki:Vasco.costa * 9279 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 12 : 10 Aug-16 Aug */
19:34.49 Notify 03BRL-CAD Wiki:Vasco.costa * 9280 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 12 : 10 Aug-16 Aug */
19:35.57 Notify 03BRL-CAD Wiki:Vasco.costa * 9281 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */
19:44.34 vasc man two months and a half and feel like i just broke the outer layer in porting BRL-CAD to opencl. without working on the meat of it.
19:45.02 Stragus It's a pretty massive amount of work
19:46.14 vasc so basically we now got half a dozen primitive intersection routines and we can store an array of primitives on the gpu
19:46.34 vasc opencl device or whatever
19:47.43 vasc the plan i had i would do grid acceleration next. i actually got it working on ANSI C and I got a device side grid builder.
19:48.20 vasc the problem is now i think its a mistake to use grids in this case and doing a gpu bvh builder would take ohhh so much time
19:48.33 vasc probably another 2 months
19:48.52 vasc or 3
19:48.56 Stragus Don't build on the GPU!
19:49.35 vasc if i code a cpu bvh builder which can be ported to the gpu it will probably take a month to code it and take all the kinks out.
19:49.45 vasc and that's without doing the traversal
19:49.54 vasc optimized
19:50.21 Stragus I think Sean always wanted an "incremental" process, little steps
19:50.35 ``Erik with frequent commits...
19:50.43 Stragus Which sounds weird to me, because it seems like a whole block to be written all at once
19:50.56 vasc well it would only take a couple of days to do the gpu grid traversal but i think its a waste of time
19:50.57 ``Erik mal: burger says you visited him? how crazy is he? :D
19:51.20 Stragus Oh, I went to Australia and met Burga like... 8 years ago?
19:51.36 Stragus Seemed pretty normal I guess :)
19:51.36 ``Erik hah, damn, I had no idea :)
19:52.21 Stragus He didn't seem like a true and dedicated geek
19:52.33 ``Erik <-- is sitting at a linux box running X, seems so weird (burning in a new laptop battery)
19:52.58 ``Erik he did a stint in the assie military, that can rip the geekiness out of one, I'd think?
19:53.07 Stragus vasc, I fully agree with not doing stuff that will have to be rewritten anyway
19:53.09 *** join/#brlcad smile (~smile@202.164.45.212)
19:53.18 Stragus ``Erik, probably!
19:53.52 vasc the current code uses mailboxing. so it has this per ray bitset with the size of the number of primitives...
19:53.56 vasc now wait a mine
19:53.58 vasc minute
19:54.10 vasc now that i think about it the grids can probably work
19:54.39 ``Erik vasc: have you discussed what to work on with your mentor? "pencils down" is coming up
19:54.56 vasc coz this is a csg raytracer the number of primitives in a scene is usually kind of low. i mean the goliah has like 300 primitives
19:55.10 vasc a bitset per ray for that isn't that big and can probably be stored in shared memory
19:55.45 ``Erik 'real' geometries are typically something in the 1000's or 10000's range iirc
19:56.27 Stragus Darn no, don't use a bitset per ray
19:56.57 Stragus I have no weight whatsoever on what you should work on, but I have strong opinions regarding not doing work that someone will have to rewrite anyway
19:57.14 Stragus So, whatever is written should ideally be good code
19:57.34 vasc well it if was a triangle raytracer the mailboxing wouldn't work because the bitset would take too much memory
19:57.43 vasc you can have tens of millions of triangles
19:58.14 vasc are those geometries that size in number of solids alone or are you counting the pieces as well?
19:58.28 Stragus A proper partitionning strategy will never require tracking bitsets per ray
19:58.48 Stragus Please do that. <disclaimer>I have no say on the matter.</disclaimer>
19:58.56 Stragus Please *don't* do that. :)
19:59.09 vasc 10000*38/1024
19:59.16 vasc that's like 371K of RAM
19:59.25 vasc ioh crap
19:59.30 vasc i forgot the workgroup size
19:59.46 Stragus waves a giant "Bad Idea" flag
19:59.48 vasc 371 MB
20:00.01 vasc wait a second
20:00.23 vasc so its like 10000/8*1024/1024 KBs
20:00.40 vasc 1.22 MBs
20:00.58 vasc its like 10000 solids, 8 bits per byte, workgroup size 1024, convert to KBs
20:01.23 vasc well that won't fit into shared memory
20:01.32 Stragus That's a huge amount of memory to *clear* before tracing every single ray
20:01.45 Stragus It implies a ton of extra loads/stores that should be unnecessary
20:01.54 vasc so we would just blow the 256KB or whatever GPU cache with the mailboxes in global memory
20:02.24 vasc yes that's why i thought it would be cheaper to redo the computations over
20:03.13 vasc even if we have to recompute intersections 3-4 times
20:03.21 vasc at least it doesn't trash the cache
20:03.34 Stragus The proper solution is object-based partitionning
20:03.43 vasc yes we had discussed that
20:03.54 Stragus If you don't have time to do that, I would suggest no partitionning whatsoever. Put all object into a big list
20:03.58 vasc anyway i need to discuss this with brlcad
20:03.59 Stragus And let someone else do that part
20:04.11 vasc i already have that
20:04.15 Stragus Good
20:04.31 vasc the routines to store a list in memory and the routines to intersect a primitive in memory given its index
20:05.47 vasc i can check the amount of duplicate intersections on goliath and do some tests like that i guess
20:06.47 Stragus Do you have a single kernel launch to trace a bunch of rays, intersect all objects (from a single fat list), then return lists of hits?
20:06.56 Stragus I think that would be a very good intermediary step
20:07.28 Stragus Then someone can insert partitionning while keeping the rest of the code
20:07.49 vasc nah. that isn't coded yet. it seems like a reasonable approach. but i would need to compute intersections twice to know the size of the list of hits to allocate.
20:08.21 Stragus That's a reasonable intermediary step
20:08.24 vasc which is still going to be less duplicate intersections than not using the mailboxing
20:08.35 vasc i bet
20:08.54 vasc yeah it seems a good idea
20:09.36 Stragus Alternatively, just return hits in some "inlined callback" function, returns 0 to terminate rays, then someone can plug some buffering code in there
20:09.42 vasc i had a pseudo code for that
20:09.48 Stragus Cool
20:11.09 vasc compute the intersections for all opencl primitives once to determine the size of the list of hits, allocate that, compute the hits again and fill the list, copy the list to the cpu, then compute the intersections fort the non-opencl primitives and merge the intersections of host and device primitives
20:11.49 Stragus I would drop the part about non-opencl primitives and merging intersections
20:11.52 vasc then everything else would be done in the cpu
20:11.59 Stragus It's all code that someone will have to throw away when all is ported to OpenCL
20:12.05 vasc well the boolean weaving is kind of complicated
20:12.39 vasc well
20:12.40 Stragus Fine, keep that on the CPU
20:13.41 vasc it's like this. if i only did a first shot intersector i could discard a lot of results and not need to figure out the size of the list of hits and so on
20:13.58 vasc coz you only need one hit per ray
20:14.31 vasc the thing is it doesn't solve the general problem in a good way. but it can probably render scenes quick
20:14.56 Stragus The idea is not to write a first hit code
20:15.04 Stragus But to use a callback function which may return 0 to terminate the ray
20:15.19 vasc i think Sean suggested that as an alternative at one point
20:15.20 Stragus And in your code, you terminate the ray right there, unless you have time to do the buffering part
20:15.36 Stragus These are good incremental steps, because no one will have to throw away code later on
20:15.37 vasc i can't use function pointers in opencl
20:15.50 vasc i can call some function but the thing is
20:15.56 Stragus It's a figure of speech, you don't want an actual function pointer
20:16.11 vasc its like i said you need to redo the intersections twice to compute the size first
20:16.20 vasc so i don't think it would add anything
20:16.48 Stragus You don't *need* to do that... but it's a reasonable temporary resolution, because it doesn't involve much extra code
20:16.48 vasc you would do it like this in a later version:
20:17.16 vasc you compute the intersections passing NULL as a buffer to store results. the kernel returns the number of intersections computed
20:17.21 Stragus Later on, someone will throw that away to instead buffer hits in some global static buffer allocated through atomics
20:17.39 vasc then you allocate the buffer with that size and call the intersection function again
20:18.17 vasc its a bit more complicated than that because its per ray
20:18.24 vasc so i would need a prefix sum somewhere
20:18.27 vasc but its as easy as that
20:19.09 Stragus Keep it simple, it's temporary code
20:19.39 vasc no this code could work for the general case
20:20.45 Stragus Yes, but it's twice as slow as it should be
20:20.50 Stragus Therefore, it's temporary code
20:21.06 vasc the callback would be a great idea if we could incrementally compute the boolean weaving somehow. but i don't see how to do that.
20:21.35 vasc yeah i know its twice as slow. but the advantage is no dynamic mallocs
20:21.57 vasc its only arithmetic intensive
20:22.12 Stragus Twice as slow plus the prefix sum
20:22.53 vasc those are kinda quick
20:23.21 vasc plus i already have opencl code for prefix sums in svn trunk
20:23.43 vasc calling that would be a line of code
20:24.00 Stragus Like I said, it's a viable way to demonstrate the code is working
20:24.10 Stragus But I expect this to be rewritten, it's not final code
20:24.26 vasc yeah and perhaps it wouldn't be slow as molasses
20:24.48 vasc well i don't see a way of doing the 'final code' differently unless the boolean weaving can be computed incrementally.
20:25.13 vasc and i think that's probably SIGGRAPH paper material
20:25.18 vasc probably
20:25.40 vasc i was actually reading about that the other day
20:26.19 vasc "CST: Constructive Solid Trimming for
20:26.19 vasc rendering BReps and CSG"
20:27.50 vasc "CST: Constructive Solid Trimming for rendering BReps and CSG", John Hable and Jarek Rossignac
20:28.12 vasc so...
20:29.32 vasc i didn't read it that profoundly but i think they use the stencil buffer.
20:30.30 vasc i think it renders the scene in object order
20:30.39 vasc so its kinda like a rasterization technique
20:35.49 vasc i think there's no 100% future proof design we can do until we figure out how to do the boolean weaving on the gpu
20:36.03 Stragus Right, keep that on the CPU for now
20:36.25 vasc but it could be useful to have a proof of concept that shows how faster the gpu side rendering can be
20:36.36 vasc for that a first hit intersector would do it
20:37.10 Stragus That will also require proper scene partitionning, which is a massive amount of work
20:37.24 Stragus Everybody knows GPUs are fast, no need to demonstrate that :p
20:38.03 vasc well it would probably provide more motivation for people to work on this if they saw something that had some kind of direct user impact
20:38.17 ``Erik pheer my intel g41 gpu, tremble before it's might :D
20:39.00 vasc actually the current kernel call intensive code is probably faster running the opencl on the CPU than the GPU. coz it doesn't do as many bus transfers.
20:39.12 vasc whaka whaka
20:39.39 Stragus I use dual-GTX 590, 4 GPUs on two boards, yar!
20:40.15 Stragus Old GPUs, but it's the last architecture that Nvidia made for true compute. And it's still faster than any more recent hardware at double precision o.O
20:41.00 ``Erik http://www.videocardbenchmark.net/gpu.php?gpu=Intel+G41+Express+Chipset
20:41.01 Stragus 4 GPUs is also great to test code scalability
20:41.36 Stragus No idea what these numbers are, but it does not look good
20:42.00 vasc wikipedia says 2488.3 GFLOPS FMA for the GTX-590
20:42.16 vasc but i think that's SP FLOPS
20:42.25 vasc what's the DP FLOPS ratio?
20:43.08 Stragus 1 to 4, the best ratio
20:43.27 ``Erik plain gtx-590 is 4000 on that benchmark site, a little more than my 62 ;)
20:43.38 Stragus Or 1 to 2?
20:44.07 Stragus ``Erik, it's two GPUs on the same board, OpenGL can't properly use two GPUs at the same time
20:44.10 vasc in that case I think the Titan Z is faster.
20:44.18 vasc https://en.wikipedia.org/wiki/GeForce_700_series
20:44.26 vasc 2707 DP GFLOPS
20:44.42 vasc a lot of wasted potential but still faster
20:46.22 vasc i think that intel chipset doesn't have OpenCL support
20:46.37 Stragus Fermi was an amazing compute architecture
20:46.45 Stragus Kepler was pure gaming, and Maxwell... kind of half way
20:47.04 vasc no the Maxwell is the pure gaming
20:47.06 vasc the Kepler is ok
20:47.26 vasc I have a Kepler
20:47.27 vasc GK110
20:47.55 vasc maxwell has 1/32 the DP FLOPS
20:48.28 Stragus The Kepler's general cache is atrociously bad
20:48.31 vasc like the TITAN X 192 DP GFLOPS and 6144 SP FLOPS
20:48.49 *** join/#brlcad Guest64755 (~smile@101.208.40.51)
20:48.58 Stragus When I ported my code from acessing memory directly to using the texture cache, it became 2.3 times faster!
20:49.26 vasc compare with the Kepler TITAN Z: 8122 SP GFLOPS and 2707 DP GFLOPS
20:49.56 vasc the Kepler has a lot more DP FLOPS
20:50.02 vasc the Maxwell (the latest one) is crap at DP
20:50.20 Stragus At least the general cache is not utter garbage
20:50.46 vasc Fermi, Kepler, Maxwell
20:50.49 vasc I have the Kepler
20:51.03 vasc they just keep nerfing the DP
20:56.57 vasc oh neat. i found a bug in my code when i use AMD OpenCL.
20:57.42 vasc it doesn't search the current path for includes like the NVIDIA compiler
20:57.51 vasc brilliant
20:59.36 Stragus -I./ ?
21:01.13 vasc yeah -I. works
21:04.09 Notify 03BRL-CAD:vasco_costa * 65875 brlcad/trunk/src/librt/primitives/primitive_util.c: add current path for opencl includes or the AMD OpenCL won't find them.
21:04.45 vasc elapsed = 58.6581 sec
21:04.56 vasc i think the GPU takes like ten times that
21:05.01 vasc coz of the bus transfers
21:05.14 vasc i wonder if its quicker than the non-opencl one
21:06.57 vasc it does use vector SSE
21:07.20 Stragus Don't worry about performance at this point
21:07.35 Stragus Performance will be terrible until all piece of the puzzles have been written properly
21:09.52 vasc its just a matter of recompiling and testing it
21:11.01 vasc elapsed = 58.6581 sec
21:11.02 vasc lol
21:11.13 vasc elapsed = 1.24892 sec
21:11.41 vasc too much overhead
21:13.11 vasc it's an OpenCL decelerator now
21:14.14 Stragus Stop worrying :p
21:14.27 vasc its still good to know the scale of things
21:14.38 vasc i kinda expected to do more by now
21:14.55 vasc man i never thought those primitives had so much code to port
21:15.20 vasc most of it was straightforward but still i had to hunt it down all over the place
21:15.55 vasc <PROTECTED>
21:15.55 vasc <PROTECTED>
21:15.56 vasc <PROTECTED>
21:15.56 vasc <PROTECTED>
21:15.56 vasc <PROTECTED>
21:15.56 vasc <PROTECTED>
21:15.58 vasc <PROTECTED>
21:16.00 vasc <PROTECTED>
21:16.02 vasc <PROTECTED>
21:16.04 vasc <PROTECTED>
21:16.06 vasc <PROTECTED>
21:16.18 vasc the sph_shot.cl was the one that was done before for comparison.
21:17.08 vasc and that's not counting the glue code and serialization code and crap like that
21:17.17 vasc none of the ANSI C bits
21:21.15 vasc i'll just work on the first hit renderer then.
21:21.34 vasc at least until i hear more from brlcad
21:22.11 vasc that will port the entire rendering pipeline in a really simple way
21:22.19 vasc i won't do any spatial acceleration whatsoever.
21:22.53 vasc which reminds me we don't compute normals on the gpu side yet
21:22.56 vasc :-P
21:24.07 vasc oh well black will do for now
21:24.16 vasc or white
21:39.16 Notify 03BRL-CAD:carlmoore * 65876 brlcad/trunk/src/util/bwdiff.c: switch a line to make the files look more alike, although there are other differences which probably cannot be resolved
21:42.27 Notify 03BRL-CAD Wiki:101.208.40.51 * 9282 /wiki/User:Hiteshsofat/GSoc15/log_developmen:
21:46.04 Notify 03BRL-CAD:carlmoore * 65877 (brlcad/trunk/src/util/bwfilter.c brlcad/trunk/src/util/pixfilter.c): fix comment for sake of uniformity
21:54.31 Notify 03BRL-CAD Wiki:Konrado DJ * 9283 /wiki/User:Konrado_DJ/GSoc2015/logs: /* 11 AUGUST 2015 */
21:57.20 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
22:40.21 *** join/#brlcad bhollister2 (~behollis@dhcp-59-221.cse.ucsc.edu)
23:43.32 *** join/#brlcad bhollister3 (~behollis@dhcp-59-221.cse.ucsc.edu)

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.