IRC log for #brlcad on 20150806

00:31.39 Notify 03BRL-CAD Wiki:Bhollister * 9240 /wiki/User:Bhollister/DevLogAug2015:
00:32.22 Notify 03BRL-CAD:vasco_costa * 65844 (brlcad/trunk/src/librt/librt_private.h brlcad/trunk/src/librt/primitives/arb8/arb8.c and 12 others): pass struct with primitive data to opencl as an initial step to an AoS device primitive database. move constants into common.cl.
01:10.43 *** join/#brlcad vasc__ (~vasc@bl8-192-46.dsl.telepac.pt)
02:56.19 *** join/#brlcad sofat (~androirc@101.214.213.146)
03:17.59 *** join/#brlcad gurwinder (~chatzilla@117.212.50.212)
03:24.02 starseeker sofat?
03:24.05 starseeker nuts
04:06.25 gurwinder brlcad: Hi I have exported ehy and epa now moving towards rhc rpc and bot.
04:07.26 Notify 03BRL-CAD:vasco_costa * 65845 (brlcad/trunk/src/librt/librt_private.h brlcad/trunk/src/librt/primitives/arb8/arb8.c and 14 others): generic opencl solid shot handler. refactored code to remove duplicates.
04:08.34 vasc__ that's that. i think i did all i could on trunk without changing the apis.
04:08.52 vasc__ i think i'll continue on the branch
04:11.52 Notify 03BRL-CAD Wiki:Vasco.costa * 9241 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 11 : 3 Aug-9 Aug */
04:15.41 brlcad vasc__: looks pretty good
04:16.31 brlcad vasc__: please also assign your patches to yourself, mark them as accepted, and close them out too (denote the commit revision in a comment) as you commit them
04:24.35 vasc__ i think i did that to all the patches i had on the tracker
04:25.37 vasc__ that i commited
04:25.50 vasc__ i guess i can assign to myself the patches i didn't commit as well
04:27.31 vasc__ so basically the thing to do next is to store the scene database on the gpu
04:29.37 vasc__ i'm going to redo the database code.
04:29.54 Notify 03BRL-CAD Wiki:Vasco.costa * 9242 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */
04:29.55 vasc__ the patches i commited o trunk already did like half the work on that
04:31.05 vasc__ there's a generic shot callback that calls the primitive specific callback that uses a pointer to the memory region where the primitive data is
04:31.42 vasc__ so the only thing to do is to actually allocate, copy the data to device memory
04:35.59 vasc__ i guess i could commit the scan code i have to trunk. but the thing is nothing will call it until i do the rest of the code
04:36.47 vasc__ anyway not today
04:39.34 vasc__ i also found out that the nvidia opencl compiler doesn't handle large .cl files very well...
04:39.56 vasc__ so i had to split them up and compile them separately and then link them
05:04.49 brlcad interesting -- any idea on what the limit is/was?
05:08.27 Stragus That's weird, I have compiled huge .cu (CUDA) files. Very large device functions or just files?
05:09.11 Stragus And what error or problem were you experiencing?
05:15.31 Notify 03BRL-CAD:vasco_costa * 65846 (brlcad/trunk/src/librt/primitives/ehy/ehy_shot.cl brlcad/trunk/src/librt/primitives/ell/ell_shot.cl and 3 others): load large opencl vectors on demand to reduce stack footprint per function call.
05:15.50 vasc__ it just gave me some ptxas function is being called with wrong number of arguments or something
05:16.11 vasc__ which usually means that the code is calling a function that isn't defined anywhere
05:16.32 Stragus Output the PTX assembly and inspect it
05:16.35 vasc__ nah
05:16.40 vasc__ it works this way
05:16.51 vasc__ and i know the AMD GPU compiler also creaks on large files so
05:17.19 Stragus It's probably more an issue of a single huge kernel rather than large files
05:17.19 vasc__ i tried concatenating it all into one file and it didn't work
05:17.30 vasc__ it probably tried inlining everything yes
05:17.35 vasc__ and then it croaked
05:17.45 Stragus Right. Which shouldn't happen
05:19.21 Notify 03BRL-CAD Wiki:Vasco.costa * 9243 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 11 : 3 Aug-9 Aug */
05:19.39 vasc__ yeah. i could have inspected the assembly but...
05:19.44 vasc__ *snore*
05:20.04 vasc__ it actually makes more sense this way
05:20.33 vasc__ i was just including everything into a huge file
05:21.47 Stragus Actual function calls are slow on most GPU hardware
05:21.54 Stragus But yes, not a big issue at the moment
05:23.12 vasc__ i hope i don't have memory alignment issues anymore
05:24.06 vasc__ everything should be aligned in 8 byte boundaries
05:26.26 vasc__ damned huge doubles
05:27.22 Stragus I don't even see how that could be an issue in the first place
05:27.33 vasc__ ah
05:27.36 Stragus On CPU as well, you definitely want 8 bytes alignment for your doubles
05:27.48 Stragus In fact, you should want 32 bytes alignment for bundles of 4 doubles
05:28.17 vasc__ right. i considered that. there's just a teensy little issue with that and AoS
05:28.42 Stragus On GPU, it should be bundles of 32 doubles
05:28.58 vasc__ ah the triangle ray tracers were so much simpler
05:29.01 Stragus (Which is obviously also quite fine on CPU)
05:29.51 Stragus I thought space partitionning traversal would be the tricky part, and it doesn't matter what kind of primitives are there
05:30.14 vasc__ sure
05:30.16 Stragus Then you just call the intersection for whatever primitive encountered
05:30.24 vasc__ but remember each primitive has a different size
05:30.46 Stragus Does that make a big difference?
05:31.00 vasc__ i'm just going to allocate a contiguous memory block and stuff all that primitive data in there in serialized form
05:31.06 Stragus Good call
05:31.27 Stragus So, make sure all sizeof() are aligned, (sizeof(foo)+0xf)&~0xf
05:31.44 Stragus Probably better with some kind of macro, eh
05:31.47 vasc__ yeah that was my problem
05:31.58 vasc__ i hope it's magically working now
05:32.14 vasc__ if it isn't i'll use the thing you said
05:33.09 vasc__ so they'll all be multiples of 8 bytes
05:33.36 Stragus That's 16 byte alignment actually, typed instinctively for SSE
05:33.49 Stragus <PROTECTED>
05:33.57 vasc__ yeah
05:34.08 vasc__ so its 0x7 then
05:36.08 vasc__ the grid was a bad idea...
05:36.19 Stragus :(
05:36.28 vasc__ i forgot the primitives can be quite expensive to intersect
05:36.34 Stragus Yes
05:36.41 Stragus I didn't think it was a good idea either
05:36.50 vasc__ a bvh would be a lot better
05:36.59 Stragus Spatial partitionning is good for triangles because intersection is so cheap
05:37.06 Stragus But these NURBS and stuff are a different beast
05:37.35 vasc__ it would probably take weeks to do a modern bvh builder though
05:37.44 vasc__ a gpu one at least
05:38.06 Stragus Meh, it can be built on the CPU, then upload the big chunk of memory to the GPU
05:38.16 Stragus But yes, it's still a massive amount of work
05:38.17 vasc__ yeah that is probably a lot more doable
05:38.53 Stragus My CUDA raytracer was also building on the CPU. Everything was packed/interleaved into just one big chunk of memory. You could raytrace on the CPU with it, on the GPU, save it to disk, whatever
05:41.17 Stragus Since everything was packed into a big chunk of memory, you could have per-primitive "extra data" packed within the graph, and so on. That extra data could vary between primitives
05:41.22 vasc__ i actually know quite a lot about gpu bvh builders although i'm grid guy
05:41.25 Stragus That sounds like a good approach for a CSG raytracer too
05:41.45 Stragus I'm a graph person, I don't like hierarchies :p
05:43.00 vasc__ i'll think if i'll use the grids or not
05:43.18 vasc__ i would like to use that golliath scene as a benchmark of sorts
05:43.26 vasc__ it ain't gonna cut it without some acceleration scheme
05:43.31 vasc__ i think it has like 200 primitives
05:43.56 vasc__ which is kinda low but
05:44.01 Stragus How much time do you have to implement this?
05:44.14 vasc__ i have the code done. i did it a couple of weeks back
05:44.15 vasc__ oh
05:44.20 vasc__ well until the end of this month
05:44.41 vasc__ that's why i went with the grids to begin with
05:44.43 Stragus My opinion is that any part of the whole task is better done very well and correctly, or left to someone else
05:44.44 vasc__ its a lot simpler
05:44.56 Stragus (But my opinion has no weight whatsoever on this)
05:45.27 Stragus Half-good solutions have to be rewritten anyway
05:45.50 vasc__ i've never believed that a system was ever complete anyway
05:46.06 vasc__ even if i coded the currently best bvh in a couple of years it could be crap
05:46.36 Stragus It might then be suboptimal but it won't be crap :p
05:47.04 vasc__ a low resolution grid is probably okaish
05:47.18 vasc__ i think my issue is i was using too fine subdivision
05:47.19 Stragus I wouldn't personally use a BVH, but this is complex and there's too little time to explore new ideas
05:47.34 Stragus Sure, it can work
05:47.48 vasc__ well its just that the current code uses mailboxing and crap like that
05:47.57 vasc__ if we used the bvh the mailboxing wouldn't be needed anymore
05:48.19 vasc__ not that i'll use mailboxing with the grid either
05:48.25 vasc__ i'll just multiple-intersect things
05:48.28 Stragus I agree it requires object partitionning rather than spatial partitionning
05:48.34 vasc__ ar ar
05:48.38 Stragus It's the whole "hierarchy" thing I disagree with
05:48.48 vasc__ well it is csg after all
05:49.03 Stragus My raytracer never writes a byte to any shared or global memory during traversal, until the hit callback is called
05:49.14 Stragus Any kind of hierarchy involves building a stack of some sort, and GPUs hate that
05:49.15 vasc__ kewl
05:49.38 vasc__ yeah. if you use a lot of stack space you reduce the amount of threads you can spawn
05:49.53 vasc__ coz you have limited L1 cache for registers and stack
05:50.07 Stragus The L1 cache and registers are independent
05:50.17 Stragus But the stack is stored in global memory and it is SLOW, even with that crappy L1 cache
05:50.20 vasc__ yeah its split
05:50.29 vasc__ global?
05:50.44 vasc__ that's lame
05:50.48 Stragus No no, the L1 and shared memory shares the same chunk of on-chip "cache"
05:51.12 Stragus Registers are totally independent, and a whole lot faster
05:51.41 vasc__ i thought you could choose the amount that goes into registers and remaining L1 on driver loading or something
05:51.54 Stragus You choose how to split between L1 and shared memory
05:52.27 vasc__ ah no its the shared memory yeah
05:52.34 vasc__ uhoh
05:52.53 Stragus Anyhow, experimenting with novel ideas takes more time than you have
05:52.55 vasc__ so that's why function calls are slow as heck
05:52.59 Stragus Indeed
05:53.03 Stragus It's terrible
05:53.28 *** join/#brlcad milamber (~devlin@2602:306:8094:9360:b941:e8cd:a8d8:db8d)
05:56.55 vasc__ the current code uses a shitton of temporaries
05:57.07 Stragus GPUs have tons of registers
05:57.17 Stragus Memory is slow, but registers are free :p
05:57.50 vasc__ yeah but if you use a lot of registers you can't spawn as many threads
05:58.26 Stragus Can you ask OpenCL about register usage? We can with CUDA
05:58.42 vasc__ yeah CUDA has some compiler flag
05:59.02 Stragus Hum... I meant a runtime thing on the kernel, but it's true I'm using the low-level driver API
05:59.07 vasc__ you can pass flags to the opencl compiler. i'm not sure if you can use the same flags as CUDA though.
05:59.35 vasc__ nvcc has some compiler flag that says how much registers a kernel uses
05:59.44 Stragus Well, that works
06:00.04 vasc__ but that's for cuda
06:00.17 vasc__ it's too early to think about that
06:06.49 vasc__ later
06:21.47 Notify 03BRL-CAD Wiki:Shaina7837 * 9244 /wiki/User:Shainasabarwal/GSoC15/logs: /* 27 July */
06:28.28 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
06:59.03 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
07:14.53 *** join/#brlcad milamber1 (~devlin@2602:306:8094:9360:ed0a:f53f:4f21:2165)
07:45.50 *** join/#brlcad teepee-- (bc5c2134@gateway/web/freenode/ip.188.92.33.52)
09:24.40 starseeker brlcad: http://www.cmake.org/pipermail/cmake/2011-June/045233.html
09:31.03 starseeker in fact, they caution in the docs not to list outputs of custom commands in multiple targets: http://www.cmake.org/cmake/help/v3.0/command/add_custom_command.html
09:31.39 starseeker and I see we are doing just that with the obj-g code
09:33.39 starseeker and I'm doing it in one of the step directories as well
09:33.56 starseeker OK, that's probably it then
09:34.31 starseeker I'll wade into fixing that ASAP
11:57.58 Notify 03BRL-CAD:carlmoore * 65847 (brlcad/trunk/AUTHORS brlcad/trunk/src/librt/primitives/arb8/arb8.c and 8 others): remove trailing white space, and fix spelling
12:21.42 *** join/#brlcad konrado (~konro@41.205.22.13)
12:36.12 *** join/#brlcad Ch3ck_ (~Ch3ck@154.70.99.98)
13:00.02 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
13:37.48 *** join/#brlcad sofat (~sofat@202.164.45.208)
13:51.27 sofat brlcad, I need your help in google custom search
13:51.38 sofat please reply me if you free
13:53.06 *** join/#brlcad sofat_ (~androirc@49.138.113.71)
13:59.29 sofat starseeker, I have submitted the new patch on building system I also solve the problem which you told me . I have made presentation.xsl.in file to auto generate the presentation.xsl file so please review this patch. patch no:401
14:54.34 Notify 03BRL-CAD:ejno * 65848 brlcad/trunk/include/bu/opt.h: add parentheses around macro arguments
15:07.16 *** join/#brlcad sofat (~sofat@202.164.45.208)
15:17.34 *** join/#brlcad sofat (~sofat@202.164.45.208)
15:41.15 *** join/#brlcad bhollister2 (~brad@2601:647:cb01:9750:d5ba:1393:eae0:ec4b)
15:45.43 *** join/#brlcad sofat (~sofat@49.138.113.71)
16:03.48 *** join/#brlcad sofat (~sofat@101.215.79.175)
16:34.50 *** join/#brlcad sofat (~sofat@101.215.79.175)
16:58.59 *** join/#brlcad sofat (~sofat@101.215.79.175)
17:23.40 *** join/#brlcad sofat (~sofat@202.164.45.208)
17:44.08 *** join/#brlcad sofat (~sofat@202.164.45.204)
17:50.06 sofat brlcad, hello
17:50.32 sofat I want some discussion please reply me
17:56.13 archivist methinks someone nags too much
18:27.33 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
18:29.00 *** join/#brlcad vasc (~VASC@bl8-192-46.dsl.telepac.pt)
18:33.50 *** join/#brlcad milamber (~devlin@104-9-73-54.lightspeed.cicril.sbcglobal.net)
18:41.19 *** join/#brlcad sofat (~sofat@202.164.45.212)
19:02.51 Notify 03BRL-CAD:dhoward * 65849 (brlcad/trunk/include/rt/misc.h brlcad/trunk/src/libged/facetize.c brlcad/trunk/src/librt/screened_poisson.cpp): Added edge sampling to SPR facetization code.
19:08.03 Notify 03BRL-CAD Wiki:Deekaysharma * 9245 /wiki/User:Deekaysharma/logs:
19:10.48 *** join/#brlcad dracarys983 (dracarys98@nat/iiit/x-xnzkponofzzwciso)
19:22.49 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
20:08.01 Notify 03BRL-CAD:ejno * 65850 (brlcad/trunk/include/bu/opt.h brlcad/trunk/include/gcv/api.h and 13 others): initial integration of libgcv plugin argument processing
20:10.22 *** join/#brlcad milamber (~devlin@2602:306:8094:9360:ed0a:f53f:4f21:2165)
20:16.34 *** part/#brlcad Ch3ck_ (~Ch3ck@154.70.99.98)
20:24.17 Notify 03BRL-CAD:ejno * 65851 (brlcad/trunk/src/conv/gcv/gcv.c brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp): correct conversion mode of fastgen4_write
20:34.04 Notify 03BRL-CAD:ejno * 65852 brlcad/trunk/src/conv/gcv/gcv.c: correctly set options_data
21:47.06 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
22:03.11 *** join/#brlcad konrado (~konro@41.205.22.53)
22:07.21 Notify 03BRL-CAD Wiki:202.164.45.212 * 9246 /wiki/User:Hiteshsofat/GSoc15/log_developmen:
23:06.46 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
23:22.32 *** join/#brlcad vasc_ (~VASC@bl8-192-46.dsl.telepac.pt)

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.