IRC log for #brlcad on 20150806

`00:31.39`	`Notify`	`03BRL-CAD Wiki:Bhollister * 9240 /wiki/User:Bhollister/DevLogAug2015:`
`00:32.22`	`Notify`	`03BRL-CAD:vasco_costa * 65844 (brlcad/trunk/src/librt/librt_private.h brlcad/trunk/src/librt/primitives/arb8/arb8.c and 12 others): pass struct with primitive data to opencl as an initial step to an AoS device primitive database. move constants into common.cl.`
`01:10.43`	`*** join/#brlcad vasc__ (~vasc@bl8-192-46.dsl.telepac.pt)`
`02:56.19`	`*** join/#brlcad sofat (~androirc@101.214.213.146)`
`03:17.59`	`*** join/#brlcad gurwinder (~chatzilla@117.212.50.212)`
`03:24.02`	`starseeker`	`sofat?`
`03:24.05`	`starseeker`	`nuts`
`04:06.25`	`gurwinder`	`brlcad: Hi I have exported ehy and epa now moving towards rhc rpc and bot.`
`04:07.26`	`Notify`	`03BRL-CAD:vasco_costa * 65845 (brlcad/trunk/src/librt/librt_private.h brlcad/trunk/src/librt/primitives/arb8/arb8.c and 14 others): generic opencl solid shot handler. refactored code to remove duplicates.`
`04:08.34`	`vasc__`	`that's that. i think i did all i could on trunk without changing the apis.`
`04:08.52`	`vasc__`	`i think i'll continue on the branch`
`04:11.52`	`Notify`	`03BRL-CAD Wiki:Vasco.costa * 9241 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 11 : 3 Aug-9 Aug */`
`04:15.41`	`brlcad`	`vasc__: looks pretty good`
`04:16.31`	`brlcad`	`vasc__: please also assign your patches to yourself, mark them as accepted, and close them out too (denote the commit revision in a comment) as you commit them`
`04:24.35`	`vasc__`	`i think i did that to all the patches i had on the tracker`
`04:25.37`	`vasc__`	`that i commited`
`04:25.50`	`vasc__`	`i guess i can assign to myself the patches i didn't commit as well`
`04:27.31`	`vasc__`	`so basically the thing to do next is to store the scene database on the gpu`
`04:29.37`	`vasc__`	`i'm going to redo the database code.`
`04:29.54`	`Notify`	`03BRL-CAD Wiki:Vasco.costa * 9242 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */`
`04:29.55`	`vasc__`	`the patches i commited o trunk already did like half the work on that`
`04:31.05`	`vasc__`	`there's a generic shot callback that calls the primitive specific callback that uses a pointer to the memory region where the primitive data is`
`04:31.42`	`vasc__`	`so the only thing to do is to actually allocate, copy the data to device memory`
`04:35.59`	`vasc__`	`i guess i could commit the scan code i have to trunk. but the thing is nothing will call it until i do the rest of the code`
`04:36.47`	`vasc__`	`anyway not today`
`04:39.34`	`vasc__`	`i also found out that the nvidia opencl compiler doesn't handle large .cl files very well...`
`04:39.56`	`vasc__`	`so i had to split them up and compile them separately and then link them`
`05:04.49`	`brlcad`	`interesting -- any idea on what the limit is/was?`
`05:08.27`	`Stragus`	`That's weird, I have compiled huge .cu (CUDA) files. Very large device functions or just files?`
`05:09.11`	`Stragus`	`And what error or problem were you experiencing?`
`05:15.31`	`Notify`	`03BRL-CAD:vasco_costa * 65846 (brlcad/trunk/src/librt/primitives/ehy/ehy_shot.cl brlcad/trunk/src/librt/primitives/ell/ell_shot.cl and 3 others): load large opencl vectors on demand to reduce stack footprint per function call.`
`05:15.50`	`vasc__`	`it just gave me some ptxas function is being called with wrong number of arguments or something`
`05:16.11`	`vasc__`	`which usually means that the code is calling a function that isn't defined anywhere`
`05:16.32`	`Stragus`	`Output the PTX assembly and inspect it`
`05:16.35`	`vasc__`	`nah`
`05:16.40`	`vasc__`	`it works this way`
`05:16.51`	`vasc__`	`and i know the AMD GPU compiler also creaks on large files so`
`05:17.19`	`Stragus`	`It's probably more an issue of a single huge kernel rather than large files`
`05:17.19`	`vasc__`	`i tried concatenating it all into one file and it didn't work`
`05:17.30`	`vasc__`	`it probably tried inlining everything yes`
`05:17.35`	`vasc__`	`and then it croaked`
`05:17.45`	`Stragus`	`Right. Which shouldn't happen`
`05:19.21`	`Notify`	`03BRL-CAD Wiki:Vasco.costa * 9243 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 11 : 3 Aug-9 Aug */`
`05:19.39`	`vasc__`	`yeah. i could have inspected the assembly but...`
`05:19.44`	`vasc__`	`snore`
`05:20.04`	`vasc__`	`it actually makes more sense this way`
`05:20.33`	`vasc__`	`i was just including everything into a huge file`
`05:21.47`	`Stragus`	`Actual function calls are slow on most GPU hardware`
`05:21.54`	`Stragus`	`But yes, not a big issue at the moment`
`05:23.12`	`vasc__`	`i hope i don't have memory alignment issues anymore`
`05:24.06`	`vasc__`	`everything should be aligned in 8 byte boundaries`
`05:26.26`	`vasc__`	`damned huge doubles`
`05:27.22`	`Stragus`	`I don't even see how that could be an issue in the first place`
`05:27.33`	`vasc__`	`ah`
`05:27.36`	`Stragus`	`On CPU as well, you definitely want 8 bytes alignment for your doubles`
`05:27.48`	`Stragus`	`In fact, you should want 32 bytes alignment for bundles of 4 doubles`
`05:28.17`	`vasc__`	`right. i considered that. there's just a teensy little issue with that and AoS`
`05:28.42`	`Stragus`	`On GPU, it should be bundles of 32 doubles`
`05:28.58`	`vasc__`	`ah the triangle ray tracers were so much simpler`
`05:29.01`	`Stragus`	`(Which is obviously also quite fine on CPU)`
`05:29.51`	`Stragus`	`I thought space partitionning traversal would be the tricky part, and it doesn't matter what kind of primitives are there`
`05:30.14`	`vasc__`	`sure`
`05:30.16`	`Stragus`	`Then you just call the intersection for whatever primitive encountered`
`05:30.24`	`vasc__`	`but remember each primitive has a different size`
`05:30.46`	`Stragus`	`Does that make a big difference?`
`05:31.00`	`vasc__`	`i'm just going to allocate a contiguous memory block and stuff all that primitive data in there in serialized form`
`05:31.06`	`Stragus`	`Good call`
`05:31.27`	`Stragus`	`So, make sure all sizeof() are aligned, (sizeof(foo)+0xf)&~0xf`
`05:31.44`	`Stragus`	`Probably better with some kind of macro, eh`
`05:31.47`	`vasc__`	`yeah that was my problem`
`05:31.58`	`vasc__`	`i hope it's magically working now`
`05:32.14`	`vasc__`	`if it isn't i'll use the thing you said`
`05:33.09`	`vasc__`	`so they'll all be multiples of 8 bytes`
`05:33.36`	`Stragus`	`That's 16 byte alignment actually, typed instinctively for SSE`
`05:33.49`	`Stragus`	`<PROTECTED>`
`05:33.57`	`vasc__`	`yeah`
`05:34.08`	`vasc__`	`so its 0x7 then`
`05:36.08`	`vasc__`	`the grid was a bad idea...`
`05:36.19`	`Stragus`	`:(`
`05:36.28`	`vasc__`	`i forgot the primitives can be quite expensive to intersect`
`05:36.34`	`Stragus`	`Yes`
`05:36.41`	`Stragus`	`I didn't think it was a good idea either`
`05:36.50`	`vasc__`	`a bvh would be a lot better`
`05:36.59`	`Stragus`	`Spatial partitionning is good for triangles because intersection is so cheap`
`05:37.06`	`Stragus`	`But these NURBS and stuff are a different beast`
`05:37.35`	`vasc__`	`it would probably take weeks to do a modern bvh builder though`
`05:37.44`	`vasc__`	`a gpu one at least`
`05:38.06`	`Stragus`	`Meh, it can be built on the CPU, then upload the big chunk of memory to the GPU`
`05:38.16`	`Stragus`	`But yes, it's still a massive amount of work`
`05:38.17`	`vasc__`	`yeah that is probably a lot more doable`
`05:38.53`	`Stragus`	`My CUDA raytracer was also building on the CPU. Everything was packed/interleaved into just one big chunk of memory. You could raytrace on the CPU with it, on the GPU, save it to disk, whatever`
`05:41.17`	`Stragus`	`Since everything was packed into a big chunk of memory, you could have per-primitive "extra data" packed within the graph, and so on. That extra data could vary between primitives`
`05:41.22`	`vasc__`	`i actually know quite a lot about gpu bvh builders although i'm grid guy`
`05:41.25`	`Stragus`	`That sounds like a good approach for a CSG raytracer too`
`05:41.45`	`Stragus`	`I'm a graph person, I don't like hierarchies :p`
`05:43.00`	`vasc__`	`i'll think if i'll use the grids or not`
`05:43.18`	`vasc__`	`i would like to use that golliath scene as a benchmark of sorts`
`05:43.26`	`vasc__`	`it ain't gonna cut it without some acceleration scheme`
`05:43.31`	`vasc__`	`i think it has like 200 primitives`
`05:43.56`	`vasc__`	`which is kinda low but`
`05:44.01`	`Stragus`	`How much time do you have to implement this?`
`05:44.14`	`vasc__`	`i have the code done. i did it a couple of weeks back`
`05:44.15`	`vasc__`	`oh`
`05:44.20`	`vasc__`	`well until the end of this month`
`05:44.41`	`vasc__`	`that's why i went with the grids to begin with`
`05:44.43`	`Stragus`	`My opinion is that any part of the whole task is better done very well and correctly, or left to someone else`
`05:44.44`	`vasc__`	`its a lot simpler`
`05:44.56`	`Stragus`	`(But my opinion has no weight whatsoever on this)`
`05:45.27`	`Stragus`	`Half-good solutions have to be rewritten anyway`
`05:45.50`	`vasc__`	`i've never believed that a system was ever complete anyway`
`05:46.06`	`vasc__`	`even if i coded the currently best bvh in a couple of years it could be crap`
`05:46.36`	`Stragus`	`It might then be suboptimal but it won't be crap :p`
`05:47.04`	`vasc__`	`a low resolution grid is probably okaish`
`05:47.18`	`vasc__`	`i think my issue is i was using too fine subdivision`
`05:47.19`	`Stragus`	`I wouldn't personally use a BVH, but this is complex and there's too little time to explore new ideas`
`05:47.34`	`Stragus`	`Sure, it can work`
`05:47.48`	`vasc__`	`well its just that the current code uses mailboxing and crap like that`
`05:47.57`	`vasc__`	`if we used the bvh the mailboxing wouldn't be needed anymore`
`05:48.19`	`vasc__`	`not that i'll use mailboxing with the grid either`
`05:48.25`	`vasc__`	`i'll just multiple-intersect things`
`05:48.28`	`Stragus`	`I agree it requires object partitionning rather than spatial partitionning`
`05:48.34`	`vasc__`	`ar ar`
`05:48.38`	`Stragus`	`It's the whole "hierarchy" thing I disagree with`
`05:48.48`	`vasc__`	`well it is csg after all`
`05:49.03`	`Stragus`	`My raytracer never writes a byte to any shared or global memory during traversal, until the hit callback is called`
`05:49.14`	`Stragus`	`Any kind of hierarchy involves building a stack of some sort, and GPUs hate that`
`05:49.15`	`vasc__`	`kewl`
`05:49.38`	`vasc__`	`yeah. if you use a lot of stack space you reduce the amount of threads you can spawn`
`05:49.53`	`vasc__`	`coz you have limited L1 cache for registers and stack`
`05:50.07`	`Stragus`	`The L1 cache and registers are independent`
`05:50.17`	`Stragus`	`But the stack is stored in global memory and it is SLOW, even with that crappy L1 cache`
`05:50.20`	`vasc__`	`yeah its split`
`05:50.29`	`vasc__`	`global?`
`05:50.44`	`vasc__`	`that's lame`
`05:50.48`	`Stragus`	`No no, the L1 and shared memory shares the same chunk of on-chip "cache"`
`05:51.12`	`Stragus`	`Registers are totally independent, and a whole lot faster`
`05:51.41`	`vasc__`	`i thought you could choose the amount that goes into registers and remaining L1 on driver loading or something`
`05:51.54`	`Stragus`	`You choose how to split between L1 and shared memory`
`05:52.27`	`vasc__`	`ah no its the shared memory yeah`
`05:52.34`	`vasc__`	`uhoh`
`05:52.53`	`Stragus`	`Anyhow, experimenting with novel ideas takes more time than you have`
`05:52.55`	`vasc__`	`so that's why function calls are slow as heck`
`05:52.59`	`Stragus`	`Indeed`
`05:53.03`	`Stragus`	`It's terrible`
`05:53.28`	`*** join/#brlcad milamber (~devlin@2602:306:8094:9360:b941:e8cd:a8d8:db8d)`
`05:56.55`	`vasc__`	`the current code uses a shitton of temporaries`
`05:57.07`	`Stragus`	`GPUs have tons of registers`
`05:57.17`	`Stragus`	`Memory is slow, but registers are free :p`
`05:57.50`	`vasc__`	`yeah but if you use a lot of registers you can't spawn as many threads`
`05:58.26`	`Stragus`	`Can you ask OpenCL about register usage? We can with CUDA`
`05:58.42`	`vasc__`	`yeah CUDA has some compiler flag`
`05:59.02`	`Stragus`	`Hum... I meant a runtime thing on the kernel, but it's true I'm using the low-level driver API`
`05:59.07`	`vasc__`	`you can pass flags to the opencl compiler. i'm not sure if you can use the same flags as CUDA though.`
`05:59.35`	`vasc__`	`nvcc has some compiler flag that says how much registers a kernel uses`
`05:59.44`	`Stragus`	`Well, that works`
`06:00.04`	`vasc__`	`but that's for cuda`
`06:00.17`	`vasc__`	`it's too early to think about that`
`06:06.49`	`vasc__`	`later`
`06:21.47`	`Notify`	`03BRL-CAD Wiki:Shaina7837 * 9244 /wiki/User:Shainasabarwal/GSoC15/logs: /* 27 July */`
`06:28.28`	`*** join/#brlcad teepee (~teepee@unaffiliated/teepee)`
`06:59.03`	`*** join/#brlcad kintel (~kintel@unaffiliated/kintel)`
`07:14.53`	`*** join/#brlcad milamber1 (~devlin@2602:306:8094:9360:ed0a:f53f:4f21:2165)`
`07:45.50`	`*** join/#brlcad teepee-- (bc5c2134@gateway/web/freenode/ip.188.92.33.52)`
`09:24.40`	`starseeker`	`brlcad: http://www.cmake.org/pipermail/cmake/2011-June/045233.html`
`09:31.03`	`starseeker`	`in fact, they caution in the docs not to list outputs of custom commands in multiple targets: http://www.cmake.org/cmake/help/v3.0/command/add_custom_command.html`
`09:31.39`	`starseeker`	`and I see we are doing just that with the obj-g code`
`09:33.39`	`starseeker`	`and I'm doing it in one of the step directories as well`
`09:33.56`	`starseeker`	`OK, that's probably it then`
`09:34.31`	`starseeker`	`I'll wade into fixing that ASAP`
`11:57.58`	`Notify`	`03BRL-CAD:carlmoore * 65847 (brlcad/trunk/AUTHORS brlcad/trunk/src/librt/primitives/arb8/arb8.c and 8 others): remove trailing white space, and fix spelling`
`12:21.42`	`*** join/#brlcad konrado (~konro@41.205.22.13)`
`12:36.12`	`*** join/#brlcad Ch3ck_ (~Ch3ck@154.70.99.98)`
`13:00.02`	`*** join/#brlcad kintel (~kintel@unaffiliated/kintel)`
`13:37.48`	`*** join/#brlcad sofat (~sofat@202.164.45.208)`
`13:51.27`	`sofat`	`brlcad, I need your help in google custom search`
`13:51.38`	`sofat`	`please reply me if you free`
`13:53.06`	`*** join/#brlcad sofat_ (~androirc@49.138.113.71)`
`13:59.29`	`sofat`	`starseeker, I have submitted the new patch on building system I also solve the problem which you told me . I have made presentation.xsl.in file to auto generate the presentation.xsl file so please review this patch. patch no:401`
`14:54.34`	`Notify`	`03BRL-CAD:ejno * 65848 brlcad/trunk/include/bu/opt.h: add parentheses around macro arguments`
`15:07.16`	`*** join/#brlcad sofat (~sofat@202.164.45.208)`
`15:17.34`	`*** join/#brlcad sofat (~sofat@202.164.45.208)`
`15:41.15`	`*** join/#brlcad bhollister2 (~brad@2601:647:cb01:9750:d5ba:1393:eae0:ec4b)`
`15:45.43`	`*** join/#brlcad sofat (~sofat@49.138.113.71)`
`16:03.48`	`*** join/#brlcad sofat (~sofat@101.215.79.175)`
`16:34.50`	`*** join/#brlcad sofat (~sofat@101.215.79.175)`
`16:58.59`	`*** join/#brlcad sofat (~sofat@101.215.79.175)`
`17:23.40`	`*** join/#brlcad sofat (~sofat@202.164.45.208)`
`17:44.08`	`*** join/#brlcad sofat (~sofat@202.164.45.204)`
`17:50.06`	`sofat`	`brlcad, hello`
`17:50.32`	`sofat`	`I want some discussion please reply me`
`17:56.13`	`archivist`	`methinks someone nags too much`
`18:27.33`	`*** join/#brlcad kintel (~kintel@unaffiliated/kintel)`
`18:29.00`	`*** join/#brlcad vasc (~VASC@bl8-192-46.dsl.telepac.pt)`
`18:33.50`	`*** join/#brlcad milamber (~devlin@104-9-73-54.lightspeed.cicril.sbcglobal.net)`
`18:41.19`	`*** join/#brlcad sofat (~sofat@202.164.45.212)`
`19:02.51`	`Notify`	`03BRL-CAD:dhoward * 65849 (brlcad/trunk/include/rt/misc.h brlcad/trunk/src/libged/facetize.c brlcad/trunk/src/librt/screened_poisson.cpp): Added edge sampling to SPR facetization code.`
`19:08.03`	`Notify`	`03BRL-CAD Wiki:Deekaysharma * 9245 /wiki/User:Deekaysharma/logs:`
`19:10.48`	`*** join/#brlcad dracarys983 (dracarys98@nat/iiit/x-xnzkponofzzwciso)`
`19:22.49`	`*** join/#brlcad kintel (~kintel@unaffiliated/kintel)`
`20:08.01`	`Notify`	`03BRL-CAD:ejno * 65850 (brlcad/trunk/include/bu/opt.h brlcad/trunk/include/gcv/api.h and 13 others): initial integration of libgcv plugin argument processing`
`20:10.22`	`*** join/#brlcad milamber (~devlin@2602:306:8094:9360:ed0a:f53f:4f21:2165)`
`20:16.34`	`*** part/#brlcad Ch3ck_ (~Ch3ck@154.70.99.98)`
`20:24.17`	`Notify`	`03BRL-CAD:ejno * 65851 (brlcad/trunk/src/conv/gcv/gcv.c brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp): correct conversion mode of fastgen4_write`
`20:34.04`	`Notify`	`03BRL-CAD:ejno * 65852 brlcad/trunk/src/conv/gcv/gcv.c: correctly set options_data`
`21:47.06`	`*** join/#brlcad kintel (~kintel@unaffiliated/kintel)`
`22:03.11`	`*** join/#brlcad konrado (~konro@41.205.22.53)`
`22:07.21`	`Notify`	`03BRL-CAD Wiki:202.164.45.212 * 9246 /wiki/User:Hiteshsofat/GSoc15/log_developmen:`
`23:06.46`	`*** join/#brlcad kintel (~kintel@unaffiliated/kintel)`
`23:22.32`	`*** join/#brlcad vasc_ (~VASC@bl8-192-46.dsl.telepac.pt)`

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.