| 00:31.39 | Notify | 03BRL-CAD Wiki:Bhollister * 9240 /wiki/User:Bhollister/DevLogAug2015: |
| 00:32.22 | Notify | 03BRL-CAD:vasco_costa * 65844 (brlcad/trunk/src/librt/librt_private.h brlcad/trunk/src/librt/primitives/arb8/arb8.c and 12 others): pass struct with primitive data to opencl as an initial step to an AoS device primitive database. move constants into common.cl. |
| 01:10.43 | *** join/#brlcad vasc__ (~vasc@bl8-192-46.dsl.telepac.pt) | |
| 02:56.19 | *** join/#brlcad sofat (~androirc@101.214.213.146) | |
| 03:17.59 | *** join/#brlcad gurwinder (~chatzilla@117.212.50.212) | |
| 03:24.02 | starseeker | sofat? |
| 03:24.05 | starseeker | nuts |
| 04:06.25 | gurwinder | brlcad: Hi I have exported ehy and epa now moving towards rhc rpc and bot. |
| 04:07.26 | Notify | 03BRL-CAD:vasco_costa * 65845 (brlcad/trunk/src/librt/librt_private.h brlcad/trunk/src/librt/primitives/arb8/arb8.c and 14 others): generic opencl solid shot handler. refactored code to remove duplicates. |
| 04:08.34 | vasc__ | that's that. i think i did all i could on trunk without changing the apis. |
| 04:08.52 | vasc__ | i think i'll continue on the branch |
| 04:11.52 | Notify | 03BRL-CAD Wiki:Vasco.costa * 9241 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 11 : 3 Aug-9 Aug */ |
| 04:15.41 | brlcad | vasc__: looks pretty good |
| 04:16.31 | brlcad | vasc__: please also assign your patches to yourself, mark them as accepted, and close them out too (denote the commit revision in a comment) as you commit them |
| 04:24.35 | vasc__ | i think i did that to all the patches i had on the tracker |
| 04:25.37 | vasc__ | that i commited |
| 04:25.50 | vasc__ | i guess i can assign to myself the patches i didn't commit as well |
| 04:27.31 | vasc__ | so basically the thing to do next is to store the scene database on the gpu |
| 04:29.37 | vasc__ | i'm going to redo the database code. |
| 04:29.54 | Notify | 03BRL-CAD Wiki:Vasco.costa * 9242 /wiki/User:Vasco.costa/GSoC15/logs: /* Development Status */ |
| 04:29.55 | vasc__ | the patches i commited o trunk already did like half the work on that |
| 04:31.05 | vasc__ | there's a generic shot callback that calls the primitive specific callback that uses a pointer to the memory region where the primitive data is |
| 04:31.42 | vasc__ | so the only thing to do is to actually allocate, copy the data to device memory |
| 04:35.59 | vasc__ | i guess i could commit the scan code i have to trunk. but the thing is nothing will call it until i do the rest of the code |
| 04:36.47 | vasc__ | anyway not today |
| 04:39.34 | vasc__ | i also found out that the nvidia opencl compiler doesn't handle large .cl files very well... |
| 04:39.56 | vasc__ | so i had to split them up and compile them separately and then link them |
| 05:04.49 | brlcad | interesting -- any idea on what the limit is/was? |
| 05:08.27 | Stragus | That's weird, I have compiled huge .cu (CUDA) files. Very large device functions or just files? |
| 05:09.11 | Stragus | And what error or problem were you experiencing? |
| 05:15.31 | Notify | 03BRL-CAD:vasco_costa * 65846 (brlcad/trunk/src/librt/primitives/ehy/ehy_shot.cl brlcad/trunk/src/librt/primitives/ell/ell_shot.cl and 3 others): load large opencl vectors on demand to reduce stack footprint per function call. |
| 05:15.50 | vasc__ | it just gave me some ptxas function is being called with wrong number of arguments or something |
| 05:16.11 | vasc__ | which usually means that the code is calling a function that isn't defined anywhere |
| 05:16.32 | Stragus | Output the PTX assembly and inspect it |
| 05:16.35 | vasc__ | nah |
| 05:16.40 | vasc__ | it works this way |
| 05:16.51 | vasc__ | and i know the AMD GPU compiler also creaks on large files so |
| 05:17.19 | Stragus | It's probably more an issue of a single huge kernel rather than large files |
| 05:17.19 | vasc__ | i tried concatenating it all into one file and it didn't work |
| 05:17.30 | vasc__ | it probably tried inlining everything yes |
| 05:17.35 | vasc__ | and then it croaked |
| 05:17.45 | Stragus | Right. Which shouldn't happen |
| 05:19.21 | Notify | 03BRL-CAD Wiki:Vasco.costa * 9243 /wiki/User:Vasco.costa/GSoC15/logs: /* Week 11 : 3 Aug-9 Aug */ |
| 05:19.39 | vasc__ | yeah. i could have inspected the assembly but... |
| 05:19.44 | vasc__ | *snore* |
| 05:20.04 | vasc__ | it actually makes more sense this way |
| 05:20.33 | vasc__ | i was just including everything into a huge file |
| 05:21.47 | Stragus | Actual function calls are slow on most GPU hardware |
| 05:21.54 | Stragus | But yes, not a big issue at the moment |
| 05:23.12 | vasc__ | i hope i don't have memory alignment issues anymore |
| 05:24.06 | vasc__ | everything should be aligned in 8 byte boundaries |
| 05:26.26 | vasc__ | damned huge doubles |
| 05:27.22 | Stragus | I don't even see how that could be an issue in the first place |
| 05:27.33 | vasc__ | ah |
| 05:27.36 | Stragus | On CPU as well, you definitely want 8 bytes alignment for your doubles |
| 05:27.48 | Stragus | In fact, you should want 32 bytes alignment for bundles of 4 doubles |
| 05:28.17 | vasc__ | right. i considered that. there's just a teensy little issue with that and AoS |
| 05:28.42 | Stragus | On GPU, it should be bundles of 32 doubles |
| 05:28.58 | vasc__ | ah the triangle ray tracers were so much simpler |
| 05:29.01 | Stragus | (Which is obviously also quite fine on CPU) |
| 05:29.51 | Stragus | I thought space partitionning traversal would be the tricky part, and it doesn't matter what kind of primitives are there |
| 05:30.14 | vasc__ | sure |
| 05:30.16 | Stragus | Then you just call the intersection for whatever primitive encountered |
| 05:30.24 | vasc__ | but remember each primitive has a different size |
| 05:30.46 | Stragus | Does that make a big difference? |
| 05:31.00 | vasc__ | i'm just going to allocate a contiguous memory block and stuff all that primitive data in there in serialized form |
| 05:31.06 | Stragus | Good call |
| 05:31.27 | Stragus | So, make sure all sizeof() are aligned, (sizeof(foo)+0xf)&~0xf |
| 05:31.44 | Stragus | Probably better with some kind of macro, eh |
| 05:31.47 | vasc__ | yeah that was my problem |
| 05:31.58 | vasc__ | i hope it's magically working now |
| 05:32.14 | vasc__ | if it isn't i'll use the thing you said |
| 05:33.09 | vasc__ | so they'll all be multiples of 8 bytes |
| 05:33.36 | Stragus | That's 16 byte alignment actually, typed instinctively for SSE |
| 05:33.49 | Stragus | <PROTECTED> |
| 05:33.57 | vasc__ | yeah |
| 05:34.08 | vasc__ | so its 0x7 then |
| 05:36.08 | vasc__ | the grid was a bad idea... |
| 05:36.19 | Stragus | :( |
| 05:36.28 | vasc__ | i forgot the primitives can be quite expensive to intersect |
| 05:36.34 | Stragus | Yes |
| 05:36.41 | Stragus | I didn't think it was a good idea either |
| 05:36.50 | vasc__ | a bvh would be a lot better |
| 05:36.59 | Stragus | Spatial partitionning is good for triangles because intersection is so cheap |
| 05:37.06 | Stragus | But these NURBS and stuff are a different beast |
| 05:37.35 | vasc__ | it would probably take weeks to do a modern bvh builder though |
| 05:37.44 | vasc__ | a gpu one at least |
| 05:38.06 | Stragus | Meh, it can be built on the CPU, then upload the big chunk of memory to the GPU |
| 05:38.16 | Stragus | But yes, it's still a massive amount of work |
| 05:38.17 | vasc__ | yeah that is probably a lot more doable |
| 05:38.53 | Stragus | My CUDA raytracer was also building on the CPU. Everything was packed/interleaved into just one big chunk of memory. You could raytrace on the CPU with it, on the GPU, save it to disk, whatever |
| 05:41.17 | Stragus | Since everything was packed into a big chunk of memory, you could have per-primitive "extra data" packed within the graph, and so on. That extra data could vary between primitives |
| 05:41.22 | vasc__ | i actually know quite a lot about gpu bvh builders although i'm grid guy |
| 05:41.25 | Stragus | That sounds like a good approach for a CSG raytracer too |
| 05:41.45 | Stragus | I'm a graph person, I don't like hierarchies :p |
| 05:43.00 | vasc__ | i'll think if i'll use the grids or not |
| 05:43.18 | vasc__ | i would like to use that golliath scene as a benchmark of sorts |
| 05:43.26 | vasc__ | it ain't gonna cut it without some acceleration scheme |
| 05:43.31 | vasc__ | i think it has like 200 primitives |
| 05:43.56 | vasc__ | which is kinda low but |
| 05:44.01 | Stragus | How much time do you have to implement this? |
| 05:44.14 | vasc__ | i have the code done. i did it a couple of weeks back |
| 05:44.15 | vasc__ | oh |
| 05:44.20 | vasc__ | well until the end of this month |
| 05:44.41 | vasc__ | that's why i went with the grids to begin with |
| 05:44.43 | Stragus | My opinion is that any part of the whole task is better done very well and correctly, or left to someone else |
| 05:44.44 | vasc__ | its a lot simpler |
| 05:44.56 | Stragus | (But my opinion has no weight whatsoever on this) |
| 05:45.27 | Stragus | Half-good solutions have to be rewritten anyway |
| 05:45.50 | vasc__ | i've never believed that a system was ever complete anyway |
| 05:46.06 | vasc__ | even if i coded the currently best bvh in a couple of years it could be crap |
| 05:46.36 | Stragus | It might then be suboptimal but it won't be crap :p |
| 05:47.04 | vasc__ | a low resolution grid is probably okaish |
| 05:47.18 | vasc__ | i think my issue is i was using too fine subdivision |
| 05:47.19 | Stragus | I wouldn't personally use a BVH, but this is complex and there's too little time to explore new ideas |
| 05:47.34 | Stragus | Sure, it can work |
| 05:47.48 | vasc__ | well its just that the current code uses mailboxing and crap like that |
| 05:47.57 | vasc__ | if we used the bvh the mailboxing wouldn't be needed anymore |
| 05:48.19 | vasc__ | not that i'll use mailboxing with the grid either |
| 05:48.25 | vasc__ | i'll just multiple-intersect things |
| 05:48.28 | Stragus | I agree it requires object partitionning rather than spatial partitionning |
| 05:48.34 | vasc__ | ar ar |
| 05:48.38 | Stragus | It's the whole "hierarchy" thing I disagree with |
| 05:48.48 | vasc__ | well it is csg after all |
| 05:49.03 | Stragus | My raytracer never writes a byte to any shared or global memory during traversal, until the hit callback is called |
| 05:49.14 | Stragus | Any kind of hierarchy involves building a stack of some sort, and GPUs hate that |
| 05:49.15 | vasc__ | kewl |
| 05:49.38 | vasc__ | yeah. if you use a lot of stack space you reduce the amount of threads you can spawn |
| 05:49.53 | vasc__ | coz you have limited L1 cache for registers and stack |
| 05:50.07 | Stragus | The L1 cache and registers are independent |
| 05:50.17 | Stragus | But the stack is stored in global memory and it is SLOW, even with that crappy L1 cache |
| 05:50.20 | vasc__ | yeah its split |
| 05:50.29 | vasc__ | global? |
| 05:50.44 | vasc__ | that's lame |
| 05:50.48 | Stragus | No no, the L1 and shared memory shares the same chunk of on-chip "cache" |
| 05:51.12 | Stragus | Registers are totally independent, and a whole lot faster |
| 05:51.41 | vasc__ | i thought you could choose the amount that goes into registers and remaining L1 on driver loading or something |
| 05:51.54 | Stragus | You choose how to split between L1 and shared memory |
| 05:52.27 | vasc__ | ah no its the shared memory yeah |
| 05:52.34 | vasc__ | uhoh |
| 05:52.53 | Stragus | Anyhow, experimenting with novel ideas takes more time than you have |
| 05:52.55 | vasc__ | so that's why function calls are slow as heck |
| 05:52.59 | Stragus | Indeed |
| 05:53.03 | Stragus | It's terrible |
| 05:53.28 | *** join/#brlcad milamber (~devlin@2602:306:8094:9360:b941:e8cd:a8d8:db8d) | |
| 05:56.55 | vasc__ | the current code uses a shitton of temporaries |
| 05:57.07 | Stragus | GPUs have tons of registers |
| 05:57.17 | Stragus | Memory is slow, but registers are free :p |
| 05:57.50 | vasc__ | yeah but if you use a lot of registers you can't spawn as many threads |
| 05:58.26 | Stragus | Can you ask OpenCL about register usage? We can with CUDA |
| 05:58.42 | vasc__ | yeah CUDA has some compiler flag |
| 05:59.02 | Stragus | Hum... I meant a runtime thing on the kernel, but it's true I'm using the low-level driver API |
| 05:59.07 | vasc__ | you can pass flags to the opencl compiler. i'm not sure if you can use the same flags as CUDA though. |
| 05:59.35 | vasc__ | nvcc has some compiler flag that says how much registers a kernel uses |
| 05:59.44 | Stragus | Well, that works |
| 06:00.04 | vasc__ | but that's for cuda |
| 06:00.17 | vasc__ | it's too early to think about that |
| 06:06.49 | vasc__ | later |
| 06:21.47 | Notify | 03BRL-CAD Wiki:Shaina7837 * 9244 /wiki/User:Shainasabarwal/GSoC15/logs: /* 27 July */ |
| 06:28.28 | *** join/#brlcad teepee (~teepee@unaffiliated/teepee) | |
| 06:59.03 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 07:14.53 | *** join/#brlcad milamber1 (~devlin@2602:306:8094:9360:ed0a:f53f:4f21:2165) | |
| 07:45.50 | *** join/#brlcad teepee-- (bc5c2134@gateway/web/freenode/ip.188.92.33.52) | |
| 09:24.40 | starseeker | brlcad: http://www.cmake.org/pipermail/cmake/2011-June/045233.html |
| 09:31.03 | starseeker | in fact, they caution in the docs not to list outputs of custom commands in multiple targets: http://www.cmake.org/cmake/help/v3.0/command/add_custom_command.html |
| 09:31.39 | starseeker | and I see we are doing just that with the obj-g code |
| 09:33.39 | starseeker | and I'm doing it in one of the step directories as well |
| 09:33.56 | starseeker | OK, that's probably it then |
| 09:34.31 | starseeker | I'll wade into fixing that ASAP |
| 11:57.58 | Notify | 03BRL-CAD:carlmoore * 65847 (brlcad/trunk/AUTHORS brlcad/trunk/src/librt/primitives/arb8/arb8.c and 8 others): remove trailing white space, and fix spelling |
| 12:21.42 | *** join/#brlcad konrado (~konro@41.205.22.13) | |
| 12:36.12 | *** join/#brlcad Ch3ck_ (~Ch3ck@154.70.99.98) | |
| 13:00.02 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 13:37.48 | *** join/#brlcad sofat (~sofat@202.164.45.208) | |
| 13:51.27 | sofat | brlcad, I need your help in google custom search |
| 13:51.38 | sofat | please reply me if you free |
| 13:53.06 | *** join/#brlcad sofat_ (~androirc@49.138.113.71) | |
| 13:59.29 | sofat | starseeker, I have submitted the new patch on building system I also solve the problem which you told me . I have made presentation.xsl.in file to auto generate the presentation.xsl file so please review this patch. patch no:401 |
| 14:54.34 | Notify | 03BRL-CAD:ejno * 65848 brlcad/trunk/include/bu/opt.h: add parentheses around macro arguments |
| 15:07.16 | *** join/#brlcad sofat (~sofat@202.164.45.208) | |
| 15:17.34 | *** join/#brlcad sofat (~sofat@202.164.45.208) | |
| 15:41.15 | *** join/#brlcad bhollister2 (~brad@2601:647:cb01:9750:d5ba:1393:eae0:ec4b) | |
| 15:45.43 | *** join/#brlcad sofat (~sofat@49.138.113.71) | |
| 16:03.48 | *** join/#brlcad sofat (~sofat@101.215.79.175) | |
| 16:34.50 | *** join/#brlcad sofat (~sofat@101.215.79.175) | |
| 16:58.59 | *** join/#brlcad sofat (~sofat@101.215.79.175) | |
| 17:23.40 | *** join/#brlcad sofat (~sofat@202.164.45.208) | |
| 17:44.08 | *** join/#brlcad sofat (~sofat@202.164.45.204) | |
| 17:50.06 | sofat | brlcad, hello |
| 17:50.32 | sofat | I want some discussion please reply me |
| 17:56.13 | archivist | methinks someone nags too much |
| 18:27.33 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 18:29.00 | *** join/#brlcad vasc (~VASC@bl8-192-46.dsl.telepac.pt) | |
| 18:33.50 | *** join/#brlcad milamber (~devlin@104-9-73-54.lightspeed.cicril.sbcglobal.net) | |
| 18:41.19 | *** join/#brlcad sofat (~sofat@202.164.45.212) | |
| 19:02.51 | Notify | 03BRL-CAD:dhoward * 65849 (brlcad/trunk/include/rt/misc.h brlcad/trunk/src/libged/facetize.c brlcad/trunk/src/librt/screened_poisson.cpp): Added edge sampling to SPR facetization code. |
| 19:08.03 | Notify | 03BRL-CAD Wiki:Deekaysharma * 9245 /wiki/User:Deekaysharma/logs: |
| 19:10.48 | *** join/#brlcad dracarys983 (dracarys98@nat/iiit/x-xnzkponofzzwciso) | |
| 19:22.49 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 20:08.01 | Notify | 03BRL-CAD:ejno * 65850 (brlcad/trunk/include/bu/opt.h brlcad/trunk/include/gcv/api.h and 13 others): initial integration of libgcv plugin argument processing |
| 20:10.22 | *** join/#brlcad milamber (~devlin@2602:306:8094:9360:ed0a:f53f:4f21:2165) | |
| 20:16.34 | *** part/#brlcad Ch3ck_ (~Ch3ck@154.70.99.98) | |
| 20:24.17 | Notify | 03BRL-CAD:ejno * 65851 (brlcad/trunk/src/conv/gcv/gcv.c brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp): correct conversion mode of fastgen4_write |
| 20:34.04 | Notify | 03BRL-CAD:ejno * 65852 brlcad/trunk/src/conv/gcv/gcv.c: correctly set options_data |
| 21:47.06 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 22:03.11 | *** join/#brlcad konrado (~konro@41.205.22.53) | |
| 22:07.21 | Notify | 03BRL-CAD Wiki:202.164.45.212 * 9246 /wiki/User:Hiteshsofat/GSoc15/log_developmen: |
| 23:06.46 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 23:22.32 | *** join/#brlcad vasc_ (~VASC@bl8-192-46.dsl.telepac.pt) | |