| 00:13.13 | *** join/#brlcad scqdzsqsugpfvsyk (~armin@dslc-082-083-184-129.pools.arcor-ip.net) | |
| 00:21.08 | *** join/#brlcad infobot (ibot@rikers.org) | |
| 00:21.08 | *** topic/#brlcad is GSoC students: if you have a question, ask and wait for an answer ... responses may take minutes or hours. Ask and WAIT. ;) | |
| 01:06.38 | *** join/#brlcad teepee (~teepee@unaffiliated/teepee) | |
| 01:10.25 | *** join/#brlcad DaRock (~Thunderbi@mail.unitedinsong.com.au) | |
| 03:23.46 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:24.36 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:25.26 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:26.11 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:27.01 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:27.51 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:28.36 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:29.26 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:30.11 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 03:40.21 | *** join/#brlcad teepee_ (~teepee@unaffiliated/teepee) | |
| 06:50.13 | *** join/#brlcad inxirwtrrrpwmydy (~armin@dslc-082-083-184-129.pools.arcor-ip.net) | |
| 07:26.15 | *** join/#brlcad Caterpillar (~caterpill@unaffiliated/caterpillar) | |
| 10:43.42 | *** join/#brlcad DaRock (~Thunderbi@mail.unitedinsong.com.au) | |
| 13:08.07 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 13:28.21 | *** join/#brlcad yorik (~yorik@2804:431:f721:94ee:290:f5ff:fedc:3bb2) | |
| 14:04.11 | *** join/#brlcad ``Erik (~erik@pool-100-16-14-17.bltmmd.fios.verizon.net) | |
| 15:33.44 | *** join/#brlcad Caterpillar2 (~caterpill@unaffiliated/caterpillar) | |
| 16:26.18 | *** join/#brlcad d_rossberg (~rossberg@104.225.5.10) | |
| 17:19.19 | *** join/#brlcad KimK (~Kim__@2600:8803:7a81:7400:c1b9:9c23:aaf0:3cf0) | |
| 18:41.23 | Notify | 03BRL-CAD:Marco-domingues * 10018 /wiki/User:Marco-domingues/GSoC17/Log: 5 June |
| 19:23.56 | *** join/#brlcad yorik (~yorik@2804:431:f721:94ee:290:f5ff:fedc:3bb2) | |
| 19:36.42 | *** join/#brlcad LordOfBikes (~armin@dslc-082-083-184-129.pools.arcor-ip.net) | |
| 19:37.50 | *** join/#brlcad vasc (~vasc@bl13-101-248.dsl.telepac.pt) | |
| 19:38.25 | vasc | pokes at mdtwenty[m] |
| 19:50.44 | vasc | there was a site with IRC logs for this channel somewhere. someone got a link to the logs? |
| 20:07.18 | mdtwenty[m] | Hey.. After our conversation last week, i came up with a new solution to the weave_segs kernel. I tried first to implement it without allocating a fixed size array to store the segments in each partition, and although the number of partitions per ray was correct, the segments in each partition were not and the code was a bit messy. So I decided to first implement it with a fixed array of segments in each partition (I |
| 20:07.18 | mdtwenty[m] | started with an array of 100 elements because it was sufficient for the example) and the results seemed ok (i.e the number of partitions and segments in each partition after boolean evaluation seemed correct for the example i am using to test) |
| 20:11.25 | vasc | i don't see how that makes things simpler since it was bounded... but do continue. |
| 20:12.09 | vasc | mdtwenty[m] |
| 20:13.58 | starseeker | Notify: irc |
| 20:14.03 | starseeker | hmm |
| 20:14.42 | starseeker | don't remember how that works |
| 20:15.20 | starseeker | vasc: I think you're looking for this? http://infobot.rikers.org/%23brlcad/ |
| 20:15.25 | gcibot_ | [ apt/ibot/infobot/purl logs for 2017 ] |
| 20:15.30 | vasc | yes. that's it! thanks! |
| 20:18.10 | mdtwenty[m] | I tried to make it bounded first so it would be easier to compare the results with a new solution, but if we were to alloc the array with the total number of segments, how we could do that? Since we only know that number after the count_hits kernel is executed |
| 20:19.26 | vasc | yeah but why would you need to know the size before calling count_hits anyway? |
| 20:19.59 | mdtwenty[m] | and when i tried to alloc the memory for that array before creating the opencl bufer, it would take to much time to execute comparing with the previous solution |
| 20:20.23 | vasc | eh? |
| 20:21.05 | vasc | in theory you'll only need to allocate buffers in the graphics card memory. |
| 20:22.02 | vasc | opencl buffers. |
| 20:22.52 | vasc | i don't see how allocating a smaller buffer will be slower than allocating a larger buffer. which is what will happen if you have 100 segments per pixel. |
| 20:27.27 | vasc | you're talking about this? |
| 20:27.28 | vasc | <PROTECTED> |
| 20:27.28 | vasc | <PROTECTED> |
| 20:27.28 | vasc | <PROTECTED> |
| 20:27.28 | vasc | <PROTECTED> |
| 20:27.28 | vasc | <PROTECTED> |
| 20:27.30 | vasc | <PROTECTED> |
| 20:27.32 | vasc | <PROTECTED> |
| 20:27.34 | vasc | <PROTECTED> |
| 20:27.36 | vasc | BU_ASSERT((counts[i-1] % 2) == 0); |
| 20:27.38 | vasc | h[i] = h[i-1] + counts[i-1]/2;/* number of segs is half the number of hits */ |
| 20:27.40 | vasc | <PROTECTED> |
| 20:27.42 | vasc | <PROTECTED> |
| 20:27.45 | vasc | that code is only there because we don't have opencl prefix sums implemented |
| 20:27.51 | Stragus | When the buffer runs out, you can return failure, realloc, then just try again with a bigger buffer and/or fewer rays? |
| 20:27.53 | vasc | it should be done all in the opencl side eventually. |
| 20:28.04 | vasc | you won't need to realloc |
| 20:28.11 | vasc | man |
| 20:28.55 | vasc | IIRC the max amount of partitions is 2x the amount of segments right? |
| 20:29.07 | vasc | so you just allocate that as the maximum buffer size. |
| 20:29.36 | vasc | and then you dynamically grow the virtual buffer, sure, but it will never go past the maximum buffer size. |
| 20:31.06 | Stragus | I seriously lack context here, but the maximum buffer size for buffering all hits through a complex scene can be astronomical. It's very practical to "return failure and try again", it almost never happens in practice |
| 20:31.21 | vasc | no it's not. |
| 20:32.00 | Stragus | Counting hits first also isn't relible since optimization of different kernels will produce slightly different results... besides the whole problem of tracing rays twice |
| 20:32.24 | vasc | allocating memory is much slower than counting hits. |
| 20:32.40 | Stragus | You allocate once and reuse the same buffer over and over |
| 20:32.46 | vasc | hm |
| 20:32.57 | vasc | sure that would work. |
| 20:33.04 | vasc | but why bother. |
| 20:33.12 | Stragus | It's the most efficient solution? |
| 20:33.22 | Stragus | has done exactly that in another ray tracer... |
| 20:33.24 | vasc | i would be happy with something that actually works first. |
| 20:33.41 | Stragus | Counting hits isn't reliable |
| 20:33.49 | vasc | why isn't it reliable? |
| 20:34.25 | Stragus | Because the kernel to count hits and the kernel to record hits are different. They use the same function, but they will all be inlined by the compiler and optimized in different rays |
| 20:34.29 | Stragus | Err, different ways* |
| 20:35.02 | vasc | so you're saying the opencl device won't produce the same results if you run the same code twice? that it isn't deterministic? |
| 20:35.21 | Stragus | It's not the same code, it's two different kernels: counting hints and recording hits |
| 20:35.33 | Stragus | Unless you actually have one kernel that does both with a branch. Less efficient though |
| 20:35.47 | vasc | it's the exact same code. except one stores the results and the other one doesn't. |
| 20:36.27 | Stragus | You do realize that everything OpenCL/CUDA is preferably inlined in the calling device function as one big fat function? |
| 20:36.41 | Stragus | And when you inline and optimize floating point math, results differ slightly |
| 20:36.50 | vasc | i don't see how that makes any difference in the code results. unless the compiler has a bug. |
| 20:36.56 | Stragus | (a+b)+c != a+(b+c) |
| 20:37.05 | vasc | especially because they both call the same exact functions. |
| 20:37.21 | vasc | man, i have it running and it works(tm) |
| 20:37.37 | vasc | i count the hits, alloc a buffer for the hits, and then store them |
| 20:37.40 | vasc | it's in SVN |
| 20:37.42 | Stragus | Okay, but it's not reliable if you enable optimization |
| 20:37.50 | vasc | why shouldn't it be? |
| 20:38.24 | Stragus | _If_ you use two separate kernels, it's all inlined and optimized separately, with slightly different results |
| 20:38.33 | Stragus | If you use a branch in the same kernel, it's fine, but slower |
| 20:38.37 | vasc | no man. because it's THE SAME KERNEL |
| 20:38.49 | vasc | they only difference is a branch which either stores the result or not. |
| 20:39.12 | vasc | :-) |
| 20:39.33 | Stragus | Okay, and you end up tracing rays twice |
| 20:40.10 | vasc | sure. |
| 20:40.17 | vasc | which still beats re-allocating memory. |
| 20:40.25 | Stragus | You don't reallocate! |
| 20:40.49 | vasc | sure. then you replace this simple to solve problem with a more complex problem. |
| 20:40.50 | Stragus | When I did something similar/identical, I had a batch of 32 buffers to store results (one buffer per thread/lane), the offsets were incremented by atomics, and there was a special flag to denote "I ran out of memory" |
| 20:41.02 | vasc | so which size of buffer will you allocate that can use all the gpu compute units? |
| 20:41.24 | vasc | my solution doesn't require atomics either. |
| 20:41.26 | Stragus | If that flag was ever set, you would reallocate _or_ trace less rays. And you did that maybe once, if the heuristics were off for the scene's complexity |
| 20:41.29 | vasc | its lockless. |
| 20:42.11 | Stragus | Wait actually, it wasn't one buffer per thread/lane, I was doing one atomic for the whole warp after counting how much memory all of it required |
| 20:42.21 | Stragus | The hits being buffered in on-chip shared memory |
| 20:42.55 | vasc | atomics don't work on shared memory. they work on global memory. |
| 20:43.00 | vasc | at least in opencl it's like that. |
| 20:43.34 | vasc | plus if you're doing inter-warp computation you don't need atomics. |
| 20:43.59 | Stragus | You have both in CUDA... but the atomics were for the global buffer, shared memory was only for accumulating many hits before flushing to global (with one atomic operation to allocate and flush the results of all threads of the warp) |
| 20:44.06 | vasc | ah ok. |
| 20:44.22 | vasc | you're still replacing a simple problem with a more complex problem. |
| 20:44.38 | Stragus | It's a little complex, but it's much faster than tracing twice |
| 20:45.08 | vasc | meh. there's way worse inneficiencies in the code right now. |
| 20:45.26 | Stragus | Okay then. :) Perhaps keep all this in mind for a future iteration |
| 20:45.37 | vasc | sure. |
| 20:47.39 | vasc | anyway mdtwenty[m], the problem with the change you made is that you can't assume that buffer will be large enough to hold the results. |
| 20:48.17 | mdtwenty[m] | yes i'm aware |
| 20:49.00 | mdtwenty[m] | and perharps i did something wrong when i tried to allocate the buffer for the array of segments in each partition |
| 20:49.25 | vasc | at one point we actually computed that prefix sum with opencl and did no memory transfers in that code, but the thing is i was using a prefix sum code with an Apache Public License code so it needed to be ripped out. |
| 20:49.54 | vasc | will need to either get an MIT licensed algorithm or reimplement it eventually. |
| 20:50.07 | vasc | but for now it doesn't matter. |
| 20:50.15 | vasc | well |
| 20:50.23 | vasc | i wouldn't be surprised if there was a bug there. |
| 20:50.53 | vasc | its okay to try things with a simpler piece of code for now, but eventually you need something that works properly. |
| 20:51.58 | mdtwenty[m] | yes the idea of implementing first this simpler code was to have a base to compare the results with future solutions |
| 20:52.44 | vasc | you could just make a mockup that returns white when there are intersections and black when there are none. |
| 20:52.52 | vasc | so it would make debugging your results easier. |
| 20:53.47 | vasc | i.e. a replacement for clt_shade_segs_kernel |
| 20:54.34 | mdtwenty[m] | yes thanks i will do that |
| 20:55.46 | vasc | eventually you'll need to get the material right as well. |
| 20:56.20 | vasc | where was that in the ANSI C code... |
| 21:03.42 | vasc | right |
| 21:03.58 | vasc | https://svn.code.sf.net/p/brlcad/code/brlcad/trunk/src/rt/view.c |
| 21:04.00 | vasc | colorview() |
| 21:05.32 | vasc | the opencl rt is basically a mix of ANSI C librt, liboptical and rt code... |
| 21:05.46 | vasc | with a lot of things deleted. |
| 21:06.11 | vasc | to make it something a human being can understand. |
| 21:06.45 | vasc | of course most of those features will probably need to be added back eventually. |
| 21:14.01 | mdtwenty[m] | sure :) |
| 21:14.55 | mdtwenty[m] | i will implement your suggestion of shading the intersections with a different color to see it there is still a problem with the weave_segs kernel |
| 21:15.37 | mdtwenty[m] | and will also work on replacing the bounded array |
| 21:19.09 | vasc | just make sure to keep backups |
| 21:19.43 | vasc | once you get a black/white shader working then message me. |
| 21:23.17 | vasc | i also started with before i got the shading working properly: |
| 21:23.19 | vasc | https://brlcad.org/w/images/thumb/8/87/Cl_havoc.png/512px-Cl_havoc.png |
| 21:25.04 | vasc | s/started with/started with the black&white shader |
| 21:25.23 | mdtwenty[m] | yes it is definitely a good idea |
| 21:26.07 | mdtwenty[m] | i will implement that and will give you a heads up when its done :) |
| 21:38.02 | vasc | okay then |
| 21:38.32 | vasc | i notice you've been updating your blog, so keep at it |
| 21:43.38 | mdtwenty[m] | yes i will keep posting my daily progress on the blog! |
| 23:20.34 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 23:40.31 | *** join/#brlcad teepee (~teepee@unaffiliated/teepee) | |