IRC log for #brlcad on 20170605

00:13.13 *** join/#brlcad scqdzsqsugpfvsyk (~armin@dslc-082-083-184-129.pools.arcor-ip.net)
00:21.08 *** join/#brlcad infobot (ibot@rikers.org)
00:21.08 *** topic/#brlcad is GSoC students: if you have a question, ask and wait for an answer ... responses may take minutes or hours. Ask and WAIT. ;)
01:06.38 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
01:10.25 *** join/#brlcad DaRock (~Thunderbi@mail.unitedinsong.com.au)
03:23.46 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:24.36 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:25.26 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:26.11 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:27.01 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:27.51 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:28.36 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:29.26 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:30.11 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
03:40.21 *** join/#brlcad teepee_ (~teepee@unaffiliated/teepee)
06:50.13 *** join/#brlcad inxirwtrrrpwmydy (~armin@dslc-082-083-184-129.pools.arcor-ip.net)
07:26.15 *** join/#brlcad Caterpillar (~caterpill@unaffiliated/caterpillar)
10:43.42 *** join/#brlcad DaRock (~Thunderbi@mail.unitedinsong.com.au)
13:08.07 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
13:28.21 *** join/#brlcad yorik (~yorik@2804:431:f721:94ee:290:f5ff:fedc:3bb2)
14:04.11 *** join/#brlcad ``Erik (~erik@pool-100-16-14-17.bltmmd.fios.verizon.net)
15:33.44 *** join/#brlcad Caterpillar2 (~caterpill@unaffiliated/caterpillar)
16:26.18 *** join/#brlcad d_rossberg (~rossberg@104.225.5.10)
17:19.19 *** join/#brlcad KimK (~Kim__@2600:8803:7a81:7400:c1b9:9c23:aaf0:3cf0)
18:41.23 Notify 03BRL-CAD:Marco-domingues * 10018 /wiki/User:Marco-domingues/GSoC17/Log: 5 June
19:23.56 *** join/#brlcad yorik (~yorik@2804:431:f721:94ee:290:f5ff:fedc:3bb2)
19:36.42 *** join/#brlcad LordOfBikes (~armin@dslc-082-083-184-129.pools.arcor-ip.net)
19:37.50 *** join/#brlcad vasc (~vasc@bl13-101-248.dsl.telepac.pt)
19:38.25 vasc pokes at mdtwenty[m]
19:50.44 vasc there was a site with IRC logs for this channel somewhere. someone got a link to the logs?
20:07.18 mdtwenty[m] Hey.. After our conversation last week, i came up with a new solution to the weave_segs kernel. I tried first to implement it without allocating a fixed size array to store the segments in each partition, and although the number of partitions per ray was correct, the segments in each partition were not and the code was a bit messy. So I decided to first implement it with a fixed array of segments in each partition (I
20:07.18 mdtwenty[m] started with an array of 100 elements because it was sufficient for the example) and the results seemed ok (i.e the number of partitions and segments in each partition after boolean evaluation seemed correct for the example i am using to test)
20:11.25 vasc i don't see how that makes things simpler since it was bounded... but do continue.
20:12.09 vasc mdtwenty[m]
20:13.58 starseeker Notify: irc
20:14.03 starseeker hmm
20:14.42 starseeker don't remember how that works
20:15.20 starseeker vasc: I think you're looking for this? http://infobot.rikers.org/%23brlcad/
20:15.25 gcibot_ [ apt/ibot/infobot/purl logs for 2017 ]
20:15.30 vasc yes. that's it! thanks!
20:18.10 mdtwenty[m] I tried to make it bounded first so it would be easier to compare the results with a new solution, but if we were to alloc the array with the total number of segments, how we could do that? Since we only know that number after the count_hits kernel is executed
20:19.26 vasc yeah but why would you need to know the size before calling count_hits anyway?
20:19.59 mdtwenty[m] and when i tried to alloc the memory for that array before creating the opencl bufer, it would take to much time to execute comparing with the previous solution
20:20.23 vasc eh?
20:21.05 vasc in theory you'll only need to allocate buffers in the graphics card memory.
20:22.02 vasc opencl buffers.
20:22.52 vasc i don't see how allocating a smaller buffer will be slower than allocating a larger buffer. which is what will happen if you have 100 segments per pixel.
20:27.27 vasc you're talking about this?
20:27.28 vasc <PROTECTED>
20:27.28 vasc <PROTECTED>
20:27.28 vasc <PROTECTED>
20:27.28 vasc <PROTECTED>
20:27.28 vasc <PROTECTED>
20:27.30 vasc <PROTECTED>
20:27.32 vasc <PROTECTED>
20:27.34 vasc <PROTECTED>
20:27.36 vasc BU_ASSERT((counts[i-1] % 2) == 0);
20:27.38 vasc h[i] = h[i-1] + counts[i-1]/2;/* number of segs is half the number of hits */
20:27.40 vasc <PROTECTED>
20:27.42 vasc <PROTECTED>
20:27.45 vasc that code is only there because we don't have opencl prefix sums implemented
20:27.51 Stragus When the buffer runs out, you can return failure, realloc, then just try again with a bigger buffer and/or fewer rays?
20:27.53 vasc it should be done all in the opencl side eventually.
20:28.04 vasc you won't need to realloc
20:28.11 vasc man
20:28.55 vasc IIRC the max amount of partitions is 2x the amount of segments right?
20:29.07 vasc so you just allocate that as the maximum buffer size.
20:29.36 vasc and then you dynamically grow the virtual buffer, sure, but it will never go past the maximum buffer size.
20:31.06 Stragus I seriously lack context here, but the maximum buffer size for buffering all hits through a complex scene can be astronomical. It's very practical to "return failure and try again", it almost never happens in practice
20:31.21 vasc no it's not.
20:32.00 Stragus Counting hits first also isn't relible since optimization of different kernels will produce slightly different results... besides the whole problem of tracing rays twice
20:32.24 vasc allocating memory is much slower than counting hits.
20:32.40 Stragus You allocate once and reuse the same buffer over and over
20:32.46 vasc hm
20:32.57 vasc sure that would work.
20:33.04 vasc but why bother.
20:33.12 Stragus It's the most efficient solution?
20:33.22 Stragus has done exactly that in another ray tracer...
20:33.24 vasc i would be happy with something that actually works first.
20:33.41 Stragus Counting hits isn't reliable
20:33.49 vasc why isn't it reliable?
20:34.25 Stragus Because the kernel to count hits and the kernel to record hits are different. They use the same function, but they will all be inlined by the compiler and optimized in different rays
20:34.29 Stragus Err, different ways*
20:35.02 vasc so you're saying the opencl device won't produce the same results if you run the same code twice? that it isn't deterministic?
20:35.21 Stragus It's not the same code, it's two different kernels: counting hints and recording hits
20:35.33 Stragus Unless you actually have one kernel that does both with a branch. Less efficient though
20:35.47 vasc it's the exact same code. except one stores the results and the other one doesn't.
20:36.27 Stragus You do realize that everything OpenCL/CUDA is preferably inlined in the calling device function as one big fat function?
20:36.41 Stragus And when you inline and optimize floating point math, results differ slightly
20:36.50 vasc i don't see how that makes any difference in the code results. unless the compiler has a bug.
20:36.56 Stragus (a+b)+c != a+(b+c)
20:37.05 vasc especially because they both call the same exact functions.
20:37.21 vasc man, i have it running and it works(tm)
20:37.37 vasc i count the hits, alloc a buffer for the hits, and then store them
20:37.40 vasc it's in SVN
20:37.42 Stragus Okay, but it's not reliable if you enable optimization
20:37.50 vasc why shouldn't it be?
20:38.24 Stragus _If_ you use two separate kernels, it's all inlined and optimized separately, with slightly different results
20:38.33 Stragus If you use a branch in the same kernel, it's fine, but slower
20:38.37 vasc no man. because it's THE SAME KERNEL
20:38.49 vasc they only difference is a branch which either stores the result or not.
20:39.12 vasc :-)
20:39.33 Stragus Okay, and you end up tracing rays twice
20:40.10 vasc sure.
20:40.17 vasc which still beats re-allocating memory.
20:40.25 Stragus You don't reallocate!
20:40.49 vasc sure. then you replace this simple to solve problem with a more complex problem.
20:40.50 Stragus When I did something similar/identical, I had a batch of 32 buffers to store results (one buffer per thread/lane), the offsets were incremented by atomics, and there was a special flag to denote "I ran out of memory"
20:41.02 vasc so which size of buffer will you allocate that can use all the gpu compute units?
20:41.24 vasc my solution doesn't require atomics either.
20:41.26 Stragus If that flag was ever set, you would reallocate _or_ trace less rays. And you did that maybe once, if the heuristics were off for the scene's complexity
20:41.29 vasc its lockless.
20:42.11 Stragus Wait actually, it wasn't one buffer per thread/lane, I was doing one atomic for the whole warp after counting how much memory all of it required
20:42.21 Stragus The hits being buffered in on-chip shared memory
20:42.55 vasc atomics don't work on shared memory. they work on global memory.
20:43.00 vasc at least in opencl it's like that.
20:43.34 vasc plus if you're doing inter-warp computation you don't need atomics.
20:43.59 Stragus You have both in CUDA... but the atomics were for the global buffer, shared memory was only for accumulating many hits before flushing to global (with one atomic operation to allocate and flush the results of all threads of the warp)
20:44.06 vasc ah ok.
20:44.22 vasc you're still replacing a simple problem with a more complex problem.
20:44.38 Stragus It's a little complex, but it's much faster than tracing twice
20:45.08 vasc meh. there's way worse inneficiencies in the code right now.
20:45.26 Stragus Okay then. :) Perhaps keep all this in mind for a future iteration
20:45.37 vasc sure.
20:47.39 vasc anyway mdtwenty[m], the problem with the change you made is that you can't assume that buffer will be large enough to hold the results.
20:48.17 mdtwenty[m] yes i'm aware
20:49.00 mdtwenty[m] and perharps i did something wrong when i tried to allocate the buffer for the array of segments in each partition
20:49.25 vasc at one point we actually computed that prefix sum with opencl and did no memory transfers in that code, but the thing is i was using a prefix sum code with an Apache Public License code so it needed to be ripped out.
20:49.54 vasc will need to either get an MIT licensed algorithm or reimplement it eventually.
20:50.07 vasc but for now it doesn't matter.
20:50.15 vasc well
20:50.23 vasc i wouldn't be surprised if there was a bug there.
20:50.53 vasc its okay to try things with a simpler piece of code for now, but eventually you need something that works properly.
20:51.58 mdtwenty[m] yes the idea of implementing first this simpler code was to have a base to compare the results with future solutions
20:52.44 vasc you could just make a mockup that returns white when there are intersections and black when there are none.
20:52.52 vasc so it would make debugging your results easier.
20:53.47 vasc i.e. a replacement for clt_shade_segs_kernel
20:54.34 mdtwenty[m] yes thanks i will do that
20:55.46 vasc eventually you'll need to get the material right as well.
20:56.20 vasc where was that in the ANSI C code...
21:03.42 vasc right
21:03.58 vasc https://svn.code.sf.net/p/brlcad/code/brlcad/trunk/src/rt/view.c
21:04.00 vasc colorview()
21:05.32 vasc the opencl rt is basically a mix of ANSI C librt, liboptical and rt code...
21:05.46 vasc with a lot of things deleted.
21:06.11 vasc to make it something a human being can understand.
21:06.45 vasc of course most of those features will probably need to be added back eventually.
21:14.01 mdtwenty[m] sure :)
21:14.55 mdtwenty[m] i will implement your suggestion of shading the intersections with a different color to see it there is still a problem with the weave_segs kernel
21:15.37 mdtwenty[m] and will also work on replacing the bounded array
21:19.09 vasc just make sure to keep backups
21:19.43 vasc once you get a black/white shader working then message me.
21:23.17 vasc i also started with before i got the shading working properly:
21:23.19 vasc https://brlcad.org/w/images/thumb/8/87/Cl_havoc.png/512px-Cl_havoc.png
21:25.04 vasc s/started with/started with the black&white shader
21:25.23 mdtwenty[m] yes it is definitely a good idea
21:26.07 mdtwenty[m] i will implement that and will give you a heads up when its done :)
21:38.02 vasc okay then
21:38.32 vasc i notice you've been updating your blog, so keep at it
21:43.38 mdtwenty[m] yes i will keep posting my daily progress on the blog!
23:20.34 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
23:40.31 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.