IRC log for #brlcad on 20170713

00:19.42 *** join/#brlcad infobot (~infobot@rikers.org)
00:19.42 *** topic/#brlcad is GSoC students: if you have a question, ask and wait for an answer ... responses may take minutes or hours. Ask and WAIT. ;)
08:55.54 *** join/#brlcad infobot (~infobot@rikers.org)
08:55.54 *** topic/#brlcad is GSoC students: if you have a question, ask and wait for an answer ... responses may take minutes or hours. Ask and WAIT. ;)
09:22.05 *** join/#brlcad mdtwenty[m] (mdtwentyma@gateway/shell/matrix.org/x-iwpdlhgermucyhhk)
09:23.32 *** join/#brlcad Caterpillar2 (~caterpill@unaffiliated/caterpillar)
09:59.54 *** join/#brlcad merzo (~merzo@252-22-132-95.pool.ukrtel.net)
10:35.58 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
11:15.02 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
11:31.33 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
11:37.21 Notify 03BRL-CAD:Amritpal singh * 10098 /wiki/User:Amritpal_singh/GSoC17/logs: /* Coding Period */
11:39.42 Notify 03BRL-CAD:Amritpal singh * 10099 /wiki/User:Amritpal_singh/GSoC17/logs: /* Coding Period */
12:01.06 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
12:44.16 *** join/#brlcad gabbar1947 (uid205515@gateway/web/irccloud.com/x-jhwmzfdcblkcoioz)
12:56.45 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
13:37.12 *** join/#brlcad deep-book-gk_ (~1wm_su@94.242.252.58)
13:37.42 *** part/#brlcad deep-book-gk_ (~1wm_su@94.242.252.58)
13:42.33 *** join/#brlcad yorik (~yorik@2804:431:f720:9892:290:f5ff:fedc:3bb2)
13:46.45 *** join/#brlcad teepee (~teepee@unaffiliated/teepee)
15:17.34 *** join/#brlcad vasc (~vasc@bl4-6-201.dsl.telepac.pt)
15:19.21 *** join/#brlcad vasc (~vasc@bl4-6-201.dsl.telepac.pt)
15:19.24 vasc hello mdtwenty[m]
15:20.28 mdtwenty[m] hello
15:20.38 mdtwenty[m] sorry yesterday, was having lunch
15:21.02 vasc no problem. i had to leave early in the afternoon as well.
15:21.14 vasc how's the code going?
15:23.24 mdtwenty[m] hm so after translating the bits of the boolean tree i could get the operators scene to work for some views (i found later that some views have a strange behaviour)
15:23.53 vasc so what's the difference between translating the bits and not translating the bits, in terms of visual output?
15:23.59 Notify 03BRL-CAD Wiki:95.18.89.88 * 10100 /wiki/User:Mariomeissner/logs:
15:25.27 mdtwenty[m] if i dont translate the bits, the render always differ, because the boolean trees changes and so do the partitions evaluated
15:26.03 mdtwenty[m] but when i translate the tree the output is fixed
15:26.26 vasc ok
15:26.48 vasc well
15:27.22 vasc i think it's like this. we clean up the code a bit and try to put it into the SVN branch.
15:27.37 vasc there's still some bugs, but i think it's close to optimal.
15:28.02 vasc at least as far as an alpha release can go
15:28.51 vasc from now on, make your code against the opencl branch: https://svn.code.sf.net/p/brlcad/code/brlcad/branches/opencl/src/librt/
15:28.52 gcibot [ p/brlcad/code - Revision 69941: /brlcad/branches/opencl/src/librt ]
15:29.09 mdtwenty[m] yes, i think so!
15:29.17 vasc make a patch, download that, and make a patch against that version
15:29.27 vasc then i'll review it and we'll apply it
15:29.55 vasc this isn't good enough to go in the trunk yet, but i think we need to keep it stored someplace.
15:30.50 *** join/#brlcad merzo (~merzo@136-3-133-95.pool.ukrtel.net)
15:31.12 mdtwenty[m] will do that!
15:31.31 mdtwenty[m] this is the one example of a bug that is happening
15:32.03 mdtwenty[m] uploaded an image: operators.png (138KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/QRtTVgTwEFAWVKOtedoSguQa>
15:32.03 mdtwenty[m] is some views this "holes" happend
15:32.53 vasc that's really weird.
15:33.27 mdtwenty[m] hm it is not that weird with the wireframe
15:33.31 mdtwenty[m] sec
15:34.05 vasc it's like it's evaluating the tree wrong?
15:34.28 vasc the segments seem ok
15:35.24 mdtwenty[m] uploaded an image: operators_wire.png (192KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/QEPwQVVQWceIapNElAnxOMfn>
15:35.37 mdtwenty[m] it seems like the geometry in yellow is interfering
15:35.57 vasc yeah that helps. like thought, it's like its evaluating the boolean csg wrong but the segments seem to be ok.
15:36.09 mdtwenty[m] this can be an error with the regiontable, or the lack of support for overlapping partitions
15:36.22 mdtwenty[m] i think, not sure yet
15:37.13 vasc well once we get this into svn, we need to fix that issue with the region's primitive id translation
15:37.41 vasc and we need to get some kind of translator so that we can compare the output of the intermediate steps in the opencl and ansi c code.
15:37.59 vasc or just fix the bugs.
15:38.38 vasc lack of support for overlapping partitions?
15:38.58 vasc in which part of the code is that supposed to be in?
15:39.36 mdtwenty[m] is a part of the rt_boolfinal function
15:39.39 vasc oh
15:39.53 vasc but the boolweave is feature complete right?
15:40.51 mdtwenty[m] i didn't implemented it yet because was trying to understand the FASTGEN regions
15:41.18 vasc ah. THAT
15:41.23 mdtwenty[m] yeah i think booleweave is complete now
15:41.32 vasc ignore those. in fact just strip the FASTGEN code from the opencl port.
15:42.15 vasc FASTGEN is like legacy support for an older solid modelling system that the US military used from what I understand.
15:42.35 vasc fact is, i didn't even port the FASTGEN primitives.
15:42.48 vasc so it's kinda pointless to implement the FASTGEN csg code.
15:43.41 vasc if for whatever reason it's necessary to implement FASTGEN someday, then we'll think about it.
15:43.55 mdtwenty[m] hm i see. i was not sure if it was important for the ocl code so thanks for claryfying
15:44.24 vasc BRL-CAD has a FASTGEN import module that imports FASTGEN scenes.
15:44.44 vasc the database format for FASTGEN is kinda weird. it has like special primitives and the way the rendering works is also different.
15:46.15 Notify 03BRL-CAD Wiki:Mariomeissner * 10101 /wiki/User:Mariomeissner/logs:
15:46.32 mdtwenty[m] ok
15:46.55 mdtwenty[m] the other day you said something about storing the results of the boolean evaluation in the struct partition
15:47.48 mdtwenty[m] which i currently do
15:48.18 vasc that should probably be kept in a separate data structure
15:48.36 vasc or we should just make the evaluation work faster so that we don't need to cache that in the first place.
15:50.20 *** join/#brlcad skat00sh (uid103741@gateway/web/irccloud.com/x-owitdxmtukjpgbew)
15:51.35 vasc btw the opencl branch in svn already has the boolean tree code.
15:51.47 vasc so you might get merge conflicts because of that.
15:52.02 vasc you'll have to manually apply the patch.
15:53.11 mdtwenty[m] ok thanks for the heads up
15:53.19 mdtwenty[m] will apply it mannually
16:01.33 mdtwenty[m] i will just remove some debug code from the code and will clean it a bit before submiting the patch to the opencl branch
16:02.23 mdtwenty[m] or should i finish rt_boolfinal first? (overlapping partitions)
16:03.37 vasc well
16:04.09 vasc just show me what you have
16:04.29 vasc there should be some debug and log code in there
16:05.52 vasc some of those should be kept i think
16:15.01 mdtwenty[m] yeah probably is good idea to have some debug code in there
16:15.12 mdtwenty[m] posted a file: rt_bool_final.patch (62KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/uomSumphCREJqrcPmarkhjYX>
16:15.19 mdtwenty[m] this is what i got right now
16:15.50 mdtwenty[m] i m already using a dynamic bitvector for the regiontable
16:16.30 vasc yeah but that's against trunk/ not branches/opencl right?
16:16.51 mdtwenty[m] ah yes it is not against the opencl branch yet
16:17.44 mdtwenty[m] i just checked out the opencl branch from the svn, so if you give me some time i can apply the patch manually and sent it to you
16:18.16 vasc there also seems to be some noise
16:18.28 vasc like in the rendering function rt.cl
16:18.44 vasc because you indented some code differently some things are reported as changed even though the code is the same
16:20.43 mdtwenty[m] oh i see.. will fix that
16:30.08 vasc this code is missing in opencl boolweave:
16:30.16 vasc if (segp->seg_stp->st_aradius < INFINITY &&
16:30.16 vasc <PROTECTED>
16:30.16 vasc <PROTECTED>
16:30.18 vasc ...
16:32.01 vasc also this is kinda strange:
16:32.02 vasc <PROTECTED>
16:32.32 vasc <PROTECTED>
16:33.33 vasc is that the circular 'pointer' in the head and tail of the list again?
16:33.55 vasc shouldn't it be, like, 'j = head_pp' or whatever?
16:37.33 mdtwenty[m] hum yes, j = head_pp is equivalent, but shorter
16:38.00 mdtwenty[m] i already have it that way in rt_boolfinal
16:39.15 vasc also name those functions boolweave and boolfinal
16:39.22 vasc don't use different names than the ANSI C names
16:39.35 vasc it makes it harder to understand which is which
16:40.03 vasc and yeah eval_partitions/rt_boolfinal needs to be cleaned up
16:40.13 vasc and some things need to be refactored.
16:43.45 vasc yeah we'll need an overlap handler...
16:44.04 vasc ah well.
16:44.13 vasc fix the things i said and make a patch against branches/opencl
16:44.33 vasc i'll then apply the boolweave code, but the boolfinal code still needs some work
16:45.35 mdtwenty[m] sure will do that
17:08.19 Stragus Hrm... perhaps these INFINITY should be replaced with FLT_MAX or DBL_MAX
17:08.39 Stragus Some chips become up to 900 times slower when perfoming a floating point operation where an infinity or NaN is involved
17:08.51 Stragus (including comparisons)
17:10.01 vasc well, we want bug for bug compatibility with the ANSI C code though.
17:10.27 Stragus Right. That comment was mostly for the CPU side of things
17:10.31 vasc i mean if we wanted speed we wouldn't be using doubles on a GPU in the first place.
17:10.38 Stragus Eh, indeed
17:11.10 vasc still its a reasonable argument. considering we have some defines for doubles as floats an an option.
17:11.13 Stragus But even modern CPU Intel chips are 200-300 times slower with infinities. AMD doesn't care about inf/NaN
17:11.47 vasc so you say #undef INFINITY and #define INFINITY DBL_MAX?
17:12.30 Stragus Basically yes, though I would personally prefer some custom foo_MAX macro rather than replacing INFINITY
17:13.39 vasc in opencl that would be MAXFLOAT it seems
17:14.09 vasc ah no
17:14.11 vasc that's SP
17:14.15 Stragus nods
17:15.04 vasc doesn't the compiler do those kinds of optimizations you use ffast-math or something?
17:15.09 vasc if you use
17:15.18 Stragus No, that would change the behavior of the code
17:16.28 vasc i think ffast-math disables those checks though
17:16.28 Stragus I'm not currently aware of the performance of Inf/NaN on GPUs, but it's a good idea to avoid these in any case
17:16.45 vasc https://gcc.gnu.org/wiki/FloatingPointMath
17:16.46 gcibot [ FloatingPointMath - GCC Wiki ]
17:17.20 vasc "In addition GCC offers the -ffast-math flag which is a shortcut for several options, presenting the least conforming but fastest math mode. It enables -fno-trapping-math, -funsafe-math-optimizations, -ffinite-math-only, -fno-errno-math, -fno-signaling-nans, -fno-rounding-math, -fcx-limited-range and -fno-signed-zeros."
17:17.20 vasc -ffinite-math-only
17:17.56 Stragus Hrm -ffinite-math-only, indeed
17:18.19 Stragus Though I'm aware of a performance gain when I removed infinity checks on code that was using ffast-math several years ago
17:19.16 Stragus And assuming you do want to check for overflows, a check against DBL_MAX is still a good idea, eh
17:19.19 vasc well it wouldn't be the first time a compiler wouldn't behave like it's supposed to.
17:21.53 mdtwenty[m] hm, should i use DBL_MAX then?
17:22.26 vasc keep it as is for now
17:22.26 vasc we don't want even more weird behavior right now.
17:22.26 vasc leave the optimizations for later.
17:22.38 vasc just make a note for it.
17:22.54 mdtwenty[m] ok :)
17:42.30 vasc given the amount of things which need to be optimized...
17:42.45 vasc we'll go for algorithmic improvements first.
17:46.54 vasc there's lots of O(N^2) things, spurious memory usage and access and things like that which need to be fixed first
17:48.00 vasc besides i'm not sure the compiler doesn't do that in the first place
17:48.21 vasc without looking at the assembly code output i wouldn't make changes like that.
17:51.12 Stragus Ah right... but I think perhaps that should all have been done before porting to GPUs?
17:52.47 Stragus Optimization and debugging on GPUs is more troublesome, it's easier to settle the algorithm and code on CPUs first
17:52.47 Stragus And I'm not entirely convinced about GPU performance considering the need for double precision, compared to CPU AVX2
17:53.18 vasc well. this is OpenCL. it runs on the CPU as well. in fact mdtwenty[m] has been running and testing it there.
17:54.03 vasc and i did prototype the boolean evaluator in ANSI C before mdtwenty[m] ported it over.
17:54.30 vasc the boolean weaving code is also a relatively straightfoward port.
17:54.38 Stragus All right then
17:54.52 vasc the boolfinal might not be, because i suspect the current way of doing it isn't optimal. but mdtwenty[m]'s still working on that.
17:55.33 vasc also it's not that GPUs are slow at double's. it's that NVIDIA cripples the budget GPUs.
17:56.43 vasc have you looked at the DP FLOPS of the V100?
17:57.08 Stragus Sure sure, it's all right in the $3k GPUs
17:57.09 vasc 7014 GFLOPS on the PCIe V100
17:57.16 vasc DP GFLOPS
17:58.10 Stragus On consumer GPUs, I have had better performance using dual-float math instead of doubles (for similar accuracy)
17:58.16 vasc how many GFLOPS do those Skylake server processors or the AMD Epyc have?
17:58.52 vasc it says in this article
17:58.56 vasc http://www.eetimes.com/document.asp?doc_id=1331988&page_number=2
17:58.57 gcibot [ Intel Skylake Counters AMD Epyc | EE Times ]
17:59.11 vasc 32 FLOPS/cycle
18:00.14 vasc DP FLOPS
18:00.27 vasc 28 cores
18:00.30 vasc 3.6 GHz
18:01.22 Stragus So about half of a $3k GPU
18:01.28 vasc 3225.6 DP GFLOPS/peak?
18:01.33 Stragus Right
18:02.30 vasc https://en.wikichip.org/wiki/intel/xeon_platinum/8180
18:02.31 gcibot [ Xeon Platinum 8180 - Intel - WikiChip ]
18:02.37 vasc Release Price$10009.00
18:02.49 vasc GPU wins that one.
18:03.20 vasc let's see how much the entry level costs.
18:03.21 Stragus Screw you Intel :), I'm waiting for dual-socket Epyc motherboards to upgrade my desktop
18:04.06 vasc https://en.wikichip.org/wiki/intel/xeon_bronze
18:04.07 gcibot [ Xeon Bronze - Intel - WikiChip ]
18:04.09 vasc those are cheaper.
18:04.47 vasc also half the clockspeed.
18:04.47 Stragus And I know profesional grade GPUs are better at double precision. But on a typical desktop machine with a gaming GPU, it's not so clear
18:05.06 vasc yeah, it's a good question, what's better on a typical desktop.
18:05.29 vasc which is one reason why we went for opencl and not cuda, despite all the extra work in it.
18:05.35 vasc because of the crap libraries.
18:06.30 Stragus The best double-precision-like performance I had on gaming GPUs was a healthy mix of regular floats and dual-floats
18:06.59 Stragus Just in case you could use that, here's my code for double-float arithmetics: http://www.rayforce.net/ddm.h
18:09.21 vasc what's the license?
18:09.54 vasc 0h it uses sse
18:09.54 Stragus "Do whatever you want with it", I should put a header
18:09.54 Stragus No no, that was just some optional optimization attempt
18:10.25 Stragus The double-double math is also useful when you need higher accuracy than double but with decent performance
18:10.48 vasc ok i'll keep this under my hat
18:10.51 vasc :-)
18:10.56 vasc now really bbl
18:11.02 Stragus :) Okay
18:24.54 *** part/#brlcad mdtwenty[m] (mdtwentyma@gateway/shell/matrix.org/x-iwpdlhgermucyhhk)
20:13.44 *** join/#brlcad merzo (~merzo@136-3-133-95.pool.ukrtel.net)
21:28.54 *** join/#brlcad infobot (~infobot@rikers.org)
21:28.55 *** topic/#brlcad is GSoC students: if you have a question, ask and wait for an answer ... responses may take minutes or hours. Ask and WAIT. ;)
21:34.44 *** join/#brlcad kintel (~kintel@unaffiliated/kintel)
21:45.21 Notify 03BRL-CAD:starseeker * 69942 (brlcad/trunk/misc/CMake/BRLCAD_Targets.cmake brlcad/trunk/src/libbu/CMakeLists.txt): Tweak astyle validation logic

Generated by irclog2html.pl Modified by Tim Riker to work with infobot.