| 00:19.42 | *** join/#brlcad infobot (~infobot@rikers.org) | |
| 00:19.42 | *** topic/#brlcad is GSoC students: if you have a question, ask and wait for an answer ... responses may take minutes or hours. Ask and WAIT. ;) | |
| 08:55.54 | *** join/#brlcad infobot (~infobot@rikers.org) | |
| 08:55.54 | *** topic/#brlcad is GSoC students: if you have a question, ask and wait for an answer ... responses may take minutes or hours. Ask and WAIT. ;) | |
| 09:22.05 | *** join/#brlcad mdtwenty[m] (mdtwentyma@gateway/shell/matrix.org/x-iwpdlhgermucyhhk) | |
| 09:23.32 | *** join/#brlcad Caterpillar2 (~caterpill@unaffiliated/caterpillar) | |
| 09:59.54 | *** join/#brlcad merzo (~merzo@252-22-132-95.pool.ukrtel.net) | |
| 10:35.58 | *** join/#brlcad teepee (~teepee@unaffiliated/teepee) | |
| 11:15.02 | *** join/#brlcad teepee (~teepee@unaffiliated/teepee) | |
| 11:31.33 | *** join/#brlcad teepee (~teepee@unaffiliated/teepee) | |
| 11:37.21 | Notify | 03BRL-CAD:Amritpal singh * 10098 /wiki/User:Amritpal_singh/GSoC17/logs: /* Coding Period */ |
| 11:39.42 | Notify | 03BRL-CAD:Amritpal singh * 10099 /wiki/User:Amritpal_singh/GSoC17/logs: /* Coding Period */ |
| 12:01.06 | *** join/#brlcad teepee (~teepee@unaffiliated/teepee) | |
| 12:44.16 | *** join/#brlcad gabbar1947 (uid205515@gateway/web/irccloud.com/x-jhwmzfdcblkcoioz) | |
| 12:56.45 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 13:37.12 | *** join/#brlcad deep-book-gk_ (~1wm_su@94.242.252.58) | |
| 13:37.42 | *** part/#brlcad deep-book-gk_ (~1wm_su@94.242.252.58) | |
| 13:42.33 | *** join/#brlcad yorik (~yorik@2804:431:f720:9892:290:f5ff:fedc:3bb2) | |
| 13:46.45 | *** join/#brlcad teepee (~teepee@unaffiliated/teepee) | |
| 15:17.34 | *** join/#brlcad vasc (~vasc@bl4-6-201.dsl.telepac.pt) | |
| 15:19.21 | *** join/#brlcad vasc (~vasc@bl4-6-201.dsl.telepac.pt) | |
| 15:19.24 | vasc | hello mdtwenty[m] |
| 15:20.28 | mdtwenty[m] | hello |
| 15:20.38 | mdtwenty[m] | sorry yesterday, was having lunch |
| 15:21.02 | vasc | no problem. i had to leave early in the afternoon as well. |
| 15:21.14 | vasc | how's the code going? |
| 15:23.24 | mdtwenty[m] | hm so after translating the bits of the boolean tree i could get the operators scene to work for some views (i found later that some views have a strange behaviour) |
| 15:23.53 | vasc | so what's the difference between translating the bits and not translating the bits, in terms of visual output? |
| 15:23.59 | Notify | 03BRL-CAD Wiki:95.18.89.88 * 10100 /wiki/User:Mariomeissner/logs: |
| 15:25.27 | mdtwenty[m] | if i dont translate the bits, the render always differ, because the boolean trees changes and so do the partitions evaluated |
| 15:26.03 | mdtwenty[m] | but when i translate the tree the output is fixed |
| 15:26.26 | vasc | ok |
| 15:26.48 | vasc | well |
| 15:27.22 | vasc | i think it's like this. we clean up the code a bit and try to put it into the SVN branch. |
| 15:27.37 | vasc | there's still some bugs, but i think it's close to optimal. |
| 15:28.02 | vasc | at least as far as an alpha release can go |
| 15:28.51 | vasc | from now on, make your code against the opencl branch: https://svn.code.sf.net/p/brlcad/code/brlcad/branches/opencl/src/librt/ |
| 15:28.52 | gcibot | [ p/brlcad/code - Revision 69941: /brlcad/branches/opencl/src/librt ] |
| 15:29.09 | mdtwenty[m] | yes, i think so! |
| 15:29.17 | vasc | make a patch, download that, and make a patch against that version |
| 15:29.27 | vasc | then i'll review it and we'll apply it |
| 15:29.55 | vasc | this isn't good enough to go in the trunk yet, but i think we need to keep it stored someplace. |
| 15:30.50 | *** join/#brlcad merzo (~merzo@136-3-133-95.pool.ukrtel.net) | |
| 15:31.12 | mdtwenty[m] | will do that! |
| 15:31.31 | mdtwenty[m] | this is the one example of a bug that is happening |
| 15:32.03 | mdtwenty[m] | uploaded an image: operators.png (138KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/QRtTVgTwEFAWVKOtedoSguQa> |
| 15:32.03 | mdtwenty[m] | is some views this "holes" happend |
| 15:32.53 | vasc | that's really weird. |
| 15:33.27 | mdtwenty[m] | hm it is not that weird with the wireframe |
| 15:33.31 | mdtwenty[m] | sec |
| 15:34.05 | vasc | it's like it's evaluating the tree wrong? |
| 15:34.28 | vasc | the segments seem ok |
| 15:35.24 | mdtwenty[m] | uploaded an image: operators_wire.png (192KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/QEPwQVVQWceIapNElAnxOMfn> |
| 15:35.37 | mdtwenty[m] | it seems like the geometry in yellow is interfering |
| 15:35.57 | vasc | yeah that helps. like thought, it's like its evaluating the boolean csg wrong but the segments seem to be ok. |
| 15:36.09 | mdtwenty[m] | this can be an error with the regiontable, or the lack of support for overlapping partitions |
| 15:36.22 | mdtwenty[m] | i think, not sure yet |
| 15:37.13 | vasc | well once we get this into svn, we need to fix that issue with the region's primitive id translation |
| 15:37.41 | vasc | and we need to get some kind of translator so that we can compare the output of the intermediate steps in the opencl and ansi c code. |
| 15:37.59 | vasc | or just fix the bugs. |
| 15:38.38 | vasc | lack of support for overlapping partitions? |
| 15:38.58 | vasc | in which part of the code is that supposed to be in? |
| 15:39.36 | mdtwenty[m] | is a part of the rt_boolfinal function |
| 15:39.39 | vasc | oh |
| 15:39.53 | vasc | but the boolweave is feature complete right? |
| 15:40.51 | mdtwenty[m] | i didn't implemented it yet because was trying to understand the FASTGEN regions |
| 15:41.18 | vasc | ah. THAT |
| 15:41.23 | mdtwenty[m] | yeah i think booleweave is complete now |
| 15:41.32 | vasc | ignore those. in fact just strip the FASTGEN code from the opencl port. |
| 15:42.15 | vasc | FASTGEN is like legacy support for an older solid modelling system that the US military used from what I understand. |
| 15:42.35 | vasc | fact is, i didn't even port the FASTGEN primitives. |
| 15:42.48 | vasc | so it's kinda pointless to implement the FASTGEN csg code. |
| 15:43.41 | vasc | if for whatever reason it's necessary to implement FASTGEN someday, then we'll think about it. |
| 15:43.55 | mdtwenty[m] | hm i see. i was not sure if it was important for the ocl code so thanks for claryfying |
| 15:44.24 | vasc | BRL-CAD has a FASTGEN import module that imports FASTGEN scenes. |
| 15:44.44 | vasc | the database format for FASTGEN is kinda weird. it has like special primitives and the way the rendering works is also different. |
| 15:46.15 | Notify | 03BRL-CAD Wiki:Mariomeissner * 10101 /wiki/User:Mariomeissner/logs: |
| 15:46.32 | mdtwenty[m] | ok |
| 15:46.55 | mdtwenty[m] | the other day you said something about storing the results of the boolean evaluation in the struct partition |
| 15:47.48 | mdtwenty[m] | which i currently do |
| 15:48.18 | vasc | that should probably be kept in a separate data structure |
| 15:48.36 | vasc | or we should just make the evaluation work faster so that we don't need to cache that in the first place. |
| 15:50.20 | *** join/#brlcad skat00sh (uid103741@gateway/web/irccloud.com/x-owitdxmtukjpgbew) | |
| 15:51.35 | vasc | btw the opencl branch in svn already has the boolean tree code. |
| 15:51.47 | vasc | so you might get merge conflicts because of that. |
| 15:52.02 | vasc | you'll have to manually apply the patch. |
| 15:53.11 | mdtwenty[m] | ok thanks for the heads up |
| 15:53.19 | mdtwenty[m] | will apply it mannually |
| 16:01.33 | mdtwenty[m] | i will just remove some debug code from the code and will clean it a bit before submiting the patch to the opencl branch |
| 16:02.23 | mdtwenty[m] | or should i finish rt_boolfinal first? (overlapping partitions) |
| 16:03.37 | vasc | well |
| 16:04.09 | vasc | just show me what you have |
| 16:04.29 | vasc | there should be some debug and log code in there |
| 16:05.52 | vasc | some of those should be kept i think |
| 16:15.01 | mdtwenty[m] | yeah probably is good idea to have some debug code in there |
| 16:15.12 | mdtwenty[m] | posted a file: rt_bool_final.patch (62KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/uomSumphCREJqrcPmarkhjYX> |
| 16:15.19 | mdtwenty[m] | this is what i got right now |
| 16:15.50 | mdtwenty[m] | i m already using a dynamic bitvector for the regiontable |
| 16:16.30 | vasc | yeah but that's against trunk/ not branches/opencl right? |
| 16:16.51 | mdtwenty[m] | ah yes it is not against the opencl branch yet |
| 16:17.44 | mdtwenty[m] | i just checked out the opencl branch from the svn, so if you give me some time i can apply the patch manually and sent it to you |
| 16:18.16 | vasc | there also seems to be some noise |
| 16:18.28 | vasc | like in the rendering function rt.cl |
| 16:18.44 | vasc | because you indented some code differently some things are reported as changed even though the code is the same |
| 16:20.43 | mdtwenty[m] | oh i see.. will fix that |
| 16:30.08 | vasc | this code is missing in opencl boolweave: |
| 16:30.16 | vasc | if (segp->seg_stp->st_aradius < INFINITY && |
| 16:30.16 | vasc | <PROTECTED> |
| 16:30.16 | vasc | <PROTECTED> |
| 16:30.18 | vasc | ... |
| 16:32.01 | vasc | also this is kinda strange: |
| 16:32.02 | vasc | <PROTECTED> |
| 16:32.32 | vasc | <PROTECTED> |
| 16:33.33 | vasc | is that the circular 'pointer' in the head and tail of the list again? |
| 16:33.55 | vasc | shouldn't it be, like, 'j = head_pp' or whatever? |
| 16:37.33 | mdtwenty[m] | hum yes, j = head_pp is equivalent, but shorter |
| 16:38.00 | mdtwenty[m] | i already have it that way in rt_boolfinal |
| 16:39.15 | vasc | also name those functions boolweave and boolfinal |
| 16:39.22 | vasc | don't use different names than the ANSI C names |
| 16:39.35 | vasc | it makes it harder to understand which is which |
| 16:40.03 | vasc | and yeah eval_partitions/rt_boolfinal needs to be cleaned up |
| 16:40.13 | vasc | and some things need to be refactored. |
| 16:43.45 | vasc | yeah we'll need an overlap handler... |
| 16:44.04 | vasc | ah well. |
| 16:44.13 | vasc | fix the things i said and make a patch against branches/opencl |
| 16:44.33 | vasc | i'll then apply the boolweave code, but the boolfinal code still needs some work |
| 16:45.35 | mdtwenty[m] | sure will do that |
| 17:08.19 | Stragus | Hrm... perhaps these INFINITY should be replaced with FLT_MAX or DBL_MAX |
| 17:08.39 | Stragus | Some chips become up to 900 times slower when perfoming a floating point operation where an infinity or NaN is involved |
| 17:08.51 | Stragus | (including comparisons) |
| 17:10.01 | vasc | well, we want bug for bug compatibility with the ANSI C code though. |
| 17:10.27 | Stragus | Right. That comment was mostly for the CPU side of things |
| 17:10.31 | vasc | i mean if we wanted speed we wouldn't be using doubles on a GPU in the first place. |
| 17:10.38 | Stragus | Eh, indeed |
| 17:11.10 | vasc | still its a reasonable argument. considering we have some defines for doubles as floats an an option. |
| 17:11.13 | Stragus | But even modern CPU Intel chips are 200-300 times slower with infinities. AMD doesn't care about inf/NaN |
| 17:11.47 | vasc | so you say #undef INFINITY and #define INFINITY DBL_MAX? |
| 17:12.30 | Stragus | Basically yes, though I would personally prefer some custom foo_MAX macro rather than replacing INFINITY |
| 17:13.39 | vasc | in opencl that would be MAXFLOAT it seems |
| 17:14.09 | vasc | ah no |
| 17:14.11 | vasc | that's SP |
| 17:14.15 | Stragus | nods |
| 17:15.04 | vasc | doesn't the compiler do those kinds of optimizations you use ffast-math or something? |
| 17:15.09 | vasc | if you use |
| 17:15.18 | Stragus | No, that would change the behavior of the code |
| 17:16.28 | vasc | i think ffast-math disables those checks though |
| 17:16.28 | Stragus | I'm not currently aware of the performance of Inf/NaN on GPUs, but it's a good idea to avoid these in any case |
| 17:16.45 | vasc | https://gcc.gnu.org/wiki/FloatingPointMath |
| 17:16.46 | gcibot | [ FloatingPointMath - GCC Wiki ] |
| 17:17.20 | vasc | "In addition GCC offers the -ffast-math flag which is a shortcut for several options, presenting the least conforming but fastest math mode. It enables -fno-trapping-math, -funsafe-math-optimizations, -ffinite-math-only, -fno-errno-math, -fno-signaling-nans, -fno-rounding-math, -fcx-limited-range and -fno-signed-zeros." |
| 17:17.20 | vasc | -ffinite-math-only |
| 17:17.56 | Stragus | Hrm -ffinite-math-only, indeed |
| 17:18.19 | Stragus | Though I'm aware of a performance gain when I removed infinity checks on code that was using ffast-math several years ago |
| 17:19.16 | Stragus | And assuming you do want to check for overflows, a check against DBL_MAX is still a good idea, eh |
| 17:19.19 | vasc | well it wouldn't be the first time a compiler wouldn't behave like it's supposed to. |
| 17:21.53 | mdtwenty[m] | hm, should i use DBL_MAX then? |
| 17:22.26 | vasc | keep it as is for now |
| 17:22.26 | vasc | we don't want even more weird behavior right now. |
| 17:22.26 | vasc | leave the optimizations for later. |
| 17:22.38 | vasc | just make a note for it. |
| 17:22.54 | mdtwenty[m] | ok :) |
| 17:42.30 | vasc | given the amount of things which need to be optimized... |
| 17:42.45 | vasc | we'll go for algorithmic improvements first. |
| 17:46.54 | vasc | there's lots of O(N^2) things, spurious memory usage and access and things like that which need to be fixed first |
| 17:48.00 | vasc | besides i'm not sure the compiler doesn't do that in the first place |
| 17:48.21 | vasc | without looking at the assembly code output i wouldn't make changes like that. |
| 17:51.12 | Stragus | Ah right... but I think perhaps that should all have been done before porting to GPUs? |
| 17:52.47 | Stragus | Optimization and debugging on GPUs is more troublesome, it's easier to settle the algorithm and code on CPUs first |
| 17:52.47 | Stragus | And I'm not entirely convinced about GPU performance considering the need for double precision, compared to CPU AVX2 |
| 17:53.18 | vasc | well. this is OpenCL. it runs on the CPU as well. in fact mdtwenty[m] has been running and testing it there. |
| 17:54.03 | vasc | and i did prototype the boolean evaluator in ANSI C before mdtwenty[m] ported it over. |
| 17:54.30 | vasc | the boolean weaving code is also a relatively straightfoward port. |
| 17:54.38 | Stragus | All right then |
| 17:54.52 | vasc | the boolfinal might not be, because i suspect the current way of doing it isn't optimal. but mdtwenty[m]'s still working on that. |
| 17:55.33 | vasc | also it's not that GPUs are slow at double's. it's that NVIDIA cripples the budget GPUs. |
| 17:56.43 | vasc | have you looked at the DP FLOPS of the V100? |
| 17:57.08 | Stragus | Sure sure, it's all right in the $3k GPUs |
| 17:57.09 | vasc | 7014 GFLOPS on the PCIe V100 |
| 17:57.16 | vasc | DP GFLOPS |
| 17:58.10 | Stragus | On consumer GPUs, I have had better performance using dual-float math instead of doubles (for similar accuracy) |
| 17:58.16 | vasc | how many GFLOPS do those Skylake server processors or the AMD Epyc have? |
| 17:58.52 | vasc | it says in this article |
| 17:58.56 | vasc | http://www.eetimes.com/document.asp?doc_id=1331988&page_number=2 |
| 17:58.57 | gcibot | [ Intel Skylake Counters AMD Epyc | EE Times ] |
| 17:59.11 | vasc | 32 FLOPS/cycle |
| 18:00.14 | vasc | DP FLOPS |
| 18:00.27 | vasc | 28 cores |
| 18:00.30 | vasc | 3.6 GHz |
| 18:01.22 | Stragus | So about half of a $3k GPU |
| 18:01.28 | vasc | 3225.6 DP GFLOPS/peak? |
| 18:01.33 | Stragus | Right |
| 18:02.30 | vasc | https://en.wikichip.org/wiki/intel/xeon_platinum/8180 |
| 18:02.31 | gcibot | [ Xeon Platinum 8180 - Intel - WikiChip ] |
| 18:02.37 | vasc | Release Price$10009.00 |
| 18:02.49 | vasc | GPU wins that one. |
| 18:03.20 | vasc | let's see how much the entry level costs. |
| 18:03.21 | Stragus | Screw you Intel :), I'm waiting for dual-socket Epyc motherboards to upgrade my desktop |
| 18:04.06 | vasc | https://en.wikichip.org/wiki/intel/xeon_bronze |
| 18:04.07 | gcibot | [ Xeon Bronze - Intel - WikiChip ] |
| 18:04.09 | vasc | those are cheaper. |
| 18:04.47 | vasc | also half the clockspeed. |
| 18:04.47 | Stragus | And I know profesional grade GPUs are better at double precision. But on a typical desktop machine with a gaming GPU, it's not so clear |
| 18:05.06 | vasc | yeah, it's a good question, what's better on a typical desktop. |
| 18:05.29 | vasc | which is one reason why we went for opencl and not cuda, despite all the extra work in it. |
| 18:05.35 | vasc | because of the crap libraries. |
| 18:06.30 | Stragus | The best double-precision-like performance I had on gaming GPUs was a healthy mix of regular floats and dual-floats |
| 18:06.59 | Stragus | Just in case you could use that, here's my code for double-float arithmetics: http://www.rayforce.net/ddm.h |
| 18:09.21 | vasc | what's the license? |
| 18:09.54 | vasc | 0h it uses sse |
| 18:09.54 | Stragus | "Do whatever you want with it", I should put a header |
| 18:09.54 | Stragus | No no, that was just some optional optimization attempt |
| 18:10.25 | Stragus | The double-double math is also useful when you need higher accuracy than double but with decent performance |
| 18:10.48 | vasc | ok i'll keep this under my hat |
| 18:10.51 | vasc | :-) |
| 18:10.56 | vasc | now really bbl |
| 18:11.02 | Stragus | :) Okay |
| 18:24.54 | *** part/#brlcad mdtwenty[m] (mdtwentyma@gateway/shell/matrix.org/x-iwpdlhgermucyhhk) | |
| 20:13.44 | *** join/#brlcad merzo (~merzo@136-3-133-95.pool.ukrtel.net) | |
| 21:28.54 | *** join/#brlcad infobot (~infobot@rikers.org) | |
| 21:28.55 | *** topic/#brlcad is GSoC students: if you have a question, ask and wait for an answer ... responses may take minutes or hours. Ask and WAIT. ;) | |
| 21:34.44 | *** join/#brlcad kintel (~kintel@unaffiliated/kintel) | |
| 21:45.21 | Notify | 03BRL-CAD:starseeker * 69942 (brlcad/trunk/misc/CMake/BRLCAD_Targets.cmake brlcad/trunk/src/libbu/CMakeLists.txt): Tweak astyle validation logic |