00:19.42 |
*** join/#brlcad infobot
(~infobot@rikers.org) |
00:19.42 |
*** topic/#brlcad is GSoC
students: if you have a question, ask and wait for an answer ...
responses may take minutes or hours. Ask and WAIT.
;) |
08:55.54 |
*** join/#brlcad infobot
(~infobot@rikers.org) |
08:55.54 |
*** topic/#brlcad is GSoC
students: if you have a question, ask and wait for an answer ...
responses may take minutes or hours. Ask and WAIT.
;) |
09:22.05 |
*** join/#brlcad mdtwenty[m]
(mdtwentyma@gateway/shell/matrix.org/x-iwpdlhgermucyhhk) |
09:23.32 |
*** join/#brlcad Caterpillar2
(~caterpill@unaffiliated/caterpillar) |
09:59.54 |
*** join/#brlcad merzo
(~merzo@252-22-132-95.pool.ukrtel.net) |
10:35.58 |
*** join/#brlcad teepee
(~teepee@unaffiliated/teepee) |
11:15.02 |
*** join/#brlcad teepee
(~teepee@unaffiliated/teepee) |
11:31.33 |
*** join/#brlcad teepee
(~teepee@unaffiliated/teepee) |
11:37.21 |
Notify |
03BRL-CAD:Amritpal singh * 10098
/wiki/User:Amritpal_singh/GSoC17/logs: /* Coding Period
*/ |
11:39.42 |
Notify |
03BRL-CAD:Amritpal singh * 10099
/wiki/User:Amritpal_singh/GSoC17/logs: /* Coding Period
*/ |
12:01.06 |
*** join/#brlcad teepee
(~teepee@unaffiliated/teepee) |
12:44.16 |
*** join/#brlcad gabbar1947
(uid205515@gateway/web/irccloud.com/x-jhwmzfdcblkcoioz) |
12:56.45 |
*** join/#brlcad kintel
(~kintel@unaffiliated/kintel) |
13:37.12 |
*** join/#brlcad deep-book-gk_
(~1wm_su@94.242.252.58) |
13:37.42 |
*** part/#brlcad deep-book-gk_
(~1wm_su@94.242.252.58) |
13:42.33 |
*** join/#brlcad yorik
(~yorik@2804:431:f720:9892:290:f5ff:fedc:3bb2) |
13:46.45 |
*** join/#brlcad teepee
(~teepee@unaffiliated/teepee) |
15:17.34 |
*** join/#brlcad vasc
(~vasc@bl4-6-201.dsl.telepac.pt) |
15:19.21 |
*** join/#brlcad vasc
(~vasc@bl4-6-201.dsl.telepac.pt) |
15:19.24 |
vasc |
hello mdtwenty[m] |
15:20.28 |
mdtwenty[m] |
hello |
15:20.38 |
mdtwenty[m] |
sorry yesterday, was having lunch |
15:21.02 |
vasc |
no problem. i had to leave early in the
afternoon as well. |
15:21.14 |
vasc |
how's the code going? |
15:23.24 |
mdtwenty[m] |
hm so after translating the bits of the
boolean tree i could get the operators scene to work for some views
(i found later that some views have a strange behaviour) |
15:23.53 |
vasc |
so what's the difference between translating
the bits and not translating the bits, in terms of visual
output? |
15:23.59 |
Notify |
03BRL-CAD Wiki:95.18.89.88 * 10100
/wiki/User:Mariomeissner/logs: |
15:25.27 |
mdtwenty[m] |
if i dont translate the bits, the render
always differ, because the boolean trees changes and so do the
partitions evaluated |
15:26.03 |
mdtwenty[m] |
but when i translate the tree the output is
fixed |
15:26.26 |
vasc |
ok |
15:26.48 |
vasc |
well |
15:27.22 |
vasc |
i think it's like this. we clean up the code a
bit and try to put it into the SVN branch. |
15:27.37 |
vasc |
there's still some bugs, but i think it's
close to optimal. |
15:28.02 |
vasc |
at least as far as an alpha release can
go |
15:28.51 |
vasc |
from now on, make your code against the opencl
branch:
https://svn.code.sf.net/p/brlcad/code/brlcad/branches/opencl/src/librt/ |
15:28.52 |
gcibot |
[ p/brlcad/code - Revision 69941:
/brlcad/branches/opencl/src/librt ] |
15:29.09 |
mdtwenty[m] |
yes, i think so! |
15:29.17 |
vasc |
make a patch, download that, and make a patch
against that version |
15:29.27 |
vasc |
then i'll review it and we'll apply
it |
15:29.55 |
vasc |
this isn't good enough to go in the trunk yet,
but i think we need to keep it stored someplace. |
15:30.50 |
*** join/#brlcad merzo
(~merzo@136-3-133-95.pool.ukrtel.net) |
15:31.12 |
mdtwenty[m] |
will do that! |
15:31.31 |
mdtwenty[m] |
this is the one example of a bug that is
happening |
15:32.03 |
mdtwenty[m] |
uploaded an image:
operators.png (138KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/QRtTVgTwEFAWVKOtedoSguQa> |
15:32.03 |
mdtwenty[m] |
is some views this "holes" happend |
15:32.53 |
vasc |
that's really weird. |
15:33.27 |
mdtwenty[m] |
hm it is not that weird with the
wireframe |
15:33.31 |
mdtwenty[m] |
sec |
15:34.05 |
vasc |
it's like it's evaluating the tree
wrong? |
15:34.28 |
vasc |
the segments seem ok |
15:35.24 |
mdtwenty[m] |
uploaded an image:
operators_wire.png (192KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/QEPwQVVQWceIapNElAnxOMfn> |
15:35.37 |
mdtwenty[m] |
it seems like the geometry in yellow is
interfering |
15:35.57 |
vasc |
yeah that helps. like thought, it's like its
evaluating the boolean csg wrong but the segments seem to be
ok. |
15:36.09 |
mdtwenty[m] |
this can be an error with the regiontable, or
the lack of support for overlapping partitions |
15:36.22 |
mdtwenty[m] |
i think, not sure yet |
15:37.13 |
vasc |
well once we get this into svn, we need to fix
that issue with the region's primitive id translation |
15:37.41 |
vasc |
and we need to get some kind of translator so
that we can compare the output of the intermediate steps in the
opencl and ansi c code. |
15:37.59 |
vasc |
or just fix the bugs. |
15:38.38 |
vasc |
lack of support for overlapping
partitions? |
15:38.58 |
vasc |
in which part of the code is that supposed to
be in? |
15:39.36 |
mdtwenty[m] |
is a part of the rt_boolfinal
function |
15:39.39 |
vasc |
oh |
15:39.53 |
vasc |
but the boolweave is feature complete
right? |
15:40.51 |
mdtwenty[m] |
i didn't implemented it yet because was trying
to understand the FASTGEN regions |
15:41.18 |
vasc |
ah. THAT |
15:41.23 |
mdtwenty[m] |
yeah i think booleweave is complete
now |
15:41.32 |
vasc |
ignore those. in fact just strip the FASTGEN
code from the opencl port. |
15:42.15 |
vasc |
FASTGEN is like legacy support for an older
solid modelling system that the US military used from what I
understand. |
15:42.35 |
vasc |
fact is, i didn't even port the FASTGEN
primitives. |
15:42.48 |
vasc |
so it's kinda pointless to implement the
FASTGEN csg code. |
15:43.41 |
vasc |
if for whatever reason it's necessary to
implement FASTGEN someday, then we'll think about it. |
15:43.55 |
mdtwenty[m] |
hm i see. i was not sure if it was important
for the ocl code so thanks for claryfying |
15:44.24 |
vasc |
BRL-CAD has a FASTGEN import module that
imports FASTGEN scenes. |
15:44.44 |
vasc |
the database format for FASTGEN is kinda
weird. it has like special primitives and the way the rendering
works is also different. |
15:46.15 |
Notify |
03BRL-CAD Wiki:Mariomeissner * 10101
/wiki/User:Mariomeissner/logs: |
15:46.32 |
mdtwenty[m] |
ok |
15:46.55 |
mdtwenty[m] |
the other day you said something about storing
the results of the boolean evaluation in the struct
partition |
15:47.48 |
mdtwenty[m] |
which i currently do |
15:48.18 |
vasc |
that should probably be kept in a separate
data structure |
15:48.36 |
vasc |
or we should just make the evaluation work
faster so that we don't need to cache that in the first
place. |
15:50.20 |
*** join/#brlcad skat00sh
(uid103741@gateway/web/irccloud.com/x-owitdxmtukjpgbew) |
15:51.35 |
vasc |
btw the opencl branch in svn already has the
boolean tree code. |
15:51.47 |
vasc |
so you might get merge conflicts because of
that. |
15:52.02 |
vasc |
you'll have to manually apply the
patch. |
15:53.11 |
mdtwenty[m] |
ok thanks for the heads up |
15:53.19 |
mdtwenty[m] |
will apply it mannually |
16:01.33 |
mdtwenty[m] |
i will just remove some debug code from the
code and will clean it a bit before submiting the patch to the
opencl branch |
16:02.23 |
mdtwenty[m] |
or should i finish rt_boolfinal first?
(overlapping partitions) |
16:03.37 |
vasc |
well |
16:04.09 |
vasc |
just show me what you have |
16:04.29 |
vasc |
there should be some debug and log code in
there |
16:05.52 |
vasc |
some of those should be kept i think |
16:15.01 |
mdtwenty[m] |
yeah probably is good idea to have some debug
code in there |
16:15.12 |
mdtwenty[m] |
posted a file:
rt_bool_final.patch (62KB) <https://matrix.org/_matrix/media/v1/download/matrix.org/uomSumphCREJqrcPmarkhjYX> |
16:15.19 |
mdtwenty[m] |
this is what i got right now |
16:15.50 |
mdtwenty[m] |
i m already using a dynamic bitvector for the
regiontable |
16:16.30 |
vasc |
yeah but that's against trunk/ not
branches/opencl right? |
16:16.51 |
mdtwenty[m] |
ah yes it is not against the opencl branch
yet |
16:17.44 |
mdtwenty[m] |
i just checked out the opencl branch from the
svn, so if you give me some time i can apply the patch manually and
sent it to you |
16:18.16 |
vasc |
there also seems to be some noise |
16:18.28 |
vasc |
like in the rendering function rt.cl |
16:18.44 |
vasc |
because you indented some code differently
some things are reported as changed even though the code is the
same |
16:20.43 |
mdtwenty[m] |
oh i see.. will fix that |
16:30.08 |
vasc |
this code is missing in opencl
boolweave: |
16:30.16 |
vasc |
if (segp->seg_stp->st_aradius <
INFINITY && |
16:30.16 |
vasc |
<PROTECTED> |
16:30.16 |
vasc |
<PROTECTED> |
16:30.18 |
vasc |
... |
16:32.01 |
vasc |
also this is kinda strange: |
16:32.02 |
vasc |
<PROTECTED> |
16:32.32 |
vasc |
<PROTECTED> |
16:33.33 |
vasc |
is that the circular 'pointer' in the head and
tail of the list again? |
16:33.55 |
vasc |
shouldn't it be, like, 'j = head_pp' or
whatever? |
16:37.33 |
mdtwenty[m] |
hum yes, j = head_pp is equivalent, but
shorter |
16:38.00 |
mdtwenty[m] |
i already have it that way in
rt_boolfinal |
16:39.15 |
vasc |
also name those functions boolweave and
boolfinal |
16:39.22 |
vasc |
don't use different names than the ANSI C
names |
16:39.35 |
vasc |
it makes it harder to understand which is
which |
16:40.03 |
vasc |
and yeah eval_partitions/rt_boolfinal needs to
be cleaned up |
16:40.13 |
vasc |
and some things need to be
refactored. |
16:43.45 |
vasc |
yeah we'll need an overlap
handler... |
16:44.04 |
vasc |
ah well. |
16:44.13 |
vasc |
fix the things i said and make a patch against
branches/opencl |
16:44.33 |
vasc |
i'll then apply the boolweave code, but the
boolfinal code still needs some work |
16:45.35 |
mdtwenty[m] |
sure will do that |
17:08.19 |
Stragus |
Hrm... perhaps these INFINITY should be
replaced with FLT_MAX or DBL_MAX |
17:08.39 |
Stragus |
Some chips become up to 900 times slower when
perfoming a floating point operation where an infinity or NaN is
involved |
17:08.51 |
Stragus |
(including comparisons) |
17:10.01 |
vasc |
well, we want bug for bug compatibility with
the ANSI C code though. |
17:10.27 |
Stragus |
Right. That comment was mostly for the CPU
side of things |
17:10.31 |
vasc |
i mean if we wanted speed we wouldn't be using
doubles on a GPU in the first place. |
17:10.38 |
Stragus |
Eh, indeed |
17:11.10 |
vasc |
still its a reasonable argument. considering
we have some defines for doubles as floats an an option. |
17:11.13 |
Stragus |
But even modern CPU Intel chips are 200-300
times slower with infinities. AMD doesn't care about
inf/NaN |
17:11.47 |
vasc |
so you say #undef INFINITY and #define
INFINITY DBL_MAX? |
17:12.30 |
Stragus |
Basically yes, though I would personally
prefer some custom foo_MAX macro rather than replacing
INFINITY |
17:13.39 |
vasc |
in opencl that would be MAXFLOAT it
seems |
17:14.09 |
vasc |
ah no |
17:14.11 |
vasc |
that's SP |
17:14.15 |
Stragus |
nods |
17:15.04 |
vasc |
doesn't the compiler do those kinds of
optimizations you use ffast-math or something? |
17:15.09 |
vasc |
if you use |
17:15.18 |
Stragus |
No, that would change the behavior of the
code |
17:16.28 |
vasc |
i think ffast-math disables those checks
though |
17:16.28 |
Stragus |
I'm not currently aware of the performance of
Inf/NaN on GPUs, but it's a good idea to avoid these in any
case |
17:16.45 |
vasc |
https://gcc.gnu.org/wiki/FloatingPointMath |
17:16.46 |
gcibot |
[ FloatingPointMath - GCC Wiki ] |
17:17.20 |
vasc |
"In addition GCC offers the -ffast-math flag
which is a shortcut for several options, presenting the least
conforming but fastest math mode. It enables -fno-trapping-math,
-funsafe-math-optimizations, -ffinite-math-only, -fno-errno-math,
-fno-signaling-nans, -fno-rounding-math, -fcx-limited-range and
-fno-signed-zeros." |
17:17.20 |
vasc |
-ffinite-math-only |
17:17.56 |
Stragus |
Hrm -ffinite-math-only, indeed |
17:18.19 |
Stragus |
Though I'm aware of a performance gain when I
removed infinity checks on code that was using ffast-math several
years ago |
17:19.16 |
Stragus |
And assuming you do want to check for
overflows, a check against DBL_MAX is still a good idea,
eh |
17:19.19 |
vasc |
well it wouldn't be the first time a compiler
wouldn't behave like it's supposed to. |
17:21.53 |
mdtwenty[m] |
hm, should i use DBL_MAX then? |
17:22.26 |
vasc |
keep it as is for now |
17:22.26 |
vasc |
we don't want even more weird behavior right
now. |
17:22.26 |
vasc |
leave the optimizations for later. |
17:22.38 |
vasc |
just make a note for it. |
17:22.54 |
mdtwenty[m] |
ok :) |
17:42.30 |
vasc |
given the amount of things which need to be
optimized... |
17:42.45 |
vasc |
we'll go for algorithmic improvements
first. |
17:46.54 |
vasc |
there's lots of O(N^2) things, spurious memory
usage and access and things like that which need to be fixed
first |
17:48.00 |
vasc |
besides i'm not sure the compiler doesn't do
that in the first place |
17:48.21 |
vasc |
without looking at the assembly code output i
wouldn't make changes like that. |
17:51.12 |
Stragus |
Ah right... but I think perhaps that should
all have been done before porting to GPUs? |
17:52.47 |
Stragus |
Optimization and debugging on GPUs is more
troublesome, it's easier to settle the algorithm and code on CPUs
first |
17:52.47 |
Stragus |
And I'm not entirely convinced about GPU
performance considering the need for double precision, compared to
CPU AVX2 |
17:53.18 |
vasc |
well. this is OpenCL. it runs on the CPU as
well. in fact mdtwenty[m] has been running and testing it
there. |
17:54.03 |
vasc |
and i did prototype the boolean evaluator in
ANSI C before mdtwenty[m] ported it over. |
17:54.30 |
vasc |
the boolean weaving code is also a relatively
straightfoward port. |
17:54.38 |
Stragus |
All right then |
17:54.52 |
vasc |
the boolfinal might not be, because i suspect
the current way of doing it isn't optimal. but mdtwenty[m]'s still
working on that. |
17:55.33 |
vasc |
also it's not that GPUs are slow at double's.
it's that NVIDIA cripples the budget GPUs. |
17:56.43 |
vasc |
have you looked at the DP FLOPS of the
V100? |
17:57.08 |
Stragus |
Sure sure, it's all right in the $3k
GPUs |
17:57.09 |
vasc |
7014 GFLOPS on the PCIe V100 |
17:57.16 |
vasc |
DP GFLOPS |
17:58.10 |
Stragus |
On consumer GPUs, I have had better
performance using dual-float math instead of doubles (for similar
accuracy) |
17:58.16 |
vasc |
how many GFLOPS do those Skylake server
processors or the AMD Epyc have? |
17:58.52 |
vasc |
it says in this article |
17:58.56 |
vasc |
http://www.eetimes.com/document.asp?doc_id=1331988&page_number=2 |
17:58.57 |
gcibot |
[ Intel Skylake Counters AMD Epyc | EE Times
] |
17:59.11 |
vasc |
32 FLOPS/cycle |
18:00.14 |
vasc |
DP FLOPS |
18:00.27 |
vasc |
28 cores |
18:00.30 |
vasc |
3.6 GHz |
18:01.22 |
Stragus |
So about half of a $3k GPU |
18:01.28 |
vasc |
3225.6 DP GFLOPS/peak? |
18:01.33 |
Stragus |
Right |
18:02.30 |
vasc |
https://en.wikichip.org/wiki/intel/xeon_platinum/8180 |
18:02.31 |
gcibot |
[ Xeon Platinum 8180 - Intel - WikiChip
] |
18:02.37 |
vasc |
Release Price$10009.00 |
18:02.49 |
vasc |
GPU wins that one. |
18:03.20 |
vasc |
let's see how much the entry level
costs. |
18:03.21 |
Stragus |
Screw you Intel :), I'm waiting for
dual-socket Epyc motherboards to upgrade my desktop |
18:04.06 |
vasc |
https://en.wikichip.org/wiki/intel/xeon_bronze |
18:04.07 |
gcibot |
[ Xeon Bronze - Intel - WikiChip ] |
18:04.09 |
vasc |
those are cheaper. |
18:04.47 |
vasc |
also half the clockspeed. |
18:04.47 |
Stragus |
And I know profesional grade GPUs are better
at double precision. But on a typical desktop machine with a gaming
GPU, it's not so clear |
18:05.06 |
vasc |
yeah, it's a good question, what's better on a
typical desktop. |
18:05.29 |
vasc |
which is one reason why we went for opencl and
not cuda, despite all the extra work in it. |
18:05.35 |
vasc |
because of the crap libraries. |
18:06.30 |
Stragus |
The best double-precision-like performance I
had on gaming GPUs was a healthy mix of regular floats and
dual-floats |
18:06.59 |
Stragus |
Just in case you could use that, here's my
code for double-float arithmetics: http://www.rayforce.net/ddm.h |
18:09.21 |
vasc |
what's the license? |
18:09.54 |
vasc |
0h it uses sse |
18:09.54 |
Stragus |
"Do whatever you want with it", I should put a
header |
18:09.54 |
Stragus |
No no, that was just some optional
optimization attempt |
18:10.25 |
Stragus |
The double-double math is also useful when you
need higher accuracy than double but with decent
performance |
18:10.48 |
vasc |
ok i'll keep this under my hat |
18:10.51 |
vasc |
:-) |
18:10.56 |
vasc |
now really bbl |
18:11.02 |
Stragus |
:) Okay |
18:24.54 |
*** part/#brlcad mdtwenty[m]
(mdtwentyma@gateway/shell/matrix.org/x-iwpdlhgermucyhhk) |
20:13.44 |
*** join/#brlcad merzo
(~merzo@136-3-133-95.pool.ukrtel.net) |
21:28.54 |
*** join/#brlcad infobot
(~infobot@rikers.org) |
21:28.55 |
*** topic/#brlcad is GSoC
students: if you have a question, ask and wait for an answer ...
responses may take minutes or hours. Ask and WAIT.
;) |
21:34.44 |
*** join/#brlcad kintel
(~kintel@unaffiliated/kintel) |
21:45.21 |
Notify |
03BRL-CAD:starseeker * 69942
(brlcad/trunk/misc/CMake/BRLCAD_Targets.cmake
brlcad/trunk/src/libbu/CMakeLists.txt): Tweak astyle validation
logic |