00:31.39 |
Notify |
03BRL-CAD Wiki:Bhollister * 9240
/wiki/User:Bhollister/DevLogAug2015: |
00:32.22 |
Notify |
03BRL-CAD:vasco_costa * 65844
(brlcad/trunk/src/librt/librt_private.h
brlcad/trunk/src/librt/primitives/arb8/arb8.c and 12 others): pass
struct with primitive data to opencl as an initial step to an AoS
device primitive database. move constants into common.cl. |
01:10.43 |
*** join/#brlcad vasc__
(~vasc@bl8-192-46.dsl.telepac.pt) |
02:56.19 |
*** join/#brlcad sofat
(~androirc@101.214.213.146) |
03:17.59 |
*** join/#brlcad gurwinder
(~chatzilla@117.212.50.212) |
03:24.02 |
starseeker |
sofat? |
03:24.05 |
starseeker |
nuts |
04:06.25 |
gurwinder |
brlcad: Hi I have exported ehy and epa now
moving towards rhc rpc and bot. |
04:07.26 |
Notify |
03BRL-CAD:vasco_costa * 65845
(brlcad/trunk/src/librt/librt_private.h
brlcad/trunk/src/librt/primitives/arb8/arb8.c and 14 others):
generic opencl solid shot handler. refactored code to remove
duplicates. |
04:08.34 |
vasc__ |
that's that. i think i did all i could on
trunk without changing the apis. |
04:08.52 |
vasc__ |
i think i'll continue on the branch |
04:11.52 |
Notify |
03BRL-CAD Wiki:Vasco.costa * 9241
/wiki/User:Vasco.costa/GSoC15/logs: /* Week 11 : 3 Aug-9 Aug
*/ |
04:15.41 |
brlcad |
vasc__: looks pretty good |
04:16.31 |
brlcad |
vasc__: please also assign your patches to
yourself, mark them as accepted, and close them out too (denote the
commit revision in a comment) as you commit them |
04:24.35 |
vasc__ |
i think i did that to all the patches i had on
the tracker |
04:25.37 |
vasc__ |
that i commited |
04:25.50 |
vasc__ |
i guess i can assign to myself the patches i
didn't commit as well |
04:27.31 |
vasc__ |
so basically the thing to do next is to store
the scene database on the gpu |
04:29.37 |
vasc__ |
i'm going to redo the database code. |
04:29.54 |
Notify |
03BRL-CAD Wiki:Vasco.costa * 9242
/wiki/User:Vasco.costa/GSoC15/logs: /* Development Status
*/ |
04:29.55 |
vasc__ |
the patches i commited o trunk already did
like half the work on that |
04:31.05 |
vasc__ |
there's a generic shot callback that calls the
primitive specific callback that uses a pointer to the memory
region where the primitive data is |
04:31.42 |
vasc__ |
so the only thing to do is to actually
allocate, copy the data to device memory |
04:35.59 |
vasc__ |
i guess i could commit the scan code i have to
trunk. but the thing is nothing will call it until i do the rest of
the code |
04:36.47 |
vasc__ |
anyway not today |
04:39.34 |
vasc__ |
i also found out that the nvidia opencl
compiler doesn't handle large .cl files very well... |
04:39.56 |
vasc__ |
so i had to split them up and compile them
separately and then link them |
05:04.49 |
brlcad |
interesting -- any idea on what the limit
is/was? |
05:08.27 |
Stragus |
That's weird, I have compiled huge .cu (CUDA)
files. Very large device functions or just files? |
05:09.11 |
Stragus |
And what error or problem were you
experiencing? |
05:15.31 |
Notify |
03BRL-CAD:vasco_costa * 65846
(brlcad/trunk/src/librt/primitives/ehy/ehy_shot.cl
brlcad/trunk/src/librt/primitives/ell/ell_shot.cl and 3 others):
load large opencl vectors on demand to reduce stack footprint per
function call. |
05:15.50 |
vasc__ |
it just gave me some ptxas function is being
called with wrong number of arguments or something |
05:16.11 |
vasc__ |
which usually means that the code is calling a
function that isn't defined anywhere |
05:16.32 |
Stragus |
Output the PTX assembly and inspect
it |
05:16.35 |
vasc__ |
nah |
05:16.40 |
vasc__ |
it works this way |
05:16.51 |
vasc__ |
and i know the AMD GPU compiler also creaks on
large files so |
05:17.19 |
Stragus |
It's probably more an issue of a single huge
kernel rather than large files |
05:17.19 |
vasc__ |
i tried concatenating it all into one file and
it didn't work |
05:17.30 |
vasc__ |
it probably tried inlining everything
yes |
05:17.35 |
vasc__ |
and then it croaked |
05:17.45 |
Stragus |
Right. Which shouldn't happen |
05:19.21 |
Notify |
03BRL-CAD Wiki:Vasco.costa * 9243
/wiki/User:Vasco.costa/GSoC15/logs: /* Week 11 : 3 Aug-9 Aug
*/ |
05:19.39 |
vasc__ |
yeah. i could have inspected the assembly
but... |
05:19.44 |
vasc__ |
*snore* |
05:20.04 |
vasc__ |
it actually makes more sense this
way |
05:20.33 |
vasc__ |
i was just including everything into a huge
file |
05:21.47 |
Stragus |
Actual function calls are slow on most GPU
hardware |
05:21.54 |
Stragus |
But yes, not a big issue at the
moment |
05:23.12 |
vasc__ |
i hope i don't have memory alignment issues
anymore |
05:24.06 |
vasc__ |
everything should be aligned in 8 byte
boundaries |
05:26.26 |
vasc__ |
damned huge doubles |
05:27.22 |
Stragus |
I don't even see how that could be an issue in
the first place |
05:27.33 |
vasc__ |
ah |
05:27.36 |
Stragus |
On CPU as well, you definitely want 8 bytes
alignment for your doubles |
05:27.48 |
Stragus |
In fact, you should want 32 bytes alignment
for bundles of 4 doubles |
05:28.17 |
vasc__ |
right. i considered that. there's just a
teensy little issue with that and AoS |
05:28.42 |
Stragus |
On GPU, it should be bundles of 32
doubles |
05:28.58 |
vasc__ |
ah the triangle ray tracers were so much
simpler |
05:29.01 |
Stragus |
(Which is obviously also quite fine on
CPU) |
05:29.51 |
Stragus |
I thought space partitionning traversal would
be the tricky part, and it doesn't matter what kind of primitives
are there |
05:30.14 |
vasc__ |
sure |
05:30.16 |
Stragus |
Then you just call the intersection for
whatever primitive encountered |
05:30.24 |
vasc__ |
but remember each primitive has a different
size |
05:30.46 |
Stragus |
Does that make a big difference? |
05:31.00 |
vasc__ |
i'm just going to allocate a contiguous memory
block and stuff all that primitive data in there in serialized
form |
05:31.06 |
Stragus |
Good call |
05:31.27 |
Stragus |
So, make sure all sizeof() are aligned,
(sizeof(foo)+0xf)&~0xf |
05:31.44 |
Stragus |
Probably better with some kind of macro,
eh |
05:31.47 |
vasc__ |
yeah that was my problem |
05:31.58 |
vasc__ |
i hope it's magically working now |
05:32.14 |
vasc__ |
if it isn't i'll use the thing you
said |
05:33.09 |
vasc__ |
so they'll all be multiples of 8
bytes |
05:33.36 |
Stragus |
That's 16 byte alignment actually, typed
instinctively for SSE |
05:33.49 |
Stragus |
<PROTECTED> |
05:33.57 |
vasc__ |
yeah |
05:34.08 |
vasc__ |
so its 0x7 then |
05:36.08 |
vasc__ |
the grid was a bad idea... |
05:36.19 |
Stragus |
:( |
05:36.28 |
vasc__ |
i forgot the primitives can be quite expensive
to intersect |
05:36.34 |
Stragus |
Yes |
05:36.41 |
Stragus |
I didn't think it was a good idea
either |
05:36.50 |
vasc__ |
a bvh would be a lot better |
05:36.59 |
Stragus |
Spatial partitionning is good for triangles
because intersection is so cheap |
05:37.06 |
Stragus |
But these NURBS and stuff are a different
beast |
05:37.35 |
vasc__ |
it would probably take weeks to do a modern
bvh builder though |
05:37.44 |
vasc__ |
a gpu one at least |
05:38.06 |
Stragus |
Meh, it can be built on the CPU, then upload
the big chunk of memory to the GPU |
05:38.16 |
Stragus |
But yes, it's still a massive amount of
work |
05:38.17 |
vasc__ |
yeah that is probably a lot more
doable |
05:38.53 |
Stragus |
My CUDA raytracer was also building on the
CPU. Everything was packed/interleaved into just one big chunk of
memory. You could raytrace on the CPU with it, on the GPU, save it
to disk, whatever |
05:41.17 |
Stragus |
Since everything was packed into a big chunk
of memory, you could have per-primitive "extra data" packed within
the graph, and so on. That extra data could vary between
primitives |
05:41.22 |
vasc__ |
i actually know quite a lot about gpu bvh
builders although i'm grid guy |
05:41.25 |
Stragus |
That sounds like a good approach for a CSG
raytracer too |
05:41.45 |
Stragus |
I'm a graph person, I don't like hierarchies
:p |
05:43.00 |
vasc__ |
i'll think if i'll use the grids or
not |
05:43.18 |
vasc__ |
i would like to use that golliath scene as a
benchmark of sorts |
05:43.26 |
vasc__ |
it ain't gonna cut it without some
acceleration scheme |
05:43.31 |
vasc__ |
i think it has like 200 primitives |
05:43.56 |
vasc__ |
which is kinda low but |
05:44.01 |
Stragus |
How much time do you have to implement
this? |
05:44.14 |
vasc__ |
i have the code done. i did it a couple of
weeks back |
05:44.15 |
vasc__ |
oh |
05:44.20 |
vasc__ |
well until the end of this month |
05:44.41 |
vasc__ |
that's why i went with the grids to begin
with |
05:44.43 |
Stragus |
My opinion is that any part of the whole task
is better done very well and correctly, or left to someone
else |
05:44.44 |
vasc__ |
its a lot simpler |
05:44.56 |
Stragus |
(But my opinion has no weight whatsoever on
this) |
05:45.27 |
Stragus |
Half-good solutions have to be rewritten
anyway |
05:45.50 |
vasc__ |
i've never believed that a system was ever
complete anyway |
05:46.06 |
vasc__ |
even if i coded the currently best bvh in a
couple of years it could be crap |
05:46.36 |
Stragus |
It might then be suboptimal but it won't be
crap :p |
05:47.04 |
vasc__ |
a low resolution grid is probably
okaish |
05:47.18 |
vasc__ |
i think my issue is i was using too fine
subdivision |
05:47.19 |
Stragus |
I wouldn't personally use a BVH, but this is
complex and there's too little time to explore new ideas |
05:47.34 |
Stragus |
Sure, it can work |
05:47.48 |
vasc__ |
well its just that the current code uses
mailboxing and crap like that |
05:47.57 |
vasc__ |
if we used the bvh the mailboxing wouldn't be
needed anymore |
05:48.19 |
vasc__ |
not that i'll use mailboxing with the grid
either |
05:48.25 |
vasc__ |
i'll just multiple-intersect things |
05:48.28 |
Stragus |
I agree it requires object partitionning
rather than spatial partitionning |
05:48.34 |
vasc__ |
ar ar |
05:48.38 |
Stragus |
It's the whole "hierarchy" thing I disagree
with |
05:48.48 |
vasc__ |
well it is csg after all |
05:49.03 |
Stragus |
My raytracer never writes a byte to any shared
or global memory during traversal, until the hit callback is
called |
05:49.14 |
Stragus |
Any kind of hierarchy involves building a
stack of some sort, and GPUs hate that |
05:49.15 |
vasc__ |
kewl |
05:49.38 |
vasc__ |
yeah. if you use a lot of stack space you
reduce the amount of threads you can spawn |
05:49.53 |
vasc__ |
coz you have limited L1 cache for registers
and stack |
05:50.07 |
Stragus |
The L1 cache and registers are
independent |
05:50.17 |
Stragus |
But the stack is stored in global memory and
it is SLOW, even with that crappy L1 cache |
05:50.20 |
vasc__ |
yeah its split |
05:50.29 |
vasc__ |
global? |
05:50.44 |
vasc__ |
that's lame |
05:50.48 |
Stragus |
No no, the L1 and shared memory shares the
same chunk of on-chip "cache" |
05:51.12 |
Stragus |
Registers are totally independent, and a whole
lot faster |
05:51.41 |
vasc__ |
i thought you could choose the amount that
goes into registers and remaining L1 on driver loading or
something |
05:51.54 |
Stragus |
You choose how to split between L1 and shared
memory |
05:52.27 |
vasc__ |
ah no its the shared memory yeah |
05:52.34 |
vasc__ |
uhoh |
05:52.53 |
Stragus |
Anyhow, experimenting with novel ideas takes
more time than you have |
05:52.55 |
vasc__ |
so that's why function calls are slow as
heck |
05:52.59 |
Stragus |
Indeed |
05:53.03 |
Stragus |
It's terrible |
05:53.28 |
*** join/#brlcad milamber
(~devlin@2602:306:8094:9360:b941:e8cd:a8d8:db8d) |
05:56.55 |
vasc__ |
the current code uses a shitton of
temporaries |
05:57.07 |
Stragus |
GPUs have tons of registers |
05:57.17 |
Stragus |
Memory is slow, but registers are free
:p |
05:57.50 |
vasc__ |
yeah but if you use a lot of registers you
can't spawn as many threads |
05:58.26 |
Stragus |
Can you ask OpenCL about register usage? We
can with CUDA |
05:58.42 |
vasc__ |
yeah CUDA has some compiler flag |
05:59.02 |
Stragus |
Hum... I meant a runtime thing on the kernel,
but it's true I'm using the low-level driver API |
05:59.07 |
vasc__ |
you can pass flags to the opencl compiler. i'm
not sure if you can use the same flags as CUDA though. |
05:59.35 |
vasc__ |
nvcc has some compiler flag that says how much
registers a kernel uses |
05:59.44 |
Stragus |
Well, that works |
06:00.04 |
vasc__ |
but that's for cuda |
06:00.17 |
vasc__ |
it's too early to think about that |
06:06.49 |
vasc__ |
later |
06:21.47 |
Notify |
03BRL-CAD Wiki:Shaina7837 * 9244
/wiki/User:Shainasabarwal/GSoC15/logs: /* 27 July */ |
06:28.28 |
*** join/#brlcad teepee
(~teepee@unaffiliated/teepee) |
06:59.03 |
*** join/#brlcad kintel
(~kintel@unaffiliated/kintel) |
07:14.53 |
*** join/#brlcad milamber1
(~devlin@2602:306:8094:9360:ed0a:f53f:4f21:2165) |
07:45.50 |
*** join/#brlcad teepee--
(bc5c2134@gateway/web/freenode/ip.188.92.33.52) |
09:24.40 |
starseeker |
brlcad: http://www.cmake.org/pipermail/cmake/2011-June/045233.html |
09:31.03 |
starseeker |
in fact, they caution in the docs not to list
outputs of custom commands in multiple targets:
http://www.cmake.org/cmake/help/v3.0/command/add_custom_command.html |
09:31.39 |
starseeker |
and I see we are doing just that with the
obj-g code |
09:33.39 |
starseeker |
and I'm doing it in one of the step
directories as well |
09:33.56 |
starseeker |
OK, that's probably it then |
09:34.31 |
starseeker |
I'll wade into fixing that ASAP |
11:57.58 |
Notify |
03BRL-CAD:carlmoore * 65847
(brlcad/trunk/AUTHORS brlcad/trunk/src/librt/primitives/arb8/arb8.c
and 8 others): remove trailing white space, and fix
spelling |
12:21.42 |
*** join/#brlcad konrado
(~konro@41.205.22.13) |
12:36.12 |
*** join/#brlcad Ch3ck_
(~Ch3ck@154.70.99.98) |
13:00.02 |
*** join/#brlcad kintel
(~kintel@unaffiliated/kintel) |
13:37.48 |
*** join/#brlcad sofat
(~sofat@202.164.45.208) |
13:51.27 |
sofat |
brlcad, I need your help in google custom
search |
13:51.38 |
sofat |
please reply me if you free |
13:53.06 |
*** join/#brlcad sofat_
(~androirc@49.138.113.71) |
13:59.29 |
sofat |
starseeker, I have submitted the new patch on
building system I also solve the problem which you told me . I have
made presentation.xsl.in file to auto generate the presentation.xsl
file so please review this patch. patch no:401 |
14:54.34 |
Notify |
03BRL-CAD:ejno * 65848
brlcad/trunk/include/bu/opt.h: add parentheses around macro
arguments |
15:07.16 |
*** join/#brlcad sofat
(~sofat@202.164.45.208) |
15:17.34 |
*** join/#brlcad sofat
(~sofat@202.164.45.208) |
15:41.15 |
*** join/#brlcad bhollister2
(~brad@2601:647:cb01:9750:d5ba:1393:eae0:ec4b) |
15:45.43 |
*** join/#brlcad sofat
(~sofat@49.138.113.71) |
16:03.48 |
*** join/#brlcad sofat
(~sofat@101.215.79.175) |
16:34.50 |
*** join/#brlcad sofat
(~sofat@101.215.79.175) |
16:58.59 |
*** join/#brlcad sofat
(~sofat@101.215.79.175) |
17:23.40 |
*** join/#brlcad sofat
(~sofat@202.164.45.208) |
17:44.08 |
*** join/#brlcad sofat
(~sofat@202.164.45.204) |
17:50.06 |
sofat |
brlcad, hello |
17:50.32 |
sofat |
I want some discussion please reply
me |
17:56.13 |
archivist |
methinks someone nags too much |
18:27.33 |
*** join/#brlcad kintel
(~kintel@unaffiliated/kintel) |
18:29.00 |
*** join/#brlcad vasc
(~VASC@bl8-192-46.dsl.telepac.pt) |
18:33.50 |
*** join/#brlcad milamber
(~devlin@104-9-73-54.lightspeed.cicril.sbcglobal.net) |
18:41.19 |
*** join/#brlcad sofat
(~sofat@202.164.45.212) |
19:02.51 |
Notify |
03BRL-CAD:dhoward * 65849
(brlcad/trunk/include/rt/misc.h brlcad/trunk/src/libged/facetize.c
brlcad/trunk/src/librt/screened_poisson.cpp): Added edge sampling
to SPR facetization code. |
19:08.03 |
Notify |
03BRL-CAD Wiki:Deekaysharma * 9245
/wiki/User:Deekaysharma/logs: |
19:10.48 |
*** join/#brlcad dracarys983
(dracarys98@nat/iiit/x-xnzkponofzzwciso) |
19:22.49 |
*** join/#brlcad kintel
(~kintel@unaffiliated/kintel) |
20:08.01 |
Notify |
03BRL-CAD:ejno * 65850
(brlcad/trunk/include/bu/opt.h brlcad/trunk/include/gcv/api.h and
13 others): initial integration of libgcv plugin argument
processing |
20:10.22 |
*** join/#brlcad milamber
(~devlin@2602:306:8094:9360:ed0a:f53f:4f21:2165) |
20:16.34 |
*** part/#brlcad Ch3ck_
(~Ch3ck@154.70.99.98) |
20:24.17 |
Notify |
03BRL-CAD:ejno * 65851
(brlcad/trunk/src/conv/gcv/gcv.c
brlcad/trunk/src/libgcv/conv/fastgen4/fastgen4_write.cpp): correct
conversion mode of fastgen4_write |
20:34.04 |
Notify |
03BRL-CAD:ejno * 65852
brlcad/trunk/src/conv/gcv/gcv.c: correctly set
options_data |
21:47.06 |
*** join/#brlcad kintel
(~kintel@unaffiliated/kintel) |
22:03.11 |
*** join/#brlcad konrado
(~konro@41.205.22.53) |
22:07.21 |
Notify |
03BRL-CAD Wiki:202.164.45.212 * 9246
/wiki/User:Hiteshsofat/GSoc15/log_developmen: |
23:06.46 |
*** join/#brlcad kintel
(~kintel@unaffiliated/kintel) |
23:22.32 |
*** join/#brlcad vasc_
(~VASC@bl8-192-46.dsl.telepac.pt) |