i've done a first attempt of moving bu_mappedfile
to SoA. here's the diff. i'm looking for feedback
in struct bu_mapped_file
, i've added a pointer to the SoA of mapped files and the index of the mapped file in this SoA. i kept the old members so that functions working with a struct bu_mapped_file
keep working (so i didn't remove the "old" API). however, now, when modifying a member in a struct bu_mapped_file
, you also need to update it in the array (tbl[idx]
)
i don't know what the correct way of going about this is. new code still needs to be aware of the new API, and old code which writes to the array without using libbu's functions will become incorrect
the is_active member is there for "deleting" in O(1). on the one hand it doesn't reuse array slots (which consumes a lot of memory), but on the other, deletions are faster (but i don't think there are many deletions of bu_mapped_files, so maybe it's better the other way)
have_apbuf
is there because when realloc'ing the member arrays (e.g., apbuf
), the new memory is not zero'd, so the old check if (mp->apbuf)
does not work. so i'm using have_apbuf
to keep track of whether the apbuf
member is usable or not
i think the flags (such as is_active
and have_apbuf
, and in this case, maybe dont_restat
as well) can be put in an int of flags, although in this instance, the single chars occupy less memory
when i want to add a new item to the array, that's 14 lines of very similar code doing very little. it's a bit tiresome to write and i think error-prone. don't really know what to do to it though, or if you care about this :-? maybe generate some macros at build time using python?
i think the diff also includes changes to bu_temp_file_list
. hmm... they're very similar
also, indentation/white space is off. i didn't pass it through ws.sh
since i'm just looking for feedback at this stage
i've done a first attempt of moving
bu_mappedfile
to SoA. here's the diff. i'm looking for feedback
@Cezar what's the reason for duplicating the guts of bu_mapped_file in the _tbl instead of embedding an array of bu_mapped_file?
we don't need to accommodate callers that directly access struct elements -- that is obviously a problem for backwards-compatibility. a question to think through is how affected callers will be, whether they could/should be updated to go through accessor functions or if there's some good generalization etc. another question is whether we can fix memory management while we're at it reworking the struct. using the PIMPL pattern and accessing only via functions, it would be good to eliminate dynamic allocations (if at all possible).
if you were to re-design bu_mappedfile and completely hid the implementation, what might that header file look like?
@Cezar what's the reason for duplicating the guts of bu_mapped_file in the _tbl instead of embedding an array of bu_mapped_file?
i thought this is what moving to a structure of arrays meant, an array for each member of the struct. or do you mean having the member arrays inside bu_mapped_file and an array of bu_mapped_file in _tbl? i think i misunderstood something
we don't need to accommodate callers that directly access struct elements -- that is obviously a problem for backwards-compatibility. a question to think through is how affected callers will be, whether they could/should be updated to go through accessor functions or if there's some good generalization etc.
if i can replace code directly accessing members of bu_mapped_files with accessor functions, that would simplify the structs. i could make one bu_mapped_file be a pair of (pointer to tbl, idx in tbl) and it should be enough to implement the accessor functions. and i think the pair should be enough for all SoAs
another question is whether we can fix memory management while we're at it reworking the struct. using the PIMPL pattern and accessing only via functions, it would be good to eliminate dynamic allocations (if at all possible).
i'm not familiar with PIMPL, but from what i'm reading, it seems to decrease compilation times (since headers aren't modified so users of the header don't need to be recompiled) and help backwards-compatibility (even if we modify the implementation, the definitions the users see are the same). but i don't really see how i could eliminate dynamic allocations with this
if you were to re-design bu_mappedfile and completely hid the implementation, what might that header file look like?
to start, i think the pair structure above and PIMPL for backwards-compatibility would be good. and there could be other things specific to bu_mapped_file that might be changed/improved
but i'm still curious about this. seems like i misunderstood something about how to do SoA
@Cezar what's the reason for duplicating the guts of bu_mapped_file in the _tbl instead of embedding an array of bu_mapped_file?
i thought this is what moving to a structure of arrays meant, an array for each member of the struct. or do you mean having the member arrays inside bu_mapped_file and an array of bu_mapped_file in _tbl? i think i misunderstood something
i thought this is what moving to a structure of arrays meant, an array for each member of the struct. or do you mean having the member arrays inside bu_mapped_file and an array of bu_mapped_file in _tbl? i think i misunderstood something
It is both/either -- but what caught my eye is the duplication. Maybe I misread the patch but it looked like there was a bu_mapped_file and a _tbl with pointers/arrays for all the same elements.
there are a variety of ways to go about it, but one thing to be aware of is how memory is managed. ideally, we want to avoid dynamic allocation for at least the typical use cases -- that is nearly always faster. for something like a mapped file handle/table/list/set/etc where there are typically very few of them, we can avoid it very easily by creating a fixed size array, e.g., of size 8, and then falling back to dynamic allocation when that is exceeded.
if you treat the struct as opaque and only access through functions, then we won't need to keep two public structs (one for the handle, one for arrays of them
looking at the current mapped file public API, you should notice that it currently relies on some global state (e.g., notice how bu_free_mapped_files() somehow will free something it was not given) -- that's bad in itself but gets at the heart of how to convert bu_mappedfile to AoS or SoA or something else. for example, there's nowhere in the public API (include/bu/mapped_file.h) that indicates an inherent need for there to be a list or set of bu_mappedfiles -- that is entirely a construct for the backend implementation that iterates over it's list of bu_mappedfiles to look for a match, to free them, etc
that means in this specific instance, the "fix" for bu_list aliasing can be as simple as elimiate the bu_list in the struct and change the container in the implementation (bu_mapped_file_list in src/libbu/mappedfile.c)
at that point, we could just convert it to an std::map (converting impl to c++) or use a plain C array and dynaimcally allocate more bu_mappedfiles as needed, keeping track of how many allocated and how many used.
i'm not familiar with PIMPL, but from what i'm reading, it seems to decrease compilation times (since headers aren't modified so users of the header don't need to be recompiled) and help backwards-compatibility (even if we modify the implementation, the definitions the users see are the same). but i don't really see how i could eliminate dynamic allocations with this
pimpl doesn't eliminate dynamic allocation by itself, it just hides the implementation so you can change things more easily in the future. it's not strictly necessary for bu_mappedfile, but would simplify memory management on the backend since the immediate goal really is just to eliminate the bu_list aliasing
if you were to re-design bu_mappedfile and completely hid the implementation, what might that header file look like?
to start, i think the pair structure above and PIMPL for backwards-compatibility would be good. and there could be other things specific to bu_mapped_file that might be changed/improved
do you think you could write that header first? and any "hidden" structures that would live in the .c/.cpp implementation file?
if you treat the struct as opaque and only access through functions, then we won't need to keep two public structs (one for the handle, one for arrays of them
yeah, i disliked the duplication in the structs as well. but i thought that it was ok for bu_mapped_file to bypass the API/functions, so i tried to keep the same members. i see that that's not necessary, i will make the changes
you really can probably get away with just eliminating the bu_list in the public struct and then having a container of them in the implementation file
now i also see what you mean about memory allocation and pimpl
that addresses the immediate issue (aliasing of bu_list) ... and there are dozens of other bu_list to tackle next ;)
yep :D i'll make the changes after i'm done tracking the rtwizard bug
regarding the api, i was thinking of something similar to getsockopt/setsockopt, so instead of having lots of functions like
bu_mappedfile_set/get_dont_restat(struct mapped_file *[, int new_val])
for each member, we could just have two functions
bu_mappedfile_set/get(struct mapped_file *, member_enum_t which_member[, void *new_val])
with something like that, i think it can even be abstracted further, having a generic setter/getter, and enums + callbacks for specific structs
@Cezar that'd be great except it's punting on API design and would introduce a runtime crash potential (incorrect void *marshaling) -- they don't necessarily need access to every field we previous had in a struct mappedfile.
going down the list, 'l' can simply be eliminated (and thus we're done with this struct from an aliasing perspective) because the .c/.cpp file can simply hold a list/array of them, accessed via 'name'+'appl'. you'd hide all the implementation fields via pimpl with a backend struct, which is 'buf', 'buflen', 'is_mapped', 'modtime', 'uses', and 'dont_restart'. that leaves just 'apbuf' and 'abbuflen' which is user data that should be set/got via function.
for added measure, I'll note mapped_file doesn't conform with hacking on the LIB_GROUP_...() naming convention -- it's one of the older apis that needs to be cleaned up
regardless, I wouldn't get too wrapped around the axel on mapped_file API ... that was supposed to be just a quick one for addressing the bu_list aliasing. there are literally dozens of these and not time to spend days on each thinking about all the ways they could be improved... :)
the fix for struct bu_mapped_file is to just remove the bu_list and make the backend track them differently. I'd probably just make a plain array that holds N of them and then have a dynamic array that is realloced if we exceed N
i think most of what we discuss about mapoed_file can be carried over to the more central structures
i understand what you’re saying about just removing ‘l’, but i don‘t think that would carry over as well to the other structs
that is true
but so what? :)
we're not necessarily looking for a universal solution, at least if we were -- it would be how to replace bu_list
but that's specifically not possible in any meaningful way because of how bu_list was designed -- it's too abstract and used in very different ways throughout the code
that's why the original exercise was to just tackle one struct like mapped_file, then do another, and another, to see if there's a common pattern that might warrant API
with mapped_file, the best solution is elimination of the bu_list ... that doesn't imply more API
the common pattern might just be to eliminate bu_list everywhere and convert all the .c files to .cpp files with std:: containers
do you remember what problem is being tackled?
well, i thought that the point was eliminating bu_list altogether with the goal of improving cache behaviour
i think i was wrong about the first part though :D
the problem being tackled is automatic vectorization
by the compiler
automatic vectorization is disabled by certain things -- pointer aliasing is one of them
bu_list's are commonly aliased (hence why they are always the first struct element)
it's also a very old method that is unfamiliar to most C/C++ programmers coming right out of school, which makes API and code hard to understand
AND the code is potentially cache incoherent, but that depends on how bu_list was used
oh, ok, i see i did forget the initial goal
by a quick count, there are 139 bu_list instances in our public API...
i understand how bu_list aliasing works, and i did use it previously, it’s what happens with some sockets structs too, but i never thought about vectorization and their performance implications
rather, 153, but only 139 that matter
so @Cezar ... we're past the second evaluation. I think it's time for a step back regardless.
oh, and before going down that route -- just add that there look to be at least 53 of those 139 in libbu, at least per the original project scope ;)
okay, so taking a step back -- if you could do anything, work on anything -- what would that be?
forgetting about bu_lists for a moment
well, ideally on something that offers the most performance gains
i started on this because you said at some point that it would improve performance by an order of magnitute, so it was appealing
heh, well that's good to hear :)
so sticking with performance, the problem with getting that order is the workload
the gain cannot happen until some subset of the 139 (the ones involved in ray tracing) have aliasing eliminated
but it's undeniably a lot of work. at this rate, you'd need to do about one bu_list per hour, or at least 5-10 per day to get there before gsoc is over (~17 days)
and that seems unlikely given the first one has been about 2+ weeks :)
that's not saying anything negative about you @Cezar ... the first ones when going down a new development route will always take the longest. the question I have for you is whether now that you've learned so much about this is whether to keep running this route or change things up.
to stay the path, it's going to be a race against time and you'll probably want to install the Intel compiler (as it has outstanding vectorization/aliasing reporting)
alternatively, there is a performance gain to be had in a few other specific places like boolean evaluation, ray dispatching, and perceptively via display presentation
there's probably just enough time to make progress in just one area
that's not saying anything negative about you @Cezar ... the first ones when going down a new development route will always take the longest. the question I have for you is whether now that you've learned so much about this is whether to keep running this route or change things up.
@Sean well, i can't really say i've learnt much. i was aware of most of what i've done, but the reason it took so much was that i hit three bugs (the deadlock, the compilation issue, and the failing tests on the mac) unrelated to the SoA work which have taken a lot of time to fix
it may be my fault because i've managed the time incorrectly (i also spent a week reading something that didn't offer much return), but there wasn't much figuring out with regards to bu_mappedfile
to stay the path, it's going to be a race against time and you'll probably want to install the Intel compiler (as it has outstanding vectorization/aliasing reporting)
installing the intel compiler is not a hurdle (i think), but i did not know that it had outstanding vectorization reporting. when i started this part of the work, i did use vectorization reporting, albeit from clang. i recognized it would be useful, but i think i (mis)understood from one of your messages back then that i shouldn't (need to?) do that :-?
alternatively, there is a performance gain to be had in a few other specific places like boolean evaluation, ray dispatching, and perceptively via display presentation
if you think it would be better to switch focus, i would be fine. but i think it would be useful for me to have an outline of the work that i should do for these specific places. i can't tell you right now i would like to work on boolean evaluation because i don't know what exactly that would entail (and similarly for the other two areas)
another fault on my part would be that i should've asked for an outline (or how you would see it) when i first started, but i thought i could figure it out as i went along, so i've mostly proceeded without a plan/overview of where i should be heading
@Sean well, i can't really say i've learnt much. i was aware of most of what i've done, but the reason it took so much was that i hit three bugs (the deadlock, the compilation issue, and the failing tests on the mac) unrelated to the SoA work which have taken a lot of time to fix
hm, that is true. there have been a number of annoying issues. several still unresolved to bite another day. :(
was the deadlock the gga issue or something else? I know there's a workaround on the compilation (i.e., leave the turn-off inlining flag for now), and punting on the rtwizard bug until someone else can find the commit that caused it.
installing the intel compiler is not a hurdle (i think), but i did not know that it had outstanding vectorization reporting. when i started this part of the work, i did use vectorization reporting, albeit from clang. i recognized it would be useful, but i think i (mis)understood from one of your messages back then that i shouldn't (need to?) do that :-?
installing it is easy -- you can just say you're using it with BRL-CAD as open source use gets a free license key. you don't need to use the vectorization reporting until you've taken care of all the higher level issues OR if you're working on optimizing a very specific piece of code. In talking about making a major impact on ray tracing performance, you could for example find all of the files with the hotest functions and make sure just those files pass automatic vectorization. that would be FAR fewer of the 139 bu_list instances, for example -- a prioritization aid that would directly pertain to performance.
I don't really think you should switch focus -- I'm just trying to make sure the work stays interesting to you. could be wrong, but it just felt to me like you were getting tired of the bu_list work. the fact that you've said you're not learning much reflects that too. if it's still interesting and something you want to do, then you're still good to go from my perspective. we need bu_list to go away either way. :)
another fault on my part would be that i should've asked for an outline (or how you would see it) when i first started, but i thought i could figure it out as i went along, so i've mostly proceeded without a plan/overview of where i should be heading
to be honest, I was (and still am) fine with you proceeding without an outline or specific plan too because I know you're capable and there's so many ways you can make the code better. that of course is a double-edged sword because some places in the code are FAR more important than others -- for example, bu_mappedfile is completely unimportant in the big scheme of things other than as a learning struct for getting rid of bu_list. that's why I've been suggesting the shortest route possible -- just remove it and make the backend use a container. the important structs are going to be hard, which is why they really shouldn't be tackled first. we must all learn the issues and patterns involved from some of the 138 other instances.
you know what might help is seeing ray tracing performance at a much lower level. i'll send you a link for something you can try running.
was the deadlock the gga issue or something else?
not sure what a gga issue is :D
I don't really think you should switch focus -- I'm just trying to make sure the work stays interesting to you. could be wrong, but it just felt to me like you were getting tired of the bu_list work. the fact that you've said you're not learning much reflects that too. if it's still interesting and something you want to do, then you're still good to go from my perspective. we need bu_list to go away either way. :smiley:
it's not that i got tired of the bu_list work, but debugging those problems was interesting enough to give it a try and i thought it would take less time, so that's why i spent more time there
@Sean i generated a report using intel's compiler (bool.c.optrpt). it seems like a lot of the loops can't be vectorized because of the control flow (from what i gather, it means things like continue/goto/break)
and in terms of structures, i think struct partition
is the one involved in most of the report, so should i start looking at getting rid of bu_list there?
also, i had to disable Werror when compiling with icc (it complained about things like assert(a > b && "a is not bigger than b")
), and 10+ tests are failing. i don't know if icc is important though
actually, it's not 10+, i misremembered. tried it again, it's 3 (rt_dvec, regress-solids, and the rtwizard D one)
i committed the code eliminating bu_list from libbu/temp.c. if issues arise, i'll fix them, but the changes should be easier to follow this way
@Sean i generated a report using intel's compiler (bool.c.optrpt). it seems like a lot of the loops can't be vectorized because of the control flow (from what i gather, it means things like continue/goto/break)
Yep, there are a handful of other issues also affecting auto-vectorization, not just aliasing. Aliasing is just one of the more spread out issues that will take time to resolve. Some of the other issues are being tackled in different ways or require back-porting -- e.g., the control flow issues are already mostly fixed in the OpenCL code path.
and in terms of structures, i think
struct partition
is the one involved in most of the report, so should i start looking at getting rid of bu_list there?
I not without experience with a lot more bu_list patterns... partition is one of the ones that is going to be considerably harder for both technical and practical reasons (changing the structure without introducing API will break essentially all critical applications).
i committed the code eliminating bu_list from libbu/temp.c. if issues arise, i'll fix them, but the changes should be easier to follow this way
this is in my commit-queue to review, so I'll let you know if I see any issues! saw the other that followed as well. Keep at them. ;)
any modern popular compiler is worth fixing, if you want to tackle the issues. just beware of red herrings -- issues that seem like one thing but are really another. example, assert() warnings could very well be just because some other bit of CMake logic is wrong and not because there's anything actually wrong with the assert line. Be skeptical and confirm the source, not just patch symptoms, is the healthy perspective to have.
of course, it's secondary to the other work, so you're welcome to disable warnings and proceed if fixing them is not interesting
I not without experience with a lot more bu_list patterns... partition is one of the ones that is going to be considerably harder for both technical and practical reasons (changing the structure without introducing API will break essentially all critical applications).
oh, i wasn't sure if you wanted me to focus on the central structures. so should i just keep on eliminating bu_list from the files that are easier? regarding the API, i see what you mean. i'm thinking of maybe getting rid of bu_list in the internal code and exposing the old way to the users, but i don't really have experience with deprecating things (or even designing APIs, but i think i have some overall ideas about that)
this is in my commit-queue to review, so I'll let you know if I see any issues! saw the other that followed as well. Keep at them. ;)
i'm curious, is there a tool for keeping a commit queue, or do you just take note of them and go over them at a later time?
any modern popular compiler is worth fixing, if you want to tackle the issues. just beware of red herrings -- issues that seem like one thing but are really another. example, assert() warnings could very well be just because some other bit of CMake logic is wrong and not because there's anything actually wrong with the assert line. Be skeptical and confirm the source, not just patch symptoms, is the healthy perspective to have.
i see what you mean. i'll keep that in mind when facing issues like that. i think i should've done that with the gcc 6.3 bug as well
of course, it's secondary to the other work, so you're welcome to disable warnings and proceed if fixing them is not interesting
disabling Werror is what i've done so far in order to generate the reports
in libbu, i found 6 files exposing an API which uses bu_list
include/bu % grep -Rl "bu_list" . ./observer.h ./bitv.h ./hook.h ./ptbl.h ./redblack.h ./list.h ./cmd.h
@Sean i was considering removing bu_list from them going forward (and after that moving on to librt), but since they're user-facing structures, what i would do is leave the bu_list member in place, modify existing code to not use the bu_list, and mark it as deprecated somehow
i'm curious, is there a tool for keeping a commit queue, or do you just take note of them and go over them at a later time?
I have a tool I use for this, but it's very specific to my workflow needs -- not something generally reusable. For team review, we've talked about using either Crucible or Phabricator but nobody has gone through the effort to set it up.
that plan sounds good -- you can mark them deprecated in a doxygen comment, e.g., /***< DEPRECATED */
if there are no places in our code that actually iterate the list, it might even be possible to remove the element without deprecation but that will have to be a case-by-case judgement call -- ask me if you find any like that
@Cezar I looked through that list in libbu and looks like observer, hook, redblack, and cmd.h can easily just be changed/removed. bitv, ptbl, and list are definitely harder as they are prominently used by applications, but if you can change them such that the places where code calls them is only minimally impacted, then they could be changed too.
minimally impacted means you could fix the code that calls them with a simple stateless regular expression (typically fixing calling code with a search-and-replace)
that may or may not be doable with the three harder ones.
i have something which builds, but i can't run make test
because of the result
problem i mentioned previously. should i commit? submit a patch?
@Cezar you have commit, so you no longer need to submit to the patch tracker unless you just really really really want to ...
it's just as easy to check it post commit and fix or revert if needed -- if you want me or someone else to check it beforehand, just post up patch
i've committed the removal of bu_list from the simpler ones. i tried to be careful since i can't really run all the tests, hope everything is ok. should i look at bitv/ptbl next?
i'm trying to remove the bu_list in bu_ptbl. in some places, a list of bu_ptbl
s is still needed. should i implement a dynamically-expanded array like i did for the other replacements, or should i use std::array? the reason i'm asking is because i don't know if the plan is to switch to STL eventually, and if it is, i think i should use it now instead of postponing
@Cezar before going down that road, we need more unit tests...
what should they test for? that the structures using the new list behave the same as with the bu_list?
Found three separate bugs in bu_hook today from its conversion and there’s a couple higher level integration tests failing now, likely from the other conversions (memory corruption)
They should ideally test each of the public functions, testing that good inputs do what is expected and bad fails appropriate
after making the changes, i ran make test
and everything passed, except for the rtwizard ones. how do i run the high level integration tests?
That’s good, though the nature of container bugs can be super hard to trigger
With memory corruption, it’ll work fine with one compile and not another
That’s where cross platform testing helps, in general, as memory tends to be arranged quite different
One thing you can check easily is distcheck-full
That will at least check optimized which should disable zero-initialization
do you have any suggestions for what files i should write tests for?
do you have any suggestions for what files i should write tests for?
er, the ones you modified...
I'd suggest starting with bu_hook -- it should be dead easy. you'll notice a variety of testing patterns in src/libbu/tests -- feel free to pick your preference or come up with something on your own, just try to keep things as simple and obvious as possible. Clever tests can be a bear to debug.
then redblack, observers
at a glance, fb_obj looks like it may have a lot of the same problems as bu_hook had ... didn't check the others that closely yet
@Cezar it would also be good to carve out a solid day to write up a summary report of everything you worked on, improved, maybe one last round of profiles (even if just to comment how all this foundation stuff doesn't affect performance yet until all bu_lists are gone)
Last updated: Nov 15 2024 at 00:49 UTC