Conversation
Profiling did show allocations below all_dies_iterator::operator++(). This was due to the value being returned by copy, copying the m_stack vector along the way. Changing m_stack into a shared_ptr gave a ~10% performance win for my use case (large lib with ~24 million DIEs)
Destruction releases memory for zw_stack::m_values which takes time. It is a little faster to keep a few stacks around and just call m_values.clear(), which destroys the contained object without releasing the memory.
|
Hi. Sorry for not being able to get around to this patchset. I'm not against merging this in principle, but:
There are also smaller points:
Thanks for the contribution! |
|
And I see you address my third point in your commit message, so never mind. The trouble is that sharing the stack breaks the semantics, because the two iterators end up being entangled, and that breaks expectations. We'll need to figure out something else. Maybe returning the iterator as const, but that's also not how things should be. Would moving some of the methods to .hh (making them inline—the ctors, the operators) help things? The compiler might be able to see that the result is thrown away and simply elide the copy ctor altogether. |
|
Hi Petr, sorry, I was on vacation and am only seeing your comments now. Frankly, I was just picking some low hanging fruits performancewise while dabbeling a bit with dwgrep. I did not try to completely understand the semantics. If I can find some time in the next days, I'll evaluate your ideas of exchanging shared_ptr with unique_ptr and do a bit of inlining. Thanks for looking into my PR anyway, it's my first on github ;-) |
Hi Petr,
thanks for writing dwgrep! I'm currently experimenting a bit with it. The lib I'm testing with has 24 million DIEs which takes 9.3s to count.
My PR improves this to about 6.5s, mostly by doing fewer allocations.