Skip to content

Add a lockfree SPSC queue#8280

Open
sakertooth wants to merge 35 commits intoLMMS:masterfrom
sakertooth:add-lockfree-spsc-queue
Open

Add a lockfree SPSC queue#8280
sakertooth wants to merge 35 commits intoLMMS:masterfrom
sakertooth:add-lockfree-spsc-queue

Conversation

@sakertooth
Copy link
Contributor

@sakertooth sakertooth commented Feb 24, 2026

Adds a lockfree SPSC (single producer, single consumer) queue meant for sending data (e.g. audio or messages) in real time from one thread to another. It uses std::atomic with relaxed memory orderings where applicable as well as alignment via alignas(std::hardware_destructive_interference_size) for optimal performance.

That being said, I'm not too confident this (mainly the performance) is any better or worse than some of the more battletested SPSC queues like the ones from boost and moodycamel, but if we are not too interested in outsourcing another library for this, this should work and fit our needs well enough.

The queue replaces the old LocklessRingBuffer, and as such is currently being used in the AudioEngine for submitting new PlayHandle* objects, the currently unused Lv2Worker code, and in SpectrumAnalyzer and Vectorscope for sending audio data across two separate threads.

Changing the indicies while waiting to -1 does work but makes the queue unreusable
…ta in its own function, remove space available flag (unused)
…not calculate contiguous region size in reserveContiguous functions
@sakertooth sakertooth requested a review from messmerd February 24, 2026 17:44
@messmerd
Copy link
Member

messmerd commented Feb 24, 2026

To fix the macOS failure, we need to bump our x86_64 macOS minimum deployment target to macOS 11.0.

This should be done explicitly by setting the -DCMAKE_OSX_DEPLOYMENT_TARGET CMake flag rather than using our mac-os.env script. See #8118 for how it should be done.

We will also need to do this to enable std::filesystem support on x86_64 macOS.

@messmerd
Copy link
Member

messmerd commented Feb 24, 2026

For std::hardware_destructive_interference_size, check out what @rubiefawn did in #8132 with the new Hardware.h header. And just so you're aware, there might be a warning about potential ABI changes that needs to be disabled when using std::hardware_destructive_interference_size.

@rubiefawn
Copy link
Contributor

rubiefawn commented Feb 24, 2026

The work in #8132 should also be able to be generalized into a lock free MPSC ring buffer, which may be useful in other places as well.

Edit: Don't know why I said "may", I've already done this, it's just not tested outside of Lb302, and I haven't meaningfully worked on it in months. Only the MPMC queue is implemented at the moment, but the other types can be derived from it by removing things (such as the MPSC queue used in Lb302).

@sakertooth
Copy link
Contributor Author

sakertooth commented Feb 24, 2026

To fix the macOS failure, we need to bump our x86_64 macOS minimum deployment target to macOS 11.0.
This should be done explicitly by setting the -DCMAKE_OSX_DEPLOYMENT_TARGET CMake flag rather than using our mac-os.env script. See #8118 for how it should be done.
We will also need to do this to enable std::filesystem support on x86_64 macOS

Should this PR wait then?

For std::hardware_destructive_interference_size, check out what @rubiefawn did in #8132 with the new Hardware.h header. And just so you're aware, there might be a warning about potential ABI changes that needs to be disabled when using std::hardware_destructive_interference_size.

I guess I'll wait to avoid doing the same thing. I'm also not sure because MinGW complains about std::hardware_destructive_interference_size being unstable, rather than it being not defined, so in the event we eventually switch to using it, MinGW might still be broken (AFAICT). Though this complaint from MinGW is actually a warning, so disabling it might resolve it.

The work in #8132 should also be able to be generalized into a lock free MPSC ring buffer, which may be useful in other places as well.

The only place I can think of right now that might use a queue whereby there is multiple threads on either side is with the audio thread and its worker threads, but even then in that situation I think separate SPSC queues might scale better (though, a MPSC/SPMC/MPMC queue might work fine when using an atomic fetch_add with a slot based system, though I designed this queue with contiguous reservations of the buffer in mind, which I don't think you can do with those queues, could be wrong). That all being said though, if we have a use case for those queues, I would be fine adding them as needed.

I'm curious why a MPSC queue would be needed for 8132 though, (are the multiple audio worker threads each sending voices or something?)

@rubiefawn
Copy link
Contributor

are the multiple audio worker threads each sending voices or something?

This is the case. I had each thread print out its id when performing an enqueue operation and got several different ids.

@rubiefawn rubiefawn mentioned this pull request Feb 25, 2026
17 tasks
@JohannesLorenz
Copy link
Contributor

I have 2 questions:

The queue replaces the old LocklessRingBuffer

What was wrong with LocklessRingBuffer? What is the advantage that this PR introduces?

the currently unused Lv2Worker code

What code is unused in Lv2Worker?

@sakertooth
Copy link
Contributor Author

sakertooth commented Mar 9, 2026

@JohannesLorenz,

What was wrong with LocklessRingBuffer? What is the advantage that this PR introduces?

LocklessRingBuffer brought with it some concerns that discouraged my use of it, most notably in #7705 (see #7705 (comment)). From further inspection of the ringbuffer submodule however, it seems to be a lockfree SPMC queue, so it seems to be more real-time safe than I had initially thought.

In 7705 I was wary of the use of QWaitCondition, since in my mind its still a heavy OS construct, most likely heavier than a simple std::atomic_flag::wait, which is what my PR uses. I also didn't like the split reader/writer API, where reading has to be done with a separate class. For SPSC queue uses this seemed a bit much.

Also wasn't sure if we really had to outsource another library (the ringbuffer submodule). I initially was planning to slowly get rid of it, but come to see its actually a SPMC queue and solves somewhat of a different problem, it might be needed (though seeing that you wrote it I don't know if you would've been okay with that regardless).

What code is unused in Lv2Worker?

Correct me if I'm wrong, but the code in Lv2Worker.cpp doesn't seem to be in use anywhere else in the codebase currently. I assumed that this was because the Lv2 implementation was being worked in incrementally rather than being complete in one go.

@JohannesLorenz
Copy link
Contributor

Correct me if I'm wrong, but the code in Lv2Worker.cpp doesn't seem to be in use anywhere else in the codebase currently.

$ git grep Lv2Worker
include/Lv2Proc.h:#include "Lv2Worker.h"
include/Lv2Proc.h:      std::optional<Lv2Worker> m_worker;

You can continue with git grep m_worker -- src/core/lv2/Lv2Proc.cpp. I really wonder why you cannot see this.

I also didn't like the split reader/writer API, where reading has to be done with a separate class. For SPSC queue uses this seemed a bit much.

Indeed it is a bit much, but restricts you to SPSC (vs SPMC).

--

If I put it all together, for me it looks a bit like this:

Advantages of this PR:

  • Follows our style guide
  • No submodule (some users find them difficult)
  • No need for Reader AND writer (but no SPMC)
  • Uses atomic_flag (though LocklessRingbuffer could be extended by it)

Advandages of keeping ringbuffer:

I value any effort. However, this PR here looks to me to only have stylistic advantages, and has functional disadvantages (yet).

@sakertooth
Copy link
Contributor Author

sakertooth commented Mar 9, 2026

You can continue with git grep m_worker -- src/core/lv2/Lv2Proc.cpp. I really wonder why you cannot see this.

Ah, I see it now. I skipped over this by accident.

I value any effort. However, this PR here looks to me to only have stylistic advantages, and has functional disadvantages (yet).

If the API for LocklessRingBuffer was simpler instead of split, I would be more receptive of using it. I would also rename it to reflect if its a SPSC or SPMC queue, so we know in what contexts it can be used. And I was confused why we needed a LocklessRingBuffer because I thought the underlying structure was already a lockfree SPMC queue? If the appropriate changes are made I guess I can close this work out and use what's already there.

Maybe have a simple interface for SPSC uses, and a more general interface for SPMC uses, though both can use the underlying library.

I'm also not sure of its capabilities/how this library actually works. I wasn't even sure if ringbuffer was lockfree (I thought it was just a regular ring buffer 😵‍💫)

Note

I never really like these situations because of the attachment to ones work and the conflict that brings. I usually am willing to let my work be disbanded, but just need confirmation that the original solution will be improved to be more convenient and easy to use. Nevertheless, I should've asked more questions about this than starting from scratch.

TLDR: Didn't really understand what ringbuffer was, wrote my own SPSC solution, should've asked around, see what could've been done about the code already there.

@messmerd
Copy link
Member

messmerd commented Mar 9, 2026

I think I'm honestly more in favor of using a widely-used, extensively-tested, well-documented, high-performance 3rd party library than trying to roll our own homemade implementation. Concurrency is hard to do correctly, so I'd feel more at ease using something that thousands of other people depend on.

The problems with this PR as I see it:

  • Homemade (can I trust it to always work correctly and be free of UB? I have no idea.)
  • No testing
  • No performance benchmarks

The problems with the current ringbuffer as I see it:

  • Homemade (can I trust it to always work correctly and be free of UB? Definitely more than this PR, but I'm still not fully confident)
  • There are a couple tests, but not much
  • No performance benchmarks
  • Confusing API

The problems with a 3rd party library:

  • Requires a new dependency (or just a different dependency, since it would replace the ringbuffer submodule)
  • That's it

@sakertooth
Copy link
Contributor Author

sakertooth commented Mar 9, 2026

Homemade (can I trust it to always work correctly and be free of UB? I have no idea.)
No testing
No performance benchmarks

Unfortunately yes you're right, I think something like moodycamel would be a good bet, off the top of my head. Might not have enough time to commit to thorough testing and benchmarks for this PR, though it's in our best interest to have them.

I just need to make sure the library has capabilities we need, like reserving contiguous regions on either side for SPSC queues at a minimum (edit: it has bulk enqueue/dequeue, so it should be fine).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants