-
Notifications
You must be signed in to change notification settings - Fork 228
access.direct.demultiplexer: add ability for applets to override default USB transfer settings. #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ult USB transfer settings.
Add four keyword arguments to DirectDemultiplexerInterface's constructor:
read_packets_per_xfer, read_xfers_per_queue,
write_packets_per_xfer, write_xfers_per_queue
An applet can set these values through device.demultiplexer.claim_interface's **kwargs.
…B settings on applets that require fixed throughput without overflows.
whitequark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the investigation. It's been a while since I looked at this code in depth, and I'm unsure if this is really the best way to solve it, especially given the analyzer's many other deficiencies.
At this moment other commitments do not leave me enough time to discuss a better approach in depth. Are you in the Matrix channel? If so we should talk about it at some point when I do have more time and work out a way forward.
|
In particular, it's not clear to me that we can't just change the defaults so that any applet gets a buffer holding 5-10 ms of data. This does go against the DMA memory quotas in the Linux kernel, but:
|
|
(Also, I think the fixed-throughput applet should perhaps be a part of the benchmark applet instead? It does perform a benchmark, and I don't like adding more top-level names unless there's good justification.) |
|
Thanks for the fast response!
No problem! It was time well spent, I learned a number of new things about USB in the process.
No worries about not discussing more now. I'm not on matrix, but I am on the 1BitSquared Discord as "SamIAm", so I see Matrix messages via the bridge.
A potential default change sounds good to me. I think one tradeoff could be latency for applets that don't use auto flush / explicitly flush.
I'm sad about the 16 MB limit. Its the default on my system (Ubuntu 24.04, Kernel 6.8). Its easy enough to increase by running
Moving the fixed-throughput applet into the benchmark applet makes sense and I'm happy to make that change. |
|
Thank you for the responsiveness! I really value this kind of contribution.
My view is that any properly designed applet should be using either auto-flush or explicitly flush. Doing neither and just relying on data to 'fall off the FIFO' does happen to work but it is expected to introduce undesirable performance characteristics.
Ah, that's upsetting. I really don't want to get involved with LKML (especially not after the latest events) so I guess the default gets to stay what it is. Do you want to explore adding a warning asking for this limit to be raised? I think once you exceed a limit you get a certain error from libusb and we could show the warning once we hit that. I recall this error to be annoyingly ambiguous but we could do what we can, now that we collectively know that it is worth bumping against the limit. |
Sure I'll look into detecting the error/making a warning on exceeding the limit. I don't remember what the error was offhand, but I definitely ran into it a few times when I made a mistake calculating the request size. |
|
Thanks, perfect! If you slice your changes into individual PRs I will likely be able to review and merge them faster (the new benchmark for example could be merged right away as far as I can tell from a very quick glance). |
|
@Sazzach Have you had the chance to take another look at this PR? |
|
I have, but I also have been distracted/sidetracked and let this slide. I'm sorry. With respect to adding a fixed source mode to the benchmark applet, the existence of a fixed source mode, implies potential fixed sink/loopback modes. Instead of adding three additional modes, I thought it'd make sense instead to add an optional I'm still working on this. My status is There are also some minor arguments I'd like to add. These are To summarize, arguments before: Proposed arguments after: Let me know If this sounds good to you/if you would like me to do things differently. I'll prioritize finishing up my additions this week (hopefully today/tomorrow, but I don't want to over promise) I haven't looked into the libusb error for insufficient usbfs_memory_mb yet. I should be able to once I finish up my benchmark PR. I have thoughts about how to determine potential new defaults for |
|
Right, that sounds fine to me. The details seem a little complicated but it seems like the problem we have on our hands is a little complicated too, so that's fine. Happy to do a review once you're done. |
|
Have you had a chance to revisit this? In the meantime I completely redesigned the demultiplexer (it's now part of |
|
I've had a chance, I'm close to finishing up my benchmark applet changes. I'm sorry for being slow on this. My current goal is to finish up tomorrow. As an aside, I'm excited by the changes to the interface. There's stuff I've been interested in doing that will be more straightforward now. I'm also looking forward to using your new BSDL parser with the JTAG applet. Thanks! |
I'm very happy to hear that! I would also love to hear about your experiences using JTAG & BSDL parser. |
|
@Sazzach Have you been able to give this PR another look? |
|
I have! I'm close to being done with adding a fixed rate modifier to the benchmark applet + some other changes. A summary of the changes are:
What I (roughly) have left to do is:
One question I have is how does the I just pushed up my WIP changes here. If you have a moment could you take a look and let me know if I've gone in a direction you don't like anywhere. I'm sorry for taking so long on this. |
|
I'll take a look shortly, but to answer this:
This is quite possibly just a bug. I'm not sure it's been tested at all. |
|
@Sazzach I'll give it another look once I'm more awake, but my main comment so far is that the code is quite hard to follow. There are a lot of different modes that combine with each other, this results in a lot of conditionals sometimes updating similarly named variables, duplicated code ( I think we need to decide on which figures of merit we measure and structure the code towards that, without so much of a combinational explosion. Right now this is:
What are the additional figures of merit we're measuring? I think it's something like p99 latency, but expressed and measured in an indirect way. Can we do this better? |
|
Some examples of what I think is difficult to review and maintain:
There are also relatively minor things like overuse of tuples (there's even a nested tuple in one place; |
|
Thanks for the fast review! I'm sorry for not responding sooner.
That's a good point. My original goal was to have a simple modifier that applied to the existing modes. In retrospect, this didn't work out well.
Ignoring adding additional figures for now, I think that adding an interface to the benchmark applet would enable easier testing of how multiple applets running at the same time affect the overall throughput/latency. Some example uses like this could be: I think that this would be fairly straight forward and wouldn't require any gateware changes, but would require thought as to what the various modes return (A dataclass?). (I agree that my current use of tuples here is bad) Thinking about adding additional figures now: I've been interested in fixed throughput because it gives a rough idea of how much data an applet can produce without any data being dropped. I think that this isn't a concern for most applets so I understand if it doesn't make sense to get it added into the main project. Also, from my experiments it seems like the main factor for fixed throughput is the number of transfers queued/their size, along with how long garbage collection takes. This is trivial to measure and is represented a latency measurements with sufficient samples to cross the gc threshold. I could see a rate setting being used for experiments such as: I'm down to:
I'm also down to drop this if it's not something you want for the project. Thanks again! |
Let's do this, sounds good to me! |
The first commit in this PR adds a way for an applet to override the default USB transfer size / the number of transfers queued on a
DirectDemultiplexerInterfaceat a time.The motivation for this change is that while the default values
_packets_per_xferand_xfers_per_queuework well for most applets, they aren't ideal for applets where backpressure results in the loss of data, such as the analyzer applet. In this case, I found that requesting larger transfers sized to hold ~5-10 ms of data allowed for a higher throughput withoutin_fifooverflows.The second commit in this PR contains an applet called
fixed-throughput. This applet outputs data from the device at a fixed rate, and checks forin_fifooverflows. I used it to test the effect of different combinations of transfer size and transfers queued.The code block below shows the results of some measurements I took. With the default settings, and a rate of 100 Mbps, the fixed-throughput applet overflowed after ~6 seconds. By changing the settings, the fixed-throughput applet was able to run for ~40 minutes at a rate of 300 Mbps without overflows before I terminated it.