-
Notifications
You must be signed in to change notification settings - Fork 228
debug.superh.aud: Add SuperH AUD-II read support #985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9069581 to
5f658b5
Compare
|
Confirmed the docs build locally and the new applet shows up. Let me know what you think! |
|
I'll try to do a review today; was busy with other things, sorry! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I wrote over 2100 words of review comments, please don't take this to mean that I dislike your PR. On the contrary, I'm incredibly impressed: this is one of the nicest PRs from a first-time contributor I've seen for Glasgow. You've clearly taken the time to examine at existing code, understand its intent, and replicate it in your applet, even though there's basically no documentation, and without being explicitly directed by a reviewer. I would be very happy to have you as a repeat contributor, which is why I spent a few hours making sure we're on the same page going forward.
Dumping 512kB takes about 8 minutes. I think most of the time is wasted due to USB round-trips waiting for the data to be ready and reading out each nibble individually. This seems to take about 250us per nibble. I could move this logic to the FPGA side, and return 4 bytes at once. However, not sure if that complexity is worth it.
I'd say that the complexity is worth not having to dump half a megabyte over a slower-than-9600-baud interface alone (or maybe I just lack patience). Impatience aside, being able to do things like wait state polling on the FPGA is how Glasgow ends up being so much nicer than other tools, and is what the FPGA is there for. (Our probe-rs applet is one of the fastest, possibly the fastest, supported SWD debug probe, usually with a good margin too. This kind of thing is what I want the project to be known for in general.)
It is true that there is some increase in complexity due to having to replicate the nibble read state machine and its timeout handling, but you could manage that by refactoring the code a little, which I explain further in the comments below.
| :maxdepth: 3 | ||
|
|
||
| arm7 | ||
| superh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the appropriate taxon is program even though it is technically using an interface with "Debug" in its name:
The program taxon groups applets implementing interfaces to memory technology devices (volatile and non-volatile) that are directly connected to programmable logic, such as a microcontroller or a gate array.
I tried to figure out how you would make a debugger using AUD-II and it's not clear to me that you can. The built-in modes are tracing and arbitrary read/write. You have a breakpoint module (UBC) that seems programmable via monitoring mode which could support single-stepping and breakpoints. There is however no way to get an arbitrary CPU register via AUD-II (or execute an arbitrary instruction, or even just get the value of R15), so you'd need help from the firmware: an ISR that dumps the CPU context into some predefined RAM area that is used for communicating with the debugger. Even then, UBC has no way to tell which breakpoint has fired on any particular cycle, so even though it supports data breakpoints you can't use it for watchpoints: you wouldn't know whether a breakpoint or a watchpoint has fired, which breaks single-stepping. You could work around that by adding an SH-2A emulator and an architecture model and evaluating every configured breakpoint/watchpoint against the post-breakpoint architectural state, if you wanted to create an abomination and had a vast excess of free time on your hands.
I think it's safe to say nobody will ever contribute a debugger for this chip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to know a part number for one of those chips/ I'm curious.
| def add_build_arguments(cls, parser, access): | ||
| access.add_voltage_argument(parser) | ||
|
|
||
| access.add_pins_argument(parser, "audata", width=4, required=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In applets like this one, I suggest using required=True, default=True to avoid having to specify the pinout each time. This way, if you connect one of the harnesses to your DUT, you don't need to guess which pinout it is: it is the default one.
The order of the arguments that applets use is:
- reset(s)
- strap(s)
- clock(s)
- controls
- data
This ordering is done per functional block, e.g. for RGMII you would have PHY reset, then RX control/data, then TX control/data (inputs before outputs).
For this applet, I suggest:
- AUDRST#
- AUDMD
- AUDCK
- AUDSYNC
- AUDATA
|
|
||
| @synthesis_test | ||
| def test_build(self): | ||
| self.assertBuilds(self.hardware_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a replay-hardware-in-loop test using @applet_v2_hardware_test(mocks=["aud_iface"]. I don't have the (fairly esoteric) hardware you're using, so if I'm doing a codebase-wide refactoring, I would have little ability to check if I broke it.
The way this decorator works is that the first time you run the test, it talks to your hardware and records the function calls done on the mocked class. The second time you run the test, it ensures the same arguments are passed, and provides the same return values. Provided no actual algorithms change, this gives a quick and easy way to validate that the applet is still functional, and permit a (limited) degree of refactorings, avoiding fossilization of the codebase.
| with m.If(self.o_stream.ready): | ||
| m.next = "RECV-COMMAND" | ||
|
|
||
| return m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general comments on the AUDComponent:
- You're using all of the pins synchronously; also, the bus itself is fully synchronous except for AUDRST# (and AUDMD). I think you can use
io.FFBufferand dropFFSynchronizer(since AUDATA isn't asynchronous to your sampling clock). - This protocol includes wait states ("Not Ready" output). Polling for the end of a wait state in software is one of the worst inefficiencies a tool like Glasgow could include. I think you should, at the very least, have a command that toggles AUDCK and polls AUDATA until it becomes non-zero and then returns that value to the host. The timeout would become an input to the component, which is tied to a register in the
AUDInterfaceclass. If you implement polling in gateware, you will be able to submit a sequence of reads as a single bulk operation, without spending USB roundtrips on bus synchronization. - I took a careful look at the entire §21 and it appears that both the tracing mode and the monitor mode were designed to allow a byte-oriented implementation: for tracing, last nibble of CMD1 and first nibble of CMD2 form a header byte, and for monitoring, command/direction nibble and zeroes form a header byte. This means that an implementation using octet-based communication with the host is feasible, provided that wait state polling is implemented in hardware. Being able to use fast primitives like bytearray concatenation,
int.{to,from}_bytes, etc without having to pack nibbles in Python is very good for performance; this applet deals with enough data that it starts to really matter. If you implement byte-packing in hardware, you will be able to retrieve the result of a sequence of reads as a singleawait self._pipe.read(size * count)operation, which is dramatically faster for large quantities of data. - I suggest at least making a small stream-based component that takes two commands: "read nibble with given sync" and "write nibble with given sync", and encapsulates toggling the clock. The
AUDComponentwill then only take care of command handling, byte-packing, as well as wait state polling. - Since it is so similar to reads and quite easy to implement and test, I think it would be nice if you added write support for completeness. Also, writes allow you to set up a predictable state in RAM for hardware-in-the-loop testing (so you don't have to modify your assertions if you have a different firmware image).
Of these suggestions, (1) and (2) are I'd say required for a high-quality implementation, while (3), (4), and (5) would be nice but I would not reject the applet if you choose not to implement them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Done
- Split everything into two components
AUDComponentandAUDBus. TheAUDComponentnow implements the wait for ready logic. I still need to implement error handing. Is this what you had in mind? - I've already implemented a
bulk_readfunction that schedules multiple transfers, and then waits for all of them at once. Removing the bit twiddling stuff (bytes([y << 4 | x for x, y in zip(data[0::2], data[1::2])])) does not seem to make a change in speed. However, since I'm only sending nibbles I'm wasting half the USB bandwidth. So I expect another 2x speedup if I switch to octet based communication. - See 2
- Will do!
|
Thanks for the thorough review! I'll need some time to address all your feedback, but I'll try to get it everything in a state where you can upstream and maintain it easily. I was very happy to get a dump of the memory at all, so didn't mind the speed. But you're right there should be no excuses for it being this slow. AUDCK can go up to the EXTAL frequency, which is something like 20 Mhz. I'll see how fast I can make this thing go. The experience of writing an applet for the first time was pretty decent as there are a bunch of nice applets to use as reference, and for the Keep up the good work! |
Oh yes, this is a known issue. It's actually hoarding the USB DMA memory, which is not even a limited resource in 2025. We may want to add a handler for that particular exception with the instructions to check other applications on the same host if it happens on Linux.
Thank you! I'm happy to see you were able to get started easily and that it was enjoyable. I do want to make the experience of writing a one-off Arduino based tool effectively obsolete :) |
AUD-II is an interesting protocol as it seems to be present on all SH-2A SuperH chips. Unlike JTAG (H-UDI) it looks like it cannot be turned off or password protected. The protocol is meant for real time tracing of data and branches, however it also has a second mode: "RAM Monitoring Mode". This is a simple mode where you can peek/poke at memory in 1, 2 or 4 byte chunks.
This applet only implements the "Read" command of the "RAM Monitoring Mode", as I was using it dump the flash of one of these SuperH chips from an automotive part.
Dumping 512kB takes about 8 minutes. I think most of the time is wasted due to USB round-trips waiting for the data to be ready and reading out each nibble individually. This seems to take about 250us per nibble. I could move this logic to the FPGA side, and return 4 bytes at once. However, not sure if that complexity is worth it.
This is my first time writing a glasgow applet, so feel free to let me know if there are things I could have done better or more efficient. Also happy to remove some comments if you think they are superfluous. Let me know if it's in the right place in the
debugmodule, it's a debug/tracing protocol but I'm currently only using it to read memory.PR is in Draft until I confirm that the docs still build properly.
Logic Analyzer trace of reading 4 bytes:

Reference from datasheet:
