Skip to content

Conversation

@pd0wm
Copy link

@pd0wm pd0wm commented Aug 1, 2025

AUD-II is an interesting protocol as it seems to be present on all SH-2A SuperH chips. Unlike JTAG (H-UDI) it looks like it cannot be turned off or password protected. The protocol is meant for real time tracing of data and branches, however it also has a second mode: "RAM Monitoring Mode". This is a simple mode where you can peek/poke at memory in 1, 2 or 4 byte chunks.

This applet only implements the "Read" command of the "RAM Monitoring Mode", as I was using it dump the flash of one of these SuperH chips from an automotive part.

Dumping 512kB takes about 8 minutes. I think most of the time is wasted due to USB round-trips waiting for the data to be ready and reading out each nibble individually. This seems to take about 250us per nibble. I could move this logic to the FPGA side, and return 4 bytes at once. However, not sure if that complexity is worth it.

This is my first time writing a glasgow applet, so feel free to let me know if there are things I could have done better or more efficient. Also happy to remove some comments if you think they are superfluous. Let me know if it's in the right place in the debugmodule, it's a debug/tracing protocol but I'm currently only using it to read memory.

PR is in Draft until I confirm that the docs still build properly.

Logic Analyzer trace of reading 4 bytes:
2025-08-01-105310_2361x1066_scrot

Reference from datasheet:
2025-08-01-105324_554x291_scrot

@pd0wm pd0wm requested a review from whitequark as a code owner August 1, 2025 08:56
@pd0wm pd0wm marked this pull request as draft August 1, 2025 08:57
@pd0wm pd0wm force-pushed the superh-aud branch 2 times, most recently from 9069581 to 5f658b5 Compare August 4, 2025 10:02
@pd0wm pd0wm marked this pull request as ready for review August 4, 2025 10:02
@pd0wm
Copy link
Author

pd0wm commented Aug 4, 2025

Confirmed the docs build locally and the new applet shows up. Let me know what you think!

@whitequark
Copy link
Member

I'll try to do a review today; was busy with other things, sorry!

Copy link
Member

@whitequark whitequark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I wrote over 2100 words of review comments, please don't take this to mean that I dislike your PR. On the contrary, I'm incredibly impressed: this is one of the nicest PRs from a first-time contributor I've seen for Glasgow. You've clearly taken the time to examine at existing code, understand its intent, and replicate it in your applet, even though there's basically no documentation, and without being explicitly directed by a reviewer. I would be very happy to have you as a repeat contributor, which is why I spent a few hours making sure we're on the same page going forward.

Dumping 512kB takes about 8 minutes. I think most of the time is wasted due to USB round-trips waiting for the data to be ready and reading out each nibble individually. This seems to take about 250us per nibble. I could move this logic to the FPGA side, and return 4 bytes at once. However, not sure if that complexity is worth it.

I'd say that the complexity is worth not having to dump half a megabyte over a slower-than-9600-baud interface alone (or maybe I just lack patience). Impatience aside, being able to do things like wait state polling on the FPGA is how Glasgow ends up being so much nicer than other tools, and is what the FPGA is there for. (Our probe-rs applet is one of the fastest, possibly the fastest, supported SWD debug probe, usually with a good margin too. This kind of thing is what I want the project to be known for in general.)

It is true that there is some increase in complexity due to having to replicate the nibble read state machine and its timeout handling, but you could manage that by refactoring the code a little, which I explain further in the comments below.

:maxdepth: 3

arm7
superh
Copy link
Member

@whitequark whitequark Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the appropriate taxon is program even though it is technically using an interface with "Debug" in its name:

The program taxon groups applets implementing interfaces to memory technology devices (volatile and non-volatile) that are directly connected to programmable logic, such as a microcontroller or a gate array.

I tried to figure out how you would make a debugger using AUD-II and it's not clear to me that you can. The built-in modes are tracing and arbitrary read/write. You have a breakpoint module (UBC) that seems programmable via monitoring mode which could support single-stepping and breakpoints. There is however no way to get an arbitrary CPU register via AUD-II (or execute an arbitrary instruction, or even just get the value of R15), so you'd need help from the firmware: an ISR that dumps the CPU context into some predefined RAM area that is used for communicating with the debugger. Even then, UBC has no way to tell which breakpoint has fired on any particular cycle, so even though it supports data breakpoints you can't use it for watchpoints: you wouldn't know whether a breakpoint or a watchpoint has fired, which breaks single-stepping. You could work around that by adding an SH-2A emulator and an architecture model and evaluating every configured breakpoint/watchpoint against the post-breakpoint architectural state, if you wanted to create an abomination and had a vast excess of free time on your hands.

I think it's safe to say nobody will ever contribute a debugger for this chip.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some chips don't have physical JTAG pins, and only rely on the AUD interface for flashing and debugging. They probably have some very cursed way of doing this, but I'm not spending €1800 to find out.

So indeed safe to say this won't be added anytime soon, at least not before H-UDI support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to know a part number for one of those chips/ I'm curious.

def add_build_arguments(cls, parser, access):
access.add_voltage_argument(parser)

access.add_pins_argument(parser, "audata", width=4, required=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In applets like this one, I suggest using required=True, default=True to avoid having to specify the pinout each time. This way, if you connect one of the harnesses to your DUT, you don't need to guess which pinout it is: it is the default one.

The order of the arguments that applets use is:

  • reset(s)
  • strap(s)
  • clock(s)
  • controls
  • data

This ordering is done per functional block, e.g. for RGMII you would have PHY reset, then RX control/data, then TX control/data (inputs before outputs).

For this applet, I suggest:

  • AUDRST#
  • AUDMD
  • AUDCK
  • AUDSYNC
  • AUDATA


@synthesis_test
def test_build(self):
self.assertBuilds(self.hardware_args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a replay-hardware-in-loop test using @applet_v2_hardware_test(mocks=["aud_iface"]. I don't have the (fairly esoteric) hardware you're using, so if I'm doing a codebase-wide refactoring, I would have little ability to check if I broke it.

The way this decorator works is that the first time you run the test, it talks to your hardware and records the function calls done on the mocked class. The second time you run the test, it ensures the same arguments are passed, and provides the same return values. Provided no actual algorithms change, this gives a quick and easy way to validate that the applet is still functional, and permit a (limited) degree of refactorings, avoiding fossilization of the codebase.

with m.If(self.o_stream.ready):
m.next = "RECV-COMMAND"

return m
Copy link
Member

@whitequark whitequark Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments on the AUDComponent:

  1. You're using all of the pins synchronously; also, the bus itself is fully synchronous except for AUDRST# (and AUDMD). I think you can use io.FFBuffer and drop FFSynchronizer (since AUDATA isn't asynchronous to your sampling clock).
  2. This protocol includes wait states ("Not Ready" output). Polling for the end of a wait state in software is one of the worst inefficiencies a tool like Glasgow could include. I think you should, at the very least, have a command that toggles AUDCK and polls AUDATA until it becomes non-zero and then returns that value to the host. The timeout would become an input to the component, which is tied to a register in the AUDInterface class. If you implement polling in gateware, you will be able to submit a sequence of reads as a single bulk operation, without spending USB roundtrips on bus synchronization.
  3. I took a careful look at the entire §21 and it appears that both the tracing mode and the monitor mode were designed to allow a byte-oriented implementation: for tracing, last nibble of CMD1 and first nibble of CMD2 form a header byte, and for monitoring, command/direction nibble and zeroes form a header byte. This means that an implementation using octet-based communication with the host is feasible, provided that wait state polling is implemented in hardware. Being able to use fast primitives like bytearray concatenation, int.{to,from}_bytes, etc without having to pack nibbles in Python is very good for performance; this applet deals with enough data that it starts to really matter. If you implement byte-packing in hardware, you will be able to retrieve the result of a sequence of reads as a single await self._pipe.read(size * count) operation, which is dramatically faster for large quantities of data.
  4. I suggest at least making a small stream-based component that takes two commands: "read nibble with given sync" and "write nibble with given sync", and encapsulates toggling the clock. The AUDComponent will then only take care of command handling, byte-packing, as well as wait state polling.
  5. Since it is so similar to reads and quite easy to implement and test, I think it would be nice if you added write support for completeness. Also, writes allow you to set up a predictable state in RAM for hardware-in-the-loop testing (so you don't have to modify your assertions if you have a different firmware image).

Of these suggestions, (1) and (2) are I'd say required for a high-quality implementation, while (3), (4), and (5) would be nice but I would not reject the applet if you choose not to implement them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Done
  2. Split everything into two components AUDComponent and AUDBus. The AUDComponent now implements the wait for ready logic. I still need to implement error handing. Is this what you had in mind?
  3. I've already implemented a bulk_read function that schedules multiple transfers, and then waits for all of them at once. Removing the bit twiddling stuff (bytes([y << 4 | x for x, y in zip(data[0::2], data[1::2])])) does not seem to make a change in speed. However, since I'm only sending nibbles I'm wasting half the USB bandwidth. So I expect another 2x speedup if I switch to octet based communication.
  4. See 2
  5. Will do!

@pd0wm
Copy link
Author

pd0wm commented Aug 5, 2025

Thanks for the thorough review! I'll need some time to address all your feedback, but I'll try to get it everything in a state where you can upstream and maintain it easily.

I was very happy to get a dump of the memory at all, so didn't mind the speed. But you're right there should be no excuses for it being this slow. AUDCK can go up to the EXTAL frequency, which is something like 20 Mhz. I'll see how fast I can make this thing go.

The experience of writing an applet for the first time was pretty decent as there are a bunch of nice applets to use as reference, and for the amaranth stuff there is documentation available. After one evening I was able to get my first bytes out of the chip. Definitely a better experience than hacking together a one-time-use tool with an Arduino. I actually spent most of the time debugging an ErrorOutOfMemory error because my Saleae was plugged into the same USB hub and was hoarding all the bandwidth.

Keep up the good work!

@whitequark
Copy link
Member

I actually spent most of the time debugging an ErrorOutOfMemory error because my Saleae was plugged into the same USB hub and was hoarding all the bandwidth.

Oh yes, this is a known issue. It's actually hoarding the USB DMA memory, which is not even a limited resource in 2025. We may want to add a handler for that particular exception with the instructions to check other applications on the same host if it happens on Linux.

Keep up the good work!

Thank you! I'm happy to see you were able to get started easily and that it was enjoyable. I do want to make the experience of writing a one-off Arduino based tool effectively obsolete :)

@whitequark whitequark added the waiting-on-review Status: Waiting for review label Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-on-review Status: Waiting for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants