debug.superh.aud: Add SuperH AUD-II read support #985

pd0wm · 2025-08-01T08:56:35Z

AUD-II is an interesting protocol as it seems to be present on all SH-2A SuperH chips. Unlike JTAG (H-UDI) it looks like it cannot be turned off or password protected. The protocol is meant for real time tracing of data and branches, however it also has a second mode: "RAM Monitoring Mode". This is a simple mode where you can peek/poke at memory in 1, 2 or 4 byte chunks.

This applet only implements the "Read" command of the "RAM Monitoring Mode", as I was using it dump the flash of one of these SuperH chips from an automotive part.

Dumping 512kB takes about 8 minutes. I think most of the time is wasted due to USB round-trips waiting for the data to be ready and reading out each nibble individually. This seems to take about 250us per nibble. I could move this logic to the FPGA side, and return 4 bytes at once. However, not sure if that complexity is worth it.

This is my first time writing a glasgow applet, so feel free to let me know if there are things I could have done better or more efficient. Also happy to remove some comments if you think they are superfluous. Let me know if it's in the right place in the debugmodule, it's a debug/tracing protocol but I'm currently only using it to read memory.

PR is in Draft until I confirm that the docs still build properly.

Logic Analyzer trace of reading 4 bytes:

Reference from datasheet:

pd0wm · 2025-08-04T10:03:19Z

Confirmed the docs build locally and the new applet shows up. Let me know what you think!

whitequark · 2025-08-04T10:20:39Z

I'll try to do a review today; was busy with other things, sorry!

whitequark

Although I wrote over 2100 words of review comments, please don't take this to mean that I dislike your PR. On the contrary, I'm incredibly impressed: this is one of the nicest PRs from a first-time contributor I've seen for Glasgow. You've clearly taken the time to examine at existing code, understand its intent, and replicate it in your applet, even though there's basically no documentation, and without being explicitly directed by a reviewer. I would be very happy to have you as a repeat contributor, which is why I spent a few hours making sure we're on the same page going forward.

Dumping 512kB takes about 8 minutes. I think most of the time is wasted due to USB round-trips waiting for the data to be ready and reading out each nibble individually. This seems to take about 250us per nibble. I could move this logic to the FPGA side, and return 4 bytes at once. However, not sure if that complexity is worth it.

I'd say that the complexity is worth not having to dump half a megabyte over a slower-than-9600-baud interface alone (or maybe I just lack patience). Impatience aside, being able to do things like wait state polling on the FPGA is how Glasgow ends up being so much nicer than other tools, and is what the FPGA is there for. (Our probe-rs applet is one of the fastest, possibly the fastest, supported SWD debug probe, usually with a good margin too. This kind of thing is what I want the project to be known for in general.)

It is true that there is some increase in complexity due to having to replicate the nibble read state machine and its timeout handling, but you could manage that by refactoring the code a little, which I explain further in the comments below.

whitequark · 2025-08-04T22:50:40Z

docs/manual/src/applets/debug/index.rst

    :maxdepth: 3

    arm7
+    superh


I think the appropriate taxon is program even though it is technically using an interface with "Debug" in its name:

The program taxon groups applets implementing interfaces to memory technology devices (volatile and non-volatile) that are directly connected to programmable logic, such as a microcontroller or a gate array.

I tried to figure out how you would make a debugger using AUD-II and it's not clear to me that you can. The built-in modes are tracing and arbitrary read/write. You have a breakpoint module (UBC) that seems programmable via monitoring mode which could support single-stepping and breakpoints. There is however no way to get an arbitrary CPU register via AUD-II (or execute an arbitrary instruction, or even just get the value of R15), so you'd need help from the firmware: an ISR that dumps the CPU context into some predefined RAM area that is used for communicating with the debugger. Even then, UBC has no way to tell which breakpoint has fired on any particular cycle, so even though it supports data breakpoints you can't use it for watchpoints: you wouldn't know whether a breakpoint or a watchpoint has fired, which breaks single-stepping. You could work around that by adding an SH-2A emulator and an architecture model and evaluating every configured breakpoint/watchpoint against the post-breakpoint architectural state, if you wanted to create an abomination and had a vast excess of free time on your hands.

I think it's safe to say nobody will ever contribute a debugger for this chip.

Some chips don't have physical JTAG pins, and only rely on the AUD interface for flashing and debugging. They probably have some very cursed way of doing this, but I'm not spending €1800 to find out.

So indeed safe to say this won't be added anytime soon, at least not before H-UDI support.

Do you happen to know a part number for one of those chips/ I'm curious.

software/glasgow/applet/debug/superh/aud/__init__.py

whitequark · 2025-08-05T00:24:14Z

software/glasgow/applet/debug/superh/aud/__init__.py

+    def add_build_arguments(cls, parser, access):
+        access.add_voltage_argument(parser)
+
+        access.add_pins_argument(parser, "audata", width=4, required=True)


In applets like this one, I suggest using required=True, default=True to avoid having to specify the pinout each time. This way, if you connect one of the harnesses to your DUT, you don't need to guess which pinout it is: it is the default one.

The order of the arguments that applets use is:

reset(s)

strap(s)

clock(s)

controls

data

This ordering is done per functional block, e.g. for RGMII you would have PHY reset, then RX control/data, then TX control/data (inputs before outputs).

For this applet, I suggest:

AUDRST#

AUDMD

AUDCK

AUDSYNC

AUDATA

whitequark · 2025-08-05T00:25:40Z

software/glasgow/applet/debug/superh/aud/test.py

+
+    @synthesis_test
+    def test_build(self):
+        self.assertBuilds(self.hardware_args)


Please add a replay-hardware-in-loop test using @applet_v2_hardware_test(mocks=["aud_iface"]. I don't have the (fairly esoteric) hardware you're using, so if I'm doing a codebase-wide refactoring, I would have little ability to check if I broke it.

The way this decorator works is that the first time you run the test, it talks to your hardware and records the function calls done on the mocked class. The second time you run the test, it ensures the same arguments are passed, and provides the same return values. Provided no actual algorithms change, this gives a quick and easy way to validate that the applet is still functional, and permit a (limited) degree of refactorings, avoiding fossilization of the codebase.

whitequark · 2025-08-05T00:41:11Z

software/glasgow/applet/debug/superh/aud/__init__.py

+                with m.If(self.o_stream.ready):
+                    m.next = "RECV-COMMAND"
+
+        return m


Some general comments on the AUDComponent:

You're using all of the pins synchronously; also, the bus itself is fully synchronous except for AUDRST# (and AUDMD). I think you can use io.FFBuffer and drop FFSynchronizer (since AUDATA isn't asynchronous to your sampling clock).

This protocol includes wait states ("Not Ready" output). Polling for the end of a wait state in software is one of the worst inefficiencies a tool like Glasgow could include. I think you should, at the very least, have a command that toggles AUDCK and polls AUDATA until it becomes non-zero and then returns that value to the host. The timeout would become an input to the component, which is tied to a register in the AUDInterface class. If you implement polling in gateware, you will be able to submit a sequence of reads as a single bulk operation, without spending USB roundtrips on bus synchronization.

I took a careful look at the entire §21 and it appears that both the tracing mode and the monitor mode were designed to allow a byte-oriented implementation: for tracing, last nibble of CMD1 and first nibble of CMD2 form a header byte, and for monitoring, command/direction nibble and zeroes form a header byte. This means that an implementation using octet-based communication with the host is feasible, provided that wait state polling is implemented in hardware. Being able to use fast primitives like bytearray concatenation, int.{to,from}_bytes, etc without having to pack nibbles in Python is very good for performance; this applet deals with enough data that it starts to really matter. If you implement byte-packing in hardware, you will be able to retrieve the result of a sequence of reads as a single await self._pipe.read(size * count) operation, which is dramatically faster for large quantities of data.

I suggest at least making a small stream-based component that takes two commands: "read nibble with given sync" and "write nibble with given sync", and encapsulates toggling the clock. The AUDComponent will then only take care of command handling, byte-packing, as well as wait state polling.

Since it is so similar to reads and quite easy to implement and test, I think it would be nice if you added write support for completeness. Also, writes allow you to set up a predictable state in RAM for hardware-in-the-loop testing (so you don't have to modify your assertions if you have a different firmware image).

Of these suggestions, (1) and (2) are I'd say required for a high-quality implementation, while (3), (4), and (5) would be nice but I would not reject the applet if you choose not to implement them.

Done

Split everything into two components AUDComponent and AUDBus. The AUDComponent now implements the wait for ready logic. I still need to implement error handing. Is this what you had in mind?

I've already implemented a bulk_read function that schedules multiple transfers, and then waits for all of them at once. Removing the bit twiddling stuff (bytes([y << 4 | x for x, y in zip(data[0::2], data[1::2])])) does not seem to make a change in speed. However, since I'm only sending nibbles I'm wasting half the USB bandwidth. So I expect another 2x speedup if I switch to octet based communication.

See 2

Will do!

software/glasgow/applet/debug/superh/aud/__init__.py

pd0wm · 2025-08-05T12:01:33Z

Thanks for the thorough review! I'll need some time to address all your feedback, but I'll try to get it everything in a state where you can upstream and maintain it easily.

I was very happy to get a dump of the memory at all, so didn't mind the speed. But you're right there should be no excuses for it being this slow. AUDCK can go up to the EXTAL frequency, which is something like 20 Mhz. I'll see how fast I can make this thing go.

The experience of writing an applet for the first time was pretty decent as there are a bunch of nice applets to use as reference, and for the amaranth stuff there is documentation available. After one evening I was able to get my first bytes out of the chip. Definitely a better experience than hacking together a one-time-use tool with an Arduino. I actually spent most of the time debugging an ErrorOutOfMemory error because my Saleae was plugged into the same USB hub and was hoarding all the bandwidth.

Keep up the good work!

whitequark · 2025-08-05T12:16:59Z

I actually spent most of the time debugging an ErrorOutOfMemory error because my Saleae was plugged into the same USB hub and was hoarding all the bandwidth.

Oh yes, this is a known issue. It's actually hoarding the USB DMA memory, which is not even a limited resource in 2025. We may want to add a handler for that particular exception with the instructions to check other applications on the same host if it happens on Linux.

Keep up the good work!

Thank you! I'm happy to see you were able to get started easily and that it was enjoyable. I do want to make the experience of writing a one-off Arduino based tool effectively obsolete :)

…lignment

software/glasgow/applet/debug/superh/aud/__init__.py

pd0wm requested a review from whitequark as a code owner August 1, 2025 08:56

pd0wm mentioned this pull request Aug 1, 2025

Add G00106: SuperH AUD-II protocol GlasgowEmbedded/archive#17

Merged

pd0wm marked this pull request as draft August 1, 2025 08:57

pd0wm force-pushed the superh-aud branch 2 times, most recently from 9069581 to 5f658b5 Compare August 4, 2025 10:02

pd0wm marked this pull request as ready for review August 4, 2025 10:02

whitequark requested changes Aug 5, 2025

View reviewed changes

pd0wm force-pushed the superh-aud branch from 314ce08 to 93cf962 Compare August 11, 2025 08:28

pd0wm added 11 commits August 11, 2025 11:10

debug.superh.aud: Add SuperH AUD-II read support

11aedfc

debug.superh.aud: add assertBuilds test case

98fe23e

debug.superh.aud: split arguments into build/run

4a614dc

debug.superh.aud: cleanup code around I/O Buffers

068426d

debug.superh.aud: use Switch/Case for command handling

73189e1

debug.superh.aud: use argparse.FileType for output filename

56b515b

debug.superh.aud: proper names for addr/size parse functions, check a…

fac408f

…lignment

debug.superh.aud: add read subcommand, match memory-24x arguments

6d8e849

debug.superh.aud: use AUDError instead of RuntimeError

78dd0fe

debug.superh.aud: add _AUDMonitorSize and _AUDMonitorCommand enums

77f8132

debug.superh.aud: add tracing to command/response stream

c58f97d

pd0wm force-pushed the superh-aud branch from c5530e5 to c58f97d Compare August 11, 2025 09:10

debug.superh.aud: use ClockDivisor

7050913

pd0wm force-pushed the superh-aud branch from ab0f5a6 to 7050913 Compare August 11, 2025 10:44

pd0wm commented Aug 11, 2025

View reviewed changes

software/glasgow/applet/debug/superh/aud/__init__.py Outdated Show resolved Hide resolved

pd0wm added 3 commits August 11, 2025 14:27

debug.superh.aud: refactor into AUDComponent and AUDBus

8af773f

debug.superh.aud: command to read N nibbles

2aafb1e

debug.superh.aud: queue up all reads before waiting for them

ce90eb5

pd0wm force-pushed the superh-aud branch from 05f1def to ce90eb5 Compare August 11, 2025 13:29

whitequark added the waiting-on-review Status: Waiting for review label Oct 12, 2025

debug.superh.aud: Add SuperH AUD-II read support #985

Are you sure you want to change the base?

debug.superh.aud: Add SuperH AUD-II read support #985

Uh oh!

Conversation

pd0wm commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pd0wm commented Aug 4, 2025

Uh oh!

whitequark commented Aug 4, 2025

Uh oh!

whitequark left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whitequark Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pd0wm Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

whitequark Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whitequark Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

whitequark Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

whitequark Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pd0wm Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pd0wm commented Aug 5, 2025

Uh oh!

whitequark commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pd0wm commented Aug 1, 2025 •

edited

Loading

whitequark left a comment •

edited

Loading

whitequark Aug 4, 2025 •

edited

Loading

whitequark Aug 5, 2025 •

edited

Loading