Skip to content
Open
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/manual/src/applets/debug/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ MCU debugging
:maxdepth: 3

arm7
superh
Copy link
Member

@whitequark whitequark Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the appropriate taxon is program even though it is technically using an interface with "Debug" in its name:

The program taxon groups applets implementing interfaces to memory technology devices (volatile and non-volatile) that are directly connected to programmable logic, such as a microcontroller or a gate array.

I tried to figure out how you would make a debugger using AUD-II and it's not clear to me that you can. The built-in modes are tracing and arbitrary read/write. You have a breakpoint module (UBC) that seems programmable via monitoring mode which could support single-stepping and breakpoints. There is however no way to get an arbitrary CPU register via AUD-II (or execute an arbitrary instruction, or even just get the value of R15), so you'd need help from the firmware: an ISR that dumps the CPU context into some predefined RAM area that is used for communicating with the debugger. Even then, UBC has no way to tell which breakpoint has fired on any particular cycle, so even though it supports data breakpoints you can't use it for watchpoints: you wouldn't know whether a breakpoint or a watchpoint has fired, which breaks single-stepping. You could work around that by adding an SH-2A emulator and an architecture model and evaluating every configured breakpoint/watchpoint against the post-breakpoint architectural state, if you wanted to create an abomination and had a vast excess of free time on your hands.

I think it's safe to say nobody will ever contribute a debugger for this chip.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some chips don't have physical JTAG pins, and only rely on the AUD interface for flashing and debugging. They probably have some very cursed way of doing this, but I'm not spending €1800 to find out.

So indeed safe to say this won't be added anytime soon, at least not before H-UDI support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to know a part number for one of those chips/ I'm curious.

10 changes: 10 additions & 0 deletions docs/manual/src/applets/debug/superh.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
``debug-superh-aud``
====================

CLI reference
-------------

.. _applet.debug.superh.aud:

.. autoprogram:: glasgow.applet.debug.superh.aud:AUDApplet._get_argparser_for_sphinx("debug-superh-aud")
:prog: glasgow run debug-superh-aud
Empty file.
378 changes: 378 additions & 0 deletions software/glasgow/applet/debug/superh/aud/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,378 @@
# Ref: SH7254R Group User's Manual: Hardware §21.1 §21.4
# Document Number: R01UH0480EJ0400
# Accession: G00106
#
# This applet implements part of the SuperH AUD-II protocol. This protocol
# supports rich features such as branch and data tracing. However, it also
# supports a "RAM Monitoring Mode" wich has simple read and write command. This
# applet implements the RAM Monitoring Mode read command to be able to dump a
# target's memory.

import argparse
import logging
import sys
from typing import Literal

from amaranth import *
from amaranth.lib import io, wiring, stream, cdc, enum
from amaranth.lib.wiring import In, Out

from glasgow.abstract import AbstractAssembly, GlasgowPin, ClockDivisor
from glasgow.applet import GlasgowAppletV2, GlasgowAppletError


class AUDError(GlasgowAppletError):
pass


class _AUDMonitorSize(enum.Enum):
Byte = 0b00
Word = 0b01
LongWord = 0b10


class _AUDMonitorCommand(enum.Enum):
Read = 0b1000
Write = 0b1100


class AUDCommand(enum.Enum, shape=8):
Reset = 0x00
Run = 0x01
Sync = 0x02
Out = 0x03
Inp = 0x04


class AUDComponent(wiring.Component):
i_stream: In(stream.Signature(8))
o_stream: Out(stream.Signature(8))

divisor: In(16)

def __init__(self, ports):
self._ports = ports

super().__init__()

def elaborate(self, platform):
m = Module()

# Add IO buffers
m.submodules.audsync_buffer = audsync = io.Buffer("o", self._ports.audsync)
m.submodules.audmd_buffer = audmd = io.Buffer("o", self._ports.audmd)
m.submodules.audrst_buffer = audrst = io.Buffer("o", self._ports.audrst)
m.submodules.audck_buffer = audck = io.Buffer("o", self._ports.audck)
m.submodules.audata_buffer = audata = io.Buffer("io", self._ports.audata)

# Always in RAM monitoring mode
m.d.comb += audmd.o.eq(1)

# Create signal for reading AUDATA pins
audata_i = Signal(4)
m.submodules += cdc.FFSynchronizer(audata.i, audata_i)

# FSM related signals
timer = Signal.like(self.divisor)
data = Signal(4)

# Main State Machine
with m.FSM():
# Receive command and switch to the appropriate handler state
with m.State("RECV-COMMAND"):
with m.If(self.i_stream.valid):
m.d.comb += self.i_stream.ready.eq(1)
m.d.sync += data.eq(self.i_stream.payload >> 4)
with m.Switch(self.i_stream.payload & 0xF):
with m.Case(AUDCommand.Reset):
m.next = "RESET"
with m.Case(AUDCommand.Run):
m.next = "RUN"
with m.Case(AUDCommand.Sync):
m.next = "SYNC"
with m.Case(AUDCommand.Out):
m.next = "OUT"
with m.Case(AUDCommand.Inp):
m.next = "INP"

# Assert Reset and put pins into a known state
with m.State("RESET"):
# Put pins into known state
m.d.sync += audck.o.eq(0)
m.d.sync += audata.oe.eq(1)
m.d.sync += audata.o.eq(0b0000)
m.d.sync += audsync.o.eq(1)

# Put into reset
m.d.sync += audrst.o.eq(0)
m.next = "RECV-COMMAND"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a command queued immediately, you could easily have a pulse time violation on AUDRST, and a setup time violation on any of the other pins. The limiting factor here is the tAURSTMW of 5 tAUCKMcyc, which you wlil violate if you queue Reset followed by Run since that will be a 42 ns delay on revC and that will always be less than 5 EXTAL cycles. (You wouldn't get a setup time violation on revC with the code you have here, but if later revisions raise system clock frequency you might.)

Your software reset sequence in init does not allow this to happen, but you also make the reset and run methods public, which means that an end user of the applet could cause this condition to happen. This is undesirable. While you could make the reset and run methods private or remove them (which I recommend you do later), it is also a good practice to make sure that all timing requirements are met by design of gateware even if your software uses command sequences that enforce them, since the gateware is well positioned to do so and the software is not. Future changes to the software by people who don't realize their code must respect timing requirements may cause violations and races, and these same people will be worst positioned to debug them because they lack the context.

Since there is a minimum value of fEX I recommend adding a gateware reset sequence meeting worst-case tAURSTMW value.


# Release Reset
with m.State("RUN"):
# Release reset
m.d.sync += audrst.o.eq(1)
m.next = "RECV-COMMAND"

# Set Sync pin to provided value
with m.State("SYNC"):
m.d.sync += audsync.o.eq(data != 0b0000)
m.next = "RECV-COMMAND"

# Send 1 nibble of data on the bus, then strobe clock
with m.State("OUT"):
m.d.sync += audata.oe.eq(1) # Switch AUDATA pins to output
m.d.sync += audata.o.eq(data)

# Strobe clock
m.d.sync += audck.o.eq(0)
m.d.sync += timer.eq(0)
m.next = "OUT-CLOCK-0"

# Wait for clock period to pass, then set clock high
with m.State("OUT-CLOCK-0"):
with m.If(timer == self.divisor):
m.d.sync += audck.o.eq(1)
m.d.sync += timer.eq(0)
m.next = "OUT-CLOCK-1"
with m.Else():
m.d.sync += timer.eq(timer + 1)

# Wait for clock period to pass, then return to command reception
with m.State("OUT-CLOCK-1"):
with m.If(timer == self.divisor):
m.next = "RECV-COMMAND"
with m.Else():
m.d.sync += timer.eq(timer + 1)

# Strobe clock, then read data on the rising edge. Send to PC
with m.State("INP"):
m.d.sync += audata.oe.eq(0) # Switch AUDATA pins to input

# Strobe clock
m.d.sync += audck.o.eq(0)
m.d.sync += timer.eq(9)
m.next = "INP-CLOCK-0"

# Wait for clock period to pass, then sample data and set clock high
with m.State("INP-CLOCK-0"):
with m.If(timer == self.divisor):
m.d.sync += audck.o.eq(1)
m.d.sync += timer.eq(0)

# Sample data on rising edge
m.d.sync += data.eq(audata_i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is correct. Based on fig. 21.12, the half-cycle before the posedge here would be for turnaround, and the half-cycle after the posedge would be for setup. At the posedge (point 1) you don't have valid data yet; you have it on the next negedge (point 2).

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't understand how this diagram is supposed to demonstrate a byte read with the data in the wrong place. Classic Renesas...

Copy link
Author

@pd0wm pd0wm Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How I'm interpreting the diagram, is that you should change the pins on the falling edge of the clock, and the chip will sample on the rising edge. I think that's how it's currently implemented.

I think we only see one nibble of the byte that's read. It's all the way on the right where AUDATA is already grey and AUDSYNC is high. But yeah, the "data" description doesn't make any sense.


m.next = "INP-CLOCK-1"
with m.Else():
m.d.sync += timer.eq(timer + 1)

# Wait for clock period to pass, then send data to PC
with m.State("INP-CLOCK-1"):
with m.If(timer == self.divisor):
m.next = "SEND-DATA"
with m.Else():
m.d.sync += timer.eq(timer + 1)

# Send the sampled data to the output stream, return to command reception
with m.State("SEND-DATA"):
m.d.comb += self.o_stream.valid.eq(1)
m.d.comb += self.o_stream.payload.eq(data)

with m.If(self.o_stream.ready):
m.next = "RECV-COMMAND"

return m
Copy link
Member

@whitequark whitequark Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general comments on the AUDComponent:

  1. You're using all of the pins synchronously; also, the bus itself is fully synchronous except for AUDRST# (and AUDMD). I think you can use io.FFBuffer and drop FFSynchronizer (since AUDATA isn't asynchronous to your sampling clock).
  2. This protocol includes wait states ("Not Ready" output). Polling for the end of a wait state in software is one of the worst inefficiencies a tool like Glasgow could include. I think you should, at the very least, have a command that toggles AUDCK and polls AUDATA until it becomes non-zero and then returns that value to the host. The timeout would become an input to the component, which is tied to a register in the AUDInterface class. If you implement polling in gateware, you will be able to submit a sequence of reads as a single bulk operation, without spending USB roundtrips on bus synchronization.
  3. I took a careful look at the entire §21 and it appears that both the tracing mode and the monitor mode were designed to allow a byte-oriented implementation: for tracing, last nibble of CMD1 and first nibble of CMD2 form a header byte, and for monitoring, command/direction nibble and zeroes form a header byte. This means that an implementation using octet-based communication with the host is feasible, provided that wait state polling is implemented in hardware. Being able to use fast primitives like bytearray concatenation, int.{to,from}_bytes, etc without having to pack nibbles in Python is very good for performance; this applet deals with enough data that it starts to really matter. If you implement byte-packing in hardware, you will be able to retrieve the result of a sequence of reads as a single await self._pipe.read(size * count) operation, which is dramatically faster for large quantities of data.
  4. I suggest at least making a small stream-based component that takes two commands: "read nibble with given sync" and "write nibble with given sync", and encapsulates toggling the clock. The AUDComponent will then only take care of command handling, byte-packing, as well as wait state polling.
  5. Since it is so similar to reads and quite easy to implement and test, I think it would be nice if you added write support for completeness. Also, writes allow you to set up a predictable state in RAM for hardware-in-the-loop testing (so you don't have to modify your assertions if you have a different firmware image).

Of these suggestions, (1) and (2) are I'd say required for a high-quality implementation, while (3), (4), and (5) would be nice but I would not reject the applet if you choose not to implement them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Done
  2. Split everything into two components AUDComponent and AUDBus. The AUDComponent now implements the wait for ready logic. I still need to implement error handing. Is this what you had in mind?
  3. I've already implemented a bulk_read function that schedules multiple transfers, and then waits for all of them at once. Removing the bit twiddling stuff (bytes([y << 4 | x for x, y in zip(data[0::2], data[1::2])])) does not seem to make a change in speed. However, since I'm only sending nibbles I'm wasting half the USB bandwidth. So I expect another 2x speedup if I switch to octet based communication.
  4. See 2
  5. Will do!



class AUDInterface:
def __init__(self, logger: logging.Logger, assembly: AbstractAssembly, *,
audata: GlasgowPin, audsync: GlasgowPin, audck: GlasgowPin, audmd: GlasgowPin, audrst: GlasgowPin):
self._logger = logger
self._level = logging.TRACE

ports = assembly.add_port_group(audata=audata, audsync=audsync, audck=audck, audmd=audmd, audrst=audrst)
component = assembly.add_submodule(AUDComponent(
ports,
))
self._pipe = assembly.add_inout_pipe(component.o_stream, component.i_stream)
self._clock = assembly.add_clock_divisor(component.divisor,
ref_period=assembly.sys_clk_period, name="audclk")

@property
def clock(self) -> ClockDivisor:
return self._clock

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a logging function. This logging function should trace the command/response stream of the applet such that you have, ideally, all of the context needed to diagnose an issue. (Timings will not be fully visible, of course.) There are many examples of such functions in the codebase.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tracing to _cmd and inp (the functions that call pipe.send()/pipe.recv(). Is that what you meant?

async def _cmd(self, cmd: AUDCommand, val=0):
assert val <= 0xF, "Value must be less than 0xF"
self._logger.log(self._level, "AUD: write CMD: %s DATA: <%02x>", cmd.name, val)
await self._pipe.send([cmd.value | (val << 4)])

async def reset(self):
await self._cmd(AUDCommand.Reset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these functions should not be public (and, really, exist at all, since _cmd is just fine for internal use). The way API of "*Interface classes" should be designed is to allow someone familiar with the domain (in this context: someone who've skimmed the datasheet) but not intimately familiar with applet code to achieve a useful task.

Since the design of the AUDComponent allows violating timings it obviously fails this test, but more generally, esoteric knowledge (such as the exact bus sequences) that can be encapsulated into a class with a well-defined API, should be.

In this case, the API surface of the class would probably consist of:

async def reset(self): # does what's currently inside `init`
async def read(self, addr: int, *, size: Literal[1,2,4] = 4, timeout: int = 100) -> int:

or, given you have performance issues:

async def reset(self): # does what's currently inside `init`
async def read(self, addr: int, count: int = 1, *, size: Literal[1,2,4] = 4, timeout: int = 100) -> array[int]:

(The return type here could reasonably be either bytes/bytearray, or list[int], or array[int]. Using list[int] is clear but can be slow to process. Using bytes/bytearray can get pretty confusing because the CPU is big endian but the protocol is little endian; Renesas... Using array[int] is ideal for large amounts of data, but it uses host endianness when converting to/from byte sequences, so it's a bit annoying to use.)

Since there are fewer of these methods and they're higher-level, it is now easier to document them. All new applets should have documentation unless they have no public API (this only really happens when they're bridging to some other application and would never be used directly). It doesn't need to be exhaustive, but it should mention the invariants (e.g. "you need to call reset first") and errors (e.g. "on timeout raises <...>") carefully.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a bulk_read function, but I still need to do some more refactoring of the Interface class like you suggested in this comment.


async def run(self):
await self._cmd(AUDCommand.Run)

async def out(self, val):
await self._cmd(AUDCommand.Out, val)

async def inp(self):
await self._cmd(AUDCommand.Inp)

# This is the only place we need to flush, as we're going to wait for a response
await self._pipe.flush()

data = await self._pipe.recv(1)
self._logger.log(self._level, "AUD: read <%02x>", data[0])
return data[0]

async def sync(self, val):
await self._cmd(AUDCommand.Sync, val)

async def init(self):
await self.reset()

# Strobe clock a couple times
for i in range(10):
await self.out(0)

await self.run()

# Strobe clock a couple times
for i in range(10):
await self.out(0)

async def read(self, addr: int, sz: Literal[1,2,4 ] = 4, timeout: int = 100):
assert addr in range(0, 1<<32), "Address must be a 32-bit value"
assert sz in (1, 2, 4), "Size must be one of 1, 2, or 4 bytes"

await self.sync(0)
await self.out(0)

# Send the Read command
cmd = _AUDMonitorCommand.Read.value | \
{
1: _AUDMonitorSize.Byte,
2: _AUDMonitorSize.Word,
4: _AUDMonitorSize.LongWord,
}[sz].value
await self.out(cmd)

# Clock out Addr
for i in range(8):
await self.out((addr >> (i * 4)) & 0b1111)

# Wait for data ready
for _ in range(timeout):
data = await self.inp()
if data == 1:
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be if data != 0 and then you check if it's 1 (ok) or not 1 (error); right now all bus errors will just result in a timeout exception, which is misleading.

else:
raise AUDError(f"timeout waiting for data ready; got {data:#06b}")

# Set AUDSYNC high to indicate we're ready to read
await self.sync(1)
await self.inp()

# Clock in the data
out = 0
for i in range(2*sz):
out |= (await self.inp() << (i * 4))
return out.to_bytes(sz, byteorder='big')


class AUDApplet(GlasgowAppletV2):
logger = logging.getLogger(__name__)
help = "SuperH AUD-II Applet"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help = "SuperH AUD-II Applet"
help = "program SuperH microcontrollers via AUD-II"

description = """
Read memory using the SuperH AUD-II protocol.
"""

@classmethod
def add_build_arguments(cls, parser, access):
access.add_voltage_argument(parser)

access.add_pins_argument(parser, "audata", width=4, required=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In applets like this one, I suggest using required=True, default=True to avoid having to specify the pinout each time. This way, if you connect one of the harnesses to your DUT, you don't need to guess which pinout it is: it is the default one.

The order of the arguments that applets use is:

  • reset(s)
  • strap(s)
  • clock(s)
  • controls
  • data

This ordering is done per functional block, e.g. for RGMII you would have PHY reset, then RX control/data, then TX control/data (inputs before outputs).

For this applet, I suggest:

  • AUDRST#
  • AUDMD
  • AUDCK
  • AUDSYNC
  • AUDATA

access.add_pins_argument(parser, "audsync", required=True)
access.add_pins_argument(parser, "audck", required=True)
access.add_pins_argument(parser, "audrst", required=True)
access.add_pins_argument(parser, "audmd", required=True)

@classmethod
def add_setup_arguments(cls, parser):
parser.add_argument(
"-f", "--frequency", metavar="FREQ", type=int, default=100,
help="set clock period to FREQ kHz (default: %(default)s)")

@classmethod
def add_run_arguments(cls, parser):
def address(x):
addr = int(x, 0)

# Check alignment
if addr % 4:
raise argparse.ArgumentTypeError(f"address {x} is not 4 byte aligned")

return addr

def length(x):
sz = int(x, 0)
# Check alignment
if sz % 4:
raise argparse.ArgumentTypeError(f"length {x} is not a multiple of 4")

return sz

p_operation = parser.add_subparsers(dest="operation", metavar="OPERATION", required=True)

p_read = p_operation.add_parser(
"read", help="read memory")
p_read.add_argument(
"address", metavar="ADDRESS", type=address,
help="read memory starting at address ADDRESS, needs to be 4 byte aligned")
p_read.add_argument(
"length", metavar="LENGTH", type=length,
help="read LENGTH bytes from memory, needs to be a multiple of 4")
p_read.add_argument(
"-f", "--file", metavar="FILENAME", type=argparse.FileType("wb"),
required=True,
help="write memory contents to FILENAME")

async def setup(self, args):
await self.aud_iface.clock.set_frequency(args.frequency * 1000)

def build(self, args):
with self.assembly.add_applet(self):
self.assembly.use_voltage(args.voltage)
self.aud_iface = AUDInterface(
self.logger,
self.assembly,
audata=args.audata,
audsync=args.audsync,
audck=args.audck,
audmd=args.audmd,
audrst=args.audrst)

@staticmethod
def _show_progress(done, total, status):
if sys.stdout.isatty():
sys.stdout.write("\r\033[0K")
if done < total:
sys.stdout.write(f"{done}/{total} bytes done ({done / total * 100:.2f}%)")
if status:
sys.stdout.write(f"; {status}")
sys.stdout.flush()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really have some sort of API for this. (No action is required from you here, this is a note for myself and an acknowledgement of a flaw.)


async def run(self, args):
self.logger.trace("Initializing AUD-II interface")
await self.aud_iface.init()

self.logger.trace("Reading data")
bs = 4

for i in range(args.address, args.address + args.length, bs):
data = await self.aud_iface.read(i, sz=bs)
args.file.write(data)
self._show_progress(i - args.address + bs, args.length, f"Read {data.hex()}")

self.logger.trace("Done")
Loading