Skip to content

Add ExtendedLength flag to the binary protocol #218

@glopesdev

Description

@glopesdev
  • Proposed
  • Prototype: Not Started
  • Implementation: Not Started
  • Specification: Not Started

Summary

This proposal introduces a single flag bit ExtendedLength in the MessageType field of the Harp 8-bit binary protocol. When the flag is set, the Length field is interpreted as a 32-bit little-endian unsigned integer instead of U8, and the trailing Checksum field is interpreted as a CRC-32 (IEEE 802.3 polynomial) instead of a U8 byte sum. All other fields and conventions of the binary protocol remain unchanged.

This is the binary protocol portion of the work that supersedes #186. Device-operation conventions for registers using extended-length framing, and device.yml schema changes for variable-length registers, are addressed in separate proposals linked below.

Motivation

Several Harp use cases require variable-length payloads exceeding the 254-byte ceiling of the current U8 Length field:

The previous ExtendedLength v1 specification, a 16-bit length field, was unimplemented in any device or client and was removed in #209. The new design preserves the existing wire format unchanged for messages that fit within U8 Length, and adds a clean opt-in extension for messages that do not.

The proposal also addresses the integrity of large messages. A U8 sum checksum across a multi-megabyte payload has an undetected-error probability of approximately 1 in 256 for random corruption, and is insensitive to byte permutations. CRC-32 (IEEE 802.3 polynomial) detects all single-bit errors and all burst errors of length 32 bits or fewer; for arbitrary random corruption, the undetected-error probability is approximately 2^-32, or about 1 in 4 billion.

Detailed Design

ExtendedLength flag bit

The ExtendedLength flag occupies bit 4 (mask 0x10) of the MessageType field. The updated structure is:

7 6 5 4 3 2 1 0
0 0 0 ExtendedLength Error 0 Type Type

Bits 5-6 remain reserved. Bit 7 is left untouched to avoid colliding with Flag32 in the future 32-bit binary protocol.

When ExtendedLength is clear, the message is encoded exactly as defined by the current 8-bit binary protocol: U8 Length, U8 Checksum. No other field is affected.

When ExtendedLength is set, two fields change their wire encoding as described below.

Length field

When ExtendedLength is set, the Length field is encoded as a 32-bit little-endian unsigned integer, allowing for messages up to 4 GB.

The semantics of Length is unchanged from the current specification: it specifies the number of bytes after the Length field still required to complete the Harp message, including any optional Timestamp, Payload, and the trailing Checksum.

Checksum field

When ExtendedLength is set, the trailing Checksum field is encoded as a CRC-32 (IEEE 802.3 polynomial), specified by:

  • Polynomial: 0x04C11DB7
  • Reflected input and output: yes
  • Initial value: 0xFFFFFFFF
  • Final XOR value: 0xFFFFFFFF
  • Check value (CRC of the ASCII string 123456789): 0xCBF43926

The Checksum is computed over all preceding bytes of the message. The receiver MUST calculate the CRC-32 from all received bytes and compare it against the value in this field. If the two values do not match, the Harp message SHOULD be discarded.

This is the same CRC-32 available in .NET as System.IO.Hashing.Crc32 in the System.IO.Hashing NuGet package and exposed as DMA sniffer mode 0x0 on RP2040 and RP2350 in the Raspberry Pi Pico SDK. The full parameter specification is catalogued under the name CRC-32/ISO-HDLC at the reveng CRC catalogue; the Wikipedia polynomial representations table is a useful entry point for further context.

Parsing requirements

A parser implementing this extension MUST:

  • Recognize the ExtendedLength flag bit in the MessageType byte before parsing any subsequent field.
  • When the flag is set, decode Length as 32-bit little-endian and Checksum as CRC-32 with the parameters specified above.
  • When the flag is clear, decode Length as U8 and Checksum as U8 byte sum, exactly as in the current specification.

Behavior of devices that do not support extended-length messages, and conventions on which message types may set the flag, are specified in the related proposal at #221.

Drawbacks

The proposal introduces two distinct framings, regular and extended, into the binary protocol. Parsers must dispatch between them based on the ExtendedLength flag bit. The branching cost is single-cycle on every supported microcontroller. The additional code is a small flag check plus alternative Length and Checksum parsing paths.

CRC-32 computation has higher cost than a U8 byte sum. The cost is paid only by devices that opt into supporting extended-length messages; devices that never receive or send extended-length messages do not need to compute CRC-32 at all.

  • harp-tech/core.atxmega: software CRC-32 on the ATxmega family, 8-bit AVR at 32 MHz, with a 1 KB lookup table costs roughly 5-10 cycles per byte, or about 5 seconds of CPU time per 16 MB message. The ATxmega family includes a hardware CRC peripheral, but we were unable to confirm whether the AU variants used by this core (32A4U / 64A4U / 128A1U / 128A4U) support the IEEE 802.3 polynomial directly.
  • harp-tech/core.pico: the RP2040 and RP2350 both expose hardware-accelerated CRC-32 via the DMA sniffer at no additional CPU cost. A software fallback is also available via the Pico SDK at roughly 2 cycles per byte, putting a 16 MB message well under a second on either chip.

Alternatives

Multiple new MessageType values

Considered: extending Type to 3 bits and allocating new values for blob-flavored ops, e.g. BLOB_READ, BLOB_WRITE, BLOB_EVENT. Rejected in favor of a single flag bit because the flag preserves clean parser dispatch and the type-vs-flag distinction dissolves under the bit allocation: bit 4 acts as a mask over the existing Type, so older parsers masking Type & 0x03 extract the underlying op correctly while still being able to detect the flag.

24-bit Length via reuse of removed fields

Original framing in #186 reinterpreted Length, 1 byte, plus the now-removed ExtendedLength, 2 bytes, as a single U24 value. Rejected for a clean contiguous U32 field. The cost difference is one byte per extended message, with aesthetic alignment toward a future 32-bit protocol.

U32 byte sum instead of CRC-32

Considered: a running U32 sum of bytes truncated to 32 bits. Rejected because U32 sum is qualitatively weak against byte permutations and burst errors that compensate, both common on serial transports. CRC-32 detects all single-bit errors, all burst errors of length 32 bits or fewer, and provides an undetected-error probability of approximately 2^-32 for arbitrary random corruption.

Mandatory cryptographic digest

Considered: requiring devices to compute a cryptographic digest, e.g. SHA-1 or SHA-256, of received messages for integrity. Rejected because it mandates crypto on every device. ATxmega cost is roughly 50 seconds for 16 MB software SHA-256, which fails the constraint that the protocol must remain implementable on small MCUs without hardware acceleration.

Per-register CRC-32 trailer as an OPTIONAL extension

Considered: keeping a U32 sum at the protocol level and offering a per-register CRC-32 opt-in for content fingerprinting. Rejected as redundant once CRC-32 is mandated at the protocol level for extended-length frames.

Unresolved Questions

None at the binary protocol level. All remaining questions are device-operation conventions or schema concerns and are tracked in the related issues below.

Related Issues

Design Meetings

To be populated as this proposal progresses through SRM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalRequest for a new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions