Summary
This proposal introduces a single flag bit ExtendedLength in the MessageType field of the Harp 8-bit binary protocol. When the flag is set, the Length field is interpreted as a 32-bit little-endian unsigned integer instead of U8, and the trailing Checksum field is interpreted as a CRC-32 (IEEE 802.3 polynomial) instead of a U8 byte sum. All other fields and conventions of the binary protocol remain unchanged.
This is the binary protocol portion of the work that supersedes #186. Device-operation conventions for registers using extended-length framing, and device.yml schema changes for variable-length registers, are addressed in separate proposals linked below.
Motivation
Several Harp use cases require variable-length payloads exceeding the 254-byte ceiling of the current U8 Length field:
The previous ExtendedLength v1 specification, a 16-bit length field, was unimplemented in any device or client and was removed in #209. The new design preserves the existing wire format unchanged for messages that fit within U8 Length, and adds a clean opt-in extension for messages that do not.
The proposal also addresses the integrity of large messages. A U8 sum checksum across a multi-megabyte payload has an undetected-error probability of approximately 1 in 256 for random corruption, and is insensitive to byte permutations. CRC-32 (IEEE 802.3 polynomial) detects all single-bit errors and all burst errors of length 32 bits or fewer; for arbitrary random corruption, the undetected-error probability is approximately 2^-32, or about 1 in 4 billion.
Detailed Design
ExtendedLength flag bit
The ExtendedLength flag occupies bit 4 (mask 0x10) of the MessageType field. The updated structure is:
| 7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
| 0 |
0 |
0 |
ExtendedLength |
Error |
0 |
Type |
Type |
Bits 5-6 remain reserved. Bit 7 is left untouched to avoid colliding with Flag32 in the future 32-bit binary protocol.
When ExtendedLength is clear, the message is encoded exactly as defined by the current 8-bit binary protocol: U8 Length, U8 Checksum. No other field is affected.
When ExtendedLength is set, two fields change their wire encoding as described below.
Length field
When ExtendedLength is set, the Length field is encoded as a 32-bit little-endian unsigned integer, allowing for messages up to 4 GB.
The semantics of Length is unchanged from the current specification: it specifies the number of bytes after the Length field still required to complete the Harp message, including any optional Timestamp, Payload, and the trailing Checksum.
Checksum field
When ExtendedLength is set, the trailing Checksum field is encoded as a CRC-32 (IEEE 802.3 polynomial), specified by:
- Polynomial:
0x04C11DB7
- Reflected input and output: yes
- Initial value:
0xFFFFFFFF
- Final XOR value:
0xFFFFFFFF
- Check value (CRC of the ASCII string
123456789): 0xCBF43926
The Checksum is computed over all preceding bytes of the message. The receiver MUST calculate the CRC-32 from all received bytes and compare it against the value in this field. If the two values do not match, the Harp message SHOULD be discarded.
This is the same CRC-32 available in .NET as System.IO.Hashing.Crc32 in the System.IO.Hashing NuGet package and exposed as DMA sniffer mode 0x0 on RP2040 and RP2350 in the Raspberry Pi Pico SDK. The full parameter specification is catalogued under the name CRC-32/ISO-HDLC at the reveng CRC catalogue; the Wikipedia polynomial representations table is a useful entry point for further context.
Parsing requirements
A parser implementing this extension MUST:
- Recognize the
ExtendedLength flag bit in the MessageType byte before parsing any subsequent field.
- When the flag is set, decode
Length as 32-bit little-endian and Checksum as CRC-32 with the parameters specified above.
- When the flag is clear, decode
Length as U8 and Checksum as U8 byte sum, exactly as in the current specification.
Behavior of devices that do not support extended-length messages, and conventions on which message types may set the flag, are specified in the related proposal at #221.
Drawbacks
The proposal introduces two distinct framings, regular and extended, into the binary protocol. Parsers must dispatch between them based on the ExtendedLength flag bit. The branching cost is single-cycle on every supported microcontroller. The additional code is a small flag check plus alternative Length and Checksum parsing paths.
CRC-32 computation has higher cost than a U8 byte sum. The cost is paid only by devices that opt into supporting extended-length messages; devices that never receive or send extended-length messages do not need to compute CRC-32 at all.
harp-tech/core.atxmega: software CRC-32 on the ATxmega family, 8-bit AVR at 32 MHz, with a 1 KB lookup table costs roughly 5-10 cycles per byte, or about 5 seconds of CPU time per 16 MB message. The ATxmega family includes a hardware CRC peripheral, but we were unable to confirm whether the AU variants used by this core (32A4U / 64A4U / 128A1U / 128A4U) support the IEEE 802.3 polynomial directly.
harp-tech/core.pico: the RP2040 and RP2350 both expose hardware-accelerated CRC-32 via the DMA sniffer at no additional CPU cost. A software fallback is also available via the Pico SDK at roughly 2 cycles per byte, putting a 16 MB message well under a second on either chip.
Alternatives
Multiple new MessageType values
Considered: extending Type to 3 bits and allocating new values for blob-flavored ops, e.g. BLOB_READ, BLOB_WRITE, BLOB_EVENT. Rejected in favor of a single flag bit because the flag preserves clean parser dispatch and the type-vs-flag distinction dissolves under the bit allocation: bit 4 acts as a mask over the existing Type, so older parsers masking Type & 0x03 extract the underlying op correctly while still being able to detect the flag.
24-bit Length via reuse of removed fields
Original framing in #186 reinterpreted Length, 1 byte, plus the now-removed ExtendedLength, 2 bytes, as a single U24 value. Rejected for a clean contiguous U32 field. The cost difference is one byte per extended message, with aesthetic alignment toward a future 32-bit protocol.
U32 byte sum instead of CRC-32
Considered: a running U32 sum of bytes truncated to 32 bits. Rejected because U32 sum is qualitatively weak against byte permutations and burst errors that compensate, both common on serial transports. CRC-32 detects all single-bit errors, all burst errors of length 32 bits or fewer, and provides an undetected-error probability of approximately 2^-32 for arbitrary random corruption.
Mandatory cryptographic digest
Considered: requiring devices to compute a cryptographic digest, e.g. SHA-1 or SHA-256, of received messages for integrity. Rejected because it mandates crypto on every device. ATxmega cost is roughly 50 seconds for 16 MB software SHA-256, which fails the constraint that the protocol must remain implementable on small MCUs without hardware acceleration.
Per-register CRC-32 trailer as an OPTIONAL extension
Considered: keeping a U32 sum at the protocol level and offering a per-register CRC-32 opt-in for content fingerprinting. Rejected as redundant once CRC-32 is mandated at the protocol level for extended-length frames.
Unresolved Questions
None at the binary protocol level. All remaining questions are device-operation conventions or schema concerns and are tracked in the related issues below.
Related Issues
Design Meetings
To be populated as this proposal progresses through SRM.
Summary
This proposal introduces a single flag bit
ExtendedLengthin theMessageTypefield of the Harp 8-bit binary protocol. When the flag is set, theLengthfield is interpreted as a 32-bit little-endian unsigned integer instead of U8, and the trailingChecksumfield is interpreted as a CRC-32 (IEEE 802.3 polynomial) instead of a U8 byte sum. All other fields and conventions of the binary protocol remain unchanged.This is the binary protocol portion of the work that supersedes #186. Device-operation conventions for registers using extended-length framing, and
device.ymlschema changes for variable-length registers, are addressed in separate proposals linked below.Motivation
Several Harp use cases require variable-length payloads exceeding the 254-byte ceiling of the current U8
Lengthfield:The previous
ExtendedLengthv1 specification, a 16-bit length field, was unimplemented in any device or client and was removed in #209. The new design preserves the existing wire format unchanged for messages that fit within U8Length, and adds a clean opt-in extension for messages that do not.The proposal also addresses the integrity of large messages. A U8 sum checksum across a multi-megabyte payload has an undetected-error probability of approximately 1 in 256 for random corruption, and is insensitive to byte permutations. CRC-32 (IEEE 802.3 polynomial) detects all single-bit errors and all burst errors of length 32 bits or fewer; for arbitrary random corruption, the undetected-error probability is approximately 2^-32, or about 1 in 4 billion.
Detailed Design
ExtendedLengthflag bitThe
ExtendedLengthflag occupies bit 4 (mask0x10) of theMessageTypefield. The updated structure is:ExtendedLengthErrorTypeTypeBits 5-6 remain reserved. Bit 7 is left untouched to avoid colliding with
Flag32in the future 32-bit binary protocol.When
ExtendedLengthis clear, the message is encoded exactly as defined by the current 8-bit binary protocol: U8Length, U8Checksum. No other field is affected.When
ExtendedLengthis set, two fields change their wire encoding as described below.LengthfieldWhen
ExtendedLengthis set, theLengthfield is encoded as a 32-bit little-endian unsigned integer, allowing for messages up to 4 GB.The semantics of
Lengthis unchanged from the current specification: it specifies the number of bytes after theLengthfield still required to complete the Harp message, including any optionalTimestamp,Payload, and the trailingChecksum.ChecksumfieldWhen
ExtendedLengthis set, the trailingChecksumfield is encoded as a CRC-32 (IEEE 802.3 polynomial), specified by:0x04C11DB70xFFFFFFFF0xFFFFFFFF123456789):0xCBF43926The
Checksumis computed over all preceding bytes of the message. The receiver MUST calculate the CRC-32 from all received bytes and compare it against the value in this field. If the two values do not match, the Harp message SHOULD be discarded.This is the same CRC-32 available in .NET as
System.IO.Hashing.Crc32in theSystem.IO.HashingNuGet package and exposed as DMA sniffer mode 0x0 on RP2040 and RP2350 in the Raspberry Pi Pico SDK. The full parameter specification is catalogued under the nameCRC-32/ISO-HDLCat the reveng CRC catalogue; the Wikipedia polynomial representations table is a useful entry point for further context.Parsing requirements
A parser implementing this extension MUST:
ExtendedLengthflag bit in theMessageTypebyte before parsing any subsequent field.Lengthas 32-bit little-endian andChecksumas CRC-32 with the parameters specified above.Lengthas U8 andChecksumas U8 byte sum, exactly as in the current specification.Behavior of devices that do not support extended-length messages, and conventions on which message types may set the flag, are specified in the related proposal at #221.
Drawbacks
The proposal introduces two distinct framings, regular and extended, into the binary protocol. Parsers must dispatch between them based on the
ExtendedLengthflag bit. The branching cost is single-cycle on every supported microcontroller. The additional code is a small flag check plus alternativeLengthandChecksumparsing paths.CRC-32 computation has higher cost than a U8 byte sum. The cost is paid only by devices that opt into supporting extended-length messages; devices that never receive or send extended-length messages do not need to compute CRC-32 at all.
harp-tech/core.atxmega: software CRC-32 on the ATxmega family, 8-bit AVR at 32 MHz, with a 1 KB lookup table costs roughly 5-10 cycles per byte, or about 5 seconds of CPU time per 16 MB message. The ATxmega family includes a hardware CRC peripheral, but we were unable to confirm whether the AU variants used by this core (32A4U / 64A4U / 128A1U / 128A4U) support the IEEE 802.3 polynomial directly.harp-tech/core.pico: the RP2040 and RP2350 both expose hardware-accelerated CRC-32 via the DMA sniffer at no additional CPU cost. A software fallback is also available via the Pico SDK at roughly 2 cycles per byte, putting a 16 MB message well under a second on either chip.Alternatives
Multiple new MessageType values
Considered: extending
Typeto 3 bits and allocating new values for blob-flavored ops, e.g.BLOB_READ,BLOB_WRITE,BLOB_EVENT. Rejected in favor of a single flag bit because the flag preserves clean parser dispatch and the type-vs-flag distinction dissolves under the bit allocation: bit 4 acts as a mask over the existingType, so older parsers maskingType & 0x03extract the underlying op correctly while still being able to detect the flag.24-bit
Lengthvia reuse of removed fieldsOriginal framing in #186 reinterpreted
Length, 1 byte, plus the now-removedExtendedLength, 2 bytes, as a single U24 value. Rejected for a clean contiguous U32 field. The cost difference is one byte per extended message, with aesthetic alignment toward a future 32-bit protocol.U32 byte sum instead of CRC-32
Considered: a running U32 sum of bytes truncated to 32 bits. Rejected because U32 sum is qualitatively weak against byte permutations and burst errors that compensate, both common on serial transports. CRC-32 detects all single-bit errors, all burst errors of length 32 bits or fewer, and provides an undetected-error probability of approximately 2^-32 for arbitrary random corruption.
Mandatory cryptographic digest
Considered: requiring devices to compute a cryptographic digest, e.g. SHA-1 or SHA-256, of received messages for integrity. Rejected because it mandates crypto on every device. ATxmega cost is roughly 50 seconds for 16 MB software SHA-256, which fails the constraint that the protocol must remain implementable on small MCUs without hardware acceleration.
Per-register CRC-32 trailer as an OPTIONAL extension
Considered: keeping a U32 sum at the protocol level and offering a per-register CRC-32 opt-in for content fingerprinting. Rejected as redundant once CRC-32 is mandated at the protocol level for extended-length frames.
Unresolved Questions
None at the binary protocol level. All remaining questions are device-operation conventions or schema concerns and are tracked in the related issues below.
Related Issues
Design Meetings
To be populated as this proposal progresses through SRM.