Skip to content

Add a Domain Specific Language to decode packets (v2)#3516

Open
a-rahimi wants to merge 40 commits intomerbanan:masterfrom
a-rahimi:compiler
Open

Add a Domain Specific Language to decode packets (v2)#3516
a-rahimi wants to merge 40 commits intomerbanan:masterfrom
a-rahimi:compiler

Conversation

@a-rahimi
Copy link
Copy Markdown
Contributor

@a-rahimi a-rahimi commented Apr 13, 2026

A compiler for protocol specifications

This is my second attempt at a DSL after I closed PR3500. The compiler itself is ~2k lines of Python. The rest of the code is protocol files written in the DSL, and their corresponding generated output (which we wouldn't normally check in).

The proto_compiler lets you describe rtl_433 decoders as Python classes, similar to how the flex decoder lets you describe a protocol as a declarative key-value spec in a .conf file. But unlike the flex decoder, which interprets that spec at runtime, the proto_compiler compiles the Python class into C source code. The compiler is itself a Python program that looks at the instances of the classes you've authored, and generates a complete r_device decoder.

The compiler is not a generic Python to C compiler. Instead, it converts a data structure, which happens to be an instance of a Pythong class, to C code. That means you get to use all the nicities of Python to populate your data structure, like inheritance, loop comprehension, etc.

For examples, see:

My goal with this review is to get feedback on the DSL and on the general direction. By default, I'd start porting more protocols and adjusting the DSL as I go. I want feedback about whether this valuable to the project.

Protocol structure

In this language, a decoder is a subclass of Decoder (or Dispatcher for a multi-framing device). The class has five main pieces: a modulation_config() method that fills the r_device struct, a prepare() method that describes the bitbuffer pipeline, bit-field declarations as class annotations, methods that derive values from those fields, and an optional to_json that shapes the data_make call.

ModulationConfig

Radio parameters that belong in the r_device struct (modulation kind, pulse timing, limits) are returned from a modulation_config(self) method as a ModulationConfig named tuple.

from proto_compiler.dsl import Bits, Decoder, Literal, Modulation, ModulationConfig

class thermopro_tp211b(Decoder):
    def modulation_config(self):
        return ModulationConfig(
            device_name="ThermoPro TP211B Thermometer",
            modulation=Modulation.FSK_PULSE_PCM,
            short_width=105,
            long_width=105,
            reset_limit=1500,
        )
    …

prepare() and the bitbuffer pipeline

You define all the bitbuffer preprocessing (preamble search, Manchester decode, inversion, and so on) in a method called prepare. The method transforms the bitbuffer by applying the following operations in the specified order and updating a row and offset variable that describe from which row and which bit offset of the resulting bitbuffer to start parsing:

  • invert() emits bitbuffer_invert(bitbuffer);, flips every bit in every row.
  • reflect() applies per-row reflect_bytes, bit-reversing each byte.
  • find_repeated_row(min_repeats, min_bits, max_bits=None) locates a repeated row and pins the tip's row to it. With max_bits=None the length must match min_bits exactly; otherwise the length must be <= max_bits.
  • search_preamble(pattern, bit_length=None, scan_all_rows=False) searches for a bit pattern and advances the tip's offset to just after the match. bit_length defaults to pattern.bit_length(). Use scan_all_rows=True when the match may appear on any row. The compiler emits an outer row-scan loop and records which row matched.
  • skip_bits(n) adjusts the tip's offset by n bits. Positive values skip forward; negative values skip backward. The C runtime checks that the offset does not underflow.
  • manchester_decode(max_bits=0) decodes Manchester-encoded bits.

Here's an example:

class thermopro_tp211b(Decoder):
    …

    def prepare(self, buf):
        return buf.search_preamble(0x552DD4, bit_length=24).skip_bits(24)

    …

Fields

Bit fields are parsed automatically. They're declared fields in your Protocol class using type annotations like Bits, Literal, Repeat, and Rows type constructors:

class thermopro_tp211b(Decoder):
    …

    id: Bits[24]
    flags: Bits[4]
    temp_raw: Bits[12]
    fixed_aa: Literal[0xAA, 8]
    checksum: Bits[16]

The compiler converts these into bit-extraction code at the corresponding offsets. Bits[n] extracts an n-bit unsigned integer. Literal[value, n] extracts n bits and asserts that they equal value, returning DECODE_FAIL_SANITY on mismatch. Fields whose names start with _ are skipped.

Methods

A method is a Python function defined in the decoder class. Methods are simple Python arithmetic expressions that are directly compiled to C. They act like computed counterparts of fields. They can be used like fields in most places, but they're expressions that derive their value from bit fields that were parsed, or from other methods.

For example:

class thermopro_tp211b(Decoder):
    …

    def temperature_c(self, temp_raw: int) -> float:
        return (temp_raw - 500) * 0.1

compiles to a function definition

static inline float thermopro_tp211b_temperature_c(int temp_raw) {
    return ((temp_raw - 0x1f4) * 0.1);
}

and, wherever self.temperature_c appears in to_json, a call like this is emitted:

"temperature_C", "", DATA_DOUBLE, (double)thermopro_tp211b_temperature_c(temp_raw)

TODO: introduce here the set of operators that an expression can use. you can say "in addition to the usual arithmetic operators (-,+, &, etc), you can use predefined operators like Reverse8, ..."

External methods

If the method is too complicated to express as a simple Python arithmetic expression, you can specify it as a C function in a companion header file. In that case, you still have to define the signature of the method in Python, but you can give it a bare body using the Python ellipsis literal .... We call such methods "external".

For example,

class thermopro_tp211b(Decoder):
    …
    def validate_checksum(self, id: int, flags: int, temp_raw: int, checksum: int,
                          fail_value=DecodeFail.MIC) -> bool: ...

The compiler writes #include "thermopro_tp211b.h" into the generated C. The header is expected to define static inline bool thermopro_tp211b_validate_checksum(int id, int flags, int temp_raw, int checksum) and is checked in under src/devices/<stem>.h.

validate_* hooks and DecodeFail

A method whose name starts with validate_ is a validation hook. Hooks share the method machinery, with two twists:

  1. The final parameter of a validation hook must be fail_value=DecodeFail.* with a default. The compiler reads that default to decide which DECODE_* token to return on failure. fail_value is not passed in the generated C call.
  2. The call is emitted automatically — the user never invokes a hook from to_json or anywhere else. As soon as every field declared in the hook's signature has been parsed, the compiler emits a guard that exits with the fail_value token if the hook returns false.

For example, given

    sensor_id: Bits[8]
    checksum: Bits[8]

    def validate_checksum(self, sensor_id, checksum,
                          fail_value=DecodeFail.MIC) -> bool:
        return ((sensor_id & 0xFF) + checksum) & 0xFF == 0

the compiler emits the hook definition near the top of the file and, inside the decode function, injects the guard right after both dependencies are in scope:

static inline bool prefix_validate_checksum(int sensor_id, int checksum) {
    return (((sensor_id & 0xff) + checksum) & 0xff) == 0;
}
/* … */
int sensor_id = bitrow_get_bits(b, bit_pos, 8);
bit_pos += 8;
int checksum = bitrow_get_bits(b, bit_pos, 8);
bit_pos += 8;
if (!prefix_validate_checksum(sensor_id, checksum))
    return DECODE_FAIL_MIC;

As with regular methods, replacing the body with ... turns the hook external: the compiler emits only a declaration and the user writes the implementation in the companion header. DecodeFail is re-exported from proto_compiler for use in hook signatures.

Repeat (variable-length sub-layouts in one row)

Some payloads contain a variable number of same-shape records back-to-back in the same row. The Repeat[count_expr, SubProtocol] field spec handles this. The resulting code parses SubProtocol's layout count_expr times in a row and stores the result in parallel arrays.

For example,

class sensor_reading(Repeatable):
    sensor_type: Bits[4]
    reading: Bits[12]

class lacrosse_tx31u(Decoder):
    …
    measurements: Bits[3]
    readings: Repeat[FieldRef("measurements"), sensor_reading]

Each sub-field becomes a C array sized by count_expr, populated inside a for loop that advances bit_pos by the sub-layout width per iteration:

int readings_sensor_type[measurements];
int readings_reading[measurements];
for (int _i = 0; _i < measurements; _i++) {
    readings_sensor_type[_i] = bitrow_get_bits(b, bit_pos, 4);
    bit_pos += 4;
    readings_reading[_i] = bitrow_get_bits(b, bit_pos, 12);
    bit_pos += 12;
}

count_expr may be an integer literal or the name of a field wrapped in FieldRef(…). The SubProtocol class must inherit from Repeatable and may declare only Bits[] and Literal[] fields. Deeply nested Repeatable's aren't supposed.

Rows (fields that live on specific bitbuffer rows)

Some protocols spread information across multiple rows of the raw bitbuffer rather than packing everything into one. The Rows[row_spec, SubProtocol] handles these situations. It parses a sub-layout once for each row specified in row_spec:

class row_layout(Repeatable):
    b0: Bits[8]
    b1: Bits[8]
    b2: Bits[8]
    b3: Bits[8]
    b4: Bits[4]

class alectov1(Decoder):
    …
    cells: Rows[(1, 2, 3, 4, 5, 6), row_layout, ((1, 36),)]

row_spec can be either either:

  • A tuple of row indices (row1, row2, …). The emitted code parses exactly those rows.
  • FirstValid(bits). The emitted code scans every row for the first one whose length equals bits and for which no validate_* hook fails.

In methods, you can reference the resulting fields like with cells[1].b0.

Variants

Some protocols change layout depending on a header field. For example, a weather sensor can have a "message_id" field that determines whether the rest of the packet has humidity readings or temperature readings. You can express these variable packets in Python using a Variant subclasses nested inside the parent Decoder. At decode time the compiler extracts the parent's fields first, then evaluates each variant's when() predicate to decide which variant to actually parse.

Here's an example:

class current_cost_base(Decoder):
    …
    msg_type: Bits[4]
    device_id: Bits[12]

    class meter(Variant):
        def when(self, msg_type) -> bool:
            return msg_type == 0

        ch0_valid: Bits[1]
        ch0_power: Bits[15]
        ch1_valid: Bits[1]
        ch1_power: Bits[15]

        def power0_W(self, ch0_valid, ch0_power) -> int:
            return ch0_valid * ch0_power

    class counter(Variant):
        def when(self, msg_type) -> bool:
            return msg_type == 4

        sensor_type: Bits[8]
        impulse: Bits[32]

compiles to something like

int msg_type = bitrow_get_bits(b, bit_pos, 4);  bit_pos += 4;
int device_id = bitrow_get_bits(b, bit_pos, 12); bit_pos += 12;
if ((msg_type == 0)) {
    int ch0_valid = bitrow_get_bits(b, bit_pos, 1);  bit_pos += 1;
    /* … meter body, data_make, return 1; */
} else if ((msg_type == 4)) {
    int sensor_type = bitrow_get_bits(b, bit_pos, 8); bit_pos += 8;
    /* … counter body, data_make, return 1; */
}
return DECODE_FAIL_SANITY;

Each variant must define when(self, …) -> bool. If this method returns true, the Variant is selected. If no variant's when matches, the decode function returns DECODE_FAIL_SANITY.

Variants may not contain nested Rows[] or Repeat[].

A variant inherits parent fields and parent methods for reference in its when, its methods, and its to_json.

to_json

By default the compiler emits a data_make call that outputs every non-private field and every method (except special ones like validate_* methods). You can override to_json to change this behavior. You can explicitly specify the key name, label, data type, and formatting to be emitted.

For example, your decoder might supply

    def to_json(self) -> list[JsonRecord]:
        return [
            JsonRecord("model",        "",          "Ford",           "DATA_STRING"),
            JsonRecord("id",           "",          self.sensor_id,   "DATA_STRING", fmt="%08x"),
            JsonRecord("pressure_PSI", "Pressure",  self.pressure_psi, "DATA_DOUBLE"),
            JsonRecord("mic",          "Integrity", "CHECKSUM",       "DATA_STRING"),
        ]

Each JsonRecord maps to one entry in the generated data_make(…) call. The fmt parameter emits a DATA_FORMAT directive.

Code reuse through inheritance

Because protocols are plain Python classes, you can use class inheritance to reuse code between decoders.

The CurrentCost TX decoder in current_cost.py uses two different framings ("EnviR" and "Classic") that share the same payload structure and decoding logic but differ in their preamble, bit offset after the match, and the JSONs they emit. A base class captures everything they have in common:

from proto_compiler.dsl import Bits, Decoder, JsonRecord, Modulation, ModulationConfig, Variant

class current_cost_base(Decoder):
    def modulation_config(self):
        return ModulationConfig(
            device_name="CurrentCost Current Sensor",
            modulation=Modulation.FSK_PULSE_PCM,
            short_width=250,
            long_width=250,
            reset_limit=8000,
        )

    msg_type: Bits[4]
    device_id: Bits[12]

    ...

The two concrete decoders inherit from this base and supply their own prepare() pipelines and JSON overrides

class current_cost_envir(current_cost_base):
    def prepare(self, buf):
        return (
            buf
            .invert()
            .search_preamble(0x55555555A457, bit_length=48)
            .skip_bits(47)
            .manchester_decode()
        )

    class meter(current_cost_base.meter):
        def to_json(self) -> list[JsonRecord]:
            return [ # …
            ]

    class counter(current_cost_base.counter):
        def to_json(self) -> list[JsonRecord]:
            return [ # …
            ]


class current_cost_classic(current_cost_base):
    def prepare(self, buf):
        return (
            buf
            .invert()
            .search_preamble(0xCCCCCCCE915D, bit_length=45)
            .skip_bits(45)
            .manchester_decode()
        )

    class meter(current_cost_base.meter):
        def to_json(self) -> list[JsonRecord]:
            return [ # …
            ]

    class counter(current_cost_base.counter):
        def to_json(self) -> list[JsonRecord]:
            return [ # …
            ]

Then a Dispatcher class chains them at runtime:

class current_cost(Dispatcher):
    def modulation_config(self):
        return ModulationConfig(
            device_name="CurrentCost Current Sensor",
            modulation=Modulation.FSK_PULSE_PCM,
            short_width=250,
            long_width=250,
            reset_limit=8000,
        )

    def dispatch(self):
        return (current_cost_envir(), current_cost_classic())

Building

To generate the C code, the CMake files run

python proto_compiler/build.py proto_compiler/protocols/thermopro_tp211b.py src/devices/thermopro_tp211b.c

This loads the protocol module, reads its module-level decoders tuple, and emits one concatenated C source file. Each protocol module ends with decoders = (foo(),) where foo is the Decoder or Dispatcher to compile.

Other ideas considered

Other tooling for automatically parsing bitstreams was evaluated before settling on this DSL:

Hammer is a C library for bit-level binary parsing. The syntax is verbose, it duplicates a lot of the bit parsing functionality already in rtl_433, and is missing quite a bit of it.

Kaitai Struct is a YAML-based binary format DSL that compiles to C++. It lacks strong bit-level support. Notably, there is no obvious way to introduce preamble scans.

Spicy is a C++-like DSL for network protocol parsing that compiles to C++. It is good for parsing byte-level protocols, but lacks tidy semantics for parsing bitstreams.

DaeDaLus is a grammar-based DSL that compiles to C++. Its syntax was appealing and helped inform the syntax of this DSL. The repo itself is unfriendly and uninformative, which made it seem like a poor fit for the rtl_433 community.

bitmatch is a small C regexp library for matching and parsing over bitstreams. The regexps are not very readable and it cannot associate names to fields.

Construct is a binary parser library in Python with great semantics, but it is interpreted. Its recently added compiler targets Python, not C, so it is too slow for our purposes.

This DSL was derived instead to have exactly the semantics rtl_433 needs, and to compile directly to C.

arahimi and others added 30 commits April 12, 2026 18:10
Introduces a protocol compiler that generates rtl_433 C decoder
source from Python protocol specifications. Includes DSL, compiler,
build script, CMake integration, and four converted protocols
(current_cost, lacrosse_tx31u, thermopro_tp211b, tpms_ford).

Made-with: Cursor
In newer pythons, deferred annotations are no longer guaranteed to live
in cls.__dict__["__annotations__"] at class creation time. Work around this.
Replace the hand-written honeywell_wdb.c with two generated decoders
(OOK and FSK) that share a common Python base protocol.

Add repeat_min_count/repeat_row_bits to ProtocolConfig and emit
bitbuffer_find_repeated_row.
we never use them
The config now only includes the RF parameters. Bitbuffer processing
has moved to a method called prepare(). this method uses pipeline notation
to define how the bitbuffer must be processed before it's parsed.
like Repeated, which repeats a set of fields in a row, you can now define
a Row, which repeats fields over rows.
New test classes in test_codegen.py:
- TestPipelineStageOrder: verifies prepare() stages emit in declared
  order and that the bitbuffer alias is rebound after manchester_decode
- TestValidateOrder: verifies validate_* hooks interleave with field
  decoding at the correct checkpoints, including lex ordering and
  BitbufferPipeline-only validates

Also adds PLAN_cls_removal.md documenting the plan to remove cls/type()
introspection from codegen.py in favor of instance properties computed
from Python's MRO.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Replace monolithic compiler.py with:
- dsl.py: DSL class definitions and validation layer
- codegen.py: C code generator, trusts dsl.py output

Protocol definitions ported: abmt, acurite_01185m, akhan_100F14,
alectov1, ambient_weather, current_cost, honeywell_wdb (OOK + FSK),
lacrosse_tx31u, thermopro_tp211b, tpms_ford.

Adds SEMANTICS.md (DSL spec), CODING_RULES.md, COMPILER_BUGS.md,
CLAUDE.md, and agent definitions.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…cept

New rules to catch simplification issues earlier:
- No dead code (unused imports, aliases, methods)
- Extract rather than copy-paste
- One implementation per concept
- Don't silently accept invalid input

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Delete dead imports (_collect_fieldrefs), aliases (LiteralExpr,
  FieldRefExpr), methods (explicit_args, output_fields), and
  unreachable FirstValid branch in emit_fields
- Inline and delete trivial _get_callables wrapper
- Convert JsonRecord to @DataClass(frozen=True)
- Remove emit_expr silent scalar fallbacks (now raises TypeError)
- Extract _walk_expr generator, simplify _collect_row_field_accesses
- Extract _emit_literal_check, _emit_sub_fields, _emit_pending_validates,
  _emit_decode_tail to eliminate copy-pasted logic

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Regenerate all proto-compiled C files from simplified codegen.

Fix _get_annotations_own to use inspect.get_annotations (3.10+)
for PEP 749 compatibility — cls.__dict__['__annotations__'] is
empty on Python 3.14 with deferred annotations.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Test that _get_annotations_own returns correct BitsSpec entries and
that Protocol._fields is populated — catches PEP 749 regressions
on Python 3.14+.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Agent now runs cmake --build first to regenerate C files and
recompile the binary, then unit tests, then e2e tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- CMakeLists.txt: C99 → C23 (all proto-generated decoders use static inline).
- .gitignore: ignore stale in-source CMake artifacts at project root so an
  accidental `cmake .` there can be cleaned up without tracking cruft.
- .claude/agents/rtl433-e2e-test-runner.md: point the agent at build/ explicitly.
- .claude/agent-memory/: check in the test-runner's project-scope notes.
…ugs 15-22)

Bugs 17, 18, 20, 21, 22 — generated C correctness:

- 17: emit callables as `static inline` instead of `static constexpr`.
  C23's `constexpr` is for variable declarations, not function definitions.
  Also removes the now-redundant per-file `-std=c23` override in
  src/CMakeLists.txt.
- 18: inside a FirstValid row loop, field references at callable-arg and
  data_make call sites must be subscripted with the loop index. Threaded a
  `row_index` / `indexed_fields` pair through the emission chain.
- 20: callable args that name other callables must be expanded into nested
  calls per SEMANTICS.md §5.2 (`validate_cmd(cmd(notb(b)))`), not emitted as
  bare identifiers. `_expand_callable_args` now recursively resolves names
  that live in the callables dict.
- 21: `scan_all_rows` preamble search was writing to `offset` inside a
  for-loop body without declaring it. Added `unsigned offset = 0;` at the
  top of that branch.
- 22: `_compile_dispatcher` skipped variant-level callables for each
  delegate. Each delegate's `_variants` is now iterated so variant method
  bodies are emitted alongside parent ones.

Bug 16 — DSL tightening:

SEMANTICS.md §5 requires every callable to declare its return type. The DSL
previously silently accepted untyped callables and tagged them `kind="method"`
(while typed ones were `kind="prop"`). Replaced the permissive `else` branch
with `raise TypeError` naming the attribute; collapsed the now-vestigial
`"prop"`/"method" split to just `"method"` everywhere.

Bug 19 — remove JsonRecord.when / DATA_COND:

No shipped protocol uses `when=` anymore (see the acurite restructure in the
next commit). Removed `JsonRecord.when` from dsl.py and the `DATA_COND`
branch from `emit_data_make`. Dropped the `when=`/`DATA_COND` docs from
SEMANTICS.md §7.4 and README.md, replacing with a pointer to variant
dispatch for conditional fields.

Bug 15 — variant default to_json:

When a Variant has no explicit `to_json`, the synthesized default used the
variant's class name as model and missed parent fields and methods entirely.
Fixed in three steps:

1. `Protocol.__init_subclass__` now sets `vcls._parent_decoder = cls` so the
   synthesizer can reach the parent's modulation_config / fields / callables.
2. `_default_json_records` is variant-aware: pulls model from parent's
   `device_name`, emits parent fields/methods before variant's own,
   de-duplicates by key.
3. At call sites inside a variant branch, parent-level callables must emit
   with no variant prefix (definitions are parent-prefixed). Threaded a
   `variant_owned_names` set through `emit_data_make`, `_value_to_c`,
   `_expand_callable_args`, `emit_validate_call`, `_emit_pending_validates`.

SEMANTICS.md §6.4 and §7.2 updated to spell out that the default `to_json`
methods list excludes `validate_*` hooks, and that a variant's model comes
from the parent's `device_name`.

test_codegen.py: 30+ new regression tests across TestCallableChainExpansion,
TestFirstValidWithVariants, TestVariantInheritance, TestDispatcherVariantMethodsEmitted,
TestScanAllRowsDeclaresOffset, TestCallableRequiresReturnType,
TestDefaultJsonVariant. Also fixed `sensor_with_validate.validate_checksum`
fixture to declare `-> bool` as the tightened DSL now requires.
Replaces the `when=self.temp1_ok` / `when=self.temp2_ok` conditions (which
relied on the now-removed `JsonRecord.when` / `DATA_COND` path) with four
Variant subclasses dispatching on the temperature raw-value ranges:

- BothProbes: both probes report valid readings.
- MeatOnly: meat probe valid, ambient probe invalid/unplugged.
- AmbientOnly: ambient probe valid, meat probe invalid/unplugged.
- NoProbes: neither probe reports a valid reading.

Each variant's `to_json` includes only the temperature fields for the probes
that are live, reproducing the original DATA_COND behaviour within the DSL
semantics. Shared `temp1_f` / `temp2_f` / `battery_ok` helpers live on a
module-level `_AcuriteProbeVariant` abstract base (with an always-false
`when`) so the per-probe formulas aren't duplicated across variants.
…d decoders

Each proto-compiled decoder whose Python definition declares one or more
external callables (`def foo(...) -> T: ...`) needs a companion `.h`
providing the C implementation, per SEMANTICS.md §5.1. The compiler emits
`#include "<stem>.h"` at the top of the generated `.c` when such a header
is needed.

- acurite_01185m.h (new): validate_checksum (add-with-carry over the first
  six bytes of the reconstructed row).
- akhan_100F14.h: data_str (command-code → display string).
- alectov1.h: validate_packet (nibble-sum checksum on rows 1 and 5) plus
  TODO stubs for Wind4 wind_max_m_s / wind_dir_deg that need bitbuffer
  access not yet surfaced by the DSL.
- honeywell_wdb.h / honeywell_wdb_fsk.h: validate_packet (parity + sanity),
  subtype (class_raw → string), alert (alert_raw → string).
- lacrosse_tx31u.h: stubbed validate_crc (raw bytes not yet threaded through
  the DSL), plus temperature_c / humidity extractors over the variable-
  length measurement array.
- thermopro_tp211b.h: xor_checksum helper + validate_checksum.
Output of `cmake --build .` with the preceding compiler fixes applied.
Visible deltas across the files:

- `static constexpr` → `static inline` at every callable emission (Bug 17).
- FirstValid arrays indexed with `[_r]` at every call site within the row
  loop (Bug 18) — visible in acurite_01185m.c.
- Callable args that name other callables are now expanded into nested
  `prefix_foo(prefix_bar(...))` calls (Bug 20) — visible in akhan_100F14.c.
- `scan_all_rows` preamble searches now declare `offset` before the
  scanning loop (Bug 21) — visible in ambient_weather.c and tpms_ford.c.
- Dispatcher delegate variant methods are emitted (Bug 22) — visible in
  current_cost.c.
- Acurite-01185M decode now emits a four-way variant if/elif chain on
  probe validity (from the protocol restructure).

All proto-compiled decoders build cleanly and pass the rtl_433_tests e2e
suite (1827/1827).
arahimi added 4 commits April 12, 2026 18:13
Reorders the document so open bugs (currently none) appear before the
resolved list, adds a top-level status line stating that the rtl_433_tests
e2e suite is green (1827/1827) and the proto_compiler unit tests pass
(93/93) with all fixes applied, and records per-bug resolution notes with
links to the regression test classes.

Bug resolution summary:

- 14: validate hooks with unsatisfiable deps → compile-time error via
  `_resolve_callable_deps` fixpoint and explicit check.
- 15: variant default `to_json` → parent-aware synthesis + call-site prefix
  disambiguation via `variant_owned_names`.
- 16: untyped callables → `dsl.py` rejects with TypeError citing §5.
- 17: `static constexpr` → `static inline`.
- 18: FirstValid field refs inside the row loop → subscripted with `[_r]`.
- 19: `JsonRecord.when` / `DATA_COND` → removed from DSL and codegen;
  acurite_01185m restructured to use variants instead.
- 20: callable-of-callable args → recursive nested-call expansion.
- 21: `scan_all_rows` `offset` → declared before scan loop.
- 22: dispatcher delegate variant methods → emitted per `_variants`.
…urrent semantics

Several README sections had drifted from the current DSL behaviour (most of
the drift was introduced or made visible by Bugs 15, 16, 19, and 20).  This
commit aligns the README with SEMANTICS.md:

- **Methods** (previously "Methods and properties"): drop the obsolete
  "props with return annotation become C locals, methods without go in the
  method phase" merge rule.  Every callable now requires an explicit return
  type annotation (Bug 16).  Every non-validate callable compiles to a
  `static inline` C function called at each use site, not a `float x = ...;`
  local.  Updated the `temperature_c` example accordingly, including the
  protocol-prefixed C function name and the correct `data_make` call form.
  Added a note that methods can reference other methods — the compiler
  expands nested calls (Bug 20).

- **External methods**: replaced the made-up `validate(b, checksum)` example
  (there is no bare `validate` method in the current DSL) with a real
  external `validate_checksum` example matching the thermopro_tp211b
  protocol, and explained the companion header convention.

- **`validate_*` hooks**: corrected the emission story.  Hooks are no longer
  "emitted after all ordinary methods"; their calls are emitted
  *automatically* at the earliest point where every declared-parameter
  dependency is in scope, with lexicographic tiebreaking.  Inline hooks can
  and typically do have field parameters — the old "signature is (self,
  fail_value=...) only" claim was wrong.  Added the `FirstValid` loop
  `continue` variant.

- **Variants**: clarified that variants inherit parent fields and methods;
  parent-level methods are emitted once (parent-prefixed) and called
  without the variant prefix from variant bodies, variant-level methods
  are emitted variant-prefixed.  Pointed readers to acurite_01185m.py as
  the example for conditional field output via variant dispatch (replacing
  the removed `when=` / `DATA_COND` path from Bug 19).

- **to_json default**: corrected "every property" → "every non-validate
  method".  Added the variant-aware behaviour: a variant's default
  `to_json` uses the parent's `device_name` and includes parent fields and
  methods alongside its own (Bug 15).

- **LfsrDigest8**: "(or properties)" → "or any method".
The previous pipeline description carried a stale notion that `invert()`
somehow operated on "the row buffer (or on decoded packet bits after
manchester_decode, depending on position in the chain)".  The DSL has no
such distinction — the pipeline carries a single *tip* `(bitbuffer, row,
offset)` and every step operates on the current tip, whatever that happens
to be.  `manchester_decode` just rebinds the tip to a freshly declared
`packet_bits`; downstream steps don't need to know or care.

Also:

- Made the tip state explicit at the top of the pipeline section.
- Corrected `find_repeated_row` to match the current default
  (`max_bits=None` + exact match, `max_bits=...` + `<=` match).
- Noted that `scan_all_rows=True` emits an outer row-scan loop.
- Reworded the top-of-file "protocol structure" paragraph to drop the
  leftover "methods that act like bitfields" phrasing.
@a-rahimi a-rahimi marked this pull request as draft April 13, 2026 01:26
arahimi added 2 commits April 12, 2026 19:00
Upstream CI was red on four jobs:
* Build (macOS-14, CMake+Ninja): -std=c23 unsupported by CI compilers.
  Revert CMakeLists.txt to -std=c99 / CMAKE_C_STANDARD 99 (matches
  master). Generated code only needs static inline which is C99.
* Check code style: style-check requires the r_device decode function
  opening brace on its own line. codegen.block() gains a
  brace_on_newline flag; emit_decode_fn and emit_dispatcher use it.
* Check symbol errors: symbolizer rejects model strings with spaces
  or commas, and flags decode functions that lack doc blocks or (for
  dispatchers) @sa links to their delegates. ModulationConfig gets a
  new optional `model` field (separate from the r_device.name string);
  emit_decode_fn emits `/** <device_name> decoder. */` before the
  signature; emit_dispatcher emits `@sa <delegate>_decode()` for each
  delegate. lacrosse_tx31u and thermopro_tp211b add explicit `model=`
  strings matching the symbolizer regex.
* Analyze with Clang: hand-written companion headers used bool/true/
  false without including <stdbool.h>. Add the include to each header
  and to codegen's _emit_file_header so generated .c files pull it in.

Regenerated all proto-compiled C files to reflect the codegen changes.
The e2e-test-runner now runs the same style-check / symbolizer /
clang-analyzer invocations that upstream CI runs, so that proto-
compiler changes that break those checks can be caught locally
before pushing. Also notes that the project is C99 — do not bump
CMakeLists.txt to C23 as some CI compilers don't support it.
@a-rahimi a-rahimi marked this pull request as ready for review April 13, 2026 02:11
arahimi and others added 4 commits April 12, 2026 19:31
The explicit to_json() overrides the default, so the model= field in
ModulationConfig is never consumed.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Untrack .claude/ (agent configs and memory) and proto_compiler/
COMPILER_BUGS.md, CODING_RULES.md, PLAN_cls_removal.md. Files remain on
disk but are now gitignored.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The original upstream file was alecto.c; renaming it to alectov1 during
the port was a mistake.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@a-rahimi a-rahimi changed the title Add a Domain Specific Language to decode packets Add a Domain Specific Language to decode packets (v2) Apr 13, 2026
@merbanan
Copy link
Copy Markdown
Owner

Well while probably nice and all I'm not really sure what I am supposed to do with this?

@a-rahimi
Copy link
Copy Markdown
Contributor Author

My goal with this review is to get feedback on the DSL and on the general direction. By default, I'd start porting more protocols and adjusting the DSL as I go. I want feedback about whether this valuable to the project.

@merbanan
Copy link
Copy Markdown
Owner

My goal with this review is to get feedback on the DSL and on the general direction. By default, I'd start porting more protocols and adjusting the DSL as I go. I want feedback about whether this valuable to the project.

Can you write a summary in 5 sentences what the purpose with this is?

@a-rahimi
Copy link
Copy Markdown
Contributor Author

in the same way the flex decoders make it easy to write decoders in a few lines, a DSL makes it easy to implement more complicated decoders in a few lines. its less flexible than the C decoders, and more flexible than the flex decoder. it's less error prone than writing in C, but easier to write than in C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants