Skip to content

Conversation

@CapnBry
Copy link
Contributor

@CapnBry CapnBry commented Dec 30, 2025

This PR addresses regressions from #4038 related to the handling of invalid CRSF data processing. The updated code provides more robustness against the most common type of error encountered and throws out significantly fewer packets, while also decreasing latency related to expecting invalidly-long packets.

1. Throwing out too much data

By far the most common error condition of rejecting a CRSF packet is a CRC error due to a byte being missing. The old resync code would look inside a packet which failed CRC check to see if a new CRSF packet is contained within it, and shift everything to start there. The current code discards the whole declared length of the packet, which will also include the start of the next packet, leading to dropping multiple packets for one missed byte, Example:

[XF] CRC error DUMP 28 bytes ...
C8 0D EA EE 10 00 00 9C 40 00 00 00 4A 99 C8 0C 14 D5 00 64 0C 00 1B 01 CA 64 0B 73 
HDR LEN TYPE!?

EA is not a valid type of packet, it is a source address and the TYPE is missing. The current code will throw out 0D bytes following the 0D, including the C8 of the next packet. Then, the existing code will search the remaining data for the start of a new packet, but it has been thrown out already, so both packets will be discarded.

I've updated the code to only advance the start of packet by 1 byte instead of pkt_len, and then start searching. This finds and processes the second packet as normal.

2. Allowing too much data, then throwing it out

The _lenIsSane() function was allowing packets that are too large to be CRSF packets due to using the incorrect defined value for comparison. All CRSF packets must be 64 bytes or less, the first 2 bytes (HDR and LEN) consume 2 bytes, leaving a max 62 byte declared packet len. The existing code allows up to 128 bytes total, leading the code to buffer multiple packets, then this will guaranteed fail CRC, and all the packets will be thrown out at once. Example:

[XF] CRC error DUMP 72 bytes ...
C8 3A EA EE 10 00 00 9C 40 00 00 00 1C D7 C8 0D 3A EA EE 10 00 00 9C 40 00 00 00 38 8D C8 0C 14 
D7 00 64 0C 00 1B 01 D2 64 0B 1D C8 0D 3A EA EE 10 00 00 9C 40 00 00 00 41 1A C8 0C 14 D6 00 64 
0C 00 1B 01 D1 64 0B EF

This packet is missing its length byte, leading to the packet type (0x3A) to be interpreted as the length. The parser then waits until it has gathered multiple frames to meet the incorrect length requirement, adding latency, and then throws out the entire data because the CRC is incorrect. In this example there are 5 total packets which are thrown out due to the improper length check, and the next packet could also be potentially thrown out as well if the length makes it stop in the middle of a packet.

To resolve this I've updated the _lenIsSane() to be clearer about what value it is supposed to operate on, as well as use defines for the constants, with comments.

The existing code also would throw out the entire buffer if the length wasn't [the old definition of] sane, even if the invalid length only comprised part of the buffer. Now it uses the same exhaustive search as 1.

@CapnBry
Copy link
Contributor Author

CapnBry commented Dec 30, 2025

I see that a test has failed, but somehow not in any code that I have changed (the test is incorrect unless this is a requirement) I will fix this as well but I am out of time for today.

@3djc 3djc requested a review from raphaelcoeffic December 30, 2025 16:20
@gagarinlg
Copy link
Member

I see that a test has failed, but somehow not in any code that I have changed (the test is incorrect unless this is a requirement) I will fix this as well but I am out of time for today.

Well it is an error at a test for the CRSF frame length. Maybe the tests sends packets longer than 64 bytes?

@gagarinlg
Copy link
Member

I see that a test has failed, but somehow not in any code that I have changed (the test is incorrect unless this is a requirement) I will fix this as well but I am out of time for today.

Well it is an error at a test for the CRSF frame length. Maybe the tests sends packets longer than 64 bytes?

https://github.com/EdgeTX/edgetx/blob/5f37969b27d5c4c717474390c3be58295680609d/radio/src/tests/crossfire.cpp#L166C1-L169C24

that test is failling

@CapnBry
Copy link
Contributor Author

CapnBry commented Dec 30, 2025

Well it is an error at a test for the CRSF frame length. Maybe the tests sends packets longer than 64 bytes?

I looked at the test and it makes incorrect assumptions based on how the old code operated. The code has always expected that no packet less than 3 bytes can be evaluated to determine if it is valid, and it should wait until more data comes in. However, the old code would throw out far too much when something went wrong, so it would throw away possibly good data as well.

The test tested that if something went wrong then everything was thrown out, which is not what you'd want to do if you want to try to recover packets from the buffer. It is a case of a test testing that the code is doing what the code does, not that the code is having the desired effect of getting packets from corrupt data.

I'll fix the test and also add more tests of the packet recovery mechanism, which is what you'd want to test.

@gagarinlg
Copy link
Member

Well it is an error at a test for the CRSF frame length. Maybe the tests sends packets longer than 64 bytes?

I looked at the test and it makes incorrect assumptions based on how the old code operated. The code has always expected that no packet less than 3 bytes can be evaluated to determine if it is valid, and it should wait until more data comes in. However, the old code would throw out far too much when something went wrong, so it would throw away possibly good data as well.

The test tested that if something went wrong then everything was thrown out, which is not what you'd want to do if you want to try to recover packets from the buffer. It is a case of a test testing that the code is doing what the code does, not that the code is having the desired effect of getting packets from corrupt data.

Th I would say the best way forward would be to also update the test to refelct the desired behaviour

@CapnBry
Copy link
Contributor Author

CapnBry commented Dec 31, 2025

I've updated the failed tests to validate that it follows the intention, which is to try to get as many valid packets as possible, not to just throw out all the data if some of the checks fail. I've also added tests for 3 different missing byte conditions that I've observed and also that the parser will process multiple packets following one missing byte. All of these new tests would fail on the current code despite being copy-pasted from real-world data. 😊

@pfeerick
Copy link
Member

pfeerick commented Jan 1, 2026

Thanks for that Capn... clearly not ideal when the tests are testing that faulty code is correctly faulty (try saying that three times fast! 🤣 ) ... not that I have been known to disable tests to resolve that issue. 😁

@CapnBry
Copy link
Contributor Author

CapnBry commented Jan 1, 2026

(try saying that three times fast! 🤣)

haha not so early on New Years Day I can't!

The test was correct if the design was to just make sure if there was bad data that it wouldn't get stuck and would move on to be able to process future packets, so no shade there. 😚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants