Skip to content

zebra: Fix incorrect update of 'nhe_received' in route_entry_update_nhe()#21104

Merged
riw777 merged 1 commit intoFRRouting:masterfrom
GaladrielZhao:fix/nhe_received_update
Apr 27, 2026
Merged

zebra: Fix incorrect update of 'nhe_received' in route_entry_update_nhe()#21104
riw777 merged 1 commit intoFRRouting:masterfrom
GaladrielZhao:fix/nhe_received_update

Conversation

@GaladrielZhao
Copy link
Copy Markdown
Contributor

@GaladrielZhao GaladrielZhao commented Mar 12, 2026

The variable 'nhe_received' stores the NHG received from the upper-level protocol. It should not be unconditionally overwritten during route processing, as doing so loses the original NHG.

In route_entry_update_nhe(), replace the unconditional update of nhe_received with a conditional one: only clear nhe_received (and decrement its ref) when it points to the same NHG as nhe, which is being replaced.

In rib_handle_nhg_replace(), before old_entry is force-freed, check whether nhe_received still points to old_entry. If so, migrate it to new_entry to prevent a dangling pointer. If nhe_received already points to a different NHG (the true original received NHG), leave it untouched.

Let's fix it.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 12, 2026

Greptile Summary

This PR fixes a reference counting and pointer safety bug in route_entry_update_nhe(), where nhe_received (the NHG originally received from the upper-level protocol) was being unconditionally overwritten during route processing, discarding the true original NHG. The fix conditionally updates nhe_received only when it already points to the same NHG as nhe, and adds a pre-migration step in rib_handle_nhg_replace() to prevent dangling pointers before the old NHE is released. Test refcounts are correctly doubled to reflect that each route entry now holds two distinct refs to the NHG (one for re->nhe, one for re->nhe_received) when they coincide.

Confidence Score: 5/5

Safe to merge — the refcount arithmetic is correct across all code paths and the fix prevents the original NHG from being silently discarded.

No P0/P1 issues found. The conditional update of nhe_received is correctly scoped: same-NHG paths update it, diverged-NHG paths leave it untouched, and rib_re_nhg_free is a reliable safety net for the diverged case. Refcount algebra in both the NULL and non-NULL new_nhghe paths is balanced. Test expected values are updated consistently with the new 2-refs-per-route semantics.

No files require special attention.

Important Files Changed

Filename Overview
zebra/zebra_rib.c Core bug fix: conditional nhe_received update in route_entry_update_nhe() and pre-migration in rib_handle_nhg_replace() to prevent dangling pointers; refcount arithmetic appears correct across all paths.
tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py Refcount expectations updated to account for dual refs per route (nhe + nhe_received); new nexthop-change test added; **kwargs added to inner retry-decorated function for decorator compatibility.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant rib_handle_nhg_replace
    participant route_entry_update_nhe
    participant zebra_nhg

    Note over Caller: NHG replace: old_entry → new_entry

    Caller->>rib_handle_nhg_replace: (old_entry, new_entry)

    loop for each re where re->nhe == old_entry
        alt re->nhe_received == old_entry
            Note over rib_handle_nhg_replace: Pre-migrate nhe_received before update_nhe releases nhe ref
            rib_handle_nhg_replace->>zebra_nhg: decrement_ref(old_entry) [nhe_received ref]
            rib_handle_nhg_replace->>zebra_nhg: increment_ref(new_entry) [nhe_received ref]
            rib_handle_nhg_replace->>rib_handle_nhg_replace: re->nhe_received = new_entry
        else re->nhe_received != old_entry
            Note over rib_handle_nhg_replace: nhe_received points to true original NHG — leave untouched
        end
        rib_handle_nhg_replace->>route_entry_update_nhe: (re, new_entry)
        route_entry_update_nhe->>zebra_nhg: increment_ref(new_entry) [nhe ref]
        Note over route_entry_update_nhe: re->nhe_received now == new_entry != old_nhg → skip nhe_received update
        route_entry_update_nhe->>zebra_nhg: decrement_ref(old_entry) [nhe ref]
        route_entry_update_nhe-->>rib_handle_nhg_replace: ret (1 if old freed)
    end

    rib_handle_nhg_replace-->>Caller: ret
Loading

Reviews (3): Last reviewed commit: "zebra: Fix incorrect update of 'nhe_rece..." | Re-trigger Greptile

@GaladrielZhao GaladrielZhao force-pushed the fix/nhe_received_update branch from ef90d3d to e2f4044 Compare March 12, 2026 08:14
@github-actions github-actions Bot added size/L and removed size/XS labels Mar 12, 2026
@GaladrielZhao GaladrielZhao force-pushed the fix/nhe_received_update branch from e2f4044 to e2d5ca6 Compare March 12, 2026 12:02
Copy link
Copy Markdown
Member

@riw777 riw777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@riw777
Copy link
Copy Markdown
Member

riw777 commented Mar 18, 2026

 This indicates duplicate RMACs exist - the fix is not working!
assert 'Old RMAC 1e:7a:17:93:51:93 still present, waiting for new RMAC' is None
E   AssertionError: RMAC change verification failed: Old RMAC 1e:7a:17:93:51:93 still present, waiting for new RMAC
     This indicates duplicate RMACs exist - the fix is not working!
   assert 'Old RMAC 1e:7a:17:93:51:93 still present, waiting for new RMAC' is None```

:-(

@GaladrielZhao GaladrielZhao force-pushed the fix/nhe_received_update branch from e2d5ca6 to 23c32a9 Compare March 29, 2026 12:32
@GaladrielZhao
Copy link
Copy Markdown
Contributor Author

 This indicates duplicate RMACs exist - the fix is not working!
assert 'Old RMAC 1e:7a:17:93:51:93 still present, waiting for new RMAC' is None
E   AssertionError: RMAC change verification failed: Old RMAC 1e:7a:17:93:51:93 still present, waiting for new RMAC
     This indicates duplicate RMACs exist - the fix is not working!
   assert 'Old RMAC 1e:7a:17:93:51:93 still present, waiting for new RMAC' is None```

:-(

Thanks very much for your review. I'm working on it.

@GaladrielZhao
Copy link
Copy Markdown
Contributor Author

Hi @riw777 , I've fixed the CI issues. Could you please take another look?

@riw777
Copy link
Copy Markdown
Member

riw777 commented Apr 21, 2026

@greptile can you check this one again?

…he()

The variable 'nhe_received' stores the NHG received from the upper-level
protocol. It should not be unconditionally overwritten during route
processing, as doing so loses the original NHG.

In route_entry_update_nhe(), replace the unconditional update of
nhe_received with a conditional one: only clear nhe_received (and
decrement its ref) when it points to the same NHG as nhe, which
is being replaced.

In rib_handle_nhg_replace(), before old_entry is force-freed, check
whether nhe_received still points to old_entry. If so, migrate it to
new_entry to prevent a dangling pointer. If nhe_received already points
to a different NHG (the true original received NHG), leave it untouched.

Signed-off-by: Yuqing Zhao <galadriel.zyq@alibaba-inc.com>
@GaladrielZhao GaladrielZhao force-pushed the fix/nhe_received_update branch from 0934c96 to e4c9930 Compare April 22, 2026 14:25
@GaladrielZhao
Copy link
Copy Markdown
Contributor Author

GaladrielZhao commented Apr 23, 2026

@greptile can you check this one again?

Thanks @riw777 , I've addressed greptile's comments in opensourcerouting#231.

@riw777 riw777 merged commit 8094856 into FRRouting:master Apr 27, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants