zebra: Fix incorrect update of 'nhe_received' in route_entry_update_nhe() by mwinter-osr · Pull Request #231 · opensourcerouting/frr

mwinter-osr · 2026-04-14T13:45:59Z

The variable 'nhe_received' is used to store the NHG received from the upper-level protocol.
It should not be updated once it's set during early route processing, as we want to preserve the original one.

Updating it in 'route_entry_update_nhe()' causes the loss of the original NHG.
Let's fix it.

Original PR21104 by GaladrielZhao

…he() The variable 'nhe_received' is used to store the NHG received from the upper-level protocol. It should not be updated once it's set during early route processing, as we want to preserve the original one. Updating it in 'route_entry_update_nhe()' causes the loss of the original NHG. Remove the nhe_received update from route_entry_update_nhe(). Instead, handle it in rib_handle_nhg_replace() where old_entry is about to be force-freed: migrate nhe_received to new_entry only when it still points to old_entry, preventing a dangling pointer. When nhe_received already points to a different NHG (the true original received NHG), it is left untouched. Signed-off-by: Yuqing Zhao <galadriel.zyq@alibaba-inc.com>

greptile-apps · 2026-04-14T13:51:04Z

Greptile Summary

This PR fixes a refcount and pointer-correctness bug in route_entry_update_nhe() where re->nhe_received was unconditionally overwritten whenever re->nhe was updated, causing loss of the original received NHG and a ref-count leak on it. The fix guards the update behind if (re->nhe_received == old_nhg), adds a pre-migration step in rib_handle_nhg_replace() to avoid dangling pointers, and adds a safety cleanup in rib_re_nhg_free() for the case where nhe_received and nhe diverge. Topotest refcount expectations are doubled to account for the dual re->nhe / re->nhe_received refs now properly held, and a new test_zebra_received_nhe_kept_nexthop_change test verifies correct ref release on nexthop change.

Confidence Score: 5/5

Safe to merge; the C fix is correct and all remaining findings are P2 naming/style issues in the topotest.
The core ref-count fix in zebra_rib.c is logically sound across all three affected call paths (route_entry_update_nhe NULL-clear, non-NULL replace, and rib_handle_nhg_replace pre-migration). The topotest updates correctly double the expected refcounts to reflect the dual nhe/nhe_received refs now held. Only P2 findings remain: two helper functions have misleading suffixes that don't match the expected values they verify, and a redundant loop with a confusingly-worded comment was added. None of these affect correctness or CI outcomes.
tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py — helper function names check_nhg_refcount_2 and check_nhg_refcount_10 should be renamed to match their actual expected values (4 and 20).

Important Files Changed

Filename	Overview
zebra/zebra_rib.c	Core fix: restricts nhe_received updates in route_entry_update_nhe() to only when nhe_received == old_nhg; adds pre-migration in rib_handle_nhg_replace() and a safety cleanup in rib_re_nhg_free(). Ref-count arithmetic is correct in all three code paths.
tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py	Updates refcount expectations to 2× (5→10, 2→4, 10→20) to account for dual nhe/nhe_received refs; adds new nexthop-change test. Helper function names check_nhg_refcount_2 and check_nhg_refcount_10 are misleadingly named (actual expected values are 4 and 20 respectively). Logic is correct.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["route_entry_update_nhe(re, new_nhghe)"] --> B{new_nhghe == NULL?}

    B -- yes --> C["old_nhg = re->nhe"]
    C --> D{re->nhe_received\n== re->nhe?}
    D -- yes --> E["decrement_ref(nhe_received)\nnhe_received = NULL"]
    D -- no --> F["leave nhe_received untouched\n(different NHG preserved)"]
    E --> G["re->nhe = NULL, goto done"]
    F --> G

    B -- no --> H{re->nhe != new_nhghe\nand re->nhe != NULL?}
    H -- yes --> I["old_nhg = re->nhe\nattach_ref(re, new_nhghe)"]
    H -- no\n(first attach) --> J["attach_ref(re, new_nhghe)\nold_nhg = NULL"]

    I --> K["done: old_nhg != NULL"]
    G --> K

    K --> L{re->nhe_received\n== old_nhg?}
    L -- yes --> M["decrement_ref(old_nhg)\nif new_nhghe: increment_ref(new_nhghe)\nnhe_received = new_nhghe"]
    L -- no\n(original NHG preserved!) --> N["nhe_received unchanged"]
    M --> O["decrement_ref(old_nhg)\n(for re->nhe ref)"]
    N --> O

    J --> P["done: old_nhg == NULL\n(no cleanup needed)"]

    style F fill:#90EE90
    style N fill:#90EE90
    style M fill:#FFE4B5

Prompt To Fix All With AI

This is a comment left during a code review.
Path: tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py
Line: 173-178

Comment:
**Misleading helper function name**

The function is renamed to `check_nhg_refcount_2` but the expected value passed to `run_and_expect` is still `4`. The suffix `_2` implies the test is checking for a refcount of 2, which is incorrect. The old name `check_nhg_refcount_4` was accurate. Rename to `check_nhg_refcount_4` (or better, `check_nhg_refcount`) to avoid misleading future readers.

```suggestion
    def check_nhg_refcount_4():
        nhg_info = r1.vtysh_cmd("show nexthop-group rib {} json".format(nhg_id))
        nhg_json = json.loads(nhg_info)
        return nhg_json.get(str(nhg_id), {}).get("refCount")

    _, result = topotest.run_and_expect(check_nhg_refcount_4, 4, count=30, wait=1)
    assert result == 4, "NHG refcount is not 4"
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py
Line: 239-244

Comment:
**Misleading helper function name**

Same pattern as the remove-routes test: the function is renamed to `check_nhg_refcount_10` but the expected refcount being checked is `20`. The suffix `_10` implies it checks for 10, but the comment above correctly states "10 routes × 2 refs = 20". The old name `check_nhg_refcount_20` was accurate.

```suggestion
    def check_nhg_refcount_20():
        nhg_info = r1.vtysh_cmd("show nexthop-group rib {} json".format(nhg_id))
        nhg_json = json.loads(nhg_info)
        return nhg_json.get(str(nhg_id), {}).get("refCount")

    _, result = topotest.run_and_expect(check_nhg_refcount_20, 20, count=30, wait=1)
    assert result == 20, "NHG refcount is not 20"
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py
Line: 103-114

Comment:
**Redundant loop and contradictory comment**

The loop at lines 109–114 re-checks `receivedNexthopGroupId` existence for all 5 prefixes, but this is already asserted (more strictly) in the loop at lines 84–93 which checks both existence AND that the value equals `nhg_id`. The second loop adds no new coverage.

More importantly, the comment at lines 104–108 states `receivedNexthopGroupId may equal nexthopGroupId after resolution` and calls it "expected behavior." This appears to contradict the PR description, which states `nhe_received` should NOT be updated after initial route processing. It is worth clarifying: for routes whose nexthop is already directly connected (no recursive resolution), `nhe` and `nhe_received` legitimately share the same NHG at all times. The comment should say this explicitly to avoid implying the fix is weaker than it is.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "zebra: Fix incorrect update of 'nhe_rece..." | Re-trigger Greptile}

greptile-apps · 2026-04-14T13:51:08Z

+    def check_nhg_refcount_2():
        nhg_info = r1.vtysh_cmd("show nexthop-group rib {} json".format(nhg_id))
        nhg_json = json.loads(nhg_info)
        return nhg_json.get(str(nhg_id), {}).get("refCount")

-    _, result = topotest.run_and_expect(check_nhg_refcount_4, 4, count=30, wait=1)
+    _, result = topotest.run_and_expect(check_nhg_refcount_2, 4, count=30, wait=1)


Misleading helper function name

The function is renamed to check_nhg_refcount_2 but the expected value passed to run_and_expect is still 4. The suffix _2 implies the test is checking for a refcount of 2, which is incorrect. The old name check_nhg_refcount_4 was accurate. Rename to check_nhg_refcount_4 (or better, check_nhg_refcount) to avoid misleading future readers.

Suggested change

def check_nhg_refcount_2():

nhg_info = r1.vtysh_cmd("show nexthop-group rib {} json".format(nhg_id))

nhg_json = json.loads(nhg_info)

return nhg_json.get(str(nhg_id), {}).get("refCount")

_, result = topotest.run_and_expect(check_nhg_refcount_4, 4, count=30, wait=1)

_, result = topotest.run_and_expect(check_nhg_refcount_2, 4, count=30, wait=1)

def check_nhg_refcount_4():

nhg_info = r1.vtysh_cmd("show nexthop-group rib {} json".format(nhg_id))

nhg_json = json.loads(nhg_info)

return nhg_json.get(str(nhg_id), {}).get("refCount")

_, result = topotest.run_and_expect(check_nhg_refcount_4, 4, count=30, wait=1)

assert result == 4, "NHG refcount is not 4"

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py Line: 173-178 Comment: **Misleading helper function name** The function is renamed to `check_nhg_refcount_2` but the expected value passed to `run_and_expect` is still `4`. The suffix `_2` implies the test is checking for a refcount of 2, which is incorrect. The old name `check_nhg_refcount_4` was accurate. Rename to `check_nhg_refcount_4` (or better, `check_nhg_refcount`) to avoid misleading future readers. ```suggestion def check_nhg_refcount_4(): nhg_info = r1.vtysh_cmd("show nexthop-group rib {} json".format(nhg_id)) nhg_json = json.loads(nhg_info) return nhg_json.get(str(nhg_id), {}).get("refCount") _, result = topotest.run_and_expect(check_nhg_refcount_4, 4, count=30, wait=1) assert result == 4, "NHG refcount is not 4" ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-14T13:51:09Z

            route["receivedNexthopGroupId"] == nhg_id
        ), f"Route {prefix} has different NHG ID"

-    step("Verify NHG {} has refcount of 10".format(nhg_id))
+    step("Verify NHG {} has refcount of 20".format(nhg_id))

    # Get the NHG information


Misleading helper function name

Same pattern as the remove-routes test: the function is renamed to check_nhg_refcount_10 but the expected refcount being checked is 20. The suffix _10 implies it checks for 10, but the comment above correctly states "10 routes × 2 refs = 20". The old name check_nhg_refcount_20 was accurate.

Suggested change

route["receivedNexthopGroupId"] == nhg_id

), f"Route {prefix} has different NHG ID"

step("Verify NHG {} has refcount of 10".format(nhg_id))

step("Verify NHG {} has refcount of 20".format(nhg_id))

# Get the NHG information

def check_nhg_refcount_20():

nhg_info = r1.vtysh_cmd("show nexthop-group rib {} json".format(nhg_id))

nhg_json = json.loads(nhg_info)

return nhg_json.get(str(nhg_id), {}).get("refCount")

_, result = topotest.run_and_expect(check_nhg_refcount_20, 20, count=30, wait=1)

assert result == 20, "NHG refcount is not 20"

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py Line: 239-244 Comment: **Misleading helper function name** Same pattern as the remove-routes test: the function is renamed to `check_nhg_refcount_10` but the expected refcount being checked is `20`. The suffix `_10` implies it checks for 10, but the comment above correctly states "10 routes × 2 refs = 20". The old name `check_nhg_refcount_20` was accurate. ```suggestion def check_nhg_refcount_20(): nhg_info = r1.vtysh_cmd("show nexthop-group rib {} json".format(nhg_id)) nhg_json = json.loads(nhg_info) return nhg_json.get(str(nhg_id), {}).get("refCount") _, result = topotest.run_and_expect(check_nhg_refcount_20, 20, count=30, wait=1) assert result == 20, "NHG refcount is not 20" ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-04-14T13:51:10Z

+        # Verify that receivedNexthopGroupId exists for all routes.
+        #
+        # Note: The current implementation updates nhe_received during route
+        # modification (including route resolution), so receivedNexthopGroupId
+        # may equal nexthopGroupId after resolution. This is expected behavior.
+        # We only verify that receivedNexthopGroupId exists and is shared correctly.
+        for i in range(1, 6):
+            prefix = f"10.0.0.{i}/32"
+            route = route_json[prefix][0]
+            assert (
+                "receivedNexthopGroupId" in route
+            ), f"Route {prefix} missing receivedNexthopGroupId"


Redundant loop and contradictory comment

The loop at lines 109–114 re-checks receivedNexthopGroupId existence for all 5 prefixes, but this is already asserted (more strictly) in the loop at lines 84–93 which checks both existence AND that the value equals nhg_id. The second loop adds no new coverage.

More importantly, the comment at lines 104–108 states receivedNexthopGroupId may equal nexthopGroupId after resolution and calls it "expected behavior." This appears to contradict the PR description, which states nhe_received should NOT be updated after initial route processing. It is worth clarifying: for routes whose nexthop is already directly connected (no recursive resolution), nhe and nhe_received legitimately share the same NHG at all times. The comment should say this explicitly to avoid implying the fix is weaker than it is.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/topotests/zebra_received_nhe_kept/test_zebra_received_nhe_kept.py Line: 103-114 Comment: **Redundant loop and contradictory comment** The loop at lines 109–114 re-checks `receivedNexthopGroupId` existence for all 5 prefixes, but this is already asserted (more strictly) in the loop at lines 84–93 which checks both existence AND that the value equals `nhg_id`. The second loop adds no new coverage. More importantly, the comment at lines 104–108 states `receivedNexthopGroupId may equal nexthopGroupId after resolution` and calls it "expected behavior." This appears to contradict the PR description, which states `nhe_received` should NOT be updated after initial route processing. It is worth clarifying: for routes whose nexthop is already directly connected (no recursive resolution), `nhe` and `nhe_received` legitimately share the same NHG at all times. The comment should say this explicitly to avoid implying the fix is weaker than it is. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps Bot reviewed Apr 14, 2026

View reviewed changes

mwinter-osr mentioned this pull request Apr 14, 2026

greptile: Add Greptile configuration for automated code review on PRs. FRRouting/frr#21482

Closed

ton31337 force-pushed the master branch from 4ab6783 to ec3e59c Compare April 14, 2026 16:11

GaladrielZhao mentioned this pull request Apr 23, 2026

zebra: Fix incorrect update of 'nhe_received' in route_entry_update_nhe() FRRouting/frr#21104

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zebra: Fix incorrect update of 'nhe_received' in route_entry_update_nhe()#231

zebra: Fix incorrect update of 'nhe_received' in route_entry_update_nhe()#231
mwinter-osr wants to merge 1 commit intoopensourcerouting:masterfrom
mwinter-osr:PR21104

mwinter-osr commented Apr 14, 2026

Uh oh!

greptile-apps Bot commented Apr 14, 2026

Confidence Score: 5/5

Important Files Changed

Uh oh!

greptile-apps Bot Apr 14, 2026

Uh oh!

greptile-apps Bot Apr 14, 2026

Uh oh!

greptile-apps Bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mwinter-osr commented Apr 14, 2026

Uh oh!

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants