Skip to content

zebra: clean up VRF handling by using dataplane provided vrf_id#234

Open
mwinter-osr wants to merge 3 commits intoopensourcerouting:masterfrom
mwinter-osr:PR20318
Open

zebra: clean up VRF handling by using dataplane provided vrf_id#234
mwinter-osr wants to merge 3 commits intoopensourcerouting:masterfrom
mwinter-osr:PR20318

Conversation

@mwinter-osr
Copy link
Copy Markdown
Member

Zebra was implicitly assuming that the VRF netdevice ifindex could be used as the VRF identifier, leading to casts from ifindex to vrf_id_t. While this works with the Linux kernel dataplane, it does not respect the dataplane API, which exposes these identifiers as distinct concepts.

The code is updated to use the VRF identifier explicitly provided by the dataplane context and removes the need for casting ifindex to vrf_id_t. In addition, the VRF delete path is clarified by no longer reading unused dataplane fields, avoiding confusion about which identifiers are actually required.

There is no behavior change for the Linux kernel dataplane.

Original PR20318 by maxime-leroy

zebra_if_dplane_ifp_handling() was reading dplane_ctx_get_ifp_table_id()
for VRF events.

In the netlink dplane backend, ifp_table_id is only set via
netlink_vrf_change(), which is invoked from netlink_link_change() for
RTM_NEWLINK VRF events. It is not set for RTM_DELLINK. This causes
interface_vrf_change() to receive table_id as 0 on VRF deletes.

For delete operations, save the VRF pointer before if_delete_update()
and retrieve table_id from vrf->data.l.table_id. For add/update
operations, continue using the dplane-provided table_id.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
interface_vrf_change() was implicitly assuming that the VRF netdevice
ifindex could be used as zebra's vrf_id.

This assumption holds true for the Linux kernel dataplane, where the VRF
ID is defined as the ifindex of the VRF interface, so this change does
not alter kernel behavior.

However, the dataplane API already exposes both concepts explicitly via
dplane_ctx_get_ifindex() and dplane_ctx_get_ifp_vrf_id(ctx). Using the
proper accessor avoids casting an ifindex to vrf_id_t and better respects
the dataplane abstraction.

On interface updates, the vrf_id provided by the dataplane is now used
directly. On interface deletion (DELLINK), where the dataplane context
may no longer carry vrf information, zebra relies on the existing ifp
state (ifp->vrf->vrf_id) before if_delete_update() is called. The table_id
is also retrieved from ifp->vrf->data.l.table_id since
dplane_ctx_get_ifp_table_id() is not set for RTM_DELLINK events.

This is required for non-kernel dataplanes such as Grout (DPDK), where
the VRF ID is not equal to the VRF interface ifindex. In that case, using
the correct vrf_id fixes VRF handling between zebra and the Grout
dataplane.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
In the delete path, we already hold a reference to the vrf structure
before calling if_delete_update(). Passing this pointer directly to
interface_vrf_change() avoids a redundant vrf_lookup_by_id() call and
allows accessing vrf->name, vrf->vrf_id, and vrf->data.l.table_id
directly.

For update operations, the vrf pointer is passed as NULL and the
function continues to use the vrf_id, name, and tableid parameters
from the dataplane context.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

This PR cleans up VRF handling in zebra by eliminating the implicit (vrf_id_t)ifindex cast that coupled the VRF identifier to the VRF netdevice's ifindex. The DELETE path now captures ifp->vrf before if_delete_update (safe since if_delete frees the interface struct but not the VRF struct), and the UPDATE path uses dplane_ctx_get_ifp_vrf_id() instead of dplane_ctx_get_ifindex(). For the Linux kernel dataplane the two values are identical (vrf_id = ifi->ifi_index per if_netlink.c line 1494), confirming no behavioral regression. The if_vrf_change tracepoint field is correctly renamed from ifindex to vrf_id.

  • No topotest is included. While existing VRF topotests (e.g. bgp_multi_vrf_topo1) exercise VRF bring-up/teardown paths, this PR has no dedicated test covering VRF device create/delete through the modified interface_vrf_change code path — required per project policy.

Confidence Score: 4/5

  • Safe to merge for Linux; blocked only by missing topotest per project policy.
  • The refactoring is logically correct — VRF pointer lifetime after if_delete_update is safe, and Linux behavior is unchanged because dplane_ctx_get_ifp_vrf_id returns ifi->ifi_index for VRF devices (identical to the old cast). The only blocker is the mandatory-topotest P0 per project policy. All other findings are P2 style suggestions.
  • zebra/interface.c — specifically the DELETE path ordering (vrf capture before if_delete_update) and the dual-mode interface_vrf_change signature.

Important Files Changed

Filename Overview
zebra/interface.c Core change: DELETE path captures ifp->vrf before if_delete_update, UPDATE path uses dplane_ctx_get_ifp_vrf_id. Logic is correct; vrf pointer stays live after if_delete_update because that call frees the interface struct, not the VRF struct. Function signature is now dual-mode (different parameter semantics per op) which reduces readability.
zebra/zebra_trace.h Tracepoint argument renamed from ifindex_t ifindex to vrf_id_t vrf_id. Both types alias uint32_t, so the CTF binary layout is unchanged; only the field name and declared type differ. Correct and intentional.

Sequence Diagram

sequenceDiagram
    participant KNL as Linux Kernel
    participant NL as if_netlink.c
    participant IFH as zebra_if_dplane_ifp_handling
    participant VRF as interface_vrf_change
    participant LIB as lib/vrf.c

    Note over KNL,LIB: VRF Device CREATE (RTM_NEWLINK)
    KNL->>NL: "RTM_NEWLINK (zif_type=VRF)"
    NL->>NL: netlink_vrf_change() → set_ifp_table_id
    NL->>NL: "vrf_id = ifi->ifi_index"
    NL->>NL: dplane_ctx_set_ifp_vrf_id(ctx, vrf_id)
    NL->>IFH: dplane_provider_enqueue_to_zebra(ctx)
    IFH->>IFH: "vrf_id = dplane_ctx_get_ifp_vrf_id(ctx)"
    IFH->>VRF: interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id)
    VRF->>LIB: vrf_update(vrf_id, name)
    VRF->>LIB: vrf_enable(vrf)

    Note over KNL,LIB: VRF Device DELETE (RTM_DELLINK)
    KNL->>NL: "RTM_DELLINK (zif_type=VRF)"
    NL->>IFH: dplane_provider_enqueue_to_zebra(ctx)
    IFH->>IFH: "vrf = ifp->vrf  (captured before if_delete_update)"
    IFH->>IFH: "if_delete_update(&ifp)"
    IFH->>VRF: interface_vrf_change(op, vrf, 0, NULL, 0, ns_id)
    VRF->>LIB: vrf_delete(vrf)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: zebra/interface.c
Line: 2003-2009

Comment:
**Missing topotest for VRF device create/delete path**

No topotest covers the modified code path. While `bgp_multi_vrf_topo1` and similar tests exercise VRF-based routing, none specifically validate that VRF device creation (`DPLANE_OP_INTF_INSTALL`) and deletion (`DPLANE_OP_INTF_DELETE`) through `interface_vrf_change` behave correctly with the new identifier sourcing. Per project policy, a dedicated topotest is required before merge — even for refactors — to guard against regressions on any future non-Linux dataplane that populates `vrf_id` differently from `ifindex`.

**Context Used:** You are a senior core maintainer of FreeRangeRouti... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: zebra/interface.c
Line: 1506-1507

Comment:
**Dual-mode function signature reduces readability**

`interface_vrf_change` now has mutually exclusive parameter semantics: `vrf` is only meaningful for `DPLANE_OP_INTF_DELETE`, while `vrf_id / name / tableid` are only meaningful for the update branch. Passing `NULL` / `0` for the unused set makes call sites ambiguous at a glance:

```c
// DELETE: vrf is live pointer; vrf_id/name/tableid are garbage sentinels
interface_vrf_change(op, vrf, 0, NULL, 0, ns_id);

// UPDATE: vrf is NULL; vrf_id/name/tableid carry the real data
interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id);
```

Consider splitting into `interface_vrf_delete(struct vrf *vrf)` and `interface_vrf_update(vrf_id_t vrf_id, const char *name, uint32_t tableid, ns_id_t ns_id)` so each call site is self-documenting and the compiler enforces that the caller supplies the right data for each operation.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "zebra: pass vrf pointer to interface_vrf..." | Re-trigger Greptile

Comment thread zebra/interface.c
Comment on lines +2003 to +2009
if (zif_type == ZEBRA_IF_VRF && !vrf_is_backend_netns())
vrf = ifp->vrf;

if_delete_update(&ifp);

if (zif_type == ZEBRA_IF_VRF && !vrf_is_backend_netns())
interface_vrf_change(op, ifindex, name, tableid, ns_id);
if (vrf)
interface_vrf_change(op, vrf, 0, NULL, 0, ns_id);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Missing topotest for VRF device create/delete path

No topotest covers the modified code path. While bgp_multi_vrf_topo1 and similar tests exercise VRF-based routing, none specifically validate that VRF device creation (DPLANE_OP_INTF_INSTALL) and deletion (DPLANE_OP_INTF_DELETE) through interface_vrf_change behave correctly with the new identifier sourcing. Per project policy, a dedicated topotest is required before merge — even for refactors — to guard against regressions on any future non-Linux dataplane that populates vrf_id differently from ifindex.

Context Used: You are a senior core maintainer of FreeRangeRouti... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: zebra/interface.c
Line: 2003-2009

Comment:
**Missing topotest for VRF device create/delete path**

No topotest covers the modified code path. While `bgp_multi_vrf_topo1` and similar tests exercise VRF-based routing, none specifically validate that VRF device creation (`DPLANE_OP_INTF_INSTALL`) and deletion (`DPLANE_OP_INTF_DELETE`) through `interface_vrf_change` behave correctly with the new identifier sourcing. Per project policy, a dedicated topotest is required before merge — even for refactors — to guard against regressions on any future non-Linux dataplane that populates `vrf_id` differently from `ifindex`.

**Context Used:** You are a senior core maintainer of FreeRangeRouti... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))

How can I resolve this? If you propose a fix, please make it concise.

Comment thread zebra/interface.c
Comment on lines +1506 to +1507
static void interface_vrf_change(enum dplane_op_e op, struct vrf *vrf, vrf_id_t vrf_id,
const char *name, uint32_t tableid, ns_id_t ns_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dual-mode function signature reduces readability

interface_vrf_change now has mutually exclusive parameter semantics: vrf is only meaningful for DPLANE_OP_INTF_DELETE, while vrf_id / name / tableid are only meaningful for the update branch. Passing NULL / 0 for the unused set makes call sites ambiguous at a glance:

// DELETE: vrf is live pointer; vrf_id/name/tableid are garbage sentinels
interface_vrf_change(op, vrf, 0, NULL, 0, ns_id);

// UPDATE: vrf is NULL; vrf_id/name/tableid carry the real data
interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id);

Consider splitting into interface_vrf_delete(struct vrf *vrf) and interface_vrf_update(vrf_id_t vrf_id, const char *name, uint32_t tableid, ns_id_t ns_id) so each call site is self-documenting and the compiler enforces that the caller supplies the right data for each operation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: zebra/interface.c
Line: 1506-1507

Comment:
**Dual-mode function signature reduces readability**

`interface_vrf_change` now has mutually exclusive parameter semantics: `vrf` is only meaningful for `DPLANE_OP_INTF_DELETE`, while `vrf_id / name / tableid` are only meaningful for the update branch. Passing `NULL` / `0` for the unused set makes call sites ambiguous at a glance:

```c
// DELETE: vrf is live pointer; vrf_id/name/tableid are garbage sentinels
interface_vrf_change(op, vrf, 0, NULL, 0, ns_id);

// UPDATE: vrf is NULL; vrf_id/name/tableid carry the real data
interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id);
```

Consider splitting into `interface_vrf_delete(struct vrf *vrf)` and `interface_vrf_update(vrf_id_t vrf_id, const char *name, uint32_t tableid, ns_id_t ns_id)` so each call site is self-documenting and the compiler enforces that the caller supplies the right data for each operation.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants