zebra: clean up VRF handling by using dataplane provided vrf_id#234
zebra: clean up VRF handling by using dataplane provided vrf_id#234mwinter-osr wants to merge 3 commits intoopensourcerouting:masterfrom
Conversation
zebra_if_dplane_ifp_handling() was reading dplane_ctx_get_ifp_table_id() for VRF events. In the netlink dplane backend, ifp_table_id is only set via netlink_vrf_change(), which is invoked from netlink_link_change() for RTM_NEWLINK VRF events. It is not set for RTM_DELLINK. This causes interface_vrf_change() to receive table_id as 0 on VRF deletes. For delete operations, save the VRF pointer before if_delete_update() and retrieve table_id from vrf->data.l.table_id. For add/update operations, continue using the dplane-provided table_id. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
interface_vrf_change() was implicitly assuming that the VRF netdevice ifindex could be used as zebra's vrf_id. This assumption holds true for the Linux kernel dataplane, where the VRF ID is defined as the ifindex of the VRF interface, so this change does not alter kernel behavior. However, the dataplane API already exposes both concepts explicitly via dplane_ctx_get_ifindex() and dplane_ctx_get_ifp_vrf_id(ctx). Using the proper accessor avoids casting an ifindex to vrf_id_t and better respects the dataplane abstraction. On interface updates, the vrf_id provided by the dataplane is now used directly. On interface deletion (DELLINK), where the dataplane context may no longer carry vrf information, zebra relies on the existing ifp state (ifp->vrf->vrf_id) before if_delete_update() is called. The table_id is also retrieved from ifp->vrf->data.l.table_id since dplane_ctx_get_ifp_table_id() is not set for RTM_DELLINK events. This is required for non-kernel dataplanes such as Grout (DPDK), where the VRF ID is not equal to the VRF interface ifindex. In that case, using the correct vrf_id fixes VRF handling between zebra and the Grout dataplane. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
In the delete path, we already hold a reference to the vrf structure before calling if_delete_update(). Passing this pointer directly to interface_vrf_change() avoids a redundant vrf_lookup_by_id() call and allows accessing vrf->name, vrf->vrf_id, and vrf->data.l.table_id directly. For update operations, the vrf pointer is passed as NULL and the function continues to use the vrf_id, name, and tableid parameters from the dataplane context. Signed-off-by: Maxime Leroy <maxime@leroys.fr>
Greptile SummaryThis PR cleans up VRF handling in zebra by eliminating the implicit
|
| Filename | Overview |
|---|---|
| zebra/interface.c | Core change: DELETE path captures ifp->vrf before if_delete_update, UPDATE path uses dplane_ctx_get_ifp_vrf_id. Logic is correct; vrf pointer stays live after if_delete_update because that call frees the interface struct, not the VRF struct. Function signature is now dual-mode (different parameter semantics per op) which reduces readability. |
| zebra/zebra_trace.h | Tracepoint argument renamed from ifindex_t ifindex to vrf_id_t vrf_id. Both types alias uint32_t, so the CTF binary layout is unchanged; only the field name and declared type differ. Correct and intentional. |
Sequence Diagram
sequenceDiagram
participant KNL as Linux Kernel
participant NL as if_netlink.c
participant IFH as zebra_if_dplane_ifp_handling
participant VRF as interface_vrf_change
participant LIB as lib/vrf.c
Note over KNL,LIB: VRF Device CREATE (RTM_NEWLINK)
KNL->>NL: "RTM_NEWLINK (zif_type=VRF)"
NL->>NL: netlink_vrf_change() → set_ifp_table_id
NL->>NL: "vrf_id = ifi->ifi_index"
NL->>NL: dplane_ctx_set_ifp_vrf_id(ctx, vrf_id)
NL->>IFH: dplane_provider_enqueue_to_zebra(ctx)
IFH->>IFH: "vrf_id = dplane_ctx_get_ifp_vrf_id(ctx)"
IFH->>VRF: interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id)
VRF->>LIB: vrf_update(vrf_id, name)
VRF->>LIB: vrf_enable(vrf)
Note over KNL,LIB: VRF Device DELETE (RTM_DELLINK)
KNL->>NL: "RTM_DELLINK (zif_type=VRF)"
NL->>IFH: dplane_provider_enqueue_to_zebra(ctx)
IFH->>IFH: "vrf = ifp->vrf (captured before if_delete_update)"
IFH->>IFH: "if_delete_update(&ifp)"
IFH->>VRF: interface_vrf_change(op, vrf, 0, NULL, 0, ns_id)
VRF->>LIB: vrf_delete(vrf)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: zebra/interface.c
Line: 2003-2009
Comment:
**Missing topotest for VRF device create/delete path**
No topotest covers the modified code path. While `bgp_multi_vrf_topo1` and similar tests exercise VRF-based routing, none specifically validate that VRF device creation (`DPLANE_OP_INTF_INSTALL`) and deletion (`DPLANE_OP_INTF_DELETE`) through `interface_vrf_change` behave correctly with the new identifier sourcing. Per project policy, a dedicated topotest is required before merge — even for refactors — to guard against regressions on any future non-Linux dataplane that populates `vrf_id` differently from `ifindex`.
**Context Used:** You are a senior core maintainer of FreeRangeRouti... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: zebra/interface.c
Line: 1506-1507
Comment:
**Dual-mode function signature reduces readability**
`interface_vrf_change` now has mutually exclusive parameter semantics: `vrf` is only meaningful for `DPLANE_OP_INTF_DELETE`, while `vrf_id / name / tableid` are only meaningful for the update branch. Passing `NULL` / `0` for the unused set makes call sites ambiguous at a glance:
```c
// DELETE: vrf is live pointer; vrf_id/name/tableid are garbage sentinels
interface_vrf_change(op, vrf, 0, NULL, 0, ns_id);
// UPDATE: vrf is NULL; vrf_id/name/tableid carry the real data
interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id);
```
Consider splitting into `interface_vrf_delete(struct vrf *vrf)` and `interface_vrf_update(vrf_id_t vrf_id, const char *name, uint32_t tableid, ns_id_t ns_id)` so each call site is self-documenting and the compiler enforces that the caller supplies the right data for each operation.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "zebra: pass vrf pointer to interface_vrf..." | Re-trigger Greptile
| if (zif_type == ZEBRA_IF_VRF && !vrf_is_backend_netns()) | ||
| vrf = ifp->vrf; | ||
|
|
||
| if_delete_update(&ifp); | ||
|
|
||
| if (zif_type == ZEBRA_IF_VRF && !vrf_is_backend_netns()) | ||
| interface_vrf_change(op, ifindex, name, tableid, ns_id); | ||
| if (vrf) | ||
| interface_vrf_change(op, vrf, 0, NULL, 0, ns_id); |
There was a problem hiding this comment.
Missing topotest for VRF device create/delete path
No topotest covers the modified code path. While bgp_multi_vrf_topo1 and similar tests exercise VRF-based routing, none specifically validate that VRF device creation (DPLANE_OP_INTF_INSTALL) and deletion (DPLANE_OP_INTF_DELETE) through interface_vrf_change behave correctly with the new identifier sourcing. Per project policy, a dedicated topotest is required before merge — even for refactors — to guard against regressions on any future non-Linux dataplane that populates vrf_id differently from ifindex.
Context Used: You are a senior core maintainer of FreeRangeRouti... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: zebra/interface.c
Line: 2003-2009
Comment:
**Missing topotest for VRF device create/delete path**
No topotest covers the modified code path. While `bgp_multi_vrf_topo1` and similar tests exercise VRF-based routing, none specifically validate that VRF device creation (`DPLANE_OP_INTF_INSTALL`) and deletion (`DPLANE_OP_INTF_DELETE`) through `interface_vrf_change` behave correctly with the new identifier sourcing. Per project policy, a dedicated topotest is required before merge — even for refactors — to guard against regressions on any future non-Linux dataplane that populates `vrf_id` differently from `ifindex`.
**Context Used:** You are a senior core maintainer of FreeRangeRouti... ([source](https://app.greptile.com/review/custom-context?memory=instruction-0))
How can I resolve this? If you propose a fix, please make it concise.| static void interface_vrf_change(enum dplane_op_e op, struct vrf *vrf, vrf_id_t vrf_id, | ||
| const char *name, uint32_t tableid, ns_id_t ns_id) |
There was a problem hiding this comment.
Dual-mode function signature reduces readability
interface_vrf_change now has mutually exclusive parameter semantics: vrf is only meaningful for DPLANE_OP_INTF_DELETE, while vrf_id / name / tableid are only meaningful for the update branch. Passing NULL / 0 for the unused set makes call sites ambiguous at a glance:
// DELETE: vrf is live pointer; vrf_id/name/tableid are garbage sentinels
interface_vrf_change(op, vrf, 0, NULL, 0, ns_id);
// UPDATE: vrf is NULL; vrf_id/name/tableid carry the real data
interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id);Consider splitting into interface_vrf_delete(struct vrf *vrf) and interface_vrf_update(vrf_id_t vrf_id, const char *name, uint32_t tableid, ns_id_t ns_id) so each call site is self-documenting and the compiler enforces that the caller supplies the right data for each operation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: zebra/interface.c
Line: 1506-1507
Comment:
**Dual-mode function signature reduces readability**
`interface_vrf_change` now has mutually exclusive parameter semantics: `vrf` is only meaningful for `DPLANE_OP_INTF_DELETE`, while `vrf_id / name / tableid` are only meaningful for the update branch. Passing `NULL` / `0` for the unused set makes call sites ambiguous at a glance:
```c
// DELETE: vrf is live pointer; vrf_id/name/tableid are garbage sentinels
interface_vrf_change(op, vrf, 0, NULL, 0, ns_id);
// UPDATE: vrf is NULL; vrf_id/name/tableid carry the real data
interface_vrf_change(op, NULL, vrf_id, name, tableid, ns_id);
```
Consider splitting into `interface_vrf_delete(struct vrf *vrf)` and `interface_vrf_update(vrf_id_t vrf_id, const char *name, uint32_t tableid, ns_id_t ns_id)` so each call site is self-documenting and the compiler enforces that the caller supplies the right data for each operation.
How can I resolve this? If you propose a fix, please make it concise.
Zebra was implicitly assuming that the VRF netdevice ifindex could be used as the VRF identifier, leading to casts from ifindex to vrf_id_t. While this works with the Linux kernel dataplane, it does not respect the dataplane API, which exposes these identifiers as distinct concepts.
The code is updated to use the VRF identifier explicitly provided by the dataplane context and removes the need for casting ifindex to vrf_id_t. In addition, the VRF delete path is clarified by no longer reading unused dataplane fields, avoiding confusion about which identifiers are actually required.
There is no behavior change for the Linux kernel dataplane.
Original PR20318 by maxime-leroy