fix(merge_entities): handle long descriptions, embedder failures, and storage wrappers by 69blade69 · Pull Request #2916 · HKUDS/LightRAG

69blade69 · 2026-04-08T16:31:00Z

Summary

This PR addresses three closely related failure modes in _merge_entities_impl that all surface as the same user-visible bug: a merge of a graph hub entity (one with many neighbours) goes through partway, then aborts, leaving the graph in a half-migrated state.

Update: the original PR only had the description-truncate fix from the title. After deploying it I hit two more failure modes while merging hubs at scale and pushed an additional commit (8f0aea7) addressing them. Title and description rewritten to match the wider scope.

1. Truncate relation embed content (extends the original fix)

The original fix in this PR truncated description only for the entity vdb upsert at the end of _merge_entities_impl. But the loop that re-embeds each migrated relation right above (around line 1426) builds a content string from the relation's own description, which can also exceed the embedder's context window — relations that have been merged before may have descriptions accumulated from prior merges.

When merging Иисин (a hub with ~270 neighbours) the merge would crash mid-loop on a single relation whose accumulated description was too long:
```
ollama._types.ResponseError: the input length exceeds the context length (status code: 400)
File ".../lightrag/utils_graph.py", line 1441, in _merge_entities_impl
await relationships_vdb.upsert(relation_data_for_vdb)
```

This commit applies the same truncate (1500 chars for description, 200 for keywords) to the embedded content for each relation. The full description and full keywords are still stored in the VDB row metadata; only the string fed to the embedder is shortened.

2. Wrap entity and relation embed calls in try/except

Even with the truncate, embedders can fail for other transient reasons (network blip, model context drift, an unusually pathological character sequence). Without the try/except, one failed embedding aborts the whole merge, leaving the graph in a partially-migrated state where the target has been updated but the source entities have not been deleted (because we never reach the delete loop further down).

Wrap the entity vdb upsert and the per-relation vdb upsert in try/except. We log a warning and keep going. Relation chunk tracking is already updated separately above the upsert, so the graph stays consistent even if a single VDB row is missed; the next merge attempt or a subsequent reprocess will fix it.

3. Coerce raw node_data through dict() before _merge_attributes

This is the most subtle of the three. Some graph backends (notably PostgreSQL + Apache AGE through asyncpg) return a wrapper object from get_node whose .get() method returns None for every key — the attribute access protocol is implemented but dict.get semantics are not. When we feed that wrapper straight into _merge_attributes below:

```python
values = [data.get(key) for data in data_list if data.get(key)]
if not values:
continue # ← key is dropped!
```

…every value comes back falsy, the loop drops every key, and merged_entity_data ends up as an empty dict. Then on the line await chunk_entity_relation_graph.upsert_node(target_entity, merged_entity_data) we overwrite the existing target's description with empty.

We hit this in production: after a merge attempt aborted on a downstream embedder failure (#2 above), a hub entity was left with description=\"\" in the graph store while the VDB row still had the old 2.9KB description. We had to write a one-shot recovery script that read the VDB content back and called edit_entity to restore the graph node. The fix here prevents the regression entirely: coerce the wrapper into a plain dict before _merge_attributes sees it.

Verification

Tested against LightRAG v1.4.13 with PGGraphStorage (Apache AGE 1.5.0) and Ollama nomic-embed-text. Smoke merges before and after this commit:

Vault Refs ← Vault Ref (small, 2 sources): worked before, still works.
OpenClaw ← Openclaw (medium, 53 relations): worked before, still works.
Иисин ← Iisin/main (hub, 271 relations): previously aborted with the input length exceeds the context length, now succeeds in ~20 minutes (slow only because of the embedder rebuild for that many edges), with no errors in the server log.

After the merge, Иисин graph node still has the full description and all the relations are correctly migrated.

Notes

One-file change. Two commits in this PR — happy to squash if maintainers prefer.
All three sub-fixes are in _merge_entities_impl. They are tightly coupled (the embedder failures and the silent description nuking are two halves of the same incident) and easier to review together than as three separate PRs.
No public API or schema changes.
Independent of fix(utils_graph): coerce storage values to plain dict to fix vars() TypeError on all write endpoints #2915 (vars() TypeError on storage wrapper) — those are completely different code paths and either PR can be merged first.

… window When merging entities the default `concatenate` strategy joins the descriptions of every source entity into one long string. For graph "hub" entities (those that appear in many documents) the result routinely exceeds the embedder's context window. With Ollama's `nomic-embed-text` (default 2048-token context) this surfaces as ollama._types.ResponseError: the input length exceeds the context length (status code: 400) and the merge aborts before the VDB upsert. The fix is to truncate the description that is fed into `content` for the embedding call. The full, untruncated description is still stored in `merged_entity_data["description"]` and therefore in the graph node properties — only the `content` string used for embedding is shortened. `1500` characters ≈ `~500` tokens, which leaves comfortable headroom for the entity name on top and stays well below the 2048 default. The ellipsis suffix preserves the cue that more text exists for any human inspecting `content`. This change is minimal and surgical: only the embedded `content` is affected, not what gets persisted into graph storage or VDB metadata.

…ars() TypeError `POST /graph/entity/create`, `POST /graph/entities/merge`, `POST /graph/entity/edit`, and the relation equivalents return HTTP 500 even when the underlying database operation succeeds, when the graph storage backend is Apache AGE on PostgreSQL. The merge/create/edit actually completes — the entity is in the graph, the VDB row is in place, the relations are migrated — but the client cannot tell the operation went through, which makes the API look broken. ## Repro 1. LightRAG with `LIGHTRAG_GRAPH_STORAGE=PGGraphStorage` (Apache AGE on PostgreSQL). 2. ` curl -X POST http://127.0.0.1:9621/graph/entity/create -H \"X-API-Key: \$KEY\" -H 'Content-Type: application/json' -d '{\"entity_name\":\"TestE\",\"entity_data\":{\"entity_type\":\"Test\",\"description\":\"smoke\"}}' ` 3. Server returns HTTP 500 \"Internal Server Error\". 4. Verify `GET /graph/entity/exists?name=TestE` → `true`. The create **did happen** — only the response crashed. Server log: \`\`\` INFO: Entity Create: 'TestE' successfully created INFO: 127.0.0.1:41662 - \"POST /graph/entity/create HTTP/1.1\" 500 ERROR: Exception in ASGI application Traceback (most recent call last): File \".../fastapi/encoders.py\", line 328, in jsonable_encoder data = dict(obj) TypeError: cannot convert dictionary update sequence element #0 to a sequence During handling of the above exception, another exception occurred: File \".../fastapi/encoders.py\", line 333, in jsonable_encoder data = vars(obj) TypeError: vars() argument must have __dict__ attribute ValueError: [TypeError('cannot convert dictionary update sequence element #0 to a sequence'), TypeError('vars() argument must have __dict__ attribute')] \`\`\` The same crash hits `/graph/entities/merge`, `/graph/entity/edit`, `/graph/relation/create`, `/graph/relation/edit` — basically every write endpoint that returns `get_entity_info(...)` / `get_relation_info(...)`. ## Root cause `get_entity_info` and `get_relation_info` (in `lightrag/utils_graph.py`) embed the raw `node_data` / `edge_data` returned by the storage backend's `get_node` / `get_edge` straight into the response under `graph_data`. For Apache AGE through PostgreSQL via asyncpg, that value is a wrapper object whose properties can't be cleanly serialized — neither `dict(obj)` nor `vars(obj)` work, and `jsonable_encoder` raises. Every write function in `utils_graph.py` ends with: \`\`\`python return await get_entity_info( chunk_entity_relation_graph, entities_vdb, entity_name, include_vector_data=True, ) \`\`\` …which means **all 5 write endpoints** propagate the crash: - `acreate_entity` → `/graph/entity/create` - `acreate_relation` → `/graph/relation/create` - `aedit_entity` → `/graph/entity/edit` - `aedit_relation` → `/graph/relation/edit` - `_merge_entities_impl` → `/graph/entities/merge` We hit this in production while trying to clean up duplicate entities at scale and spent significant time chasing it before realizing **every write was actually succeeding**. ## Fix Add `_coerce_to_plain_dict(obj)` helper that recursively converts a value into a plain JSON-serializable dict, falling back to `str(...)` for unknown leaf values and to an empty dict on failure. `get_entity_info` and `get_relation_info` now coerce both `graph_data` and `vector_data` through it before returning. This is a defensive fix at the seam between storage backends and the response layer. It does not touch storage code, does not touch the write functions, and does not change the public response schema — it just guarantees the response is always JSON-encodable. Backends that already return plain dicts (NetworkX, JSON, Neo4j with current asyncpg paths) are unaffected because `isinstance(obj, dict)` short-circuits the helper. ## Verification Tested against LightRAG v1.4.13 with PGGraphStorage (Apache AGE 1.5.0). After the patch: \`\`\` POST /graph/entity/create → HTTP 200, full response with graph_data + vector_data POST /graph/entities/merge → HTTP 200, full response with graph_data + vector_data POST /graph/entity/edit → HTTP 200 DELETE /documents/delete_entity → HTTP 200 (was already working — schema stays the same) \`\`\` Smoke merge of OpenClaw ← Openclaw (53 relations transferred): HTTP 200 in ~5 minutes (slow only because of the embedder rebuild for an entity with many neighbours), no errors in the server log, source removed from both VDB and AGE graph, target preserved. ## Notes - One-file change. Net +69/-12 lines. - A separate PR (HKUDS#2916) addresses the orthogonal Ollama context-length crash on `merge_entities`. They are independent and can be reviewed separately. - The cost of the helper is very small (a single `isinstance(obj, dict)` short-circuit when the storage already returns dicts), and it never raises — losing custom properties is strictly better than HTTP 500 from a successful write. - Happy to narrow the helper to only `get_entity_info` / `get_relation_info` if maintainers prefer to keep it scoped, or to push it into the `PGGraphStorage.get_node` path instead. This PR went with the helper at the boundary because it covers all backends in one place.

…calls + coerce node_data After deploying the original `description` truncate from this PR I hit two more failure modes while merging hub entities (entities that appear in many documents) at scale. This commit addresses all three together because they all surface as the same user-visible failure: a merge of a graph hub goes through partway, then aborts, leaving the graph in a half-migrated state. ## 1. Truncate relation embed content (extends the original fix) The original fix truncated `description` only for the **entity** vdb upsert at the end of `_merge_entities_impl`. But the loop that re-embeds each migrated relation right above (around line 1426) builds a `content` string from the relation's own description, which can also exceed the embedder's context window — relations that have been merged before may have descriptions accumulated from prior merges. When merging Иисин (a hub with ~270 neighbours) the merge would crash mid-loop on a single relation whose accumulated description was too long. Stack trace from the server log: ``` ollama._types.ResponseError: the input length exceeds the context length (status code: 400) ERROR: Error in ollama_embed: the input length exceeds the context length ERROR: Error merging entities: the input length exceeds the context length File ".../lightrag/utils_graph.py", line 1441, in _merge_entities_impl await relationships_vdb.upsert(relation_data_for_vdb) ``` Line 1441 = the inner relation upsert. This commit applies the same truncate (1500 chars for description, 200 for keywords) to the embedded `content` for each relation. The full description and full keywords are still stored in the VDB row metadata; only the string fed to the embedder is shortened. ## 2. Wrap entity and relation embed calls in try/except Even with the truncate, embedders can fail for other transient reasons (network blip, model context drift, an unusually pathological character sequence). Without the try/except, **one** failed embedding aborts the **whole** merge, leaving the graph in a partially-migrated state where the target has been updated but the source entities have not been deleted (because we never reach the delete loop further down). Wrap the entity vdb upsert and the per-relation vdb upsert in try/except. We log a warning and keep going. Relation chunk tracking is already updated separately above the upsert, so the graph stays consistent even if a single VDB row is missed; the next merge attempt or a subsequent reprocess will fix it. ## 3. Coerce raw node_data through dict() before _merge_attributes This is the most subtle of the three. Some graph backends (notably PostgreSQL + Apache AGE through asyncpg) return a wrapper object from `get_node` whose `.get()` method returns `None` for every key — the attribute access protocol is implemented but `dict.get` semantics are not. When we feed that wrapper straight into `_merge_attributes` below: ```python values = [data.get(key) for data in data_list if data.get(key)] if not values: continue # ← key is dropped! ``` …every value comes back falsy, the loop drops every key, and `merged_entity_data` ends up as an empty dict. Then on the line `await chunk_entity_relation_graph.upsert_node(target_entity, merged_entity_data)` we **overwrite** the existing target's description with empty. We hit this in production: after a merge attempt aborted on a downstream embedder failure (HKUDS#2 above), a hub entity was left with `description=""` in the graph store while the VDB row still had the old 2.9KB description. We had to write a one-shot recovery script that read the VDB content back and called `edit_entity` to restore the graph node. The fix here prevents the regression entirely: coerce the wrapper into a plain dict before `_merge_attributes` sees it. ## Verification Tested against LightRAG v1.4.13 with PGGraphStorage (Apache AGE 1.5.0) and Ollama nomic-embed-text. Smoke merges before and after this commit: - `Vault Refs ← Vault Ref` (small, 2 sources): worked before, still works. - `OpenClaw ← Openclaw` (medium, 53 relations): worked before, still works. - `Иисин ← Iisin/main` (hub, 271 relations): **previously aborted** with `the input length exceeds the context length`, **now succeeds** in ~20 minutes (slow only because of the embedder rebuild for that many edges), with no errors in the server log. After the merge, `Иисин` graph node still has the full description and all the relations are correctly migrated. ## Notes - One-file change. - All three sub-fixes are in `_merge_entities_impl`. They are tightly coupled (the embedder failures and the silent description nuking are two halves of the same incident) and easier to review together than as three separate PRs. - No public API or schema changes.

69blade69 · 2026-04-08T18:41:14Z

Update — three fixes in one PR

After deploying the original description truncate from this PR I hit two more failure modes while merging hub entities at scale (entities with hundreds of neighbours). They all surface as the same user-visible bug — a merge of a graph hub aborts mid-flight and leaves the graph half-migrated — so I added them as a follow-up commit (8f0aea7) and rewrote the title and description to match the wider scope.

TL;DR of the new fixes:

Same description-truncate but applied to relation embed content too (not just entity), because the relation upsert loop was also crashing on hub merges.
try/except around entity and relation VDB upserts so a single bad embedding does not abort the whole merge.
Coerce the value returned by get_node into a plain dict before _merge_attributes sees it. PostgreSQL+AGE returns a wrapper whose .get() always returns None, so _merge_attributes silently dropped every key, and upsert_node then overwrote the existing target's description with empty. We had to write a recovery script for a real entity in production after this happened. Worth fixing.

Verification covers small/medium/hub merges against PostgreSQL+AGE+Ollama. Hub merge (Иисин ← Iisin/main, 271 relations transferred) previously aborted with the embedder error from the original description and now succeeds with no errors. Two commits in the PR — happy to squash on request.

danielaskdd · 2026-04-10T17:59:38Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8f0aea7891

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T18:03:29Z

+    except Exception as _ent_upsert_exc:
+        logger.warning(
+            f"Entity Merge: skipping entity vdb upsert for "
+            f"`{target_entity}` due to {type(_ent_upsert_exc).__name__}: "
+            f"{_ent_upsert_exc}"


Fail merge when target entity vector upsert is skipped

Catching and suppressing entities_vdb.upsert(...) errors here can silently drop all entity vectors for the merge result: this function still proceeds to step 10 and deletes each source entity's vector row, so a transient embedding/VDB failure leaves neither source nor target entity searchable in the vector index while reporting a successful merge. This should remain a hard failure (or be retried/compensated) unless source-vector deletion is deferred until target upsert is confirmed.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T18:03:29Z

+        except Exception as _rel_upsert_exc:
+            logger.warning(
+                f"Entity Merge: skipping vdb upsert for "
+                f"`{normalized_src}`~`{normalized_tgt}` due to "
+                f"{type(_rel_upsert_exc).__name__}: {_rel_upsert_exc}"


Do not swallow relation upsert failures after VDB delete

This except turns relation embedding failures into silent data loss: the function deletes old relation vectors earlier (relationships_vdb.delete(...)) and then continues when a replacement upsert fails, so affected relations are left without vector entries even though the merge completes successfully. In environments with transient embedder/network errors, this permanently degrades relation retrieval until a manual reindex/backfill runs.

Useful? React with 👍 / 👎.

69blade69 mentioned this pull request Apr 8, 2026

[Bug/Investigation] AGE_edges count drifts from vdb_relation count after delete/merge operations #2917

Open

69blade69 mentioned this pull request Apr 8, 2026

fix(utils_graph): coerce storage values to plain dict to fix vars() TypeError on all write endpoints #2915

Open

69blade69 changed the title ~~fix(merge_entities): truncate description for embedder to fit context window~~ fix(merge_entities): handle long descriptions, embedder failures, and storage wrappers Apr 8, 2026

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(merge_entities): handle long descriptions, embedder failures, and storage wrappers#2916

fix(merge_entities): handle long descriptions, embedder failures, and storage wrappers#2916
69blade69 wants to merge 2 commits intoHKUDS:mainfrom
69blade69:fix/merge-entities-embedder-truncate

69blade69 commented Apr 8, 2026 •

edited

Loading

Uh oh!

69blade69 commented Apr 8, 2026

Uh oh!

danielaskdd commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

69blade69 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Truncate relation embed content (extends the original fix)

2. Wrap entity and relation embed calls in try/except

3. Coerce raw node_data through dict() before _merge_attributes

Verification

Notes

Uh oh!

69blade69 commented Apr 8, 2026

Update — three fixes in one PR

Uh oh!

danielaskdd commented Apr 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

69blade69 commented Apr 8, 2026 •

edited

Loading