Skip to content

Add FrozenDocumentLoader for offline / vetted JSON-LD contexts#250

Open
anatoly-scherbakov wants to merge 21 commits intomasterfrom
frozen-document-loader
Open

Add FrozenDocumentLoader for offline / vetted JSON-LD contexts#250
anatoly-scherbakov wants to merge 21 commits intomasterfrom
frozen-document-loader

Conversation

@anatoly-scherbakov
Copy link
Copy Markdown
Collaborator

@anatoly-scherbakov anatoly-scherbakov commented Apr 26, 2026

Why?

For tight security deployments, it might be necessary to:

  • Ensure that the system does not require network access and does not download arbitrary data from the Web,
  • But still make certain white listed contexts/remote documents available to JSON-LD processing.

Summary

  • Adds pyld.DocumentLoader (ABC) + pyld.RemoteDocument (TypedDict) — the first class-based loader contract in PyLD; existing function-based loaders remain valid.
  • Adds pyld.FrozenDocumentLoader: a class-based loader that serves only URLs in its documents allowlist and refuses everything else with JsonLdError(code='loading document failed'). Suitable for air-gapped runs, reproducible builds, and security-hardened deployments. Honors the W3C JSON-LD Best Practices recommendation that clients SHOULD attempt to use a locally cached version of contexts (§ Cache JSON-LD Contexts).
  • Adds pyld.BUNDLED_CONTEXTS: 8 vendored W3C / W3ID JSON-LD contexts (ActivityStreams, DID v1, VC v1/v2, Linked Data Security v1/v2, Ed25519-2020, JWS-2020), refreshable with make download-bundled-contexts. With no arguments FrozenDocumentLoader() serves these; pass dict(BUNDLED_CONTEXTS, **extras) to extend.

Design notes

  • documents: dict[str, dict | Path]Path entries are read & parsed lazily on first request and cached in place. __post_init__ makes a defensive copy so the caller's mapping is never mutated.
  • RemoteDocument is a real TypedDict (previously only a docstring word). Subclasses get a typed __call__ return.
  • Bundled *.jsonld files ship via setup.py's package_data and MANIFEST.in's recursive-include.

Test plan

  • pytest tests/test_frozen_document_loader.py -v — 8 new tests, all pass.
  • Full non-network suite: pytest -m "not network" — 1477 passed, 43 skipped, 14 xfailed.
  • make lint clean.
  • End-to-end smoke: jsonld.expand(doc_with_did_v1_context, options={'documentLoader': FrozenDocumentLoader()}) resolves with no network.
  • Refusal smoke: unknown URL raises JsonLdError(code='loading document failed', type='jsonld.LoadDocumentError').
  • Bundle refresh: make download-bundled-contexts succeeds and is idempotent.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 26, 2026

@anatoly-scherbakov anatoly-scherbakov changed the base branch from master to fix-expand-base April 26, 2026 20:31
Base automatically changed from fix-expand-base to master April 27, 2026 07:28
@mielvds
Copy link
Copy Markdown
Collaborator

mielvds commented Apr 30, 2026

Hi @anatoly-scherbakov , thanks for this. This is a useful feature. Before I review, two questions:

  • it is necessary to have these local contexts in this repo? This seems difficult to maintain. Can't make download-bundled-contexts be part of the install procedure?
  • do we need to revisit unittests that require network?

@anatoly-scherbakov
Copy link
Copy Markdown
Collaborator Author

@mielvds thanks for the review!

Can't make download-bundled-contexts be part of the install procedure?

I would advise against that.

  • If pyld is being installed from a corporate PyPI mirror in a secure environment, Internet access might be restricted, and the installation will fail
  • Or, network just might be glitchy

Developers who employ the new FrozenDocumentLoader will vendor in the contexts, — that's the whole point of it. It is expected that vendored contexts are:

  • vetted,
  • verified for security,
  • and will not change.

This is the most straightforward way to avoid security vulnerabilities introduced by context spoofing.

For future versions of JSON-LD, the Working Group is going to work on security features that should reduce the possibility of context spoofing attacks.

Regardless of these developments, an air-gapped DocumentLoader that does not allow network access and whitelists remote documents is a solution already available to current JSON-LD systems.

cc @BigBlueHat what do you think?

Comment thread lib/pyld/documentloader/frozen/__init__.py Outdated
Comment thread lib/pyld/documentloader/base.py
Comment thread tests/test_frozen_document_loader.py
@anatoly-scherbakov anatoly-scherbakov requested a review from mielvds May 4, 2026 09:20
Copy link
Copy Markdown
Collaborator

@mielvds mielvds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! But let's give the original maintainers a couple of days to comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants