Add FrozenDocumentLoader for offline / vetted JSON-LD contexts#250
Add FrozenDocumentLoader for offline / vetted JSON-LD contexts#250anatoly-scherbakov wants to merge 21 commits intomasterfrom
Conversation
c1e156e to
98b7891
Compare
…`, `RemoteDocument` from `pyld`
98b7891 to
a99447f
Compare
|
Hi @anatoly-scherbakov , thanks for this. This is a useful feature. Before I review, two questions:
|
|
@mielvds thanks for the review!
I would advise against that.
Developers who employ the new
This is the most straightforward way to avoid security vulnerabilities introduced by context spoofing. For future versions of JSON-LD, the Working Group is going to work on security features that should reduce the possibility of context spoofing attacks. Regardless of these developments, an air-gapped cc @BigBlueHat what do you think? |
mielvds
left a comment
There was a problem hiding this comment.
Looks good to me! But let's give the original maintainers a couple of days to comment
Why?
For tight security deployments, it might be necessary to:
Summary
pyld.DocumentLoader(ABC) +pyld.RemoteDocument(TypedDict) — the first class-based loader contract in PyLD; existing function-based loaders remain valid.pyld.FrozenDocumentLoader: a class-based loader that serves only URLs in itsdocumentsallowlist and refuses everything else withJsonLdError(code='loading document failed'). Suitable for air-gapped runs, reproducible builds, and security-hardened deployments. Honors the W3C JSON-LD Best Practices recommendation that clients SHOULD attempt to use a locally cached version of contexts (§ Cache JSON-LD Contexts).pyld.BUNDLED_CONTEXTS: 8 vendored W3C / W3ID JSON-LD contexts (ActivityStreams, DID v1, VC v1/v2, Linked Data Security v1/v2, Ed25519-2020, JWS-2020), refreshable withmake download-bundled-contexts. With no argumentsFrozenDocumentLoader()serves these; passdict(BUNDLED_CONTEXTS, **extras)to extend.Design notes
documents: dict[str, dict | Path]—Pathentries are read & parsed lazily on first request and cached in place.__post_init__makes a defensive copy so the caller's mapping is never mutated.RemoteDocumentis a realTypedDict(previously only a docstring word). Subclasses get a typed__call__return.*.jsonldfiles ship viasetup.py'spackage_dataandMANIFEST.in'srecursive-include.Test plan
pytest tests/test_frozen_document_loader.py -v— 8 new tests, all pass.pytest -m "not network"— 1477 passed, 43 skipped, 14 xfailed.make lintclean.jsonld.expand(doc_with_did_v1_context, options={'documentLoader': FrozenDocumentLoader()})resolves with no network.JsonLdError(code='loading document failed', type='jsonld.LoadDocumentError').make download-bundled-contextssucceeds and is idempotent.