-
Notifications
You must be signed in to change notification settings - Fork 14
DTS As Collections
Bridget Almas edited this page Jul 25, 2016
·
1 revision
Bridget's prose on DTS as Collections:
- If I've learned anything from CTS, it's that if it it takes a lot of prose to explain and implement the model, it is too complicated for something that can be applied by a wide audience. If a goal of DTS is to make a flexible data model and API that will enable interoperable services for retrieval of citeable passages of text then it has to be, to a large degree, self-explanatory.
- I believe we can think of of digital documents as collections of data objects, where each object may itself be a collection. Each citeable passage of text in the document is itself either a collection or simply a low-level item in the collection, or both.
- There are various types of digital document collections, each of which might have a unique set of capabilities. A CTS work collection is one which contains CTS versions and requires that all CTS versions are collections which contain passages which adhere to the same identification scheme. A non-CTS type of work collection contains collections of documents which contain collections of diversely identified passages.
- Any given citeable item in a collection must be uniquely identifiable
- A collection should declare what ontology it uses to describe the relationship between items in it
- Any given citeable item in a collection can have metadata which describes it
- We need an API with operations to create, read, update, delete, query, list and traverse items in a collection
- We need to create formal definitions (data types) of common models of document collections that describe their capabilities so that a producer of a document collection can declare in an unambiguous, machine-readable way, what can be expected from each item in it and what actions are available on it
- How individual items in a collection, and a collection itself, are identified should be orthogonal to the model. CTS URNs should work, but also so should arks, handles, urls, etc. If a collection model requires a certain type of identifier that should be declared in the collection capabilities.
- For citing into a a collection item (i.e. a passage fragment) we should look closely at the the W3C annotation text selector models and try to use them if possible. Ideally we should recommend/support a variety of different fragment selector models and try not to invent our own unless we really need to.
- FRBR should not drive the model, but it should be possible (and maybe recommended) to provide FRBR metadata about any given item in a text collection.