Skip to content

DTS As Collections

Bridget Almas edited this page Jul 25, 2016 · 1 revision

Bridget's prose on DTS as Collections:

  1. If I've learned anything from CTS, it's that if it it takes a lot of prose to explain and implement the model, it is too complicated for something that can be applied by a wide audience. If a goal of DTS is to make a flexible data model and API that will enable interoperable services for retrieval of citeable passages of text then it has to be, to a large degree, self-explanatory.
  2. I believe we can think of of digital documents as collections of data objects, where each object may itself be a collection. Each citeable passage of text in the document is itself either a collection or simply a low-level item in the collection, or both.
  3. There are various types of digital document collections, each of which might have a unique set of capabilities. A CTS work collection is one which contains CTS versions and requires that all CTS versions are collections which contain passages which adhere to the same identification scheme. A non-CTS type of work collection contains collections of documents which contain collections of diversely identified passages.
  4. Any given citeable item in a collection must be uniquely identifiable
  5. A collection should declare what ontology it uses to describe the relationship between items in it
  6. Any given citeable item in a collection can have metadata which describes it
  7. We need an API with operations to create, read, update, delete, query, list and traverse items in a collection
  8. We need to create formal definitions (data types) of common models of document collections that describe their capabilities so that a producer of a document collection can declare in an unambiguous, machine-readable way, what can be expected from each item in it and what actions are available on it
  9. How individual items in a collection, and a collection itself, are identified should be orthogonal to the model. CTS URNs should work, but also so should arks, handles, urls, etc. If a collection model requires a certain type of identifier that should be declared in the collection capabilities.
  10. For citing into a a collection item (i.e. a passage fragment) we should look closely at the the W3C annotation text selector models and try to use them if possible. Ideally we should recommend/support a variety of different fragment selector models and try not to invent our own unless we really need to.
  11. FRBR should not drive the model, but it should be possible (and maybe recommended) to provide FRBR metadata about any given item in a text collection.

Clone this wiki locally