|
1 | | -[](https://nsf.gov/awardsearch/showAward?AWD_ID=2004562) |
| 1 | +[](https://nsf.gov/awardsearch/showAward?AWD_ID=2004562) |
2 | 2 | [](https://nsf.gov/awardsearch/showAward?AWD_ID=2004815) |
3 | 3 | [](https://nsf.gov/awardsearch/showAward?AWD_ID=2004839) |
4 | 4 | [](https://nsf.gov/awardsearch/showAward?AWD_ID=2004642) |
|
7 | 7 |
|
8 | 8 | Defines the core metadata model for iSamples. |
9 | 9 |
|
10 | | -`src/schemas/isamples_core.yaml` defines the iSamples core model in linkml. It references vocabularies contained in `[isamplesorg/vocabularies//vocabulary](https://github.com/isamplesorg/vocabularies/tree/develop/vocabulary)` which define terms for the Material Type, Sampled Feature, and Material Sample Object Type vocabularies. |
| 10 | +`src/schemas/isamples_core.yaml` defines the iSamples core model in LinkML. It references vocabularies contained in [`isamplesorg/vocabularies/vocabulary`](https://github.com/isamplesorg/vocabularies/tree/develop/vocabulary) which define terms for the Material Type, Sampled Feature, and Material Sample Object Type vocabularies. |
11 | 11 |
|
12 | | -The following artifacts are generated from the linkml and vocabulary sources: |
| 12 | +Documentation is available at https://isamplesorg.github.io/metadata/ |
13 | 13 |
|
14 | | -* Documentation in HTML, available at https://isamplesorg.github.io/metadata/ |
| 14 | +## Repository Structure |
15 | 15 |
|
| 16 | +``` |
| 17 | +metadata/ |
| 18 | +├── src/ |
| 19 | +│ └── schemas/ # LinkML schema definitions |
| 20 | +│ └── isamples_core.yaml |
| 21 | +├── background/ # Diagrams and information about existing models |
| 22 | +│ ├── DataCite/ |
| 23 | +│ ├── ESS-DIVE/ |
| 24 | +│ ├── GEOME-TDWG/ |
| 25 | +│ ├── GeoScience/ |
| 26 | +│ ├── ODM-CUAHSI/ |
| 27 | +│ └── OpenContext-Archae-anthro/ |
| 28 | +├── examples/ # Example metadata documents from different systems |
| 29 | +│ ├── APItesting/ |
| 30 | +│ ├── GEOME/ |
| 31 | +│ ├── geoJSON/ |
| 32 | +│ ├── iSamples/ |
| 33 | +│ ├── OpenContext/ |
| 34 | +│ ├── script/ |
| 35 | +│ ├── SESAR/ |
| 36 | +│ └── smithonsonian/ |
| 37 | +├── vocabulary/ # Vocabulary-related files |
| 38 | +├── tools/ # Modified docgen tool and templates for Quarto |
| 39 | +├── quarto/ # Quarto configuration files |
| 40 | +├── build/ # Build output (intermediate docs) |
| 41 | +│ └── docs/ # Generated markdown documentation |
| 42 | +├── tests/ # Test files |
| 43 | +└── notes/ # Development notes |
| 44 | +``` |
16 | 45 |
|
17 | 46 | ## Development |
18 | 47 |
|
19 | | -Linkml and associated tools require a python environment, version 3.9 or newer, and uses [poetry](https://python-poetry.org/) for dependency management. Poetry can be installed with `pip install poetry`. |
| 48 | +LinkML and associated tools require a Python environment (version 3.9 or newer) and uses [Poetry](https://python-poetry.org/) for dependency management. Poetry can be installed with `pip install poetry`. |
20 | 49 |
|
21 | 50 | To work on project contents and run artifact generators, first grab the source and switch to the develop branch: |
22 | 51 |
|
23 | | -``` |
| 52 | +```bash |
24 | 53 | git clone https://github.com/isamplesorg/metadata.git |
25 | 54 | cd metadata |
26 | | -checkout develop |
27 | | -pull |
| 55 | +git checkout develop |
| 56 | +git pull |
28 | 57 | ``` |
29 | 58 |
|
30 | | -Setup a virtual environment (e.g. using poetry or mkvirtualenv): |
| 59 | +Setup a virtual environment using Poetry: |
31 | 60 |
|
32 | | -``` |
| 61 | +```bash |
33 | 62 | poetry shell |
34 | 63 | poetry install |
35 | 64 | ``` |
36 | 65 |
|
37 | | - |
38 | 66 | (To exit poetry shell, use `exit`). |
39 | 67 |
|
40 | | -Artifacts in the `generated/` folder are produced by running `make` or `make all`. |
41 | | - |
42 | | -Documentation is rendered with [Quarto]() rather than the defaults `mkdocs` or `Sphinx` (Quarto offers many additional features for including computed examples which are planned). To generate the documentation, install a version of [Quarto >= 1.2](), then run `make`, `make all` or `make gen-docs`. |
| 68 | +Artifacts are produced by running `make` or `make all`. |
43 | 69 |
|
44 | | -This will generate markdown intermediate files in the `build/docs` folder then invoke `quarto render` to generate the HTML docs in the `docs/` folder. |
| 70 | +### Documentation Generation |
45 | 71 |
|
46 | | -Note that this project uses a version of the `linkml` `docgen` tool and templates modified to render markdown for `quarto`. The modified `docgen` and templates is located in the `tools/` folder. |
| 72 | +Documentation is rendered with [Quarto](https://quarto.org/) rather than the default `mkdocs` or `Sphinx` (Quarto offers many additional features for including computed examples). To generate the documentation: |
47 | 73 |
|
| 74 | +1. Install [Quarto >= 1.2](https://quarto.org/docs/get-started/) |
| 75 | +2. Run `make`, `make all` or `make gen-docs` |
48 | 76 |
|
49 | | -## Older notes below |
| 77 | +This will generate markdown intermediate files in the `build/docs` folder, then invoke `quarto render` to generate HTML documentation. |
50 | 78 |
|
51 | | -Collation of metadata examples and notes for the project |
| 79 | +Note that this project uses a modified version of the LinkML `docgen` tool and templates to render markdown for Quarto. The modified `docgen` and templates are located in the `tools/` folder. |
52 | 80 |
|
53 | | -- background: contains diagrams and information about some existing models that include metadata for samples; files are organized broadly by domain. |
54 | | -- examples: example metadata documents from different systems. Subfolders are |
55 | | - - raw: metadata from the originating system |
56 | | - - test: corresponding records generated manually using the iSamples basic template |
57 | | - - transform: corresponding records generated by automated ETL process from raw records |
58 | | -- vocabulary: vocabularies related to sample metadata from various systems |
| 81 | +## LinkML Schema Operations |
59 | 82 |
|
60 | | -# linkML (Current version 1.1.15) |
61 | | -This branch implments how to use linkML to generate various output and operations for iSamples. |
| 83 | +### Convert YAML schema to JSON schema |
62 | 84 |
|
63 | | -## Current workflow (01/01/2022) |
64 | | - |
| 85 | +```bash |
| 86 | +gen-json-schema -t PhysicalSampleRecord --not-closed src/schemas/isamples_core.yaml > isamples_core.schema.json |
| 87 | +``` |
65 | 88 |
|
| 89 | +The `-t PhysicalSampleRecord` option makes the "PhysicalSampleRecord" class the top-level class in the JSON schema. |
66 | 90 |
|
67 | | -## iSamples YAML schema to JSON schema |
68 | | -We could use the following command to convert iSamples YAML schema to JSON schema. |
| 91 | +### Generate JSON-LD context |
69 | 92 |
|
| 93 | +```bash |
| 94 | +gen-jsonld-context src/schemas/isamples_core.yaml > isamples_core.jsonld |
70 | 95 | ``` |
71 | | -gen-json-schema -t PhysicalSampleRecord --not-closed iSamplesSchemaBasic0.3.yaml > iSamplesSchemaBasic0.3.schema.json |
72 | | -``` |
73 | | -In this command, `-t PhysicalSampleRecord` means to make "physicalSampleRecord" class become the top level class. And the prepoerties of the class become the top level properties in the JSON-schema. The converted JSON scheme file is "iSamplesSchemaBasic0.3.schema.json". |
74 | 96 |
|
75 | | -## Generating JSON-LD context |
76 | | -``` |
77 | | -gen-jsonld-context iSamplesSchemaBasic0.3.yaml > iSampleSchemaBasic0.3.jsonld |
78 | | -``` |
79 | | -The command will save the result in the jsonld file. After we have the converted JSON-LD context. The enumeration part of JSON-context should be modified by us manually. |
80 | | -<details> |
81 | | - <summary>Modified JSON-LD context example</summary> |
82 | | -<pre> |
83 | | - "@context": { |
84 | | - "dct": "http://purl.org/dc/terms/", |
85 | | - "isam": "http://resource.isamples.org/schema/", |
86 | | - "mat": "http://resource.isamples.org/vocabulary/material/", |
87 | | - "pur": "http://resource.isamples.org/vocabulary/samplepurpose/", |
88 | | - "rdfs": "http://www.w3.org/2000/01/rdf-schema#", |
89 | | - "sf": "http://resource.isamples.org/vocabulary/sampledFeature/", |
90 | | - "skos": "http://www.w3.org/2004/02/skos/core#", |
91 | | - "spt": "http://resource.isamples.org/vocabulary/sampleobjecttype/", |
92 | | - "w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#", |
93 | | - "xsd": "http://www.w3.org/2001/XMLSchema#", |
94 | | - "@vocab": "http://resource.isamples.org/schema/", |
95 | | - "curation": { |
96 | | - "@type": "@id" |
97 | | - }, |
98 | | - "hasContextCategory": { |
99 | | - "@type":"contextcategory" |
100 | | - }, |
101 | | - "hasMaterialCategory": { |
102 | | - "@type":"materialtype" |
103 | | - }, |
104 | | - "has_sample_object_type": { |
105 | | - "@type":"specimencategory" |
106 | | - }, |
107 | | - "id": "@id", |
108 | | - "latitude": { |
109 | | - "@type": "xsd:decimal" |
110 | | - }, |
111 | | - "location": { |
112 | | - "@type": "@id" |
113 | | - }, |
114 | | - "longitude": { |
115 | | - "@type": "xsd:decimal" |
116 | | - }, |
117 | | - "producedBy": { |
118 | | - "@type": "@id" |
119 | | - }, |
120 | | - "relatedResource": { |
121 | | - "@type": "@id" |
122 | | - }, |
123 | | - "resultTime": { |
124 | | - "@type": "xsd:date" |
125 | | - }, |
126 | | - "samplingSite": { |
127 | | - "@type": "@id" |
128 | | - } |
129 | | - } |
130 | | -</pre> |
131 | | -</details> |
132 | | -This is an example of modified JSON-LD context. For each enumeartion, we use `@type` to declare enumeration type. |
| 97 | +After generating the JSON-LD context, the enumeration part may need manual modification. For each enumeration, use `@type` to declare the enumeration type. |
133 | 98 |
|
134 | | -## Validating schema and instance file |
135 | | -Before we valideting all instance files, we need to add modified JSON-LD context to the front of instances properties. |
136 | 99 | <details> |
137 | | - <summary>Full instance example</summary> |
138 | | -<pre> |
| 100 | + <summary>Example modified JSON-LD context</summary> |
| 101 | + |
| 102 | +```json |
139 | 103 | { |
140 | 104 | "@context": { |
141 | 105 | "dct": "http://purl.org/dc/terms/", |
142 | 106 | "isam": "http://resource.isamples.org/schema/", |
143 | 107 | "mat": "http://resource.isamples.org/vocabulary/material/", |
144 | | - "pur": "http://resource.isamples.org/vocabulary/samplepurpose/", |
145 | | - "rdfs": "http://www.w3.org/2000/01/rdf-schema#", |
146 | 108 | "sf": "http://resource.isamples.org/vocabulary/sampledFeature/", |
147 | 109 | "skos": "http://www.w3.org/2004/02/skos/core#", |
148 | 110 | "spt": "http://resource.isamples.org/vocabulary/sampleobjecttype/", |
149 | | - "w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#", |
150 | 111 | "xsd": "http://www.w3.org/2001/XMLSchema#", |
151 | 112 | "@vocab": "http://resource.isamples.org/schema/", |
152 | | - "curation": { |
153 | | - "@type": "@id" |
154 | | - }, |
155 | 113 | "hasContextCategory": { |
156 | | - "@type":"contextcategory" |
| 114 | + "@type": "contextcategory" |
157 | 115 | }, |
158 | 116 | "hasMaterialCategory": { |
159 | | - "@type":"materialtype" |
| 117 | + "@type": "materialtype" |
160 | 118 | }, |
161 | 119 | "has_sample_object_type": { |
162 | | - "@type":"specimencategory" |
| 120 | + "@type": "specimencategory" |
163 | 121 | }, |
164 | 122 | "id": "@id", |
165 | 123 | "latitude": { |
166 | 124 | "@type": "xsd:decimal" |
167 | 125 | }, |
168 | | - "location": { |
169 | | - "@type": "@id" |
170 | | - }, |
171 | 126 | "longitude": { |
172 | 127 | "@type": "xsd:decimal" |
173 | 128 | }, |
174 | | - "producedBy": { |
175 | | - "@type": "@id" |
176 | | - }, |
177 | | - "relatedResource": { |
178 | | - "@type": "@id" |
179 | | - }, |
180 | 129 | "resultTime": { |
181 | 130 | "@type": "xsd:date" |
182 | | - }, |
183 | | - "samplingSite": { |
184 | | - "@type": "@id" |
185 | 131 | } |
186 | | - }, |
187 | | - |
188 | | - |
189 | | - "@schema": "../../iSamplesSchemaBasic0.2.json", |
190 | | - "@id": "metadata/21547/Car2PIRE_0334", |
191 | | - "label": "PIRE_0334", |
192 | | - "sampleidentifier": "ark:/21547/Car2PIRE_0334", |
193 | | - "description": "", |
194 | | - "hasContextCategory": ["Marine Biome"], |
195 | | - "hasMaterialCategory": ["Organic Material"], |
196 | | - "has_sample_object_type": ["Whole Organism"], |
197 | | - "informalClassification": ["Gastropoda"], |
198 | | - "keywords": ["Aceh", "Sumatra","Indonesia","Asia", "Mollusca"], |
199 | | - "producedBy": { |
200 | | - "@id":"ark:/21547/Cas2INDO_2016_SEU_1B", |
201 | | - "label": "INDO_2016_SEU_1B", |
202 | | - "description": "expeditionCode: INDO_PIRE | samplingProtocol: ARMS | taxonomy team: MINV | projectId: 80", |
203 | | - "hasFeatureOfInterest": "coral reef", |
204 | | - "responsibility": ["Aji Wahyu Anggoro","Andrianus Sembiring"], |
205 | | - "resultTime": "2016-08-09", |
206 | | - "samplingSite": { |
207 | | - "description": "Shallow, coastal reef. Apparent exposure to current, Porites dominated. Less impacted bleaching site, high recruitment, 12 m.", |
208 | | - "label": "", |
209 | | - "location": { |
210 | | - "elevation": "maximumDepthInMeters: 12", |
211 | | - "latitude": 5.89430, |
212 | | - "longitude": 95.25293 |
213 | | - }, |
214 | | - "placeName": ["Pulau Seulako"] |
215 | | - } |
216 | | - }, |
217 | | - "registrant": "Chris Meyer", |
218 | | - "samplingPurpose": "genomic analysis", |
219 | | - "curation": { |
220 | | - "accessConstraints": "", |
221 | | - "curationLocation": "", |
222 | | - "responsibility": "" |
223 | | - }, |
224 | | - "relatedResource": { |
225 | | - "label":"subsample tissue", |
226 | | - "description":"", |
227 | | - "target":"ark:/21547/Cat2INDO106431.1", |
228 | | - "relationship":"subsample" |
229 | | - } |
| 132 | + } |
230 | 133 | } |
231 | | -</pre> |
| 134 | +``` |
232 | 135 | </details> |
233 | 136 |
|
234 | | -We need to use the following command to validate our instance files with schema. |
235 | | -``` |
236 | | -linkml-validate -s iSamplesSchemaBasic0.3.yaml instance.json |
237 | | -jsonschema -i instance.json iSamplesSchemaBasic0.3.schema.json |
| 137 | +### Validate instance files |
| 138 | + |
| 139 | +```bash |
| 140 | +linkml-validate -s src/schemas/isamples_core.yaml instance.json |
| 141 | +jsonschema -i instance.json isamples_core.schema.json |
238 | 142 | ``` |
239 | | -The first command is to validate instance file with yaml schema. The second command is to validate instance file with json schema. |
240 | 143 |
|
241 | | -## Run tools in a Docker container |
242 | | -The iSamples Metadata Docker container is based on the Docker container from the LinkML project [https://hub.docker.com/r/monarchinitiative/linkml/tags] |
| 144 | +The first command validates an instance file against the YAML schema. The second command validates against the JSON schema. |
243 | 145 |
|
244 | | -First you'll build the image: |
245 | | -`docker build -t isamples_linkml .` |
| 146 | +## Docker |
246 | 147 |
|
247 | | -Then, running it will open a bash shell opened to `/work`, which is the Docker container volume representing the iSamples metadata repository: |
248 | | -``docker run -a stdin -a stdout -i -t -v `pwd`:/work isamples_linkml`` |
| 148 | +The iSamples Metadata Docker container is based on the Docker container from the LinkML project ([https://hub.docker.com/r/monarchinitiative/linkml/tags](https://hub.docker.com/r/monarchinitiative/linkml/tags)). |
249 | 149 |
|
250 | | -Then use the following commands to generate LinkML: |
251 | | -* Command 1 |
252 | | -* Command 2 |
253 | | -* Command 3 |
| 150 | +Build the image: |
254 | 151 |
|
255 | | -## To do |
256 | | -- We still focus on implementing the iSamples schema under linkML requirements. |
257 | | -- There are some bugs or unimplemented parts in the linkML. |
258 | | -- The different pc platform will have different results or errors. We prefer to use [docker](https://www.docker.com/products/docker-desktop) to run linkML. Please follow the [linkML tutorial](https://linkml.io/linkml/intro/install.html) |
| 152 | +```bash |
| 153 | +docker build -t isamples_linkml . |
| 154 | +``` |
| 155 | + |
| 156 | +Run the container (opens a bash shell with the repository mounted at `/work`): |
| 157 | + |
| 158 | +```bash |
| 159 | +docker run -a stdin -a stdout -i -t -v `pwd`:/work isamples_linkml |
| 160 | +``` |
0 commit comments