You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+77-32Lines changed: 77 additions & 32 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,18 @@
1
1
# data-dictionary-cui-mapping
2
2
3
-
This package allows you to load in a data dictionary and semi-automatically query appropriate UMLS concepts using either the UMLS API, MetaMap API, and/or Semantic Search through a custom Pinecone vector database .
3
+
This package assists with mapping a user's data dictionary fields to [UMLS](https://www.nlm.nih.gov/research/umls/index.html) concepts. It is designed to be modular and flexible to allow for different configurations and use cases.
4
+
5
+
Roughly, the high-level steps are as follows:
6
+
- Configure yaml files
7
+
- Load in data dictionary
8
+
- Preprocess desired columns
9
+
- Query for UMLS concepts using any or all of the following pipeline modules:
10
+
-**umls** (*UMLS API*)
11
+
-**metamap** (*MetaMap API*)
12
+
-**semantic_search** (*relies on access to a custom Pinecone vector database*)
13
+
-**hydra_search** (*combines any combination of the above three modules*)
14
+
- Manually curate/select concepts in excel
15
+
- Create data dictionary file with new UMLS concept fields
4
16
5
17
## Prerequisites
6
18
@@ -9,7 +21,7 @@ This package allows you to load in a data dictionary and semi-automatically quer
9
21
10
22
## Installation
11
23
12
-
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install data-dictionary-cui-mappingor pip install from the GitHub repo.
24
+
Use the package manager [pip](https://pip.pypa.io/en/stable/) to install [data-dictionary-cui-mapping](https://pypi.org/project/data-dictionary-cui-mapping/) from PyPI or pip install from the [GitHub repo](https://github.com/kevon217/data-dictionary-cui-mapping). The project uses [poetry](https://python-poetry.org/) for packaging and dependency management.
@@ -51,60 +63,93 @@ In order to run and customize these pipelines, you will need to create/edit yaml
51
63
│ │ │ embeddings.yaml
52
64
```
53
65
54
-
## UMLS API and MetaMap Batch Queries
66
+
## CUI Batch Query Pipelines
55
67
56
-
#### Import modules
68
+
69
+
### STEP-1A: RUN BATCH QUERY PIPELINE
70
+
###### IMPORT PACKAGES
57
71
58
72
```python
59
-
# import batch_query_pipeline modules from metamap OR umls package
60
-
from ddcuimap.metamap import batch_query_pipeline as mm_bqp
61
-
from ddcuimap.umls import batch_query_pipeline as umls_bqp
73
+
# from ddcuimap.umls import batch_query_pipeline as umls_bqp
74
+
# from ddcuimap.metamap import batch_query_pipeline as mm_bqp
75
+
# from ddcuimap.semantic_search import batch_hybrid_query_pipeline as ss_bqp
76
+
from ddcuimap.hydra_search import batch_hydra_query_pipeline as hs_bqp
62
77
63
-
# import helper functions for loading, viewing, composing configurations for pipeline run
64
78
from ddcuimap.utils import helper
65
79
from omegaconf import OmegaConf
66
-
67
-
# import modules to create data dictionary with curated CUIs and check the file for missing mappings
68
-
from ddcuimap.curation import create_dictionary_import_file
69
-
from ddcuimap.curation import check_cuis
70
80
```
71
-
####Load/edit configuration files
81
+
###### LOAD/EDIT CONFIGURATION FILES
72
82
```python
73
-
cfg = helper.compose_config.fn(overrides=["custom=de", "apis=config_metamap_api"]) # custom config for MetaMap on data element 'title' column
74
-
# cfg = helper.compose_config.fn(overrides=["custom=de", "apis=config_umls_api"]) # custom config for UMLS API on data element 'title' column
75
-
# cfg = helper.compose_config.fn(overrides=["custom=pvd", "apis=config_metamap_api"]) # custom config for MetaMap on 'permissible value descriptions' column
76
-
# cfg = helper.compose_config.fn(overrides=["custom=pvd", "apis=config_umls_api"]) # custom config for UMLS API on 'permissible value descriptions' column
###STEP-2B: CHECK CUIS IN DATA DICTIONARY IMPORT FILE
100
141
142
+
###### CHECK CUIS
101
143
```python
102
-
cfg = helper.load_config.fn(helper.choose_file.fn("Load config file from Step 2"))
103
-
check_cuis.check_cuis(cfg)
144
+
cfg_step2 = helper.load_config.fn(helper.choose_file("Load config file from Step 2"))
145
+
df_check = check_cuis.check_cuis(cfg_step2)
146
+
print(df_check.head())
104
147
```
105
148
106
149
## Output: Data Dictionary + CUIs
107
-
Below is the final output of the data dictionary with curated CUIs.
150
+
Below is a sample modified data dictionary with curated CUIs after:
151
+
1. Running Steps 1-2 on **title** then taking the generated output dictionary file and;
152
+
2. Running Steps 1-2 again on **permissible value descriptions** to get the final output dictionary file.
108
153
109
154
| variable name | title | data element concept identifiers | data element concept names | data element terminology sources | permissible values | permissible value descriptions | permissible value output codes | permissible value concept identifiers | permissible value concept names | permissible value terminology sources |
print(f"The following columns were not found and will be excluded: {cols_excl}")
240
+
returncols
241
+
242
+
234
243
@task(name="Manual override of column values")
235
244
defoverride_cols(df, override: dict):
236
245
"""Custom function to accommodate current bug in BRICS examples dictionary import process that wants multi-CUI concepts to have a single source terminology
0 commit comments