Releases: impresso/impresso-essentials
Releases · impresso/impresso-essentials
Bbox-visualization extractor and preparation for integration of radio
This release corresponds to a major version increment, going from v0.3.0 to v1.0.0.
This increment is justified by the mergeing of two PRs; #27 and #29, which introduce new features and modify some variables meant for a shared use across the Impresso project.
Changelog:
- Introduction of a new module: the bbox visualizer JSON extractor
- General documentation here, PR #27
- This module allows to generate a JSON file given any issue, page of content-item ID which is compatible with the bbox-viewer tool. This tool is hosted locally and allows to visualize all the bounding boxes for a given canonical element, enabling for much more efficient debugging at early stages of the processing pipeline.
- First modifications meant to enable the integration of radio data in the impresso pipeline
- Details in PR #29
- Some variable names or values defined for project-wide use were modified and/or updated to accomodate for upcoming changes in the pipeline's scope:
KNOWN_JOURNALSandKNOWN_JOURNALS_DICTwere respectively renamed toALL_MEDIAandPARTNER_TO_MEDIA. Additionally, British Library (BL) titles which will be ingested soon were added.SourceTypeEnum was created, allowing us to differentiate between the various source types encountered in ImpressoPARTNER_TO_SOURCE_TYPESwas defined accordingly - will probably be updated in next version.DataStageEnum was updated to uniformize the data stage values with the naming conventions we agreed upon.- The manifest is now only pushed to the staging branch of impresso-data-release according to the newly defined Impresso Corpus and Enrichments Release Protocol. Hence, the
is_stagingparameter of the DataManifest config is now deprecated and not used.