Skip to content

Releases: impresso/impresso-essentials

Bbox-visualization extractor and preparation for integration of radio

02 Apr 15:35

Choose a tag to compare

This release corresponds to a major version increment, going from v0.3.0 to v1.0.0.
This increment is justified by the mergeing of two PRs; #27 and #29, which introduce new features and modify some variables meant for a shared use across the Impresso project.

Changelog:

  • Introduction of a new module: the bbox visualizer JSON extractor
    • General documentation here, PR #27
    • This module allows to generate a JSON file given any issue, page of content-item ID which is compatible with the bbox-viewer tool. This tool is hosted locally and allows to visualize all the bounding boxes for a given canonical element, enabling for much more efficient debugging at early stages of the processing pipeline.
  • First modifications meant to enable the integration of radio data in the impresso pipeline
    • Details in PR #29
    • Some variable names or values defined for project-wide use were modified and/or updated to accomodate for upcoming changes in the pipeline's scope:
      • KNOWN_JOURNALS and KNOWN_JOURNALS_DICT were respectively renamed to ALL_MEDIA and PARTNER_TO_MEDIA. Additionally, British Library (BL) titles which will be ingested soon were added.
      • SourceType Enum was created, allowing us to differentiate between the various source types encountered in Impresso
      • PARTNER_TO_SOURCE_TYPES was defined accordingly - will probably be updated in next version.
      • DataStage Enum was updated to uniformize the data stage values with the naming conventions we agreed upon.
      • The manifest is now only pushed to the staging branch of impresso-data-release according to the newly defined Impresso Corpus and Enrichments Release Protocol. Hence, the is_staging parameter of the DataManifest config is now deprecated and not used.