Skip to content

Missing Mapping Dictionaries in Saved Model (CSV Training) #289

@Jmartinezc-rgb

Description

@Jmartinezc-rgb

Description:

I am encountering an issue during inference with a model trained using a CSV file. The saved model does not include the necessary mapping dictionaries (entity_to_idx and relation_to_idx), which are typically stored in a DB file. This absence leads to a large number of triples being skipped during inference due to invalid keys.

Steps to Reproduce:

  1. Train an Ampligraph model (e.g., ComplEx, TransE) using data loaded from a CSV file.
  2. Save the trained model using ampligraph.utils.save_model().
  3. Load the saved model using ampligraph.utils.restore_model().
  4. Attempt to perform inference using the loaded model on a new set of triples.
  5. Observe a significant number of triples being skipped due to "invalid keys".
  6. Attempt to use ScoringBasedEmbeddingModel.get_invalid_keys() for filtering. Note that this function does not operate correctly without the mapping dictionaries.

Expected Behavior:

The saved model should include the mapping dictionaries (entity_to_idx and relation_to_idx) either within the model file itself or in a separate associated file (e.g., a DB file) to ensure proper inference on new data. Functions like get_invalid_keys() should be able to effectively identify and filter triples with unknown entities or relations using these saved mappings.

Actual Behavior:

The saved model does not contain the mapping dictionaries. Consequently, during inference, a large number of triples are flagged as containing invalid keys and skipped. The get_invalid_keys() function is ineffective in this scenario.

Additional Information:

  • Ampligraph Version: 2.1.0 (latest)
  • Training Data Format: CSV file
  • The ampligraph.utils.save_model() function does not appear to have a parameter to explicitly save the mapping dictionaries when training from CSV data.

Questions:

  1. What is the recommended way to ensure that the entity and relation mapping dictionaries are saved along with the model when training with CSV data?
  2. Is there a specific procedure or parameter that needs to be used during training or saving to include these mappings?
  3. If the DB file is not automatically saved, how can we manually save or access these dictionaries after training so they can be used for inference with the saved model?

Thank you for your time and assistance with this issue.

Reported by: Javier Martínez

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions