-
Notifications
You must be signed in to change notification settings - Fork 253
Description
Description:
I am encountering an issue during inference with a model trained using a CSV file. The saved model does not include the necessary mapping dictionaries (entity_to_idx and relation_to_idx), which are typically stored in a DB file. This absence leads to a large number of triples being skipped during inference due to invalid keys.
Steps to Reproduce:
- Train an Ampligraph model (e.g.,
ComplEx,TransE) using data loaded from a CSV file. - Save the trained model using
ampligraph.utils.save_model(). - Load the saved model using
ampligraph.utils.restore_model(). - Attempt to perform inference using the loaded model on a new set of triples.
- Observe a significant number of triples being skipped due to "invalid keys".
- Attempt to use
ScoringBasedEmbeddingModel.get_invalid_keys()for filtering. Note that this function does not operate correctly without the mapping dictionaries.
Expected Behavior:
The saved model should include the mapping dictionaries (entity_to_idx and relation_to_idx) either within the model file itself or in a separate associated file (e.g., a DB file) to ensure proper inference on new data. Functions like get_invalid_keys() should be able to effectively identify and filter triples with unknown entities or relations using these saved mappings.
Actual Behavior:
The saved model does not contain the mapping dictionaries. Consequently, during inference, a large number of triples are flagged as containing invalid keys and skipped. The get_invalid_keys() function is ineffective in this scenario.
Additional Information:
- Ampligraph Version: 2.1.0 (latest)
- Training Data Format: CSV file
- The
ampligraph.utils.save_model()function does not appear to have a parameter to explicitly save the mapping dictionaries when training from CSV data.
Questions:
- What is the recommended way to ensure that the entity and relation mapping dictionaries are saved along with the model when training with CSV data?
- Is there a specific procedure or parameter that needs to be used during training or saving to include these mappings?
- If the DB file is not automatically saved, how can we manually save or access these dictionaries after training so they can be used for inference with the saved model?
Thank you for your time and assistance with this issue.
Reported by: Javier Martínez