Skip to content

refactor#112

Merged
SeanLee97 merged 10 commits intomainfrom
upgrade
Oct 19, 2025
Merged

refactor#112
SeanLee97 merged 10 commits intomainfrom
upgrade

Conversation

@SeanLee97
Copy link
Copy Markdown
Owner

@SeanLee97 SeanLee97 commented Oct 18, 2025

  • use uv to manage dependencies
  • simplify the implementation
  • Remove all imports of AngleDataTokenizer
  • Remove all imports of DatasetFormats
  • Remove all .map(AngleDataTokenizer(...)) calls
  • Update dataset field names (textquery for Format B/C) OR use --column_rename_mapping
  • Add is_llm=True to LLM model initialization
  • Replace --prompt_template with --text_prompt, --query_prompt, or --doc_prompt
  • Update training scripts to use accelerate launch
  • Update evaluation code if using the return value
  • Support input data as a list of strings. New data formats:
    • A: {"text1": str | List[str], "text2": str | List[str], "label": float}
    • B: {"query": str | List[str], "positive": str | List[str]}
    • C: {"query": str | List[str], "positive": str | List[str], "negative": str | List[str]}
  • Support fsdp training
  • Update docs
  • Test code with the new version
  • migration guideline

@SeanLee97 SeanLee97 merged commit 1af7261 into main Oct 19, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant