Skip to content

Switch diarization model in Emilia to pyannote community-1#487

Open
gonzalo-cordova-pou wants to merge 4 commits intoopen-mmlab:mainfrom
gonzalo-cordova-pou:feat/emilia-diarization-community-1
Open

Switch diarization model in Emilia to pyannote community-1#487
gonzalo-cordova-pou wants to merge 4 commits intoopen-mmlab:mainfrom
gonzalo-cordova-pou:feat/emilia-diarization-community-1

Conversation

@gonzalo-cordova-pou
Copy link
Copy Markdown

@gonzalo-cordova-pou gonzalo-cordova-pou commented Apr 17, 2026

✨ Description

This PR updates Emilia diarization to use pyannote/speaker-diarization-community-1 and adds compatibility handling so the pipeline works across pyannote API variants.

Changes:

  • preprocessors/Emilia/main.py
    • Switch diarization model from pyannote/speaker-diarization-3.1 to pyannote/speaker-diarization-community-1.
    • Support both auth argument styles:
      • token=... (pyannote 4.x)
      • use_auth_token=... fallback (older versions)
    • Normalize diarization output handling:
      • use out.speaker_diarization when present
      • fallback to legacy Annotation output
  • preprocessors/Emilia/README.md
    • Update model references from 3.1 to community-1.
    • Update environment example from Python 3.9 to 3.10.

Rationale:

  • pyannote reports community-1 as improved over legacy 3.1 on most published benchmarks (with dataset-specific variance).

🚧 Related Issues

Closes #486

👨‍💻 Changes Proposed

  • Migrate Emilia diarization model ID to community-1
  • Add cross-version pyannote auth/output compatibility handling
  • Update README setup/model references

🧑‍🤝‍🧑 Who Can Review?

@yuantuo666
@HarryHe11

✅ Checklist

  • Code has been reviewed
  • Code complies with the project's code standards and best practices
  • Code has passed all tests
  • Code does not affect the normal use of existing features
  • Code has been commented properly
  • Documentation has been updated (if applicable)
  • Demo/checkpoint has been attached (if applicable)

Test Notes

Manual smoke test executed with a real audio sample and HF token:

  • pipeline loads community-1
  • diarization runs successfully
  • output conversion path works (speaker_diarization/legacy fallback)
  • first segment produced correctly

@gonzalo-cordova-pou gonzalo-cordova-pou changed the title Feat/emilia diarization community 1 Switch diarization model in Emilia to pyannote community-1 Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Update Emilia diarization model to pyannote/speaker-diarization-community-1

1 participant