Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions preprocessors/Emilia/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,15 +67,15 @@ The Emilia-Pipe includes the following major steps:
2. Run the following commands to install the required packages:

```bash
conda create -y -n AudioPipeline python=3.9
conda create -y -n AudioPipeline python=3.10
conda activate AudioPipeline

bash env.sh
```

3. Download the model files from the third-party repositories.
- Manually download the checkpoints of UVR-MDX-NET-Inst_HQ_3 ([UVR-MDX-NET-Inst_3.onnx](https://github.com/TRvlvr/model_repo/releases/download/all_public_uvr_models/UVR-MDX-NET-Inst_HQ_3.onnx)) and DNSMOS P.835 ([sig_bak_ovr.onnx](https://github.com/microsoft/DNS-Challenge/blob/master/DNSMOS/DNSMOS/sig_bak_ovr.onnx)), then save their path for the next step configuration (i.e. #2 and #3 TODO).
- Creat the access token to pyannote/speaker-diarization-3.1 following [the guide](https://huggingface.co/pyannote/speaker-diarization-3.1#requirements), then save it for the next step configuration (i.e. #4 TODO).
- Create the access token to pyannote/speaker-diarization-community-1 following [the guide](https://huggingface.co/pyannote/speaker-diarization-community-1), then save it for the next step configuration (i.e. #4 TODO).
- Make sure you have stable connection to GitHub and HuggingFace. The checkpoints of Silero and Whisperx-medium will be downloaded automatically on the pipeline's first run.


Expand Down Expand Up @@ -175,7 +175,7 @@ Here are some potential improvements for the Emilia-Pipe pipeline:
We acknowledge the wonderful work by these excellent developers!
- Source Separation: [UVR-MDX-NET-Inst_HQ_3](https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models)
- VAD: [snakers4/silero-vad](https://github.com/snakers4/silero-vad)
- Speaker Diarization: [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)
- Speaker Diarization: [pyannote/speaker-diarization-community-1](https://huggingface.co/pyannote/speaker-diarization-community-1)
- ASR: [m-bain/whisperX](https://github.com/m-bain/whisperX), using [faster-whisper](https://github.com/guillaumekln/faster-whisper) and [CTranslate2](https://github.com/OpenNMT/CTranslate2) backend.
- DNSMOS Prediction: [DNSMOS P.835](https://github.com/microsoft/DNS-Challenge)

Expand Down
13 changes: 11 additions & 2 deletions preprocessors/Emilia/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# LICENSE file in the root directory of this source tree.

import argparse
import inspect
import json
import librosa
import numpy as np
Expand Down Expand Up @@ -151,6 +152,10 @@ def speaker_diarization(audio):
"channel": 0,
}
)
# pyannote.audio 4.x returns a rich output object with speaker_diarization.
# Legacy versions return Annotation directly.
if hasattr(segments, "speaker_diarization"):
segments = segments.speaker_diarization

diarize_df = pd.DataFrame(
segments.itertracks(yield_label=True),
Expand Down Expand Up @@ -523,9 +528,13 @@ def main_process(audio_path, save_path=None, audio_name=None):
"You can get the token at https://huggingface.co/settings/tokens. "
"Remeber grant access following https://github.com/pyannote/pyannote-audio?tab=readme-ov-file#tldr"
)
from_pretrained_params = inspect.signature(Pipeline.from_pretrained).parameters
auth_kwarg = (
"token" if "token" in from_pretrained_params else "use_auth_token"
)
dia_pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
use_auth_token=cfg["huggingface_token"],
"pyannote/speaker-diarization-community-1",
**{auth_kwarg: cfg["huggingface_token"]},
)
dia_pipeline.to(device)

Expand Down
Loading