Magpie TTS duplicates audio at end of generation

**Describe the bug**

Magpie TTS will glitch and generate duplicated audio at the end of a generation, multiple times, with seemingly no pattern. The repetitions are always different intonations. For example, the text: "Hello" might produce: "Hello! ello!", "Hello! HELLOU!" (different, shouting intonation), "Helouloulouloulalala" (gibberish pronounciation, hallucinated words). I initially thought this was a problem with very short utterances, but it seems to manifest itself even for longer utterances, like sentences of five words or even more. If the text gets long enough, the issue seems to manifest less frequently, but it still does. In very, very rare scenarios, the audio that is repeated is not even the last segment, it is maybe second or third to last, but very close to the last portion.

Test have been conducted in the official `nvcr.io/nvidia/nemo` containers, both versions `25.11.01` and `25.09`, as you suggested in another [issue](https://github.com/NVIDIA-NeMo/NeMo/issues/15285) opened by me.

**Steps/Code to reproduce bug**

Here's a minimal script for testing. I'm dropping into `pdb` to generate examples on demand.

```
import asyncio

from nemo.collections.tts.models import MagpieTTSModel
from loguru import logger
import soundfile as sf
import torch
import os
import time

async def load_model(model_id: str = "nvidia/magpie_tts_multilingual_357m"):
    """Load Magpie TTS model."""

    logger.info(f"Loading Magpie TTS model: {model_id}")

    def _load():

        hf_token = os.environ.get("HUGGINGFACE_ACCESS_TOKEN") or os.environ.get("HF_TOKEN")

        if hf_token:
            os.environ["HF_TOKEN"] = hf_token

        model = MagpieTTSModel.from_pretrained(model_id)
        model = model.cuda()
        model.eval()
        return model

    start = time.time()
    _model = await asyncio.to_thread(_load)
    elapsed = time.time() - start
    logger.info(f"Magpie TTS model loaded in {elapsed:.1f}s")

    return _model


async def main():
    model = await load_model()

    text = 'Hello! Hi! How are you?'

    def generate(text: str):
        with torch.no_grad():
            audio, audio_len = model.do_tts(
                text,
                language="en",
                speaker_index=2,
                apply_TN=False,
            )

        audio_bytes = audio.float().detach().cpu().numpy()
        audio_bytes = audio_bytes.squeeze()
        sf.write("/app/debug/test.wav", audio_bytes, 22000)

    import pdb; pdb.set_trace()



if __name__ == "__main__":
    asyncio.run(main())
```

Then:
```
generate('Hi!')
# produces: Hi! HI! (second Hi different intonation, a bit cut off)
generate('Wow that\'s crazy!')
# produces: Wow that's crazy zy!
# or: Wow that's crazy crazy!
generate('Sure! Here are some interesting facts about GPUs')
#produces:  Sure! Here are some interesting facts about GPUs about GPUsZszs' (repetition, gibberish)
 ```

**Expected behavior**

Magpie TTS produces audio with no repetitions for short to medium utterances.

**Environment overview (please complete the following information)**

 - Environment location: Docker
 - Method of NeMo install: [pip install or from source]. mounted NeMo repository inside container. Tried branches master(`527b8c4`), `v2.6.0`, `v2.6.1`, no difference
 - If method of install is [Docker], provide `docker pull` & `docker run` commands used:
```
docker pull nvcr.io/nvidia/nemo:25.11.01
docker pull nvcr.io/nvidia/nemo:25.09
```

**Environment details**

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
- OS version
- PyTorch version
- Python version

**Additional context**

Add any other context about the problem here.
Example: GPU model: RTX 3090


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Magpie TTS duplicates audio at end of generation #15300

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Magpie TTS duplicates audio at end of generation #15300

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions