Skip to content

Conversation

@karpnv
Copy link
Collaborator

@karpnv karpnv commented Jan 28, 2026

S3 support

Collection: ASR

Changelog

  • Use input manifest in s3 object storage (s3://abc/sharded_manifests/manifest_0.jsonl)
  • Add path to the s3 credentials file and section. --s3cfg Example: ~/.s3cfg[default]. Set to "" to disable S3 support. Default is "".
  • Add S3 path to tarred audio files --tar-base-path (e.g., s3://ASR/tarred/audio_0.tar).
    When specified, audio_filepath values in the manifest are treated as filenames within this tar archive.

Usage

  • You can potentially add a usage example below
python tools/speech_data_explorer/data_explorer.py s3://abc/sharded_manifests/manifest_0.json --tar-base-path s3://abc/tarred/audio_0.tar --s3cfg ~/.s3cfg[default]

GitHub Actions CI

PR Type:

  • [ V] New Feature
  • Bugfix
  • Documentation

karpnv and others added 2 commits January 27, 2026 18:13
@karpnv karpnv marked this pull request as ready for review January 31, 2026 00:23
@karpnv karpnv requested a review from Jorjeous January 31, 2026 00:23
@karpnv karpnv requested a review from vsl9 January 31, 2026 00:31
@github-actions github-actions bot removed the Run CICD label Jan 31, 2026
@github-actions
Copy link
Contributor

[🤖]: Hi @karpnv 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants