Skip to content

[docker] feat: add Ascend A3 Dockerfile and docs#659

Merged
FoolPlayer merged 1 commit intoByteDance-Seed:mainfrom
phdddd:update-A3-Dockerfile
Apr 20, 2026
Merged

[docker] feat: add Ascend A3 Dockerfile and docs#659
FoolPlayer merged 1 commit intoByteDance-Seed:mainfrom
phdddd:update-A3-Dockerfile

Conversation

@phdddd
Copy link
Copy Markdown
Contributor

@phdddd phdddd commented Apr 15, 2026

What does this PR do?

This PR has added a Dockerfile for Ascend A3 and provided documentation for image construction. Additionally, some configuration code errors related to mixed precision training have been corrected.

Checklist Before Starting

  • Search for relative PRs/issues and link here: ...
  • PR title follows [{modules}] {type}: {description} format (enforced by check_pr_title.yml)
    • Allowed modules: agent, ci, ckpt, config, data, dist, docker, docs, logging, misc, model, omni, optim, ops, parallel, perf, release, task, trainer
    • Allowed types: feat, fix, refactor, chore, test
    • Breaking changes: prepend [BREAKING] — e.g. [BREAKING][parallel, model] feat: dynamic batching

Test

Validation results (training curves, eval metrics) for changes not covered by CI.

API and Usage Example

Show API changes and usage examples if applicable.

Design & Code Changes

High-level design description and specific change list.

Checklist Before Submitting

  • Read the Contribute Guide
  • Applied pre-commit checks
  • Added/updated documentation
  • If tasks/ training scripts were moved or renamed: updated docs/ examples and verified python3 scripts/ci/check_doc_task_paths.py passes (also enforced by the Check doc task paths CI workflow)
  • Added tests to CI workflow (or explained why not feasible)

@github-actions github-actions Bot added ascend everything about Ascend support doc Improvements or additions to documentation docker labels Apr 15, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Ascend A3 hardware by adding a specialized Dockerfile and a comprehensive build and usage guide. It also refactors several training scripts to utilize a centralized FSDP mixed precision configuration. However, the review identified several critical issues in the implementation: the build_parallelize_model function is being called with an incorrect keyword argument, and several other functions are receiving a configuration object where a boolean value is expected, which will lead to logic errors or runtime failures. Additionally, the new Dockerfile contains redundant layers and references an undefined build variable.

Comment thread tasks/deprecated_task/train_flux.py Outdated
Comment thread tasks/deprecated_task/train_qwen_vl.py Outdated
Comment thread tasks/deprecated_task/train_torch.py Outdated
Comment thread tasks/deprecated_task/train_torch.py Outdated
Comment thread tasks/deprecated_task/train_wan.py Outdated
Comment thread docker/ascend/Dockerfile.ascend_8.3rc2_a3
Comment thread tasks/deprecated_task/train_flux.py Outdated
Comment thread tasks/deprecated_task/train_wan.py Outdated
Comment thread docker/ascend/Dockerfile.ascend_8.3rc2_a3
Comment thread tasks/deprecated_task/train_flux.py Outdated
@phdddd phdddd force-pushed the update-A3-Dockerfile branch from 29ba4f6 to 445bae6 Compare April 20, 2026 02:47
@FoolPlayer FoolPlayer merged commit 8b8398d into ByteDance-Seed:main Apr 20, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ascend everything about Ascend support doc Improvements or additions to documentation docker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants