-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Labels
enhancementmajorMajor issue to solve in current milestone.Major issue to solve in current milestone.
Milestone
Description
TensorFlow v1.7 will be the last version that supports our current CUDA 8.0 + cuDNN 6.0 build chain.
From TensorFlow v1.8 we need to upgrade CUDA.
Currently we build the images on a dedicated physical machine, which has a single CUDA version.
For maximum stability and automation, it would be nice to run our builds on spot p2.xlarge/p3.xlarge instances with an appropriate Amazon DeepLearning Base AMI (v4.0 or v6.0).
Let's write scripts to do this.
- Create a cloud build configuration that supports:
- Automatically trigger the build process by git pushes to this repository and the kernel runner releases on PyPI
- Build only modified Dockerfiles but with dependency checks to base images
- Ability to force-rebuild specific images (manual trigger)
- Push rebuilt images to the docker hub and designated private docker registries (for enterprise customers)
- Optional but good to have
- Save/load tarball'ed docker images for cache heating for docker builds (maybe from/to S3, or utilize EFS) -> comparison test required
- Automatically run basic code execution tests against newly built images
- maybe using ansible, pupeet, vagrant, etc. on temporary p2/p3 instances
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementmajorMajor issue to solve in current milestone.Major issue to solve in current milestone.