Skip to content

Build fully automated build chain for TensorFlow and all other kernels #74

@achimnol

Description

@achimnol

TensorFlow v1.7 will be the last version that supports our current CUDA 8.0 + cuDNN 6.0 build chain.
From TensorFlow v1.8 we need to upgrade CUDA.

Currently we build the images on a dedicated physical machine, which has a single CUDA version.
For maximum stability and automation, it would be nice to run our builds on spot p2.xlarge/p3.xlarge instances with an appropriate Amazon DeepLearning Base AMI (v4.0 or v6.0).

Let's write scripts to do this.

  • Create a cloud build configuration that supports:
    • Automatically trigger the build process by git pushes to this repository and the kernel runner releases on PyPI
    • Build only modified Dockerfiles but with dependency checks to base images
    • Ability to force-rebuild specific images (manual trigger)
    • Push rebuilt images to the docker hub and designated private docker registries (for enterprise customers)
  • Optional but good to have
    • Save/load tarball'ed docker images for cache heating for docker builds (maybe from/to S3, or utilize EFS) -> comparison test required
    • Automatically run basic code execution tests against newly built images
      • maybe using ansible, pupeet, vagrant, etc. on temporary p2/p3 instances

Metadata

Metadata

Assignees

Labels

enhancementmajorMajor issue to solve in current milestone.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions