|
| 1 | +--- |
| 2 | +id: overview |
| 3 | +title: Overview |
| 4 | +slug: / |
| 5 | +--- |
| 6 | + |
| 7 | +# Tangle |
| 8 | + |
| 9 | +Tangle is a service and a Web app that allows the users to build and run Machine Learning pipelines using drag and drop without having to set up development environment. |
| 10 | + |
| 11 | +[](https://tangleml.com/tangle-ui) |
| 12 | + |
| 13 | +## What does a pipeline system do in a nutshell? |
| 14 | + |
| 15 | +A pipeline system like Tangle (also Cloud Pipelines/Vertex Pipelines/Kubeflow Pipelines): |
| 16 | + |
| 17 | +- **Orchestrates** (distributed execution, scheduling, data passing, caching) |
| 18 | +- **Containerized** (isolated inside containers) |
| 19 | +- **Command-line** (the true interface with user code is command line) |
| 20 | +- **Programs** (e.g. not functions passing shared in-memory objects) |
| 21 | + |
| 22 | +## What is a pipeline? |
| 23 | + |
| 24 | +Pipeline is a graph of tasks connected to each other (task outputs connected to task inputs). |
| 25 | +Tasks are instances of components. |
| 26 | +Components have name, inputs/outputs and implementation (a program). |
| 27 | + |
| 28 | +Pipeline can be submitted for execution as a pipeline run. |
| 29 | + |
| 30 | +When tasks are executed, they read input data, process it, produce output data. |
| 31 | + |
| 32 | +## Demo |
| 33 | + |
| 34 | +[Demo](https://tangleml-tangle.hf.space/#/quick-start) |
| 35 | + |
| 36 | +The experimental new version of the Tangle app is now available at [https://tangleml-tangle.hf.space/#/quick-start](https://tangleml-tangle.hf.space/#/quick-start). No registration is required to experiment with building pipelines. To be able to execute the pipelines, follow the [installation instructions](#installation). |
| 37 | + |
| 38 | +Please check it out and report any bugs you find using [GitHub Issues](https://github.com/tangleml/tangle/issues). |
| 39 | + |
| 40 | +The app is under active development, so expect some breakages as we work on the app and do not rely on the app for production. |
| 41 | + |
| 42 | +## Why should a company use Tangle? |
| 43 | + |
| 44 | +- Tracking and Reproducibility |
| 45 | + - Pipeline runs are recorded. Graph, logs, artifact metadata (size) and small values like metrics. |
| 46 | + - Intermediate data is immutable, never overwritten. This de-risks experimentation & sharing. |
| 47 | + - Each pipeline run can be cloned and re-submitted, producing same results, same models. |
| 48 | + - All components are strictly versioned. |
| 49 | +- Time and compute savings due to execution caching |
| 50 | + - Pipeline tasks that were previously executed are reused, saving time and compute. |
| 51 | +- Sharing |
| 52 | + - Team members can easily share pipeline runs. A user can easily investigate teammate's pipeline issue. A user can clone teammate’s pipeline, modify it and submit. |
| 53 | +- Component library |
| 54 | + - Team can create a library of reusable components that can be used by all team members. |
| 55 | +- Ease of onboarding. Can be used by non-engineers. |
| 56 | + - Users can create and run pipeline without writing code. No need to setup dev environment. |
| 57 | + - PMs can examine pipeline runs, track metrics and even run their own experiments. |
| 58 | + |
| 59 | +## Why should a single ML engineer use Tangle? |
| 60 | + |
| 61 | +- Tracking and Reproducibility |
| 62 | + - Even if you are on your own, automatic tracking and version control is useful. |
| 63 | +- Data passing, execution caching |
| 64 | + - No need to tinker with manual data caching between data transformations. |
| 65 | +- Non-intrusive |
| 66 | + - Components wrap what you already have. Any CLI program, any language, any container. |
| 67 | +- Components as re-usable bits of knowledge |
| 68 | + - Like Lego pieces, components are self-contained and easy to reuse. Each component is like a simple to use function, not a complex framework/library full of classes. |
| 69 | + - Can share components between multiple pipelines. Components are independent. No dependency hell. Can use different versions together if needed (e.g. to compare results). |
| 70 | + - Connect different languages and frameworks together. Python, Java, Shell, Ruby, C++, JS/TS |
| 71 | + - Forgot how to write a Tensorflow training loop? Just look at a 50-line “Train Tensorflow model” component, not a 1000-line end-to-end tutorial. Or just use that component as-is. |
| 72 | + |
| 73 | +## Comparison: Tangle vs other systems |
| 74 | + |
| 75 | +| Feature | Tangle | Kubeflow Pipelines | Vertex Pipelines | Airflow | |
| 76 | +| --------------------------------- | ------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- | |
| 77 | +| Code & license | Open-source | Open-source | Proprietary | Open-source | |
| 78 | +| Cloud support | Any cloud/local | Any Kubernetes | Google Cloud only | Local, hosted | |
| 79 | +| Data passing | Good | Good | Yes. But Artifacts vs Properties friction. | Rudimentary | |
| 80 | +| Execution caching | Content based,<br/>Global,<br/>succeeded/running | Lineage-based,<br/>Global,<br/>succeeded only | Lineage-based, per-pipeline,<br/>succeeded only | No | |
| 81 | +| No-code visual pipeline editor UI | Yes | No | No | No | |
| 82 | + |
| 83 | +## Tangle vs. Kubeflow Pipelines/Vertex Pipelines |
| 84 | + |
| 85 | +- Same idea. |
| 86 | +- Uses same `ComponentSpec`/`component.yaml` format introduced in KFP v1. |
| 87 | + This means that the components can be reused. The format has been stable since inception in 2018. |
| 88 | + (Warning: KFP v2 went through many cross-incompatible component formats.) |
| 89 | +- Tangle has better execution caching: content-based, global, can reuse running execution. (vs. lineage-based, per-pipeline, succeeded executions only) |
| 90 | +- Tangle can support different execution systems (different clouds). |
| 91 | + Kubeflow Pipelines can support any Kubernetes. |
| 92 | + Vertex Pipelines only supports Google Cloud Vertex AI. |
| 93 | + |
| 94 | +## Comparison: Tangle vs Airflow |
| 95 | + |
| 96 | +| Feature | Tangle | Airflow | |
| 97 | +| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ | |
| 98 | +| Component specification | Declarative. Can describe arbitrary CLI program. | Python operator class.<br/>Airflow-specific. | |
| 99 | +| Component code runs | Inside a container. <br/>Usually: remote, distributed. | Local Python. <br/>Local (but can call remote services) | |
| 100 | +| Data passing | Defined Inputs/Outputs. <br/>Explicit data connections. <br/>Arbitrary files/directories. Big data.<br/>System-managed data storage and passing. | No defined Inputs/Outputs. <br/>No task data connections. <br/>XComs. Small bag of JSON data. <br/>No managed data storage. | |
| 101 | +| Execution caching | Content based. <br/>Global. <br/>Succeeded/running. | No | |
| 102 | + |
| 103 | +## App features |
| 104 | + |
| 105 | +- Start building pipelines right away |
| 106 | + - Intuitive visual drag and drop interface |
| 107 | + - No registration required to build. You own your data. |
| 108 | +- Execute pipelines on your local machine or in Cloud |
| 109 | + - Easily install the app on local machine or deploy to cloud |
| 110 | + - Submit pipelines for execution with a single click. |
| 111 | + - Easily monitor all pipeline task executions, view the artifacts, read the logs. |
| 112 | +- Fast iteration |
| 113 | + - Clone any pipeline run and get a new editable pipeline |
| 114 | + - Create pipeline -> Submit run -> Monitor run -> Clone run -> Edit pipeline -> Submit run ... |
| 115 | +- Automatic execution caching and reuse |
| 116 | + - Save time and compute. Don't re-do what's done |
| 117 | + - Successful and even running executions are re-used from cache |
| 118 | +- Reproducibility |
| 119 | + - All your runs are kept forever (on your machine) - graph, logs, metadata |
| 120 | + - Re-run an old pipeline run with just two clicks (Clone pipeline, Submit run) |
| 121 | + - Containers and strict component versioning ensure reproducibility |
| 122 | +- Pipeline Components |
| 123 | + - Time-proven `ComponentSpec`/`component.yaml` format |
| 124 | + - A library of preloaded components |
| 125 | + - Fast-growing public component ecosystem |
| 126 | + - Add your own components (public or private) |
| 127 | + - Easy to create your own components manually or using the Cloud Pipelines SDK |
| 128 | + - Components can be written in [any language](https://github.com/Ark-kun/pipeline_components/tree/master/components/sample) (Python, Shell, R, Java, C#, etc). |
| 129 | + - Compatible with [Google Cloud Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) and [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/introduction/) |
| 130 | + - Lots of pre-built components on GitHub: [Ark-kun/pipeline_components](https://github.com/Ark-kun/pipeline_components/tree/master/components). |
| 131 | + |
| 132 | +We have many exciting features planned, but we want to prioritize the features based on the user feedback. |
| 133 | + |
| 134 | +## Credits |
| 135 | + |
| 136 | +The Tangle app is based on the [Pipeline Editor](https://cloud-pipelines.net/pipeline-editor) app created by [Alexey Volkov](https://github.com/Ark-kun) as part of the [Cloud Pipelines](https://github.com/Cloud-Pipelines) project. |
0 commit comments