Skip to content

Commit 17e1f44

Browse files
authored
Content fixes (#16)
1 parent 5d7db42 commit 17e1f44

File tree

9 files changed

+874
-123
lines changed

9 files changed

+874
-123
lines changed

docs/component-development/creating-components-generic.mdx

Lines changed: 590 additions & 0 deletions
Large diffs are not rendered by default.

docs/component-development/creating-components.mdx

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@ Learn more about the Oasis CLI tool in the [Oasis CLI Manual](/docs/component-de
1313

1414
## Lightweight Python Components
1515

16-
Instead of rebuilding containers for every code change, the Python code goes **in the command line**, outside the container:
16+
As you learn in the [Creating Components](/docs/component-development/creating-components-generic) guide, you can create a component by writing a code and containerizing it. But this approach may be time-consuming.
17+
For Python function, you can use the Oasis CLI tool to generate a component specification from your function. Instead of rebuilding containers for every code change, the Python code goes **in the command line**, outside the container:
1718

1819
```yaml
1920
implementation:

docs/core-concepts/caching.mdx

Lines changed: 54 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ sidebar_label: Caching
44
description: Learn how TangleML's sophisticated caching system saves time and resources
55
---
66

7+
import Tabs from "@theme/Tabs";
8+
import TabItem from "@theme/TabItem";
9+
710
# Understanding Caching in TangleML
811

912
TangleML's caching system is one of its most powerful features, designed to dramatically reduce compute time and accelerate your ML pipeline iterations. Unlike traditional pipeline systems, TangleML implements sophisticated caching strategies that can save hours or even days of computation time.
@@ -131,35 +134,72 @@ After purging, you can still see:
131134

132135
When you need fresh data despite caching:
133136

134-
### 1. Date Parameters
137+
### 1. Cache Breaker Input (Nonce)
135138

136-
For database queries, include cutoff dates:
139+
The nonce is used to introduce non-determinicity to the component, ensuring that the component is re-executed even if the inputs haven't changed.
140+
Nonce can be a random string or the current time from the caller.
137141

138142
```python
139143
@component
140-
def fetch_user_data(end_date: str) -> Dataset:
141-
# Changing end_date naturally breaks cache
144+
def search_web(
145+
query: str,
146+
nonce=None, # type: str | None
147+
) -> Results:
148+
# Pass timestamp or random value to nonce parameter
142149
...
143150
```
144151

145-
### 2. Cache Breaker Input
152+
In case of scheduled runs, pipelines may use a placeholder for incoming timestamp.
146153

147-
For volatile sources without date control:
154+
<Tabs>
155+
<TabItem value="Template with placeholder">
156+
```yaml
157+
...
158+
inputs:
159+
- name: Pipeline Creation Time
160+
type: String
161+
default: '{{CreationTime}}'
162+
...
163+
```
164+
</TabItem>
148165

149-
```python
150-
@component
151-
def search_web(query: str, cache_breaker: str = "") -> Results:
152-
# Pass timestamp or random value to cache_breaker
153-
...
166+
<TabItem value="Pipeline with actual value">
167+
```yaml
168+
...
169+
inputs:
170+
- name: Pipeline Creation Time
171+
type: String
172+
default: '2025-11-07 05:30:07.837468+00:00'
173+
...
174+
```
175+
</TabItem>
176+
</Tabs>
177+
178+
This substituted input can be used to pass the cache breaker value to the component:
179+
180+
```yaml
181+
...
182+
arguments:
183+
nonce:
184+
graphInput:
185+
inputName: Pipeline Creation Time
154186
```
155187

156-
### 3. Disable Caching
188+
189+
### 2. Disable Caching via `maxCacheStaleness`
157190

158191
For components that should never cache:
159192

160193
```yaml
161-
caching_strategy:
162-
max_cache_staleness: "P0D" # Disables caching
194+
...
195+
implementation:
196+
graph:
197+
tasks:
198+
...
199+
Fill all missing values using Pandas on CSV data:
200+
executionOptions:
201+
cachingStrategy:
202+
maxCacheStaleness: P0D
163203
```
164204

165205
<img src={require("./assets/Caching_DisableCache.png").default} alt="Disable Caching" />

docs/index.md

Lines changed: 0 additions & 105 deletions
This file was deleted.

docs/index.mdx

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
---
2+
id: overview
3+
title: Overview
4+
slug: /
5+
---
6+
7+
# Tangle
8+
9+
Tangle is a service and a Web app that allows the users to build and run Machine Learning pipelines using drag and drop without having to set up development environment.
10+
11+
[![image](https://github.com/user-attachments/assets/0ce7ccc0-dad7-4f6a-8677-f2adcd83f558)](https://tangleml.com/tangle-ui)
12+
13+
## What does a pipeline system do in a nutshell?
14+
15+
A pipeline system like Tangle (also Cloud Pipelines/Vertex Pipelines/Kubeflow Pipelines):
16+
17+
- **Orchestrates** (distributed execution, scheduling, data passing, caching)
18+
- **Containerized** (isolated inside containers)
19+
- **Command-line** (the true interface with user code is command line)
20+
- **Programs** (e.g. not functions passing shared in-memory objects)
21+
22+
## What is a pipeline?
23+
24+
Pipeline is a graph of tasks connected to each other (task outputs connected to task inputs).
25+
Tasks are instances of components.
26+
Components have name, inputs/outputs and implementation (a program).
27+
28+
Pipeline can be submitted for execution as a pipeline run.
29+
30+
When tasks are executed, they read input data, process it, produce output data.
31+
32+
## Demo
33+
34+
[Demo](https://tangleml-tangle.hf.space/#/quick-start)
35+
36+
The experimental new version of the Tangle app is now available at [https://tangleml-tangle.hf.space/#/quick-start](https://tangleml-tangle.hf.space/#/quick-start). No registration is required to experiment with building pipelines. To be able to execute the pipelines, follow the [installation instructions](#installation).
37+
38+
Please check it out and report any bugs you find using [GitHub Issues](https://github.com/tangleml/tangle/issues).
39+
40+
The app is under active development, so expect some breakages as we work on the app and do not rely on the app for production.
41+
42+
## Why should a company use Tangle?
43+
44+
- Tracking and Reproducibility
45+
- Pipeline runs are recorded. Graph, logs, artifact metadata (size) and small values like metrics.
46+
- Intermediate data is immutable, never overwritten. This de-risks experimentation & sharing.
47+
- Each pipeline run can be cloned and re-submitted, producing same results, same models.
48+
- All components are strictly versioned.
49+
- Time and compute savings due to execution caching
50+
- Pipeline tasks that were previously executed are reused, saving time and compute.
51+
- Sharing
52+
- Team members can easily share pipeline runs. A user can easily investigate teammate's pipeline issue. A user can clone teammate’s pipeline, modify it and submit.
53+
- Component library
54+
- Team can create a library of reusable components that can be used by all team members.
55+
- Ease of onboarding. Can be used by non-engineers.
56+
- Users can create and run pipeline without writing code. No need to setup dev environment.
57+
- PMs can examine pipeline runs, track metrics and even run their own experiments.
58+
59+
## Why should a single ML engineer use Tangle?
60+
61+
- Tracking and Reproducibility
62+
- Even if you are on your own, automatic tracking and version control is useful.
63+
- Data passing, execution caching
64+
- No need to tinker with manual data caching between data transformations.
65+
- Non-intrusive
66+
- Components wrap what you already have. Any CLI program, any language, any container.
67+
- Components as re-usable bits of knowledge
68+
- Like Lego pieces, components are self-contained and easy to reuse. Each component is like a simple to use function, not a complex framework/library full of classes.
69+
- Can share components between multiple pipelines. Components are independent. No dependency hell. Can use different versions together if needed (e.g. to compare results).
70+
- Connect different languages and frameworks together. Python, Java, Shell, Ruby, C++, JS/TS
71+
- Forgot how to write a Tensorflow training loop? Just look at a 50-line “Train Tensorflow model” component, not a 1000-line end-to-end tutorial. Or just use that component as-is.
72+
73+
## Comparison: Tangle vs other systems
74+
75+
| Feature | Tangle | Kubeflow Pipelines | Vertex Pipelines | Airflow |
76+
| --------------------------------- | ------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- |
77+
| Code & license | Open-source | Open-source | Proprietary | Open-source |
78+
| Cloud support | Any cloud/local | Any Kubernetes | Google Cloud only | Local, hosted |
79+
| Data passing | Good | Good | Yes. But Artifacts vs Properties friction. | Rudimentary |
80+
| Execution caching | Content based,<br/>Global,<br/>succeeded/running | Lineage-based,<br/>Global,<br/>succeeded only | Lineage-based, per-pipeline,<br/>succeeded only | No |
81+
| No-code visual pipeline editor UI | Yes | No | No | No |
82+
83+
## Tangle vs. Kubeflow Pipelines/Vertex Pipelines
84+
85+
- Same idea.
86+
- Uses same `ComponentSpec`/`component.yaml` format introduced in KFP v1.
87+
This means that the components can be reused. The format has been stable since inception in 2018.
88+
(Warning: KFP v2 went through many cross-incompatible component formats.)
89+
- Tangle has better execution caching: content-based, global, can reuse running execution. (vs. lineage-based, per-pipeline, succeeded executions only)
90+
- Tangle can support different execution systems (different clouds).
91+
Kubeflow Pipelines can support any Kubernetes.
92+
Vertex Pipelines only supports Google Cloud Vertex AI.
93+
94+
## Comparison: Tangle vs Airflow
95+
96+
| Feature | Tangle | Airflow |
97+
| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
98+
| Component specification | Declarative. Can describe arbitrary CLI program. | Python operator class.<br/>Airflow-specific. |
99+
| Component code runs | Inside a container. <br/>Usually: remote, distributed. | Local Python. <br/>Local (but can call remote services) |
100+
| Data passing | Defined Inputs/Outputs. <br/>Explicit data connections. <br/>Arbitrary files/directories. Big data.<br/>System-managed data storage and passing. | No defined Inputs/Outputs. <br/>No task data connections. <br/>XComs. Small bag of JSON data. <br/>No managed data storage. |
101+
| Execution caching | Content based. <br/>Global. <br/>Succeeded/running. | No |
102+
103+
## App features
104+
105+
- Start building pipelines right away
106+
- Intuitive visual drag and drop interface
107+
- No registration required to build. You own your data.
108+
- Execute pipelines on your local machine or in Cloud
109+
- Easily install the app on local machine or deploy to cloud
110+
- Submit pipelines for execution with a single click.
111+
- Easily monitor all pipeline task executions, view the artifacts, read the logs.
112+
- Fast iteration
113+
- Clone any pipeline run and get a new editable pipeline
114+
- Create pipeline -> Submit run -> Monitor run -> Clone run -> Edit pipeline -> Submit run ...
115+
- Automatic execution caching and reuse
116+
- Save time and compute. Don't re-do what's done
117+
- Successful and even running executions are re-used from cache
118+
- Reproducibility
119+
- All your runs are kept forever (on your machine) - graph, logs, metadata
120+
- Re-run an old pipeline run with just two clicks (Clone pipeline, Submit run)
121+
- Containers and strict component versioning ensure reproducibility
122+
- Pipeline Components
123+
- Time-proven `ComponentSpec`/`component.yaml` format
124+
- A library of preloaded components
125+
- Fast-growing public component ecosystem
126+
- Add your own components (public or private)
127+
- Easy to create your own components manually or using the Cloud Pipelines SDK
128+
- Components can be written in [any language](https://github.com/Ark-kun/pipeline_components/tree/master/components/sample) (Python, Shell, R, Java, C#, etc).
129+
- Compatible with [Google Cloud Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) and [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/introduction/)
130+
- Lots of pre-built components on GitHub: [Ark-kun/pipeline_components](https://github.com/Ark-kun/pipeline_components/tree/master/components).
131+
132+
We have many exciting features planned, but we want to prioritize the features based on the user feedback.
133+
134+
## Credits
135+
136+
The Tangle app is based on the [Pipeline Editor](https://cloud-pipelines.net/pipeline-editor) app created by [Alexey Volkov](https://github.com/Ark-kun) as part of the [Cloud Pipelines](https://github.com/Cloud-Pipelines) project.

0 commit comments

Comments
 (0)