TangleML
diff --git a/‎docs/component-development/creating-components-generic.mdx‎
Lines changed: 590 additions & 0 deletions b/‎docs/component-development/creating-components-generic.mdx‎
Lines changed: 590 additions & 0 deletions
diff --git a/‎docs/component-development/creating-components.mdx‎
Lines changed: 2 additions & 1 deletion b/‎docs/component-development/creating-components.mdx‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/core-concepts/caching.mdx‎
Lines changed: 54 additions & 14 deletions b/‎docs/core-concepts/caching.mdx‎
Lines changed: 54 additions & 14 deletions
diff --git a/‎docs/index.md‎
Lines changed: 0 additions & 105 deletions b/‎docs/index.md‎
Lines changed: 0 additions & 105 deletions
diff --git a/‎docs/index.mdx‎
Lines changed: 136 additions & 0 deletions b/‎docs/index.mdx‎
Lines changed: 136 additions & 0 deletions
@@ -13,7 +13,8 @@ Learn more about the Oasis CLI tool in the [Oasis CLI Manual](/docs/component-de
 
 ## Lightweight Python Components
 
-Instead of rebuilding containers for every code change, the Python code goes **in the command line**, outside the container:
+As you learn in the [Creating Components](/docs/component-development/creating-components-generic) guide, you can create a component by writing a code and containerizing it. But this approach may be time-consuming.
+For Python function, you can use the Oasis CLI tool to generate a component specification from your function. Instead of rebuilding containers for every code change, the Python code goes **in the command line**, outside the container:
 
 ```yaml
 implementation:
 
@@ -4,6 +4,9 @@ sidebar_label: Caching
 description: Learn how TangleML's sophisticated caching system saves time and resources
 ---
 
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
+
 # Understanding Caching in TangleML
 
 TangleML's caching system is one of its most powerful features, designed to dramatically reduce compute time and accelerate your ML pipeline iterations. Unlike traditional pipeline systems, TangleML implements sophisticated caching strategies that can save hours or even days of computation time.
@@ -131,35 +134,72 @@ After purging, you can still see:
 
 When you need fresh data despite caching:
 
-### 1. Date Parameters
+### 1. Cache Breaker Input (Nonce)
 
-For database queries, include cutoff dates:
+The nonce is used to introduce non-determinicity to the component, ensuring that the component is re-executed even if the inputs haven't changed.
+Nonce can be a random string or the current time from the caller.
 
 ```python
 @component
-def fetch_user_data(end_date: str) -> Dataset:
-    # Changing end_date naturally breaks cache
+def search_web(
+    query: str, 
+    nonce=None,  # type: str | None
+    ) -> Results:
+    # Pass timestamp or random value to nonce parameter
     ...
 ```
 
-### 2. Cache Breaker Input
+In case of scheduled runs, pipelines may use a placeholder for incoming timestamp.
 
-For volatile sources without date control:
+<Tabs>
+  <TabItem value="Template with placeholder">
+```yaml
+...
+inputs:
+  - name: Pipeline Creation Time
+    type: String
+    default: '{{CreationTime}}'
+...
+```
+</TabItem>
 
-```python
-@component
-def search_web(query: str, cache_breaker: str = "") -> Results:
-    # Pass timestamp or random value to cache_breaker
-    ...
+<TabItem value="Pipeline with actual value">
+```yaml
+...
+inputs:
+  - name: Pipeline Creation Time
+    type: String
+    default: '2025-11-07 05:30:07.837468+00:00'
+...
+```
+</TabItem>
+</Tabs>
+
+This substituted input can be used to pass the cache breaker value to the component:
+
+```yaml
+...
+    arguments:
+          nonce:
+            graphInput:
+              inputName: Pipeline Creation Time
 ```
 
-### 3. Disable Caching
+
+### 2. Disable Caching via `maxCacheStaleness`
 
 For components that should never cache:
 
 ```yaml
-caching_strategy:
-  max_cache_staleness: "P0D" # Disables caching
+...
+implementation:
+  graph:
+    tasks:
+      ...
+      Fill all missing values using Pandas on CSV data:
+        executionOptions:
+          cachingStrategy:
+            maxCacheStaleness: P0D
 ```
 
 <img src={require("./assets/Caching_DisableCache.png").default} alt="Disable Caching" />
 
@@ -0,0 +1,136 @@
+---
+id: overview
+title: Overview
+slug: /
+---
+
+# Tangle
+
+Tangle is a service and a Web app that allows the users to build and run Machine Learning pipelines using drag and drop without having to set up development environment.
+
+[![image](https://github.com/user-attachments/assets/0ce7ccc0-dad7-4f6a-8677-f2adcd83f558)](https://tangleml.com/tangle-ui)
+
+## What does a pipeline system do in a nutshell?
+
+A pipeline system like Tangle (also Cloud Pipelines/Vertex Pipelines/Kubeflow Pipelines):
+
+- **Orchestrates** (distributed execution, scheduling, data passing, caching)
+- **Containerized** (isolated inside containers)
+- **Command-line** (the true interface with user code is command line)
+- **Programs** (e.g. not functions passing shared in-memory objects)
+
+## What is a pipeline?
+
+Pipeline is a graph of tasks connected to each other (task outputs connected to task inputs).
+Tasks are instances of components.
+Components have name, inputs/outputs and implementation (a program).
+
+Pipeline can be submitted for execution as a pipeline run.
+
+When tasks are executed, they read input data, process it, produce output data.
+
+## Demo
+
+[Demo](https://tangleml-tangle.hf.space/#/quick-start)
+
+The experimental new version of the Tangle app is now available at [https://tangleml-tangle.hf.space/#/quick-start](https://tangleml-tangle.hf.space/#/quick-start). No registration is required to experiment with building pipelines. To be able to execute the pipelines, follow the [installation instructions](#installation).
+
+Please check it out and report any bugs you find using [GitHub Issues](https://github.com/tangleml/tangle/issues).
+
+The app is under active development, so expect some breakages as we work on the app and do not rely on the app for production.
+
+## Why should a company use Tangle?
+
+- Tracking and Reproducibility
+  - Pipeline runs are recorded. Graph, logs, artifact metadata (size) and small values like metrics.
+  - Intermediate data is immutable, never overwritten. This de-risks experimentation & sharing.
+  - Each pipeline run can be cloned and re-submitted, producing same results, same models.
+  - All components are strictly versioned.
+- Time and compute savings due to execution caching
+  - Pipeline tasks that were previously executed are reused, saving time and compute.
+- Sharing
+  - Team members can easily share pipeline runs. A user can easily investigate teammate's pipeline issue. A user can clone teammate’s pipeline, modify it and submit.
+- Component library
+  - Team can create a library of reusable components that can be used by all team members.
+- Ease of onboarding. Can be used by non-engineers.
+  - Users can create and run pipeline without writing code. No need to setup dev environment.
+  - PMs can examine pipeline runs, track metrics and even run their own experiments.
+
+## Why should a single ML engineer use Tangle?
+
+- Tracking and Reproducibility
+  - Even if you are on your own, automatic tracking and version control is useful.
+- Data passing, execution caching
+  - No need to tinker with manual data caching between data transformations.
+- Non-intrusive
+  - Components wrap what you already have. Any CLI program, any language, any container.
+- Components as re-usable bits of knowledge
+  - Like Lego pieces, components are self-contained and easy to reuse. Each component is like a simple to use function, not a complex framework/library full of classes.
+  - Can share components between multiple pipelines. Components are independent. No dependency hell. Can use different versions together if needed (e.g. to compare results).
+  - Connect different languages and frameworks together. Python, Java, Shell, Ruby, C++, JS/TS
+  - Forgot how to write a Tensorflow training loop? Just look at a 50-line “Train Tensorflow model” component, not a 1000-line end-to-end tutorial. Or just use that component as-is.
+
+## Comparison: Tangle vs other systems
+
+| Feature                           | Tangle                                           | Kubeflow Pipelines                            | Vertex Pipelines                                | Airflow       |
+| --------------------------------- | ------------------------------------------------ | --------------------------------------------- | ----------------------------------------------- | ------------- |
+| Code & license                    | Open-source                                      | Open-source                                   | Proprietary                                     | Open-source   |
+| Cloud support                     | Any cloud/local                                  | Any Kubernetes                                | Google Cloud only                               | Local, hosted |
+| Data passing                      | Good                                             | Good                                          | Yes. But Artifacts vs Properties friction.      | Rudimentary   |
+| Execution caching                 | Content based,<br/>Global,<br/>succeeded/running | Lineage-based,<br/>Global,<br/>succeeded only | Lineage-based, per-pipeline,<br/>succeeded only | No            |
+| No-code visual pipeline editor UI | Yes                                              | No                                            | No                                              | No            |
+
+## Tangle vs. Kubeflow Pipelines/Vertex Pipelines
+
+- Same idea.
+- Uses same `ComponentSpec`/`component.yaml` format introduced in KFP v1.
+  This means that the components can be reused. The format has been stable since inception in 2018.
+  (Warning: KFP v2 went through many cross-incompatible component formats.)
+- Tangle has better execution caching: content-based, global, can reuse running execution. (vs. lineage-based, per-pipeline, succeeded executions only)
+- Tangle can support different execution systems (different clouds).
+  Kubeflow Pipelines can support any Kubernetes.
+  Vertex Pipelines only supports Google Cloud Vertex AI.
+
+## Comparison: Tangle vs Airflow
+
+| Feature                 | Tangle                                                                                                                                        | Airflow                                                                                                                  |
+| ----------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
+| Component specification | Declarative. Can describe arbitrary CLI program.                                                                                              | Python operator class.<br/>Airflow-specific.                                                                              |
+| Component code runs     | Inside a container. <br/>Usually: remote, distributed.                                                                                         | Local Python. <br/>Local (but can call remote services)                                                                   |
+| Data passing            | Defined Inputs/Outputs. <br/>Explicit data connections. <br/>Arbitrary files/directories. Big data.<br/>System-managed data storage and passing. | No defined Inputs/Outputs. <br/>No task data connections. <br/>XComs. Small bag of JSON data. <br/>No managed data storage. |
+| Execution caching       | Content based. <br/>Global. <br/>Succeeded/running.                                                                                             | No                                                                                                                       |
+
+## App features
+
+- Start building pipelines right away
+  - Intuitive visual drag and drop interface
+  - No registration required to build. You own your data.
+- Execute pipelines on your local machine or in Cloud
+  - Easily install the app on local machine or deploy to cloud
+  - Submit pipelines for execution with a single click.
+  - Easily monitor all pipeline task executions, view the artifacts, read the logs.
+- Fast iteration
+  - Clone any pipeline run and get a new editable pipeline
+  - Create pipeline -> Submit run -> Monitor run -> Clone run -> Edit pipeline -> Submit run ...
+- Automatic execution caching and reuse
+  - Save time and compute. Don't re-do what's done
+  - Successful and even running executions are re-used from cache
+- Reproducibility
+  - All your runs are kept forever (on your machine) - graph, logs, metadata
+  - Re-run an old pipeline run with just two clicks (Clone pipeline, Submit run)
+  - Containers and strict component versioning ensure reproducibility
+- Pipeline Components
+  - Time-proven `ComponentSpec`/`component.yaml` format
+  - A library of preloaded components
+  - Fast-growing public component ecosystem
+  - Add your own components (public or private)
+  - Easy to create your own components manually or using the Cloud Pipelines SDK
+  - Components can be written in [any language](https://github.com/Ark-kun/pipeline_components/tree/master/components/sample) (Python, Shell, R, Java, C#, etc).
+  - Compatible with [Google Cloud Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) and [Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/introduction/)
+  - Lots of pre-built components on GitHub: [Ark-kun/pipeline_components](https://github.com/Ark-kun/pipeline_components/tree/master/components).
+
+We have many exciting features planned, but we want to prioritize the features based on the user feedback.
+
+## Credits
+
+The Tangle app is based on the [Pipeline Editor](https://cloud-pipelines.net/pipeline-editor) app created by [Alexey Volkov](https://github.com/Ark-kun) as part of the [Cloud Pipelines](https://github.com/Cloud-Pipelines) project.