You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -13,8 +13,11 @@ Learn more about the Oasis CLI tool in the [Oasis CLI Manual](/docs/component-de
13
13
14
14
## Lightweight Python Components
15
15
16
-
As you learn in the [Creating Components](/docs/component-development/creating-components-generic) guide, you can create a component by writing a code and containerizing it. But this approach may be time-consuming.
17
-
For Python function, you can use the Oasis CLI tool to generate a component specification from your function. Instead of rebuilding containers for every code change, the Python code goes **in the command line**, outside the container:
16
+
Instead of [the conventional approach](/docs/component-development/creating-components-generic) where you write code, containerize it, publish to a registry, and manually create YAML configuration files, Oasis CLI tool automates this entire process.
17
+
You simply write your Python code and run a single command - the system automatically generates the YAML specification running code as a command, eliminating the need to manage Docker images or registries.
18
+
19
+
This approach called "Lightweight Python Components" and it dramatically reduces the time from code to component, allowing you to iterate faster and focus on the code logic rather than infrastructure.
20
+
18
21
19
22
```yaml
20
23
implementation:
@@ -29,97 +32,6 @@ implementation:
29
32
# Generated wrapper code handles I/O
30
33
```
31
34
32
-
### The `InputPath` and `OutputPath` annotations
33
-
34
-
The `text_path: InputPath()` annotation tells the system that the input data for the text input should be placed into some file and the path of that file should be given to the function as a value for the text_path function parameter.
35
-
36
-
The `filtered_text_path: OutputPath()` annotation tells the system that it should generate and give the function a path (via the filtered_text_path parameter) where the function should write the output data. After the function finishes the execution, the system will take the output data written by the function, put it into storage and make it available for passing to other components.
37
-
38
-
### Why do we need the `InputPath` parameter annotation?
39
-
40
-
Not all data can be passed/received as a simple string. Examples: binary data, large data, directories. In all these cases, the code should read data from a file or directory pointed to by a path. This is why we have a text_path: InputPath() parameter and not text: str parameter (although the latter could still work for short texts). Another reason why the InputPath annotation is needed is that the component function code is executed inside a hermetic container. The text file needs to somehow be placed inside the container. Only the system can do that. The text_path: InputPath() annotation tells the system that the input data for the text input should be placed into some file and the path of that file should be given to the function as a value for the text_path function parameter.
41
-
42
-
Similarly the filtered_text_path: OutputPath() parameter annotation is needed so that the system knows that it needs to get the output data out of the container when the function finishes its execution.
43
-
44
-
### Default parameter values
45
-
46
-
The `create_component_from_func` function supports functions with default parameter values. This results in the generated component inputs becoming optional.
47
-
48
-
Path parameters annotated with `InputPath()` can have a default value of `None` which makes those file inputs optional.
49
-
50
-
The default parameter values can use any Python built-in type. (Only the built-in types can be used because the function needs to remain self-contained).
51
-
52
-
```python
53
-
def some_func(
54
-
some_int: int = 3,
55
-
some_path: InputPath() = None,
56
-
):
57
-
from pathlib import Path
58
-
if some_path:
59
-
Path(some_path).read_text()
60
-
...
61
-
```
62
-
63
-
### Input and Output Types
64
-
65
-
While low-level TangleML does not enforce any types, the Oasis CLI generator (`components.create_component_from_func`) provides support for six basic Python types:
| `float` | `Float` | String to float conversion |
72
-
| `bool` | `Boolean` | String to boolean conversion |
73
-
| `list` | `JsonArray` | JSON serialization |
74
-
| `dict` | `JsonObject` | JSON serialization |
75
-
76
-
:::tip
77
-
**Beyond the Basics**: You can use any type annotation (like `XGBoostModel`), but unsupported types will be passed as strings. The generator only adds serialization/deserialization for the six basic types.
78
-
:::
79
-
80
-
The function parameters (the parameter names and type annotations) are mapped to component inputs and outputs in a certain way. This example demonstrates all aspects of the mapping
81
-
82
-
```python
83
-
def my_func(
84
-
# Directly supported types:
85
-
# Mapped to input with name "some_string" and type "String"
86
-
some_string: str,
87
-
# Mapped to input with name "some_string" and type "Integer"
88
-
some_integer: int,
89
-
# Mapped to input with name "some_float" and type "Float"
90
-
some_float: float,
91
-
# Mapped to input with name "some_boolean" and type "Boolean"
92
-
some_boolean: bool,
93
-
# Mapped to input with name "some_list" and type "JsonArray"
94
-
some_list: list,
95
-
# Mapped to input with name "some_dict" and type "JsonObject"
96
-
some_dict: dict,
97
-
98
-
# Mapped to input with name "any_thing" and no type (compatible with any type. Will receive a string value at runtime!)
99
-
any_thing,
100
-
101
-
# Other types
102
-
# Mapped to input with name "some_uri" and type "Uri" (Will receive a string value at runtime!)
103
-
some_uri: "Uri",
104
-
# Mapped to input with name "some_uri" and type "BigInt" (Will receive a string value at runtime!)
105
-
some_uri: BigInt,
106
-
107
-
# Paths:
108
-
# Mapped to input with name "input1" (the "_path" suffix is removed)
109
-
input1_path: InputPath(""),
110
-
# Mapped to output with name "output1" and type "CSV" (the "_path" suffix is removed)
111
-
output1_path: OutputPath("CSV"),
112
-
) -> typing.NamedTuple("Outputs", [
113
-
# Mapped to output with name "output_string" and type "String"
114
-
("output_string", str),
115
-
# Mapped to output with name "output_uri" and type "Uri" (function needs to return a string)
116
-
("output_uri", "Uri"),
117
-
]):
118
-
...
119
-
return ("Some string", "some-uri://...")
120
-
```
121
-
122
-
123
35
## Tutorial: Creating a Lightweight Python Component
124
36
125
37
This guide walks you through creating a TangleML component that performs regex-based text replacement. The component reads an input text file, replaces all substrings matching a given regex pattern, and writes the result to an output file.
@@ -489,9 +401,10 @@ To use the component, drop it into your pipeline and configure the inputs. Click
@@ -564,3 +482,97 @@ To combine flags, add their values:
564
482
```python
565
483
flags = 2 + 8 # IGNORECASE + MULTILINE = 10
566
484
```
485
+
486
+
</details>
487
+
488
+
## Afterthoughts
489
+
490
+
### The `InputPath` and `OutputPath` annotations
491
+
492
+
The `text_path: InputPath()` annotation tells the system that the input data for the text input should be placed into some file and the path of that file should be given to the function as a value for the text_path function parameter.
493
+
494
+
The `filtered_text_path: OutputPath()` annotation tells the system that it should generate and give the function a path (via the filtered_text_path parameter) where the function should write the output data. After the function finishes the execution, the system will take the output data written by the function, put it into storage and make it available for passing to other components.
495
+
496
+
### Why do we need the `InputPath` parameter annotation?
497
+
498
+
Not all data can be passed/received as a simple string. Examples: binary data, large data, directories. In all these cases, the code should read data from a file or directory pointed to by a path. This is why we have a text_path: InputPath() parameter and not text: str parameter (although the latter could still work for short texts). Another reason why the InputPath annotation is needed is that the component function code is executed inside a hermetic container. The text file needs to somehow be placed inside the container. Only the system can do that. The text_path: InputPath() annotation tells the system that the input data for the text input should be placed into some file and the path of that file should be given to the function as a value for the text_path function parameter.
499
+
500
+
Similarly the filtered_text_path: OutputPath() parameter annotation is needed so that the system knows that it needs to get the output data out of the container when the function finishes its execution.
501
+
502
+
### Default parameter values
503
+
504
+
The `create_component_from_func` function supports functions with default parameter values. This results in the generated component inputs becoming optional.
505
+
506
+
Path parameters annotated with `InputPath()` can have a default value of `None` which makes those file inputs optional.
507
+
508
+
The default parameter values can use any Python built-in type. (Only the built-in types can be used because the function needs to remain self-contained).
509
+
510
+
```python
511
+
def some_func(
512
+
some_int: int = 3,
513
+
some_path: InputPath() = None,
514
+
):
515
+
from pathlib import Path
516
+
if some_path:
517
+
Path(some_path).read_text()
518
+
...
519
+
```
520
+
521
+
### Input and Output Types
522
+
523
+
While low-level TangleML does not enforce any types, the Oasis CLI generator (`components.create_component_from_func`) provides support for six basic Python types:
| `float` | `Float` | String to float conversion |
530
+
| `bool` | `Boolean` | String to boolean conversion |
531
+
| `list` | `JsonArray` | JSON serialization |
532
+
| `dict` | `JsonObject` | JSON serialization |
533
+
534
+
:::tip
535
+
**Beyond the Basics**: You can use any type annotation (like `XGBoostModel`), but unsupported types will be passed as strings. The generator only adds serialization/deserialization for the six basic types.
536
+
:::
537
+
538
+
The function parameters (the parameter names and type annotations) are mapped to component inputs and outputs in a certain way. This example demonstrates all aspects of the mapping
539
+
540
+
```python
541
+
def my_func(
542
+
# Directly supported types:
543
+
# Mapped to input with name "some_string" and type "String"
544
+
some_string: str,
545
+
# Mapped to input with name "some_string" and type "Integer"
546
+
some_integer: int,
547
+
# Mapped to input with name "some_float" and type "Float"
548
+
some_float: float,
549
+
# Mapped to input with name "some_boolean" and type "Boolean"
550
+
some_boolean: bool,
551
+
# Mapped to input with name "some_list" and type "JsonArray"
552
+
some_list: list,
553
+
# Mapped to input with name "some_dict" and type "JsonObject"
554
+
some_dict: dict,
555
+
556
+
# Mapped to input with name "any_thing" and no type (compatible with any type. Will receive a string value at runtime!)
557
+
any_thing,
558
+
559
+
# Other types
560
+
# Mapped to input with name "some_uri" and type "Uri" (Will receive a string value at runtime!)
561
+
some_uri: "Uri",
562
+
# Mapped to input with name "some_uri" and type "BigInt" (Will receive a string value at runtime!)
563
+
some_uri: BigInt,
564
+
565
+
# Paths:
566
+
# Mapped to input with name "input1" (the "_path" suffix is removed)
567
+
input1_path: InputPath(""),
568
+
# Mapped to output with name "output1" and type "CSV" (the "_path" suffix is removed)
569
+
output1_path: OutputPath("CSV"),
570
+
) -> typing.NamedTuple("Outputs", [
571
+
# Mapped to output with name "output_string" and type "String"
572
+
("output_string", str),
573
+
# Mapped to output with name "output_uri" and type "Uri" (function needs to return a string)
|**Large artifacts**| 30 days (Shopify) | Metadata only (size, hash) |
73
+
|**Small values**| Permanent | Full value in database |
74
+
75
+
:::warning Data Retention
76
+
At Shopify, artifacts containing merchant or PII data are automatically deleted after 30 days due to compliance requirements. After deletion, you'll see metadata but get 404 errors when accessing the actual data.
| **Active** | 0-30 days | Full artifact data and metadata |
116
-
| **Purged** | >30 days | Metadata only (size, hash, small values) |
117
-
| **Deleted** | Never | Execution records persist indefinitely |
118
-
119
-
:::warning
120
-
Artifacts are automatically purged after 30 days due to data retention policies. URLs to purged artifacts will return 404 errors, but metadata remains visible.
109
+
:::info
110
+
Large artifacts are purged after 30 days due to [data retention policies](/docs/core-concepts/artifacts#data-retention-and-purging).
0 commit comments