Skip to content

Commit 2ef619b

Browse files
authored
Add Pypaimon 0.2.0 release note (#14)
1 parent 494a75c commit 2ef619b

File tree

12 files changed

+201
-10
lines changed

12 files changed

+201
-10
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,8 +126,11 @@ The release notes are maintained in the `community/docs/releases` directory.
126126
title: "Release 0.9"
127127
type: release
128128
version: 0.9.0
129+
weight: 90
129130
---
130131
```
132+
The `weight` field is used to sort the release notes in the website. The higher the number, the earlier the release note will be displayed.
133+
131134
If you'd like to use some pictures in the markdown, you can save theme in the `public/img` directory and use relative path, such as `![image](./img/xxx.png)`.
132135
3. Update the latest version in the `community/docs/downloads.md`.
133136
4. Commit the changes and push them to the repository.

community/docs/releases/release-0.4.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Release 0.4"
33
type: release
44
version: 0.4.0
5+
weight: 40
56
---
67

78
# Apache Paimon 0.4 Available

community/docs/releases/release-0.5.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Release 0.5"
33
type: release
44
version: 0.5.0
5+
weight: 50
56
---
67

78
# Apache Paimon 0.5 Available

community/docs/releases/release-0.6.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Release 0.6"
33
type: release
44
version: 0.6.0
5+
weight: 60
56
---
67

78
# Apache Paimon 0.6 Available

community/docs/releases/release-0.7.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Release 0.7"
33
type: release
44
version: 0.7.0
5+
weight: 70
56
---
67

78
# Apache Paimon 0.7 Available

community/docs/releases/release-0.8.1.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Release 0.8.1"
33
type: release
44
version: 0.8.1
5+
weight: 81
56
---
67

78
# Apache Paimon 0.8.1 Available

community/docs/releases/release-0.8.2.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Release 0.8.2"
33
type: release
44
version: 0.8.2
5+
weight: 82
56
---
67

78
# Apache Paimon 0.8.2 Available

community/docs/releases/release-0.8.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Release 0.8"
33
type: release
44
version: 0.8.0
5+
weight: 80
56
---
67

78
# Apache Paimon 0.8 Available

community/docs/releases/release-0.9.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
title: "Release 0.9"
33
type: release
44
version: 0.9.0
5+
weight: 90
56
---
67

78
# Apache Paimon 0.9 Available
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
---
2+
title: "PyPaimon Release 0.2.0"
3+
type: release
4+
version: pypaimon-0.2.0
5+
weight: 91
6+
---
7+
8+
# PyPaimon 0.2.0 Available
9+
10+
Dec 19, 2024 - Zelin Yu (yuzelin.yzl@gmail.com)
11+
12+
The Apache Paimon PMC officially announces the release of PyPaimon 0.2.0. Because we didn't release 0.1.0,
13+
this is the first version.
14+
15+
## What is PyPaimon?
16+
17+
[PyPaimon](https://github.com/apache/paimon-python) is the Python SDK of Apache Paimon. It provides a way
18+
for users to get data from Paimon tables with Python for data analysis, and write data back to Paimon tables.
19+
20+
## Version Overview
21+
22+
The first version of PyPaimon supports following features:
23+
24+
1. Connect to `Catalog`.
25+
2. Get or create table.
26+
3. Batch read: Filter and projection pushdown, and parallelly reading data as Apache Arrow, Pandas, DuckDB and Ray format.
27+
4. Batch write: Insert into or overwrite table with Apache Arrow and Pandas data.
28+
29+
The detailed document can found at https://paimon.apache.org/docs/master/program-api/python-api/.
30+
31+
### Connect to Catalog
32+
33+
You can create a `Catalog` with options just like in SQL:
34+
35+
```python
36+
from pypaimon.py4j import Catalog
37+
38+
catalog_options = {
39+
'warehouse': 'path/to/warehouse',
40+
'metastore': 'filesystem'
41+
# other options
42+
}
43+
44+
catalog = Catalog.create(catalog_options)
45+
```
46+
47+
You can connect to any `Catalog` supported by Java. PyPaimon has built-in support for `filesystem`, `Jdbc` and `hive` catalog.
48+
If you want to connect to your self-defined catalogs, you can add the dependency jars in following way:
49+
50+
```python
51+
import os
52+
from pypaimon.py4j import constants
53+
54+
os.environ[constants.PYPAIMON_JAVA_CLASSPATH] = '/path/to/jars/*'
55+
```
56+
57+
### Get or create table
58+
59+
You can get a existed table from `Catalog` by its identifier:
60+
61+
```python
62+
table = catalog.get_table('database_name.table_name')
63+
```
64+
65+
You can also create a new table. The table field definitions are described by `pyarrow.Schema`, and you can set primary keys,
66+
partition keys, table options and comment.
67+
68+
```python
69+
import pyarrow as pa
70+
from pypaimon import Schema
71+
72+
# field definitions
73+
pa_schema = pa.schema([
74+
('dt', pa.string()),
75+
('hh', pa.string()),
76+
('pk', pa.int64()),
77+
('value', pa.string())
78+
])
79+
# table schema
80+
schema = Schema(
81+
pa_schema=pa_schema,
82+
partition_keys=['dt', 'hh'],
83+
primary_keys=['dt', 'hh', 'pk'],
84+
options={'bucket': '2'},
85+
comment='my test table'
86+
)
87+
88+
# create table
89+
catalog.create_table(identifier='default.test_table', schema=schema, ignore_if_exists=False)
90+
```
91+
92+
Then you can get table read and write interfaces from table.
93+
94+
## Batch read
95+
96+
Assume that you already hava the table `default.test_table` described in the previous section. Let's see how to read data from it.
97+
98+
```python
99+
from pypaimon.py4j import Catalog
100+
101+
# set 'max-workers' (thread numbers) for parallelly reading
102+
catalog_options = {
103+
'warehouse': 'path/to/warehouse',
104+
'metastore': 'filesystem',
105+
'max-workers': '4'
106+
}
107+
catalog = Catalog.create(catalog_options)
108+
table = catalog.get_table('default.test_table')
109+
110+
# use ReadBuilder to perform filter and projection pushdown
111+
read_builder = table.new_read_builder()
112+
113+
# select partition: dt='2024-12-01',hh='12'
114+
predicate_builder = read_builder.new_predicate_builder()
115+
dt_predicate = predicate_builder.equal('dt', '2024-12-01')
116+
dt_hh = predicate_builder.equal('hh', '12')
117+
partition_predicate = predicate_builder.and_([dt_predicate, dt_hh])
118+
read_builder = read_builder.with_filter(partition_predicate)
119+
120+
# select pk and value
121+
read_builder = read_builder.with_projection(['pk', 'value'])
122+
123+
# plan splits
124+
table_scan = read_builder.new_scan()
125+
splits = table_scan.splits()
126+
127+
# read data to pandas.DataFrame
128+
df = table_read.to_pandas(splits)
129+
```
130+
131+
Then you can do some analysis on the dataframe with Python.
132+
133+
## Batch Write
134+
135+
Assume that you already hava the table `default.test_table` described in the previous section. Let's see how to write or overwrite it.
136+
137+
First, assume that you have a dataframe data of 2024-12-02, 12 o'clock, and you want to write it into the table.
138+
139+
```python
140+
write_builder = table.new_batch_write_builder()
141+
table_write = write_builder.new_write()
142+
table_commit = write_builder.new_commit()
143+
144+
# you can write data many times before committing
145+
dataframe = ...
146+
table_write.write_pandas(dataframe)
147+
148+
commit_messages = table_write.prepare_commit()
149+
table_commit.commit(commit_messages)
150+
151+
table_write.close()
152+
table_commit.close()
153+
```
154+
155+
Let's see how to overwrite the partition 'dt=2024-12-02,hh=12' with new data.
156+
```python
157+
write_builder = table.new_batch_write_builder()
158+
# set partition to overwrite
159+
write_builder = write_builder.overwrite({'dt': '2024-01-01', 'hh': '12'})
160+
161+
table_write = write_builder.new_write()
162+
table_commit = write_builder.new_commit()
163+
164+
# then write data
165+
dataframe = ...
166+
table_write.write_pandas(dataframe)
167+
168+
commit_messages = table_write.prepare_commit()
169+
table_commit.commit(commit_messages)
170+
171+
table_write.close()
172+
table_commit.close()
173+
```
174+
175+
### Various data formats
176+
177+
PyPaimon supports reading data in following formats: Pandas, Apache Arrow and DuckDB, and writing data in following
178+
formats: Pandas, Apache Arrow. Please refer to the [document](https://paimon.apache.org/docs/master/program-api/python-api/) for details.

0 commit comments

Comments
 (0)