π SQLAnyWhere is a high-performance, cloud-native SQL query engine designed to process structured data efficiently and at scale. Built with Rust, it offers seamless cloud integration, enabling direct SQL queries on data stored in Amazon S3, local Parquet, and other storage backends.
β Cloud-Native SQL Engine - Query data directly from Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage.
Note: Now it only support for S3.
β Blazing-Fast Performance - Optimized columnar execution using Apache Arrow & DataFusion.
β Flexible Data Storage - Supports Parquet, CSV, JSON, and Iceberg tables.
β Python & Rust Integration - Expose query results via Arrow IPC for zero-copy transfer to Pandas/Spark.
β Distributed Execution Ready - Designed for parallel execution in distributed environments.
β Pluggable Storage & Compute - Extendable to support custom data sources & compute backends.
Make sure you already install rust.
Create conda env with Python3.12:
conda create -n sql_anywhere python=3.12 -y
conda activate sql_anywhereInstall dependencies:
pip install -r interface/requirements.txtQuery your file:
async def run():
stm = """
SELECT
*
FROM
"<file_protocal>://<absolute_file_path>"
"""
binary_data: list = await sa_rust.execute_sql(stm)
binary_bytes: bytes = bytes(binary_data)
reader: RecordBatchStreamReader = ipc.open_stream(binary_bytes)
# Read all Arrow data (without unnecessary deserialization)
table: Table = reader.read_all()
# Convert to Pandas DataFrame
df: pd.DataFrame = table.to_pandas()
print(df)
asyncio.run(run())File protocal:
- Local file:
file. I.e: file://<absolute_file_path>. - S3:
s3. I.e: s3://<s3_source_key>/<s3_file>.
Please read interface/example_py.py for more understanding.