Skip to content

Add Native Vector Data Type Support #29214

@navidnasyry

Description

@navidnasyry

Feature Request / Improvement

Background

Modern data architectures increasingly rely on vector embeddings for AI/ML workflows, semantic search, and recommendation systems. Major databases now support vector types:

  • PostgreSQL (pgvector extension)
  • Oracle 23c (VECTOR type)
  • Elasticsearch (dense_vector field type)
  • MySQL 9.0 (VECTOR type)

Problem Statement

Trino currently cannot:

  1. Read vector columns from sources that support them (Elasticsearch dense_vector, Oracle 23c VECTOR)
  2. Store or pass vector data between connectors
  3. Perform basic vector operations needed for ML pipelines

This forces users to maintain separate query paths for vector operations, breaking Trino's "query anything, anywhere" value proposition.

Proposed Solution

Add a native VECTOR or ARRAY<DOUBLE> data type with:

Phase 1: Basic Support

  • Read vector columns from Elasticsearch connector
  • Read vector columns from Oracle connector (23c+)
  • Pass-through vector data in SELECT statements
  • Support CAST between ARRAY and VECTOR type

Phase 2: Basic Operations

  • Vector similarity functions:
    • vector_cosine_similarity(vector1, vector2)
    • vector_euclidean_distance(vector1, vector2)
    • vector_dot_product(vector1, vector2)

Phase 3: Advanced (Future)

  • Approximate Nearest Neighbor (ANN) search push-down to sources
  • Vector indexing hints

Use Cases

  1. Unified Analytics: Join vector search results from Elasticsearch with business data in PostgreSQL
  2. ML Feature Engineering: Transform and aggregate embedding vectors across multiple data sources
  3. Cross-Source Vector Search: Compare embeddings stored in different systems
  4. Migration Simplification: Query legacy vector columns during database migrations

Example Query (Desired)

-- Join vector search results with business data
SELECT 
    e.product_id,
    e.embedding,
    p.product_name,
    vector_cosine_similarity(e.embedding, CAST([0.1, 0.2, ...] AS VECTOR)) AS similarity
FROM elasticsearch.products.embeddings e
JOIN postgresql.public.products p ON e.product_id = p.id
WHERE similarity > 0.8
ORDER BY similarity DESC
LIMIT 10;

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions