Apache Paimon Vector Index [![Build Status]][actions]
Apache Paimon Vector Index: pure Rust IVF-PQ for data lake vector search.
README
Apache Paimon Vector Index   
Pure Rust IVF-PQ implementation for Apache Paimon. Designed for data lake (S3/HDFS/OSS) with seek-based I/O, supporting both 8-bit and 4-bit PQ with SIMD acceleration.
Metadata Filter Pushdown
The vector index accepts a serialized 64-bit Roaring bitmap of allowed row IDs during reader search. This lets the Paimon query layer evaluate metadata predicates with table/scalar indexes first, then pass the matching row-id set into IVF-PQ as an ANN prefilter.
Bindings expose the same wire format:
- Rust core:
search_with_reader_roaring_filterandsearch_batch_reader_roaring_filter - Java/JNI:
IVFPQReader.search(..., byte[])andIVFPQReader.searchBatch(..., byte[]) - Python:
IVFPQReader.search(..., filter_bytes=...)andIVFPQReader.search_batch(..., filter_bytes=...)
Row IDs must be non-negative to map directly into RoaringTreemap's u64 domain.
Language Bindings
The Java binding provides small lifecycle-safe facades over the JNI symbols:
IVFPQWriter builds and writes an index, IVFPQReader opens an index and runs
single-query or batch search, and result containers expose defensive copies of
IDs and distances.
The Python binding mirrors that flow with IVFPQWriter and IVFPQReader.
search returns one-dimensional NumPy arrays for a single query, while
search_batch accepts a two-dimensional query array and returns two-dimensional
NumPy arrays shaped as (query_count, top_k).
Contributing
Apache Paimon Vector Index is an exciting project currently under active development. Whether you're looking to use it in your projects or contribute to its growth, there are several ways you can get involved:
- Follow the Contributing Guide to contribute.
- Create new Issue for bug report or feature request.
- Start discussion thread at dev mailing list (subscribe / unsubscribe / archives)
- Talk to community directly at Slack #paimon channel.
Getting Help
Submit issues for bug report or asking questions in discussion.
License
Licensed under <a href="./LICENSE">Apache License, Version 2.0</a>.
MongoDB - Build AI That Scales
