Enable batch iterator in IVF#247
Conversation
added in the cluster. Added comprhensive tests for IVF
There was a problem hiding this comment.
Pull request overview
This PR enables batch iterator functionality for IVF (Inverted File) indexes in the SVS library. The batch iterator allows incremental retrieval of nearest neighbors in configurable batch sizes, expanding the search space progressively across multiple iterations.
Key changes:
- Introduces
BatchIteratorclass for both static and dynamic IVF indexes that maintains internal state for efficient iterative searching - Adds single-query search capability to IVF indexes via
search()method with scratchspace - Updates Python bindings to reflect that
sizenow returns total vectors instead of cluster count
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| include/svs/index/ivf/iterator.h | Core BatchIterator implementation with state management and progressive search expansion |
| include/svs/index/ivf/index.h | Adds scratchspace types, single-query search, and batch iterator factory for static IVF |
| include/svs/index/ivf/dynamic_ivf.h | Adds scratchspace types, single-query search, and batch iterator factory for dynamic IVF |
| include/svs/index/ivf/extensions.h | Implements single_search customization point for single-query operations |
| include/svs/orchestrators/ivf_iterator.h | Type-erased wrapper for IVFIterator to support polymorphic usage |
| include/svs/orchestrators/ivf.h | Adds batch_iterator() method to IVF orchestrator |
| include/svs/orchestrators/dynamic_ivf.h | Adds batch_iterator() method to DynamicIVF orchestrator |
| tests/svs/index/ivf/iterator.cpp | Comprehensive unit tests for batch iterator functionality |
| tests/svs/index/ivf/index.cpp | Unit tests for static IVF single-query search and scratchspace |
| tests/svs/index/ivf/dynamic_ivf.cpp | Unit tests for dynamic IVF single-query search and scratchspace |
| tests/integration/ivf/iterator.cpp | Integration tests validating iterator behavior in realistic scenarios |
| tests/CMakeLists.txt | Registers new test files in build system |
| bindings/python/tests/test_ivf.py | Updates Python test to use correct size semantics |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| query, | ||
| [&parent]() { | ||
| // Start with a reasonable initial n_probes for iteration | ||
| // Use 10% of clusters or at least 5, whichever gives more coverage |
There was a problem hiding this comment.
The comment states 'whichever gives more coverage' but the logic actually takes the maximum of the two values (5 or 10% of clusters), then limits it to the total number of clusters. Consider clarifying that it uses 'whichever is larger' rather than 'more coverage'.
| // Use 10% of clusters or at least 5, whichever gives more coverage | |
| // Use 10% of clusters or at least 5, whichever is larger (capped at num_clusters) |
| /// Calling this method signals the iterator to abandon its cached state. | ||
| /// | ||
| /// This can be helpful for measuring performance and verifying recall values. | ||
| void restart_next_search() const { impl_->restart_next_search(); } |
There was a problem hiding this comment.
The method is marked const but modifies internal state through impl_->restart_next_search(). Either remove the const qualifier or document why this const method modifies state.
| void restart_next_search() const { impl_->restart_next_search(); } | |
| void restart_next_search() { impl_->restart_next_search(); } |
No description provided.