Conversation
standard 1 level clustering or faster two level hierarchical clustering. Clustering algorithm uses AMX on supported Intel (R) Xeon (R) systems. Also, added support for BF16 datatype.
| } | ||
|
|
||
| struct CreateDenseCluster { | ||
| using This = CreateDenseCluster; |
There was a problem hiding this comment.
These are used in the extensions of compressed datasets (LVQ/Leanvec)
| , distance_{std::move(distance_function)} | ||
| , threadpool_{threads::as_threadpool(std::move(threadpool_proto))} | ||
| , n_inner_threads_{n_inner_threads} { | ||
| // Initialize threadpools for intra-query parallelism |
There was a problem hiding this comment.
We need to be careful about nested parallelism. From this constructor, it's not obvious that the threadpool_ and n_inner_threads will be nested (i.e., two-level parallelism). Could we maybe have better naming? Also, how do we handle custom threadpool in this case?
There was a problem hiding this comment.
I update and clarified the inter (outer) and intra (inner) query parallelism and threadpools used in this commit @dian-lun-lin please have a look at IVFIndex class again. Thanks
|
|
||
| Clustering clustering(std::move(centroids), std::move(clusters)); | ||
| auto build_time = svs::lib::time_difference(tic); | ||
| fmt::print("IVF build time: {} seconds\n", build_time); |
There was a problem hiding this comment.
This print should be moved to logger
There was a problem hiding this comment.
Thanks for pointing that out, I will move all these prints in the logger with appropriate verbosity levels
| final_assignments_time.finish(); | ||
| kmeans_timer.finish(); | ||
| svs::logging::debug("{}", timer); | ||
| fmt::print("kmeans clustering time: {}\n", lib::as_seconds(timer.elapsed())); |
There was a problem hiding this comment.
This print should be moved to logger
|
|
||
| // On GCC, we need to add this attribute so that BFloat16 members can appear inside | ||
| // packed structs. | ||
| class __attribute__((packed)) BFloat16 { |
There was a problem hiding this comment.
What's the different between Float16 and BFloat16?
There was a problem hiding this comment.
Here is a short write-up explaining these formats and the differences: https://nhigham.com/2018/12/03/half-precision-arithmetic-fp16-versus-bfloat16/
IVFIndex class 2. Add docstring in the high level class and methods 3. Use logging instead of fmt::print
| typename Distance, | ||
| threads::ThreadPool Pool, | ||
| std::integral I = uint32_t> | ||
| auto hierarchical_kmeans_clustering_impl( |
There was a problem hiding this comment.
This function is quite huge. Is it possible to break it into multiple functions? Or perhaps adding some comments explaining the implementation.
There was a problem hiding this comment.
I tried to simplify this function using some helper methods and adding comments
This update introduces IVF (Inverted File) index support in SVS, allowing for index construction using either standard one-level clustering or a faster two-level hierarchical clustering approach. The clustering algorithm is optimized to utilize AMX (Advanced Matrix Extensions) on supported Intel® Xeon® systems, enhancing performance on compatible hardware. Additionally, support for the BF16 (bfloat16) data type has been introduced, broadening the range of data formats that can be efficiently processed.
Tasks