Nmslib

Latest version: v2.1.1

Safety actively analyzes 629900 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

1.7

This release mostly focuses on bug fixing and documentation improving.

1.6

Here are the list of changes for the version 1.6 (manual **isn't updated yet**):

We especially thank the following people for the fixes:
- Bileg Naidan (bileg)
- Bob Poekert (bobpoekert)
- orgoro
1. We simplified the build by excluding the code that required 3rd party code from the core library. In other words, the core library does not have any 3rd party dependencies (not even boost). To build the full version of library you have to run cmake as follows: `cmake . -DWITH_EXTRAS=1`
2. It should now be possible to build on MAC.
3. We improve Python bindings (thanks to bileg) and their installation process (thanks to bobpoekert):
1. We merged our generic and vector bindings into a single module. We upgraded to a more standard installation process via `distutils`. You can run: `python setup.py build` and then `sudo python setup.py install`.
2. We improved our support for sparse spaces: you can pass data in the form of a numpy sparse array!
3. There are now batch **multi-threaded** querying and addition of data.
4. `addDataPoint*` functions return a position of an inserted entry. This can be useful if you use function `getDataPoint`
5. For examples of using Python API, please, see `*.py` files in the folder `python_bindings`.
6. Note that to execute unit tests you need: python-numpy, python-scipy, and python-pandas.
4. Because we got rid of boost, we, unfortunately, **do not support command-line options WITHOUT arguments**. Instead, you have pass values 0 or 1.
5. However, the utility `experiment` (`experiment.exe`) now accepts the option `recallOnly`. If this option has argument 1, then the only effectiveness metric computed is recall. This is useful for evaluation of HNSW, because (for efficiency reasons) HNSW does not return proper distance values (e.g., for L2 it's a squared distance, not the original one). This makes it impossible to compute effectiveness metrics other than recall (returning wrong distance values would also lead to `experiment` terminating with an error message).
6. Additional spaces:
1. `negdotprod_sparse`: negative inner (dot) product. This is a `sparse` space.
2. `querynorm_negdotprod_sparse`: query-normalized inner (dot) product, which is the dot product divded by the query norm.
3. `renyi_diverg`: [Renyi divergence](https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy#R.C3.A9nyi_divergence). It has the parameter `alpha`.
4. `ab_diverg`: [α-β-divergence](http://reports-archive.adm.cs.cmu.edu/anon/2016/CMU-CS-16-128.pdf). It has two parameters: `alpha` and `beta`.
7. Additional search methods:
1. `simple_invindx`: A classical inverted index with a document-at-a-time processing (via a prirority queue). It doesn't have parameters, but works only with the sparse space `negdotprod_sparse`.
2. `falconn`: we ported (created a wrapper for) a June 2016's version of [FALCONN library](https://github.com/FALCONN-LIB/FALCONN).
1. Unlike the original implementation, our wrapper works directly with **sparse** vector spaces as well as with dense vector spaces.
2. However, our wrapper has to **duplicate** data twice: so this method is useful mostly as a benchmark.
3. Our wrapper directly supports a data centering trick, which can boost performance sometimes.
4. Most parameters (`hash_family`, `cross_polytope`, `hyperplane`, `storage_hash_table`, `num_hash_bits`, `num_hash_tables`, `num_probes`, `num_rotations`, `seed`, `feature_hashing_dimension`) merely map to FALCONN parameters.
5. Setting additional parameters `norm_data` and `center_data` tells us to center and normalize data. Our implementation of the centering (which is done unfortunately before the hashing trick is applied) for **sparse** data is horribly inefficient, so we wouldn't recommend using it. Besides, it doesn't seem to improve results. Just in case, the number of sprase dimensions used for centering is controlled by the parameter `max_sparse_dim_to_center`.
6. Our FALCONN wrapper would normally use the distance provided by NMSLIB, but you can force using FALCONN's distance function implementation by setting: `use_falconn_dist` to 1.

1.5.3

1. Releasing GIL to enable Python threading
2. A slightly faster VP-tree 52
3. New scalar-product spaces 110

1.5.2

Performance improvement.

1.5.1

This is a bugfix release to address issue 98

1.5

1. A new efficient method: a hierarchical (navigable) small-world graph (HNSW), contributed by Yury Malkov (yurymalkov). Works with g++, Visual Studio, Intel Compiler, but doesn't work with Clang yet.
2. A query server, which can have clients in C++, Java, Python, and other languages supported by Apache Thrift
3. Python bindings for vector and non-vector spaces
4. Improved performance of two core methods SW-graph and NAPP
5. Better handling of the gold standard data in the benchmarking utility experiment
6. Updated API that permits search methods to serialize indices
7. Improved documentation (e.g., we added tuning guidelines for best methods)

Page 3 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.