You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40-8Lines changed: 40 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,9 @@ Header-only C++ HNSW implementation with python bindings. Paper's code for the H
3
3
4
4
**NEWS:**
5
5
6
-
***Thanks to Apoorv Sharma [@apoorv-sharma](https://github.com/apoorv-sharma), hnswlib now supports true element updates (the interface remained the same, but when you the perfromance/memory should not degrade as you update the element embeddinds).**
6
+
***Thanks to Apoorv Sharma [@apoorv-sharma](https://github.com/apoorv-sharma), hnswlib now supports true element updates (the interface remained the same, but when you the perfromance/memory should not degrade as you update the element embeddings).**
7
7
8
-
***Thanks to Dmitry [@2ooom](https://github.com/2ooom), hnswlib got a boost in performance for vector dimensions that are not mutiple of 4**
8
+
***Thanks to Dmitry [@2ooom](https://github.com/2ooom), hnswlib got a boost in performance for vector dimensions that are not multiple of 4**
9
9
10
10
***Thanks to Louis Abraham ([@louisabraham](https://github.com/louisabraham)) hnswlib can now be installed via pip!**
11
11
@@ -37,7 +37,7 @@ For other spaces use the nmslib library https://github.com/nmslib/nmslib.
37
37
#### Short API description
38
38
*`hnswlib.Index(space, dim)` creates a non-initialized index an HNSW in space `space` with integer dimension `dim`.
39
39
40
-
Index methods:
40
+
`hnswlib.Index` methods:
41
41
*`init_index(max_elements, ef_construction = 200, M = 16, random_seed = 100)` initializes the index from with no elements.
42
42
*`max_elements` defines the maximum number of elements that can be stored in the structure(can be increased/shrunk).
43
43
*`ef_construction` defines a construction time/accuracy trade-off (see [ALGO_PARAMS.md](ALGO_PARAMS.md)).
@@ -49,14 +49,14 @@ Index methods:
49
49
*`data_labels` specifies the labels for the data. If index already has the elements with the same labels, their features will be updated. Note that update procedure is slower than insertion of a new element, but more memory- and query-efficient.
50
50
* Thread-safe with other `add_items` calls, but not with `knn_query`.
51
51
52
-
*`mark_deleted(data_label)` - marks the element as deleted, so it will be ommited from search results.
52
+
*`mark_deleted(data_label)` - marks the element as deleted, so it will be omitted from search results.
53
53
54
54
*`resize_index(new_size)` - changes the maximum capacity of the index. Not thread safe with `add_items` and `knn_query`.
55
55
56
56
*`set_ef(ef)` - sets the query time accuracy/speed trade-off, defined by the `ef` parameter (
57
57
[ALGO_PARAMS.md](ALGO_PARAMS.md)). Note that the parameter is currently not saved along with the index, so you need to set it manually after loading.
58
58
59
-
*`knn_query(data, k = 1, num_threads = -1)` make a batch query for `k`closests elements for each element of the
59
+
*`knn_query(data, k = 1, num_threads = -1)` make a batch query for `k`closest elements for each element of the
60
60
*`data` (shape:`N*dim`). Returns a numpy array of (shape:`N*k`).
61
61
*`num_threads` sets the number of cpu threads to use (-1 means use default).
62
62
* Thread-safe with other `knn_query` calls, but not with `add_items`.
@@ -76,14 +76,34 @@ Index methods:
76
76
77
77
*`get_current_count()` - returns the current number of element stored in the index
78
78
79
-
80
-
79
+
Read-only properties of `hnswlib.Index` class:
80
+
81
+
*`space` - name of the space (can be one of "l2", "ip", or "cosine").
82
+
83
+
*`dim` - dimensionality of the space.
84
+
85
+
*`M` - parameter that defines the maximum number of outgoing connections in the graph.
86
+
87
+
*`ef_construction` - parameter that controls speed/accuracy trade-off during the index construction.
88
+
89
+
*`max_elements` - current capacity of the index. Equivalent to `p.get_max_elements()`.
90
+
91
+
*`element_count` - number of items in the index. Equivalent to `p.get_current_count()`.
92
+
93
+
Properties of `hnswlib.Index` that support reading and writing:
*`num_threads` - default number of threads to use in `add_items` or `knn_query`. Note that calling `p.set_num_threads(3)` is equivalent to `p.num_threads=3`.
98
+
99
+
81
100
82
101
83
102
#### Python bindings examples
84
103
```python
85
104
import hnswlib
86
105
import numpy as np
106
+
import pickle
87
107
88
108
dim =128
89
109
num_elements =10000
@@ -106,6 +126,18 @@ p.set_ef(50) # ef should always be > k
106
126
107
127
# Query dataset, k - number of closest elements (returns 2 numpy arrays)
108
128
labels, distances = p.knn_query(data, k=1)
129
+
130
+
# Index objects support pickling
131
+
# WARNING: serialization via pickle.dumps(p) or p.__getstate__() is NOT thread-safe with p.add_items method!
132
+
# Note: ef parameter is included in serialization; random number generator is initialized with random_seeed on Index load
133
+
p_copy = pickle.loads(pickle.dumps(p)) # creates a copy of index p using pickle round-trip
134
+
135
+
### Index parameters are exposed as class properties:
136
+
print(f"Parameters passed to constructor: space={p_copy.space}, dim={p_copy.dim}")
0 commit comments