vsmlib.model¶
The model module that implements embedding loading.
Functions
load_from_dir (path) |
Automatically detects embeddings format and loads |
Classes
Model () |
Basic model class to define interface. |
ModelDense () |
Stores dense embeddings. |
ModelLevy () |
This is deprecated and will be removed soon. |
ModelNumbered () |
extends dense model by numbering dimensions |
ModelSparse () |
sparse (usually count-based) embeddings |
ModelW2V () |
extends ModelDense to support loading of original binary format from Mikolov’s w2v |
Model_svd_scipy (original, …) |
-
class
vsmlib.model.
Model
¶ Bases:
object
Basic model class to define interface.
Usually you would not use this class directly, but rather some of the classes which inherit from Model
-
get_most_similar_words
(w, cnt=10)¶ returns list of words sorted by cosine proximity to a target word
Parameters: - w – target word
- cnt – how many similar words are needed
Returns: list of words and corresponding similarities
-
-
class
vsmlib.model.
ModelDense
¶ Bases:
vsmlib.model.Model
Stores dense embeddings.
-
filter_by_vocab
(words)¶ reduced embeddings to the provided list of words (which can be empty)
Parameters: words – set or list of words to keep Returns: Instance of Dense class
-
get_most_similar_words
(w, cnt=10)¶ returns list of words sorted by cosine proximity to a target word
Parameters: - w – target word
- cnt – how many similar words are needed
Returns: list of words and corresponding similarities
-
load_hdf5
(path)¶ loads embeddings from hdf5 format
-
load_npy
(path)¶ loads embeddings from numpy format
-
-
class
vsmlib.model.
ModelLevy
¶ Bases:
vsmlib.model.ModelNumbered
This is deprecated and will be removed soon.
-
filter_by_vocab
(words)¶ reduced embeddings to the provided list of words (which can be empty)
Parameters: words – set or list of words to keep Returns: Instance of Dense class
-
get_most_similar_words
(w, cnt=10)¶ returns list of words sorted by cosine proximity to a target word
Parameters: - w – target word
- cnt – how many similar words are needed
Returns: list of words and corresponding similarities
-
load_hdf5
(path)¶ loads embeddings from hdf5 format
-
load_npy
(path)¶ loads embeddings from numpy format
-
-
class
vsmlib.model.
ModelNumbered
¶ Bases:
vsmlib.model.ModelDense
extends dense model by numbering dimensions
-
filter_by_vocab
(words)¶ reduced embeddings to the provided list of words (which can be empty)
Parameters: words – set or list of words to keep Returns: Instance of Dense class
-
get_most_similar_words
(w, cnt=10)¶ returns list of words sorted by cosine proximity to a target word
Parameters: - w – target word
- cnt – how many similar words are needed
Returns: list of words and corresponding similarities
-
load_hdf5
(path)¶ loads embeddings from hdf5 format
-
load_npy
(path)¶ loads embeddings from numpy format
-
-
class
vsmlib.model.
ModelSparse
¶ Bases:
vsmlib.model.Model
sparse (usually count-based) embeddings
-
get_most_similar_words
(w, cnt=10)¶ returns list of words sorted by cosine proximity to a target word
Parameters: - w – target word
- cnt – how many similar words are needed
Returns: list of words and corresponding similarities
-
load_from_hdf5
(path)¶ load model in compressed sparse row format from hdf5 file
hdf5 file should contain row_ptr, col_ind and data array
Parameters: path – path to the embeddings folder
-
-
class
vsmlib.model.
ModelW2V
¶ Bases:
vsmlib.model.ModelNumbered
extends ModelDense to support loading of original binary format from Mikolov’s w2v
-
filter_by_vocab
(words)¶ reduced embeddings to the provided list of words (which can be empty)
Parameters: words – set or list of words to keep Returns: Instance of Dense class
-
get_most_similar_words
(w, cnt=10)¶ returns list of words sorted by cosine proximity to a target word
Parameters: - w – target word
- cnt – how many similar words are needed
Returns: list of words and corresponding similarities
-
load_hdf5
(path)¶ loads embeddings from hdf5 format
-
load_npy
(path)¶ loads embeddings from numpy format
-
-
class
vsmlib.model.
Model_svd_scipy
(original, cnt_singular_vectors, power)¶ Bases:
vsmlib.model.ModelNumbered
-
filter_by_vocab
(words)¶ reduced embeddings to the provided list of words (which can be empty)
Parameters: words – set or list of words to keep Returns: Instance of Dense class
-
get_most_similar_words
(w, cnt=10)¶ returns list of words sorted by cosine proximity to a target word
Parameters: - w – target word
- cnt – how many similar words are needed
Returns: list of words and corresponding similarities
-
load_hdf5
(path)¶ loads embeddings from hdf5 format
-
load_npy
(path)¶ loads embeddings from numpy format
-
-
vsmlib.model.
load_from_dir
(path)¶ Automatically detects embeddings format and loads
Parameters: path – directory where embeddings are stores Returns: Instance of appropriate Model-based class