vsmlib.model¶

The model module that implements embedding loading.

Functions

load_from_dir(path) Automatically detects embeddings format and loads

Classes

`Model`()	Basic model class to define interface.
`ModelDense`()	Stores dense embeddings.
`ModelLevy`()	This is deprecated and will be removed soon.
`ModelNumbered`()	extends dense model by numbering dimensions
`ModelSparse`()	sparse (usually count-based) embeddings
`ModelW2V`()	extends ModelDense to support loading of original binary format from Mikolov’s w2v
`Model_svd_scipy`(original, …)

class vsmlib.model.Model¶

Bases: object

Basic model class to define interface.

Usually you would not use this class directly, but rather some of the classes which inherit from Model

get_most_similar_words(w, cnt=10)¶

returns list of words sorted by cosine proximity to a target word

Parameters:	w – target word cnt – how many similar words are needed
Returns:	list of words and corresponding similarities

class vsmlib.model.ModelDense¶

Bases: vsmlib.model.Model

Stores dense embeddings.

filter_by_vocab(words)¶

reduced embeddings to the provided list of words (which can be empty)

Parameters:	words – set or list of words to keep
Returns:	Instance of Dense class

get_most_similar_words(w, cnt=10)¶

returns list of words sorted by cosine proximity to a target word

Parameters:	w – target word cnt – how many similar words are needed
Returns:	list of words and corresponding similarities

load_hdf5(path)¶: loads embeddings from hdf5 format

load_npy(path)¶: loads embeddings from numpy format

class vsmlib.model.ModelLevy¶

Bases: vsmlib.model.ModelNumbered

This is deprecated and will be removed soon.

filter_by_vocab(words)¶

reduced embeddings to the provided list of words (which can be empty)

Parameters:	words – set or list of words to keep
Returns:	Instance of Dense class

get_most_similar_words(w, cnt=10)¶

returns list of words sorted by cosine proximity to a target word

Parameters:	w – target word cnt – how many similar words are needed
Returns:	list of words and corresponding similarities

load_hdf5(path)¶: loads embeddings from hdf5 format

load_npy(path)¶: loads embeddings from numpy format

class vsmlib.model.ModelNumbered¶

Bases: vsmlib.model.ModelDense

extends dense model by numbering dimensions

filter_by_vocab(words)¶

reduced embeddings to the provided list of words (which can be empty)

Parameters:	words – set or list of words to keep
Returns:	Instance of Dense class

get_most_similar_words(w, cnt=10)¶

returns list of words sorted by cosine proximity to a target word

Parameters:	w – target word cnt – how many similar words are needed
Returns:	list of words and corresponding similarities

load_hdf5(path)¶: loads embeddings from hdf5 format

load_npy(path)¶: loads embeddings from numpy format

class vsmlib.model.ModelSparse¶

Bases: vsmlib.model.Model

sparse (usually count-based) embeddings

get_most_similar_words(w, cnt=10)¶

returns list of words sorted by cosine proximity to a target word

Parameters:	w – target word cnt – how many similar words are needed
Returns:	list of words and corresponding similarities

load_from_hdf5(path)¶

load model in compressed sparse row format from hdf5 file

hdf5 file should contain row_ptr, col_ind and data array

Parameters:	path – path to the embeddings folder

class vsmlib.model.ModelW2V¶

Bases: vsmlib.model.ModelNumbered

extends ModelDense to support loading of original binary format from Mikolov’s w2v

filter_by_vocab(words)¶

reduced embeddings to the provided list of words (which can be empty)

Parameters:	words – set or list of words to keep
Returns:	Instance of Dense class

get_most_similar_words(w, cnt=10)¶

returns list of words sorted by cosine proximity to a target word

Parameters:	w – target word cnt – how many similar words are needed
Returns:	list of words and corresponding similarities

load_hdf5(path)¶: loads embeddings from hdf5 format

load_npy(path)¶: loads embeddings from numpy format

class vsmlib.model.Model_svd_scipy(original, cnt_singular_vectors, power)¶

Bases: vsmlib.model.ModelNumbered

filter_by_vocab(words)¶

reduced embeddings to the provided list of words (which can be empty)

Parameters:	words – set or list of words to keep
Returns:	Instance of Dense class

get_most_similar_words(w, cnt=10)¶

returns list of words sorted by cosine proximity to a target word

Parameters:	w – target word cnt – how many similar words are needed
Returns:	list of words and corresponding similarities

load_hdf5(path)¶: loads embeddings from hdf5 format

load_npy(path)¶: loads embeddings from numpy format

vsmlib.model.load_from_dir(path)¶

Automatically detects embeddings format and loads

Parameters:	path – directory where embeddings are stores
Returns:	Instance of appropriate Model-based class