tmnt.data_loading

File/module contains routines for loading in text documents to sparse matrix representations for efficient neural variational model training.

Functions

get_llm(model_name)
get_llm_dataloader(data, bow_vectorizer, ...)
get_llm_model(model_name)
get_llm_paired_dataloader(data_a, data_b, ...)
get_llm_tokenizer(model_name)
get_unwrapped_llm_dataloader(data, ...[, ...])
load_vocab(vocab_file[, encoding]) Load a pre-derived vocabulary, assumes format consisting of a single word on each line.
sparse_batch_collate(batch) Collate function which to transform scipy coo matrix to pytorch sparse tensor
sparse_coo_to_tensor(coo) Transform scipy coo matrix to pytorch sparse tensor
to_label_matrix(yvs[, num_labels]) Convert [(id1, id2, ...), (id1,id2,...) .

Classes

PairedDataLoader(data_loader1, data_loader2)
RoundRobinDataLoader(data_loaders)
SingletonWrapperLoader(data_loader)
SparseDataLoader(X, y[, shuffle, drop_last, ...])
SparseDataset(data, targets) Custom Dataset class for scipy sparse matrix
StratifiedDualBatchSampler(y_a, y_b, ...[, ...]) Stratified batch sampling Provides equal representation of target classes in each batch
StratifiedPairedLLMLoader(data_a, data_b, ...)
to_label_matrix(yvs, num_labels=0)[source]

Convert [(id1, id2, …), (id1,id2,…) … ] to Numpy matrix with multi-labels

class SparseDataset(data, targets)[source]

Bases: object

Custom Dataset class for scipy sparse matrix

Parameters:
sparse_coo_to_tensor(coo)[source]

Transform scipy coo matrix to pytorch sparse tensor

Parameters:coo (coo_matrix) –
sparse_batch_collate(batch)[source]

Collate function which to transform scipy coo matrix to pytorch sparse tensor

class SparseDataLoader(X, y, shuffle=False, drop_last=False, batch_size=1024, device='cpu')[source]

Bases: DataLoader

Parameters:
load_vocab(vocab_file, encoding='utf-8')[source]

Load a pre-derived vocabulary, assumes format consisting of a single word on each line. Note: this is a bit of a hack to use a counter to sort the vocab items IN THE ORDER THEY ARE FOUND IN THE FILE.

class StratifiedDualBatchSampler(y_a, y_b, batch_size, num_batches, shuffle=True)[source]

Bases: object

Stratified batch sampling Provides equal representation of target classes in each batch