tmnt.data_loading¶
File/module contains routines for loading in text documents to sparse matrix representations for efficient neural variational model training.
Functions
get_llm (model_name) |
|
get_llm_dataloader (data, bow_vectorizer, ...) |
|
get_llm_model (model_name) |
|
get_llm_paired_dataloader (data_a, data_b, ...) |
|
get_llm_tokenizer (model_name) |
|
get_unwrapped_llm_dataloader (data, ...[, ...]) |
|
load_vocab (vocab_file[, encoding]) |
Load a pre-derived vocabulary, assumes format consisting of a single word on each line. |
sparse_batch_collate (batch) |
Collate function which to transform scipy coo matrix to pytorch sparse tensor |
sparse_coo_to_tensor (coo) |
Transform scipy coo matrix to pytorch sparse tensor |
to_label_matrix (yvs[, num_labels]) |
Convert [(id1, id2, ...), (id1,id2,...) . |
Classes
PairedDataLoader (data_loader1, data_loader2) |
|
RoundRobinDataLoader (data_loaders) |
|
SingletonWrapperLoader (data_loader) |
|
SparseDataLoader (X, y[, shuffle, drop_last, ...]) |
|
SparseDataset (data, targets) |
Custom Dataset class for scipy sparse matrix |
StratifiedDualBatchSampler (y_a, y_b, ...[, ...]) |
Stratified batch sampling Provides equal representation of target classes in each batch |
StratifiedPairedLLMLoader (data_a, data_b, ...) |
-
to_label_matrix
(yvs, num_labels=0)[source]¶ Convert [(id1, id2, …), (id1,id2,…) … ] to Numpy matrix with multi-labels
-
class
SparseDataset
(data, targets)[source]¶ Bases:
object
Custom Dataset class for scipy sparse matrix
Parameters: - data (Union[ndarray, coo_matrix, csr_matrix]) –
- targets (Optional[Union[ndarray, coo_matrix, csr_matrix]]) –
-
sparse_coo_to_tensor
(coo)[source]¶ Transform scipy coo matrix to pytorch sparse tensor
Parameters: coo (coo_matrix) –
-
sparse_batch_collate
(batch)[source]¶ Collate function which to transform scipy coo matrix to pytorch sparse tensor
-
class
SparseDataLoader
(X, y, shuffle=False, drop_last=False, batch_size=1024, device='cpu')[source]¶ Bases:
DataLoader
Parameters: - X (Union[csr_matrix, coo_matrix]) –
- y (array) –
- drop_last (bool) –
- batch_size (Optional[int]) –