Configuration/Hyperparameter Options¶
The following configuration/hyperparameter options are available in TMNT
Option | Type | Description |
---|---|---|
epochs | integer | Number of training epochs (should be fixed to a single value for hyperband) |
lr | real | Learning rate |
batch_size | integer | Batch size to use during learning |
latent_distribution | subconfig | Subconfigurations with dist_type:[vmf|gaussian|logistic_gaussian] with kappa for vmf and alpha for logistic_gaussian |
optimizer | categorical | MXNet optimizer (adam, sgd, etc.) |
n_latent | integer | Number of latent topics |
enc_hidden_dim | integer | Number of dimensions for encoding layer |
num_enc_layers | integer | Number of encoder fully connected layers |
enc_dr | real | Dropout used for encoder layers |
coherence_loss_wt | real | Coefficient to weight coherence loss term |
redundancy_loss_wt | real | Coefficient to weight redundancy loss term |
embedding | subconfig | Subconfigurations with source categorical and optional size configuration for source: random |
The following sub-configurations are used to define sub-spaces for latent_distribution
and embedding
configuration options
Some details on these options follows.
Pre-trained Word Embeddings¶
Word embeddings are used within TMNT to initialize the first fully connected
layer within the encoder (this is equivalent to averaging the word embeddings
for all in-vocabulary tokens). A pre-trained embedding can be used within
a configuration by simply including the GluonNLP registered name for the embedding
as the value of the embedding_source
configuration option. All embedding
names have the form source:name
where source is the type of embedding. There
are four possible sources: glove
, fasttext
, word2vec
and file
.
So, for example, glove:glove.42B.300d
refers to Glove embeddings with 300 dimensions
trained on 42 billion tokens. Available Glove embeddings can be obtained via:
>>> import gluonnlp as nlp
>>> nlp.embedding.list_sources('GloVe')
['glove.42B.300d', 'glove.6B.100d', 'glove.6B.200d', 'glove.6B.300d', 'glove.6B.50d', 'glove.840B.300d', 'glove.twitter.27B.100d', 'glove.twitter.27B.200d', 'glove.twitter.27B.25d', 'glove.twitter.27B.50d']
>>> nlp.embedding.list_sources() ## will list ALL sources (from glove, word2vec and fasttext "sources")
See https://gluon-nlp.mxnet.io/api/modules/embedding.html#gluonnlp.embedding.TokenEmbedding
for other embeddings
available.
It is also possible to use custom user-pretrained embeddings using the file
source. These embeddings
should be in a compressed .npz
file as generated using the train_embeddings.py
script.
Latent Distributions¶
TMNT provides three latent distributions, gaussian
, logisic_gaussian
and vmf
(von Mises Fisher).
After hundreds of experiments across many datasets, we have found that the vmf
distribution generally works
best. Besides providing generally better coherence and perplexity, the vmf
distribution allows much greater
flexibility to trade off coherence for perplexity or vice-versa. The logisic_gaussian
distribution, however,
does tend to work as well or better than vmf
with larger numbers of topics (e.g. over 80). The
gaussian
distribution is not recommended under most circumstances and is left here for comparison.