Common Formats

Input Documents

TMNT aims for flexibility to handle data and documents in any one of a number of data formats. However, many of the examples and tutorials use a “JSON list” format consisting of one or more files with each file containing a separate serialized JSON object on each line. Each json object is assumed to have a single field that contains the text of each document. Additional meta-data, if available, is typically contained in other (flat) fields. Of particular note, a label field is required for supervised or semi-supervised models. Below is an example:

{"id": "3664", "text": "This is the text of a document about science.", "label": "science"}
{"id": "3665", "text": "This is the text of a document about politics.", "label": "politics"}
...

Configuration Files

Models and their associated hyperparameters are represented through a simple JSON file that contains various model attributes and sub-attributes. Configuration files are described in more detail here: Configuration/Hyperparameter Options

Model Space Files

Model selection is a first-class feature of TMNT. The space of possible models to consider during model selection is specified using a YAML file format with various conventions described in detail here: model-selection-label