The configuration files¶
Users interact with daart through a set of configuration (yaml) files. These files point to the data directories, define the type of model to fit, and specify a wide range of hyperparameters.
An example set of configuration files can be found here. When training a model on a new dataset, you must copy/paste these templates onto your local machine and update the arguments to match your data.
There are three configuration files:
data: where data is stored and model input type
model: model class and various network hyperparameters
train: training epochs, batch size, etc.
The sections below describe the most important parameters in each file; see the example configs for all possible options.
Data¶
input_type: name of directory containing input data: ‘markers’ | ‘features’ | …
expt_ids: list of experiment ids used for training the model
ignore_class: specifies index of the column in hand/heuristic label files that should be ignored when computing the loss function. 1s in this column mean “this frame has not been scored”; if every frame has been scored, set this to a negative value like -100.
weight_classes: false to weight each class equally in loss function; true to weight each class inversely proportional to its frequency
data_dir: absolute path to directory that contains the data
results_dir: absolute path to directory that stores model fitting results
Model¶
labmda_weak: weight on heuristic/pseudo label classification loss
lambda_strong: weight on hand label classification loss (can always leave this as 1)
lambda_recon: weight on input reconstruction loss
lambda_pred: weight on next-step-ahead prediction loss
So, for example, to fit a fully supervised classification model, set lambda_strong: 1 and
all other “lambda” options to 0.
To fit a model that uses heuristic labels, set lambda_strong: 1, lambda_weak: 1, and
all other “lambda” options to 0. You can try several values of lambda_weak to see what works
best for your data.
Train¶
min/max_epochs: control length of training
enable_early_stop: exit training early if validation loss begins to increase
trial_splits: fraction of data to use for train;val;test;gap; you can always set “gap” to 0 as long as you validate your model on completely held-out videos