Configuration is divided into fine-grained reusable modules:
base
: basic configurationlogger
: logger settingmodel_manager
: loading and saving model parametersaccelerator
: whether to enable multi-GPUdataset
: dataset managementevaluator
: evaluation and metrics setting.tokenizer
: Tokenizer initiation and tokenizing setting.optimizer
: Optimizer initiation setting.scheduler
: scheduler initiation setting.model
: model construction setting.
From Sec. 2 to Sec. 11, we will describe the configuration in detail. Or you can see Examples for Quick Start.
NOTE: _*_
config are reserved fields in OpenSLU.
In OpenSLU configuration, we support simple calculation script for each configuration item. For example, we can get dataset_name
by using {dataset.dataset_name}
, and fill its value into python script 'LightChen2333/agif-slu-' + '*'
.(Without '', {dataset.dataset_name}
value will be treated as a variable).
NOTE: each item with {}
will be treated as python script.
tokenizer:
_from_pretrained_: "'LightChen2333/agif-slu-' + '{dataset.dataset_name}'" # Support simple calculation script
# `start_time` will generated automatically when start any config script, needless to be assigned.
# start_time: xxxxxxxx
base:
name: "OpenSLU" # project/logger name
multi_intent: false # whether to enable multi-intent setting
train: True # enable train else enable zero-shot
test: True # enable test during train.
device: cuda # device for cuda/cpu
seed: 42 # random seed
best_key: EMA # save model by which metric[intent_acc/slot_f1/EMA]
tokenizer_name: word_tokenizer # tokenizer: word_tokenizer for no pretrained model, else use [AutoTokenizer] tokenizer name
add_special_tokens: false # whether add [CLS], [SEP] special tokens
epoch_num: 300 # train epoch num
# eval_step: 280 # if eval_by_epoch = false and eval_step > 0, will evaluate model by steps
eval_by_epoch: true # evaluate model by epoch
batch_size: 16 # batch size
logger:
# `wandb` is supported both in single- multi-GPU,
# `tensorboard` is only supported in multi-GPU,
# and `fitlog` is only supported in single-GPU
logger_type: wandb
model_manager:
# if load_dir != `null`, OpenSLU will try to load checkpoint to continue training,
# if load_dir == `null`, OpenSLU will restart training.
load_dir: null
# The dir path to save model and training state.
# if save_dir == `null` model will be saved to `save/{start_time}`
save_dir: save/stack
# save_mode can be selected in [save-by-step, save-by-eval]
# `save-by-step` means save model only by {save_step} steps without evaluation.
# `save-by-eval` means save model by best validation performance
save_mode: save-by-eval
# save_step: 100 # only enabled when save_mode == `save-by-step`
max_save_num: 1 # The number of best models will be saved.
accelerator:
use_accelerator: false # will enable `accelerator` if use_accelerator is `true`
dataset:
# support load model from hugging-face.
# dataset_name can be selected in [atis, snips, mix-atis, mix-snips]
dataset_name: atis
# support assign any one of dataset path and other dataset split is the same as split in `dataset_name`
# train: atis # support load model from hugging-face or assigned local data path.
# validation: {root}/ATIS/dev.jsonl
# test: {root}/ATIS/test.jsonl
evaluator:
best_key: EMA # the metric to judge the best model
eval_by_epoch: true # Evaluate after an epoch if `true`.
# Evaluate after {eval_step} steps if eval_by_epoch == `false`.
# eval_step: 1800
# metric is supported the metric as below:
# - intent_acc
# - slot_f1
# - EMA
# - intent_f1
# - macro_intent_f1
# - micro_intent_f1
# NOTE: [intent_f1, macro_intent_f1, micro_intent_f1] is only supported in multi-intent setting. intent_f1 and macro_intent_f1 is the same metric.
metric:
- intent_acc
- slot_f1
- EMA
tokenizer:
# Init tokenizer. Support `word_tokenizer` and other tokenizers in huggingface.
_tokenizer_name_: word_tokenizer
# if `_tokenizer_name_` is not assigned, you can load pretrained tokenizer from hugging-face.
# _from_pretrained_: LightChen2333/stack-propagation-slu-atis
_padding_side_: right # the padding side of tokenizer, support [left/ right]
# Align mode between text and slot, support [fast/ general],
# `general` is supported in most tokenizer, `fast` is supported only in small portion of tokenizers.
_align_mode_: fast
_to_lower_case_: true
add_special_tokens: false # other tokenizer args, you can add other args to tokenizer initialization except `_*_` format args
max_length: 512
optimizer:
_model_target_: torch.optim.Adam # Optimizer class/ function return Optimizer object
_model_partial_: true # partial load configuration. Here will add model.parameters() to complete all Optimizer parameters
lr: 0.001 # learning rate
weight_decay: 1e-6 # weight decay
scheduler:
_model_target_: transformers.get_scheduler
_model_partial_: true # partial load configuration. Here will add optimizer, num_training_steps to complete all Optimizer parameters
name : "linear"
num_warmup_steps: 0
model:
# _from_pretrained_: LightChen2333/stack-propagation-slu-atis # load model from hugging-face and is not need to assigned any parameters below.
_model_target_: model.OpenSLUModel # the general model class, can automatically build the model through configuration.
encoder:
_model_target_: model.encoder.AutoEncoder # auto-encoder to autoload provided encoder model
encoder_name: self-attention-lstm # support [lstm/ self-attention-lstm] and other pretrained models those hugging-face supported
embedding: # word embedding layer
# load_embedding_name: glove.6B.300d.txt # support autoload glove embedding.
embedding_dim: 256 # embedding dim
dropout_rate: 0.5 # dropout ratio after embedding
lstm:
layer_num: 1 # lstm configuration
bidirectional: true
output_dim: 256 # module should set output_dim for autoload input_dim in next module. You can also set input_dim manually.
dropout_rate: 0.5
attention: # self-attention configuration
hidden_dim: 1024
output_dim: 128
dropout_rate: 0.5
return_with_input: true # add inputs information, like attention_mask, to decoder module.
return_sentence_level_hidden: false # if return sentence representation to decoder module
decoder:
_model_target_: model.decoder.StackPropagationDecoder # decoder name
interaction:
_model_target_: model.decoder.interaction.StackInteraction # interaction module name
differentiable: false # interaction module config
intent_classifier:
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier # intent classifier module name
layer_num: 1
bidirectional: false
hidden_dim: 64
force_ratio: 0.9 # teacher-force ratio
embedding_dim: 8 # intent embedding dim
ignore_index: -100 # ignore index to compute loss and metric
dropout_rate: 0.5
mode: "token-level-intent" # decode mode, support [token-level-intent, intent, slot]
use_multi: "{base.multi_intent}"
return_sentence_level: true # whether to return sentence level prediction as decoded input
slot_classifier:
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
layer_num: 1
bidirectional: false
force_ratio: 0.9
hidden_dim: 64
embedding_dim: 32
ignore_index: -100
dropout_rate: 0.5
mode: "slot"
use_multi: false
return_sentence_level: false
Here we take DCA-Net
as an example:
In most cases, you just need to rewrite Interaction
module:
from common.utils import HiddenData
from model.decoder.interaction import BaseInteraction
class DCANetInteraction(BaseInteraction):
def __init__(self, **config):
super().__init__(**config)
self.T_block1 = I_S_Block(self.config["output_dim"], self.config["attention_dropout"], self.config["num_attention_heads"])
...
def forward(self, encode_hidden: HiddenData, **kwargs):
...
and then you should configure your module:
base:
...
optimizer:
...
scheduler:
...
model:
_model_target_: model.OpenSLUModel
encoder:
_model_target_: model.encoder.AutoEncoder
encoder_name: lstm
embedding:
load_embedding_name: glove.6B.300d.txt
embedding_dim: 300
dropout_rate: 0.5
lstm:
dropout_rate: 0.5
output_dim: 128
layer_num: 2
bidirectional: true
output_dim: "{model.encoder.lstm.output_dim}"
return_with_input: true
return_sentence_level_hidden: false
decoder:
_model_target_: model.decoder.DCANetDecoder
interaction:
_model_target_: model.decoder.interaction.DCANetInteraction
output_dim: "{model.encoder.output_dim}"
attention_dropout: 0.5
num_attention_heads: 8
intent_classifier:
_model_target_: model.decoder.classifier.LinearClassifier
mode: "intent"
input_dim: "{model.decoder.output_dim.output_dim}"
ignore_index: -100
slot_classifier:
_model_target_: model.decoder.classifier.LinearClassifier
mode: "slot"
input_dim: "{model.decoder.output_dim.output_dim}"
ignore_index: -100
Oops, you finish all model construction. You can run script as follows to train model:
python run.py -cp config/dca_net.yaml [-ds atis]
Sometimes, interaction then classification
order can not meet your needs. Therefore, you should simply rewrite decoder for flexible interaction order:
Here, we take stack-propagation
as an example:
- We should rewrite interaction module for
stack-propagation
from common.utils import ClassifierOutputData, HiddenData
from model.decoder.interaction.base_interaction import BaseInteraction
class StackInteraction(BaseInteraction):
def __init__(self, **config):
super().__init__(**config)
...
def forward(self, intent_output: ClassifierOutputData, encode_hidden: HiddenData):
...
- We should rewrite
StackPropagationDecoder
for stack-propagation interaction order:
from common.utils import HiddenData, OutputData
class StackPropagationDecoder(BaseDecoder):
def forward(self, hidden: HiddenData):
pred_intent = self.intent_classifier(hidden)
hidden = self.interaction(pred_intent, hidden)
pred_slot = self.slot_classifier(hidden)
return OutputData(pred_intent, pred_slot)
- Then we can easily combine general model by
config/stack-propagation.yaml
configuration file:
base:
...
...
model:
_model_target_: model.OpenSLUModel
encoder:
...
decoder:
_model_target_: model.decoder.StackPropagationDecoder
interaction:
_model_target_: model.decoder.interaction.StackInteraction
differentiable: false
intent_classifier:
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
... # parameters needed __init__(*)
mode: "token-level-intent"
use_multi: false
return_sentence_level: true
slot_classifier:
_model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
... # parameters needed __init__(*)
mode: "slot"
use_multi: false
return_sentence_level: false
- You can run script as follows to train model:
python run.py -cp config/stack-propagation.yaml