Skip to content

Latest commit

 

History

History
348 lines (302 loc) · 12.5 KB

README.md

File metadata and controls

348 lines (302 loc) · 12.5 KB

Configuation

1. Introduction

Configuration is divided into fine-grained reusable modules:

  • base: basic configuration
  • logger: logger setting
  • model_manager: loading and saving model parameters
  • accelerator: whether to enable multi-GPU
  • dataset: dataset management
  • evaluator: evaluation and metrics setting.
  • tokenizer: Tokenizer initiation and tokenizing setting.
  • optimizer: Optimizer initiation setting.
  • scheduler: scheduler initiation setting.
  • model: model construction setting.

From Sec. 2 to Sec. 11, we will describe the configuration in detail. Or you can see Examples for Quick Start.

NOTE: _*_ config are reserved fields in OpenSLU.

Configuration Item Script

In OpenSLU configuration, we support simple calculation script for each configuration item. For example, we can get dataset_name by using {dataset.dataset_name}, and fill its value into python script 'LightChen2333/agif-slu-' + '*'.(Without '', {dataset.dataset_name} value will be treated as a variable).

NOTE: each item with {} will be treated as python script.

tokenizer:
  _from_pretrained_: "'LightChen2333/agif-slu-' + '{dataset.dataset_name}'"  # Support simple calculation script

base Config

# `start_time` will generated automatically when start any config script, needless to be assigned.
# start_time: xxxxxxxx               
base:
  name: "OpenSLU"                  # project/logger name
  multi_intent: false              # whether to enable multi-intent setting
  train: True                      # enable train else enable zero-shot
  test: True                       # enable test during train.
  device: cuda                     # device for cuda/cpu
  seed: 42                         # random seed
  best_key: EMA                    # save model by which metric[intent_acc/slot_f1/EMA]
  tokenizer_name: word_tokenizer   # tokenizer: word_tokenizer for no pretrained model, else use [AutoTokenizer] tokenizer name
  add_special_tokens: false        # whether add [CLS], [SEP] special tokens
  epoch_num: 300                   # train epoch num
#  eval_step: 280                  # if eval_by_epoch = false and eval_step > 0, will evaluate model by steps
  eval_by_epoch: true              # evaluate model by epoch
  batch_size: 16                   # batch size

logger Config

logger:
  # `wandb` is supported both in single- multi-GPU,
  # `tensorboard` is only supported in multi-GPU,
  # and `fitlog` is only supported in single-GPU
  logger_type: wandb 

model_manager Config

model_manager:
  # if load_dir != `null`, OpenSLU will try to load checkpoint to continue training,
  # if load_dir == `null`, OpenSLU will restart training.
  load_dir: null
  # The dir path to save model and training state.
  # if save_dir == `null` model will be saved to `save/{start_time}`
  save_dir: save/stack
  # save_mode can be selected in [save-by-step, save-by-eval]
  # `save-by-step` means save model only by {save_step} steps without evaluation.
  # `save-by-eval` means save model by best validation performance
  save_mode: save-by-eval 
  # save_step: 100         # only enabled when save_mode == `save-by-step`
  max_save_num: 1          # The number of best models will be saved.

accelerator Config

accelerator:
  use_accelerator: false   # will enable `accelerator` if use_accelerator is `true`

dataset Config

dataset:
  # support load model from hugging-face.
  # dataset_name can be selected in [atis, snips, mix-atis, mix-snips]
  dataset_name: atis
  # support assign any one of dataset path and other dataset split is the same as split in `dataset_name`
  # train: atis # support load model from hugging-face or assigned local data path.
  # validation: {root}/ATIS/dev.jsonl 
  # test: {root}/ATIS/test.jsonl

evaluator Config

evaluator:
  best_key: EMA        # the metric to judge the best model
  eval_by_epoch: true   # Evaluate after an epoch if `true`.
  # Evaluate after {eval_step} steps if eval_by_epoch == `false`.
  # eval_step: 1800
  # metric is supported the metric as below:
  # - intent_acc
  # - slot_f1
  # - EMA
  # - intent_f1
  # - macro_intent_f1
  # - micro_intent_f1
  # NOTE: [intent_f1, macro_intent_f1, micro_intent_f1] is only supported in multi-intent setting. intent_f1 and macro_intent_f1 is the same metric.
  metric:
    - intent_acc
    - slot_f1
    - EMA

tokenizer Config

tokenizer:
  # Init tokenizer. Support `word_tokenizer` and other tokenizers in huggingface.
    _tokenizer_name_: word_tokenizer 
    # if `_tokenizer_name_` is not assigned, you can load pretrained tokenizer from hugging-face.
    # _from_pretrained_: LightChen2333/stack-propagation-slu-atis
    _padding_side_: right            # the padding side of tokenizer, support [left/ right]
    # Align mode between text and slot, support [fast/ general],
    # `general` is supported in most tokenizer, `fast` is supported only in small portion of tokenizers.
    _align_mode_: fast
    _to_lower_case_: true
    add_special_tokens: false        # other tokenizer args, you can add other args to tokenizer initialization except `_*_` format args
    max_length: 512

optimizer Config

optimizer:
  _model_target_: torch.optim.Adam # Optimizer class/ function return Optimizer object
  _model_partial_: true            # partial load configuration. Here will add model.parameters() to complete all Optimizer parameters
  lr: 0.001                        # learning rate
  weight_decay: 1e-6               # weight decay

scheduler Config

scheduler:
  _model_target_: transformers.get_scheduler
  _model_partial_: true     # partial load configuration. Here will add optimizer, num_training_steps to complete all Optimizer parameters
  name : "linear"
  num_warmup_steps: 0

model Config

model:
  # _from_pretrained_: LightChen2333/stack-propagation-slu-atis # load model from hugging-face and is not need to assigned any parameters below.
  _model_target_: model.OpenSLUModel # the general model class, can automatically build the model through configuration.

  encoder:
    _model_target_: model.encoder.AutoEncoder # auto-encoder to autoload provided encoder model
    encoder_name: self-attention-lstm         # support [lstm/ self-attention-lstm] and other pretrained models those hugging-face supported

    embedding:                                # word embedding layer
#      load_embedding_name: glove.6B.300d.txt  # support autoload glove embedding.  
      embedding_dim: 256                      # embedding dim
      dropout_rate: 0.5                       # dropout ratio after embedding

    lstm:
      layer_num: 1                           # lstm configuration
      bidirectional: true
      output_dim: 256                        # module should set output_dim for autoload input_dim in next module. You can also set input_dim manually.
      dropout_rate: 0.5

    attention:                              # self-attention configuration
      hidden_dim: 1024
      output_dim: 128
      dropout_rate: 0.5

    return_with_input: true                # add inputs information, like attention_mask, to decoder module.
    return_sentence_level_hidden: false    # if return sentence representation to decoder module

  decoder:
    _model_target_: model.decoder.StackPropagationDecoder  # decoder name
    interaction:
      _model_target_: model.decoder.interaction.StackInteraction # interaction module name
      differentiable: false                                      # interaction module config

    intent_classifier:
      _model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier # intent classifier module name
      layer_num: 1
      bidirectional: false
      hidden_dim: 64
      force_ratio: 0.9                                        # teacher-force ratio
      embedding_dim: 8                                        # intent embedding dim
      ignore_index: -100                                      # ignore index to compute loss and metric
      dropout_rate: 0.5
      mode: "token-level-intent"                              # decode mode, support [token-level-intent, intent, slot]
      use_multi: "{base.multi_intent}"
      return_sentence_level: true                             # whether to return sentence level prediction as decoded input

    slot_classifier:
      _model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
      layer_num: 1
      bidirectional: false
      force_ratio: 0.9
      hidden_dim: 64
      embedding_dim: 32
      ignore_index: -100
      dropout_rate: 0.5
      mode: "slot"
      use_multi: false
      return_sentence_level: false

Implementing a New Model

1. Interaction Re-Implement

Here we take DCA-Net as an example:

In most cases, you just need to rewrite Interaction module:

from common.utils import HiddenData
from model.decoder.interaction import BaseInteraction
class DCANetInteraction(BaseInteraction):
    def __init__(self, **config):
        super().__init__(**config)
        self.T_block1 = I_S_Block(self.config["output_dim"], self.config["attention_dropout"], self.config["num_attention_heads"])
        ...

    def forward(self, encode_hidden: HiddenData, **kwargs):
        ...

and then you should configure your module:

base:
  ...

optimizer:
  ...

scheduler:
  ...

model:
  _model_target_: model.OpenSLUModel
  encoder:
    _model_target_: model.encoder.AutoEncoder
    encoder_name: lstm

    embedding:
      load_embedding_name: glove.6B.300d.txt
      embedding_dim: 300
      dropout_rate: 0.5

    lstm:
      dropout_rate: 0.5
      output_dim: 128
      layer_num: 2
      bidirectional: true
    output_dim: "{model.encoder.lstm.output_dim}"
    return_with_input: true
    return_sentence_level_hidden: false

  decoder:
    _model_target_: model.decoder.DCANetDecoder
    interaction:
      _model_target_: model.decoder.interaction.DCANetInteraction
      output_dim: "{model.encoder.output_dim}"
      attention_dropout: 0.5
      num_attention_heads: 8

    intent_classifier:
      _model_target_: model.decoder.classifier.LinearClassifier
      mode: "intent"
      input_dim: "{model.decoder.output_dim.output_dim}"
      ignore_index: -100

    slot_classifier:
      _model_target_: model.decoder.classifier.LinearClassifier
      mode: "slot"
      input_dim: "{model.decoder.output_dim.output_dim}"
      ignore_index: -100

Oops, you finish all model construction. You can run script as follows to train model:

python run.py -cp config/dca_net.yaml [-ds atis]

2. Decoder Re-Implement

Sometimes, interaction then classification order can not meet your needs. Therefore, you should simply rewrite decoder for flexible interaction order:

Here, we take stack-propagation as an example:

  1. We should rewrite interaction module for stack-propagation
from common.utils import ClassifierOutputData, HiddenData
from model.decoder.interaction.base_interaction import BaseInteraction
class StackInteraction(BaseInteraction):
    def __init__(self, **config):
        super().__init__(**config)
        ...

    def forward(self, intent_output: ClassifierOutputData, encode_hidden: HiddenData):
        ...
  1. We should rewrite StackPropagationDecoder for stack-propagation interaction order:
from common.utils import HiddenData, OutputData
class StackPropagationDecoder(BaseDecoder):

    def forward(self, hidden: HiddenData):
        pred_intent = self.intent_classifier(hidden)
        hidden = self.interaction(pred_intent, hidden)
        pred_slot = self.slot_classifier(hidden)
        return OutputData(pred_intent, pred_slot)
  1. Then we can easily combine general model by config/stack-propagation.yaml configuration file:
base:
  ...

...

model:
  _model_target_: model.OpenSLUModel

  encoder:
    ...

  decoder:
    _model_target_: model.decoder.StackPropagationDecoder
    interaction:
      _model_target_: model.decoder.interaction.StackInteraction
      differentiable: false

    intent_classifier:
      _model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
      ... # parameters needed __init__(*)
      mode: "token-level-intent"
      use_multi: false
      return_sentence_level: true

    slot_classifier:
      _model_target_: model.decoder.classifier.AutoregressiveLSTMClassifier
      ... # parameters needed __init__(*)
      mode: "slot"
      use_multi: false
      return_sentence_level: false
  1. You can run script as follows to train model:
python run.py -cp config/stack-propagation.yaml