Skip to content

Training HERON Compatible ARMAs in RAVEN

Paul Talbot edited this page Apr 30, 2021 · 1 revision

Introduction

One of the principle tools HERON leverages for stochastic technoeconomic analysis is the synthetic histories generated by RAVEN. These synthetic histories provide the stochastic boundary conditions for dispatch optimization.

However, HERON has been built around specific ARMA structures. This guide will help guide you in setting up your RAVEN workflow to train your ARMA ROMs.

ARMA Structure

The synthetic histories produced by RAVEN and used in HERON currently originate in a trained ARMA ROM from a RAVEN training workflow. Examples of these training workflows can be found in HERON at HERON/tests/integration_tests/ARMA.

Synthetic histories for use in HERON are expected to be three-dimensional:

  • Time, or the micro-step time evolution variable (e.g. hourly),
  • Cluster, or the clustering identification variable (e.g. 20 clusters to represent a year of data)
  • Year, or the macro-step time evolution variable (e.g. year). Both Time and Year can be renamed in the HERON input in the <Case> node. In this guide, we will refer to them as Time and Year for convenience.

RAVEN Training Input File

This structure comes from an ARMA ROM that is trained using the <Segment grouping='interpolate'> node. For example, in the RAVEN workflow for training the ROM (based on train_sine.xml):

  <Models>
    <ROM name="arma" subType="ARMA">
      <Target>Signal, Time</Target>
      <Features>scaling</Features>
      <pivotParameter>Time</pivotParameter>
      <P>0</P>
      <Q>0</Q>
      <Fourier>10</Fourier>
      <Segment grouping='interpolate'> <!-- note this node specifically -->
        <macroParameter>Year</macroParameter>
        <Classifier class='Models' type='PostProcessor'>classifier</Classifier>
        <subspace divisions='365'>Time</subspace>
      </Segment>
      <reseedCopies>False</reseedCopies>
      <seed>42</seed>
    </ROM>

The node above would divide the training data into 365 divisions per year for clustering. Note also that a <Classifier> is specified. This determines how the divisions will be clustered for analysis. The classifier for this case is as follows:

   <PostProcessor name="classifier" subType="DataMining">
      <KDD labelFeature="labels" lib="SciKitLearn">
        <Features>Signal</Features>
        <SKLtype>cluster|KMeans</SKLtype>
        <n_clusters>20</n_clusters> <!-- note this node specifically -->
      </KDD>
    </PostProcessor>

Note that the <n_clusters> node is requesting 20 clusters for our data. Other clustering strategies can be used and are enumerated in the RAVEN documentation.

Input Data for Training

The input CSV for training an interpolated ARMA is a RAVEN-style HistorySet CSV, which consists of a "header" CSV with corresponding "auxiliary" CSVs. For example, for head.csv:

scaling,Year,filename
1,2020,data_2020.csv
1,2025,data_2025.csv

and corresponding data_2020.csv:

Time,Signal
0,1.01
1,1.005
...

and similarly for data_2025.csv. Note that the head.csv correlates the Year with the file containing the signal data for that year. RAVEN will train the ARMA ROM to interpolate for the years between 2020 and 2025, while training ROMs directly on 2020 and 2025. Many different years can be included, and RAVEN will interpolate for all of them. See the RAVEN manual for more information.

Training no-cluster no-year ARMAs for use in HERON

What if you only want to run a single year without clustering? It is possible to "trick" the RAVEN ARMA to do that with some minor modifications to the training data and RAVEN training input file.

Modifying the Training Data

First, we assume you have a header CSV (for example, head.csv) that looks something like:

scaling,filename
1,mydata.csv

Modify this by inserting a Year column, with whatever value you want, for example 1:

scaling,Year,filename
1,1,mydata.csv

then copy the first entry after the header and paste it at the end, modifying the Year value to be the next integer value:

scaling,Year,filename
1,1,mydata.csv
1,2,mydata.csv

Note that the filename stays the same; this will convince RAVEN that the two years are fundamentally the same, but give you the Year structure we're looking for.

Modifying the RAVEN Training Input

We assume you already have a ROM training input for this data. This will need to be extended to include interpolated segmenting and clustering; however, like we tricked it into being multiyear, we will also trick it into being a single cluster, which is identical to having no clustering. For example, from the train_sine.xml example:

    <ROM name="arma" subType="ARMA">
      <Target>Signal, Time</Target>
      <Features>scaling</Features>
      <pivotParameter>Time</pivotParameter>
      <P>0</P>
      <Q>0</Q>
      <Fourier>10</Fourier>
      <Segment grouping='interpolate'>
        <macroParameter>Year</macroParameter>
        <Classifier class='Models' type='PostProcessor'>classifier</Classifier>
        <subspace divisions='1'>Time</subspace>
      </Segment>
      <reseedCopies>False</reseedCopies>
      <seed>42</seed>
    </ROM>

    <PostProcessor name="classifier" subType="DataMining">
      <KDD labelFeature="labels" lib="SciKitLearn">
        <Features>Signal</Features>
        <SKLtype>cluster|KMeans</SKLtype>
        <n_clusters>1</n_clusters>
      </KDD>
    </PostProcessor>

Note we use divisions='1' and <n_clusters>1</n_clusters> to get the clustering format correct. Now the ARMA that is trained will have the correct (Time, Cluster, Year) structure for use in HERON. Note that <ProjectTime> in HERON still should be at least 2 to run without issues.