README.Rmd

---
title: 
output: 
  github_document
---

# motifr <a href='https://marioangst.github.io/motifr/'><img src='man/figures/logo.svg' align="right" height="139" /></a>

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  dev = "png"
)
```

<!-- badges: start -->
[![Build Status](https://travis-ci.org/marioangst/motifr.svg?branch=master)](https://travis-ci.org/marioangst/motifr)
[![CRAN\_Release\_Badge](https://www.r-pkg.org/badges/version-ago/motifr)](https://CRAN.R-project.org/package=motifr)
[![Total downloads badge](https://cranlogs.r-pkg.org/badges/grand-total/motifr?color=blue)](https://CRAN.R-project.org/package=motifr)
<!-- badges: end -->

This package provides tools to analyse multi-level networks in terms of _motifs_. 

Multi-level networks combine multiple networks in one representation, e.g. social-ecological networks, which connect a social network (e.g. interactions among fishermen) with an ecological network (e.g. interactions between fish species) and the ties in between (e.g. fishers who fish specific species).

[Motifs](https://en.wikipedia.org/wiki/Network_motif) are small configurations of nodes and edges (subgraphs) within an overall network.

Package features include:

  - Visualization: The package provides functions to visualize multi-level networks, based on [ggraph](https://CRAN.R-project.org/package=ggraph).
  
  - Motif counts: The package is in many parts a R wrapper for the excellent [SESMotifAnalyser](https://gitlab.com/t.seppelt/sesmotifanalyser) Python framework written by Tim Seppelt to count multi-level network motifs, compare them to a baseline and much more. Only parts of of SESMotifAnalyser are yet wrapped, so consult the python framework for additional functionality. 
  
- Contributions of edges to motifs: motifr further identifies and visualizes functional gaps and critical edges in multi-level networks based on contributions of existing or potential edges to given motifs (this is theoretically motivated by network theories of functional fit and misfit).


## Installation

Due to the package’s tight integration with the Python framework SESMotifAnalyser, we recommend explicitly installing the associated sma module through reticulate.

```{r eval=FALSE}
reticulate::py_install("sma", pip = TRUE)
```

You can then install motifr from CRAN:

```{r eval=FALSE}
install.packages("motifr")
```

To install the development version from github, using devtools:

```{r eval=FALSE}
devtools::install_github("marioangst/motifr")
```

Please report any issues that occur when using the package by creating an issue in the [issue tracker on github](https://github.com/marioangst/motifr/issues).

If you use motifr, please cite it when publishing results. To check how, use:

```{r}
citation("motifr")
```


## Input

motifr currently can handle unweighted directed and undirected networks. The package supports motifs distributed across a maximum of three levels currently, while the total number of levels in the network is theoretically unrestricted.

Network data should be prepared as statnet network objects or igraph/ tidygraph graph objects with a numeric vertex attribute to specify a level for each node (named e.g. "lvl") for best results.

## Introduction and key functionality

First, we load the package.

```{r}
library(motifr)
```

### Visualize a multi-level network

The following network is an example network from an empirical analysis of wetlands management in Switzerland. It consists of two levels - one level specifies a network of relations between actors. A second level specifies a network of relations between different activities occurring in the wetland, based on causal interdependence among activities. Links between the levels specify which actors carry out which activities.

It is possible to specify layouts for every network level separately. Below, one level is plotted based on a circle layout, the second one based on Kamada-Kawai.

```{r}
plot_mnet(
  net = ml_net,
  lvl_attr = "sesType",
  layouts = list("kk", "circle"),
  directed = FALSE
)
```

motifr provides a reliable starting point for multi-level network visualization but is focused on motif analyis at its core. For advanced visualization of multi-level networks we recommend pairing [ggraph](https://CRAN.R-project.org/package=ggraph) and [graphlayouts](https://CRAN.R-project.org/package=graphlayouts). [This blog post](http://blog.schochastics.net/post/visualizing-multilevel-networks-with-graphlayouts/) provides an excellent introduction.

### Selecting motifs

See the vignette on the motif zoo (``vignette("motif_zoo")``) for details on nomenclature for motifs (motif identifier strings). We highly recommend the use of two helper functions implemented in motifr to ensure that the software interprets the motif identifier provided as intended by the analyst.

- use ``explore_motifs()`` to launch a shiny app where all motifs implemented for analysis with motifr can be displayed. You can pass your own network to ``explore_motifs()`` to see what motifs mean exactly for your data. For example, if your network is stored in a object named ``my_net`` with a level attribute ``lvl`` you can explore motifs within it interactively using ``explore_motifs(net = my_net, lvl_attr = "lvl")``. Be aware that if your network does not contain a specific motif, it cannot be displayed.

- check a specific motif of interest using ``show_motif()``, which will either illustrate the motif in a dummy example network or, if you pass a network object to the function, in your network. ``show_motif()`` is specifically helpful to explore the impact of position matching (see ``vignette("motif_zoo")`` for more details).

### Count motifs

Motifs can be counted using the versatile function ``count_motifs()``. It takes as parameters a statnet network or igraph graph object (use ``ml_net`` or ``dummy_net`` provided by this package as examples) and a list of motif identifiers (see below) specifying the motifs. 

Let's quickly check out two classic examples of three-node, two-level motifs (open and closed triangles) in the wetlands management network introduced above:

```{r  out.width="300px"}
show_motif(motif = "1,2[I.C]", net = ml_net, label = TRUE, directed = FALSE) # open ('1,2[I.C]') triangle
show_motif(motif = "1,2[II.C]", net = ml_net, label = TRUE, directed = FALSE) # closed ('1,2[II.C]') triangle
```


Let's count the number of of these motifs in the entire network.

```{r}
motifs <- list("1,2[I.C]", "1,2[II.C]") # open and closed triangle

count_motifs(ml_net, motifs, directed = FALSE)
```

An exploratory approach can be taken by calling ``motif_summary()``. This function counts the occurrences of a couple of basic motifs. Furthermore it computes expectations and variances for the occurrence of these motifs in a modified Erdős-Rényi or so-called "Actor's choice" model. See the package ``vignette("random_baselines")`` for details.

```{r}
motif_summary(ml_net)
```

### Identify gaps and critical edges

motifr makes it possible to identify gaps and critical edges in multi-level networks. This is motivated by theories of functional fit and misfit in networks, which posit that certain motifs are especially valuable for network outcomes (depending on the context). 

In relation to gaps, we can therefore try to identify potential edges that would create a large number of a given closed motif if they were to exist ("activated" or "flipped"). The number of such motifs created by an edge is their contribution. For example, we can get all edges that would create closed triangles (``"1,2[II.C]"``), including the information about how many such triangles they would create for the wetlands case study network:

```{r}
gaps <- identify_gaps(ml_net, motif = "1,2[II.C]")
head(gaps)
```

We can also plot these gaps in various ways in our network, including the option to only look at gaps above a certain weight (contribution) and different levels of focus to only show nodes involved in such gaps. Here again for the wetlands management network, only showing gaps with a weight above 5 and subsetting the level where we analyze gaps to only contain nodes involved in gaps.

```{r}
plot_gaps(ml_net,
  "1,2[II.C]",
  level = -1,
  subset_graph = "partial",
  cutoff = 5, label = TRUE
)
```

``identify_gaps`` has a sibling in ``critical_dyads``. Critical_dyads works in reverse to identifying gaps - it analyses for every existing edge how many instances of a given open motif would appear if the edge was to be removed. Below an example showing critical dyads in a plot of the full wetlands management example network.

```{r}
plot_critical_dyads(ml_net,
  "1,2[I.C]",
  level = -1,
  subset_graph = "none",
  cutoff = 3, label = FALSE
)
```

### Comparing motif occurrence to a baseline model

Motifr can be used to simulate a baseline of networks to compare against. Motif counts in an empirical network can then be compared to the distribution of motif counts in the networks simulated from the baseline model. Four different ways of specifying models for baseline distributions are implemented in motifr, from a basic Erdős–Rényi model to the possiblity of supplying an exponential random graph model (ERGM) fit to draw simulations from. See the ``vignette("random_baselines")`` for details.

As an illustration, we simulate networks from a "Actor's choice" baseline model here as a baseline to compare counts of open and closed triangles in the wetland management network against. This model keeps all ties fixed except ties on a specifc level. On this level (here set by setting level to 1, which is the actor level in this network), ties are allowed to vary based on a fixed probability (Erdős-Rényi) model.

We find that open triangles occur much less frequently and closed triangles much more often than in the baseline model. 

This is an unsurprising result - everything else would have been concerning. It indicates that actors tend to close triangles across levels to other actors working on the same wetland management tasks much more often compared to what would be expected if they just chose random collaboration partners. We would expect such "fit to task" in a network of professional organizations working in wetland management. We highlight this interpretation because we want to stress that baseline models need to be judged very carefully for what they represent substantially. This is why motifr allows for a variety of baseline model configurations, (including fitted ergm objects).

```{r}
motifs <- list("1,2[I.C]", "1,2[II.C]") # open ('1,2[I.C]') and closed ('1,2[II.C]') triangles

compare_to_baseline(ml_net,
                    model = "actors_choice",
                    level = 1,
                    motifs = motifs, 
                    n = 50, 
                    directed = FALSE)
```