LTER_allometry.Rmd

---
title: "LTER_allometry"
author: "CML & CAK"
date: "2022-09-22"
output: 
  pdf_document: 
    latex_engine: xelatex
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Introduction

We selected the and_vertebrates database from the LTER site ( https://lter.github.io/lterdatasampler/reference/and_vertebrates.html):

The dataset includes count and size data for cutthroat trout and salamanders in 
clear cut or old growth sections of Mack Creek, Andrews Forest LTER.

```{r load libraries and import and open data, echo=FALSE}

library(here) #refiere la ruta a la carpeta del proyecto
library(tidyverse) #incluye la librería ggplot2
library(ggplot2) #VISUALIZACIÓN DE DATOS
library(RColorBrewer) #paletas de color
library(plotly) #hacer los gráficos interactivos
library(patchwork) #combinar gráficos de ggplot
library(tidyverse)
library(readr) #leer archivos
library(readxl) #leer archivos excel
library(dplyr) #manipular datos
library(tidyr) #ordenar y trasformar datasets
library(stringr) #manipular caracteres
library(forcats) #manipular factores
library(lubridate) #manipular fechas
library("easystats")
library("effects")
library("equatiomatic")
library("parameters")
library("report")
library("visreg")
library("performance")
library("DHARMa")
library("modelbased")
library(remotes)

# remotes::install_github("lter/lterdatasampler")  #Solo necesario para la descarga de internet
# ourdata <- lterdatasampler::and_vertebrates ## Esto se hace solo una vez, descarga de internet
# write.csv(ourdata, file = ("data/ourdata.csv"))
ourdata <- read_csv("data/ourdata.csv")
head(ourdata)

```

## Plot the raw data

In this case, we are interested in modeling  the length-mass relationships for cutthroat
trout and salamanders in Mack Creek:

```{r plot raw data, echo=FALSE}

ourdata$species <- as.factor(ourdata$species)
ggplot(ourdata, aes(x = length_1_mm,
                    y = weight_g,
                    color = species)) +
  geom_point()
```

There are three species sampled but the Cascade torrent salamander is almost
absent from the dataset, so we decided to ignore this species from our analysis.

First we subsampled the dataset and then plotted it:

```{r subsample the dataset}

data_species2 <- ourdata %>%
  subset(species != "Cascade torrent salamander")

data_species2$species <- as.factor(data_species2$species)
ggplot(data_species2, aes(x = length_1_mm,
                    y = weight_g,
                    color = species)) +
  geom_point()

```

The relationship between length and weight of these species is exponential,
thus we decided to transform our variables into logarithm (we checked log2
for length and log10 for biomass)

The reason for log10 transform the biomass is because we have many small values 
(near zero) and wanted to enlarge the distribution "scale"


```{r transform x,y data and plot it}
data_species_log <- data_species2
data_species_log$log_length1 <- log10(data_species_log$length_1_mm)
data_species_log$log_weight <- log10(data_species_log$weight_g)

ggplot(data_species_log, aes(x = log_length1,
                             y = log_weight,
                             color = species)) +
  geom_point()

```

Result: It worked. The relationship is almost linear.
We now model the allometric relationship between the length and biomass of these species and check
whether the curves per species are different. We will check if the models 
"follow" the assumptions : 

```{r first model}
lm_log_species <- lm(log_weight ~ log_length1 + species, data=data_species_log)
summary(lm_log_species)
check_model(lm_log_species)
```

[COMENTARLO CARLOS]
Hola Cristina!!! Ahí va mi comentario:
A la vista de los resultados del summary y de la comprobación de las asunciones del modelo con 'checkmodel', podemos decir que el modelo lineal propuesto es válido.
Encontramos una clara relación positiva entre la longitud y el peso en ambas especies.

[NOTA para Cristina: he hecho para probar el modelo lineal sin incluir el factor especie, así:
lm_log_species_only <- lm(log_weight ~ log_length1,  data=data_species_log)
Efectivamente sale peor, el R cuadrado sale 0.778, por lo que explica peor los datos.

```{r}

lm_log_species_i <- lm(log_weight ~ log_length1 * species, data=data_species_log)
summary(lm_log_species_i)
check_model(lm_log_species_i)

```


[COMENTARLO CARLOS]
A la vista del resultado del modelo con iteracción, podemos decir que efectivamente existe cierta interacción con la especie, lo que indica que la pendiente de la recta del modelo varía según la especie, aunque esta variación debe ser pequeña porque el R cuadrado tiene sólo una pequeña mejoría respecto al modelo anterior.

[¡¡Paper listo para envío a revista!!]