Skip to content

Commit

Permalink
Merge pull request #55 from jakobdanel/results/methods
Browse files Browse the repository at this point in the history
Write subsection about analysis of distributions
  • Loading branch information
jakobdanel authored Jan 16, 2024
2 parents f1c5ead + 3a5abaa commit ac3e1d0
Show file tree
Hide file tree
Showing 7 changed files with 222 additions and 123 deletions.
6 changes: 3 additions & 3 deletions results/_freeze/report/execute-results/html.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions results/_freeze/report/execute-results/tex.json

Large diffs are not rendered by default.

Binary file modified results/_freeze/report/figure-pdf/fig-patches-nrw-1.pdf
Binary file not shown.
278 changes: 163 additions & 115 deletions results/appendix/package-docs/docs.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,81 @@ lfa_check_flag(flag_name)



### `lfa_create_stacked_distributions_plot`

Create a stacked distribution plot for tree detections, visualizing the distribution
of a specified variable on the x-axis, differentiated by another variable.


#### Arguments

Argument |Description
------------- |----------------
`trees` | A data frame containing tree detection data.
`x_value` | A character string specifying the column name used for finding the values on the x-axis of the histogram.
`fill_value` | A character string specifying the column name by which the data are differentiated in the plot.
`bin` | An integer specifying the number of bins for the histogram. Default is 100.
`ylab` | A character string specifying the y-axis label. Default is "Amount trees."
`xlim` | A numeric vector of length 2 specifying the x-axis limits. Default is c(0, 100).
`ylim` | A numeric vector of length 2 specifying the y-axis limits. Default is c(0, 1000).
`title` | The title of the plot.


#### Description

This function generates a stacked distribution plot using the ggplot2 package,
providing a visual representation of the distribution of a specified variable
( `x_value` ) on the x-axis, with differentiation based on another variable
( `fill_value` ). The data for the plot are derived from the provided `trees`
data frame.


#### Keyword

data


#### Seealso

[`ggplot2::geom_histogram`](#ggplot2::geomhistogram) , [`ggplot2::facet_wrap`](#ggplot2::facetwrap) ,
[`ggplot2::ylab`](#ggplot2::ylab) , [`ggplot2::scale_fill_brewer`](#ggplot2::scalefillbrewer) ,
[`ggplot2::coord_cartesian`](#ggplot2::coordcartesian)


#### Value

A ggplot object representing the stacked distribution plot.


#### Examples

```{r}
#| eval: false
# Create a stacked distribution plot for variable "Z," differentiated by "area"
trees <- lfa_get_detections()
lfa_create_stacked_distributions_plot(trees, "Z", "area")
```


#### Usage

```{r}
#| eval: false
lfa_create_stacked_distributions_plot(
trees,
x_value,
fill_value,
bin = 100,
ylab = "Amount trees",
xlim = c(0, 100),
ylim = c(0, 1000),
title =
"Histograms of height distributions between species 'beech', 'oak', 'pine' and 'spruce' divided by the different areas of Interest"
)
```



### `lfa_create_tile_location_objects`

Create tile location objects
Expand Down Expand Up @@ -318,6 +393,53 @@ lfa_download(species, name, location)



### `lfa_get_all_areas`

Retrieve a data frame containing all species and corresponding areas.


#### Description

This function scans the "data" directory within the current working directory to
obtain a list of species. It then iterates through each species to retrieve the list
of areas associated with that species. The resulting data frame contains two columns:
"specie" representing the species and "area" representing the corresponding area.


#### Keyword

data


#### Seealso

[`list.dirs`](#list.dirs)


#### Value

A data frame with columns "specie" and "area" containing information about
all species and their associated areas.


#### Examples

```{r}
#| eval: false
# Retrieve a data frame with information about all species and areas
all_areas_df <- lfa_get_all_areas()
```


#### Usage

```{r}
#| eval: false
lfa_get_all_areas()
```



### `lfa_get_detection_area`

Get Detection for an area
Expand Down Expand Up @@ -436,64 +558,6 @@ lfa_get_detections_species(species)



### `lfa_get_detections`

Retrieve aggregated detection data for multiple species.


#### Concept

data retrieval functions


#### Description

This function obtains aggregated detection data for multiple species by iterating
through the list of species obtained from [`lfa_get_species`](#lfagetspecies) . For each
species, it calls [`lfa_get_detections_species`](#lfagetdetectionsspecies) to retrieve the
corresponding detection data and aggregates the results into a single data frame.
The resulting data frame includes columns for the species, tree detection data,
and the area in which the detections occurred.


#### Keyword

aggregation


#### Seealso

[`lfa_get_species`](#lfagetspecies) , [`lfa_get_detections_species`](#lfagetdetectionsspecies)

Other data retrieval functions:
[`lfa_get_species`](#lfagetspecies)


#### Value

A data frame containing aggregated detection data for multiple species.


#### Examples

```{r}
#| eval: false
lfa_get_detections()
# Retrieve aggregated detection data for multiple species
detections_data <- lfa_get_detections()
```


#### Usage

```{r}
#| eval: false
lfa_get_detections()
```



### `lfa_get_flag_path`

Get the path to a flag file indicating the completion of a specific process.
Expand Down Expand Up @@ -534,63 +598,6 @@ lfa_get_flag_path(flag_name)



### `lfa_get_species`

Get a list of species from the data directory.


#### Concept

data retrieval functions


#### Description

This function retrieves a list of species by scanning the "data" directory
located in the current working directory.


#### Keyword

data


#### References

This function relies on the [`list.dirs`](#list.dirs) function for directory listing.


#### Seealso

[`list.dirs`](#list.dirs)

Other data retrieval functions:
[`lfa_get_detections`](#lfagetdetections)


#### Value

A character vector containing the names of species found in the "data" directory.


#### Examples

```{r}
#| eval: false
# Retrieve the list of species
species_list <- lfa_get_species()
```


#### Usage

```{r}
#| eval: false
lfa_get_species()
```



### `lfa_ground_correction`

Correct the point clouds for correct ground imagery
Expand Down Expand Up @@ -1145,6 +1152,47 @@ lfa_rd_to_results()



### `lfa_read_area_as_catalog`

Read LiDAR data from a specified species and location as a catalog.


#### Arguments

Argument |Description
------------- |----------------
`specie` | A character string specifying the species of interest.
`location_name` | A character string specifying the name of the location.


#### Description

This function constructs the file path based on the specified `specie` and `location_name` ,
lists the directories at that path, and reads the LiDAR data into a `lidR::LAScatalog` .


#### Value

A `lidR::LAScatalog` object containing the LiDAR data from the specified location and species.


#### Examples

```{r}
#| eval: false
lfa_read_area_as_catalog("beech", "location1")
```


#### Usage

```{r}
#| eval: false
lfa_read_area_as_catalog(specie, location_name)
```



### `lfa_segmentation`

Segment the elements of an point cloud by trees
Expand Down
20 changes: 20 additions & 0 deletions results/methods/distribution-analysis.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
## Analysis of different distributions

Analysis of data distributions is a critical aspect of our research, with a focus on comparing two or more distributions. Our objective extends beyond evaluating the disparities between species; we also aim to assess differences within a species. To gain a comprehensive understanding of the data, we employ various visualization techniques, including histograms, density functions, and box plots.

In tandem with visualizations, descriptive statistics, such as means, standard errors, and quantiles, are leveraged to provide key insights into the central tendency and variability of the data.

For a more quantitative analysis of distribution dissimilarity, statistical tests are employed. The Kullback-Leibler (KL) difference serves as a measure to compare the similarity of a set of distributions. This involves converting distributions into their density functions, with the standard error serving as the bandwidth. The KL difference is calculated for each pair of distributions, as it is asymmetric. For the two distributions the KL difference is defined as following [@kullback1951kullback]:

$$
D_{KL}(P \, \| \, Q) = \sum_i P(i) \log\left(\frac{P(i)}{Q(i)}\right)
$$

To obtain a symmetric score, the Jensen-Shannon Divergence (JSD) is utilized [@grosse2002analysis], expressed by the formula:

$$
JS(P || Q) = \frac{1}{2} * KL(P || M) + \frac{1}{2} * KL(Q || M)
$$
Here, $M = \frac{1}{2} * (P + Q)$. The JSD provides a balanced measure of dissimilarity between distributions [@Brownlee2019Calculate]. For comparing the different scores to each other, we will use averages.

Additionally, the Kolmogorov-Smirnov Test is implemented to assess whether two distributions significantly differ from each other. This statistical test offers a formal evaluation of the dissimilarity between empirical distribution functions.
17 changes: 16 additions & 1 deletion results/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,19 @@ @article{popescu2004
doi = {10.14358/PERS.70.5.589}
}
@misc{Blickensdoerfer2022, title={Dominant tree species for Germany (2017/2018)}, url={https://atlas.thuenen.de/layers/geonode:Dominant_Species_Class}, journal={Waldatlas- Wald und Waldnutzung}, publisher={Thünen Atlas}, author={Blickensdoerfer, Lukas}, year={2022}, month={Dec}}

@misc{kullback1951kullback,
title={Kullback-leibler divergence},
author={Kullback, Solomon},
year={1951}
}
@article{grosse2002analysis,
title={Analysis of symbolic sequences using the Jensen-Shannon divergence},
author={Grosse, Ivo and Bernaola-Galv{\'a}n, Pedro and Carpena, Pedro and Rom{\'a}n-Rold{\'a}n, Ram{\'o}n and Oliver, Jose and Stanley, H Eugene},
journal={Physical Review E},
volume={65},
number={4},
pages={041905},
year={2002},
publisher={APS}
}
@misc{Brownlee2019Calculate, title={How to calculate the KL divergence for Machine Learning}, url={https://machinelearningmastery.com/divergence-between-probability-distributions/}, journal={MachineLearningMastery.com}, author={Brownlee, Jason}, year={2019}, month={Oct}}
20 changes: 18 additions & 2 deletions results/report.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,21 @@ toc-title: Contents
number-sections: true
number-depth: 3
date: today
author: Jakob Danel and Frederick Bruch
author:
- name: Jakob Danel
email: [email protected]
url: https://github.com/jakobdanel
affiliations:
- name: Universität Münster
city: Münster
country: Germany
- name: Federick Bruch
email: [email protected]
url: https://www.uni-muenster.de/Geoinformatics/institute/staff/index.php/351/Frederick_Bruch
affiliations:
- name: Universität Münster
city: Münster
country: Germany
bibliography: references.bib
execute-dir: ..
prefer-html: true
Expand All @@ -23,7 +37,7 @@ This report documents the analysis of forest data for different tree species.

{{< include methods/data-aquisition.qmd >}}
{{< include methods/preprocessing.qmd >}}

{{< include methods/distribution-analysis.qmd >}}
# Results
{{< include results/researched-areas.qmd >}}

Expand All @@ -43,6 +57,8 @@ This report documents the analysis of forest data for different tree species.
|spruce |oberhundem | 0.0162678|
|spruce |osterwald | 0.0129892|



# References

::: {#refs}
Expand Down

0 comments on commit ac3e1d0

Please sign in to comment.