Merge pull request #55 from jakobdanel/results/methods

Write subsection about analysis of distributions
jakobdanel · Jan 16, 2024 · ac3e1d0 · ac3e1d0
2 parents f1c5ead + 3a5abaa
commit ac3e1d0
Show file tree

Hide file tree

Showing 7 changed files with 222 additions and 123 deletions.
diff --git a/results/_freeze/report/execute-results/html.json b/results/_freeze/report/execute-results/html.json
diff --git a/results/_freeze/report/execute-results/tex.json b/results/_freeze/report/execute-results/tex.json
diff --git a/results/_freeze/report/figure-pdf/fig-patches-nrw-1.pdf b/results/_freeze/report/figure-pdf/fig-patches-nrw-1.pdf
diff --git a/results/appendix/package-docs/docs.qmd b/results/appendix/package-docs/docs.qmd
@@ -126,6 +126,81 @@ lfa_check_flag(flag_name)
 
 
 
+### `lfa_create_stacked_distributions_plot`
+
+Create a stacked distribution plot for tree detections, visualizing the distribution
+ of a specified variable on the x-axis, differentiated by another variable.
+
+
+#### Arguments
+
+Argument      |Description
+------------- |----------------
+`trees`     |     A data frame containing tree detection data.
+`x_value`     |     A character string specifying the column name used for finding the values on the x-axis of the histogram.
+`fill_value`     |     A character string specifying the column name by which the data are differentiated in the plot.
+`bin`     |     An integer specifying the number of bins for the histogram. Default is 100.
+`ylab`     |     A character string specifying the y-axis label. Default is "Amount trees."
+`xlim`     |     A numeric vector of length 2 specifying the x-axis limits. Default is c(0, 100).
+`ylim`     |     A numeric vector of length 2 specifying the y-axis limits. Default is c(0, 1000).
+`title`     |     The title of the plot.
+
+
+#### Description
+
+This function generates a stacked distribution plot using the ggplot2 package,
+ providing a visual representation of the distribution of a specified variable
+ ( `x_value` ) on the x-axis, with differentiation based on another variable
+ ( `fill_value` ). The data for the plot are derived from the provided `trees` 
+ data frame.
+
+
+#### Keyword
+
+data
+
+
+#### Seealso
+
+[`ggplot2::geom_histogram`](#ggplot2::geomhistogram) , [`ggplot2::facet_wrap`](#ggplot2::facetwrap) ,
+ [`ggplot2::ylab`](#ggplot2::ylab) , [`ggplot2::scale_fill_brewer`](#ggplot2::scalefillbrewer) ,
+ [`ggplot2::coord_cartesian`](#ggplot2::coordcartesian)
+
+
+#### Value
+
+A ggplot object representing the stacked distribution plot.
+
+
+#### Examples
+
+```{r}
+#| eval: false
+# Create a stacked distribution plot for variable "Z," differentiated by "area"
+trees <- lfa_get_detections()
+lfa_create_stacked_distributions_plot(trees, "Z", "area")
+```
+
+
+#### Usage
+
+```{r}
+#| eval: false
+lfa_create_stacked_distributions_plot(
+  trees,
+  x_value,
+  fill_value,
+  bin = 100,
+  ylab = "Amount trees",
+  xlim = c(0, 100),
+  ylim = c(0, 1000),
+  title =
+    "Histograms of height distributions between species 'beech', 'oak', 'pine' and 'spruce' divided by the different areas of Interest"
+)
+```
+
+
+
 ### `lfa_create_tile_location_objects`
 
 Create tile location objects
@@ -318,6 +393,53 @@ lfa_download(species, name, location)
 
 
 
+### `lfa_get_all_areas`
+
+Retrieve a data frame containing all species and corresponding areas.
+
+
+#### Description
+
+This function scans the "data" directory within the current working directory to
+ obtain a list of species. It then iterates through each species to retrieve the list
+ of areas associated with that species. The resulting data frame contains two columns:
+ "specie" representing the species and "area" representing the corresponding area.
+
+
+#### Keyword
+
+data
+
+
+#### Seealso
+
+[`list.dirs`](#list.dirs)
+
+
+#### Value
+
+A data frame with columns "specie" and "area" containing information about
+ all species and their associated areas.
+
+
+#### Examples
+
+```{r}
+#| eval: false
+# Retrieve a data frame with information about all species and areas
+all_areas_df <- lfa_get_all_areas()
+```
+
+
+#### Usage
+
+```{r}
+#| eval: false
+lfa_get_all_areas()
+```
+
+
+
 ### `lfa_get_detection_area`
 
 Get Detection for an area
@@ -436,64 +558,6 @@ lfa_get_detections_species(species)
 
 
 
-### `lfa_get_detections`
-
-Retrieve aggregated detection data for multiple species.
-
-
-#### Concept
-
-data retrieval functions
-
-
-#### Description
-
-This function obtains aggregated detection data for multiple species by iterating
- through the list of species obtained from [`lfa_get_species`](#lfagetspecies) . For each
- species, it calls [`lfa_get_detections_species`](#lfagetdetectionsspecies) to retrieve the
- corresponding detection data and aggregates the results into a single data frame.
- The resulting data frame includes columns for the species, tree detection data,
- and the area in which the detections occurred.
-
-
-#### Keyword
-
-aggregation
-
-
-#### Seealso
-
-[`lfa_get_species`](#lfagetspecies) , [`lfa_get_detections_species`](#lfagetdetectionsspecies) 
-
- Other data retrieval functions:
- [`lfa_get_species`](#lfagetspecies)
-
-
-#### Value
-
-A data frame containing aggregated detection data for multiple species.
-
-
-#### Examples
-
-```{r}
-#| eval: false
-lfa_get_detections()
-
-# Retrieve aggregated detection data for multiple species
-detections_data <- lfa_get_detections()
-```
-
-
-#### Usage
-
-```{r}
-#| eval: false
-lfa_get_detections()
-```
-
-
-
 ### `lfa_get_flag_path`
 
 Get the path to a flag file indicating the completion of a specific process.
@@ -534,63 +598,6 @@ lfa_get_flag_path(flag_name)
 
 
 
-### `lfa_get_species`
-
-Get a list of species from the data directory.
-
-
-#### Concept
-
-data retrieval functions
-
-
-#### Description
-
-This function retrieves a list of species by scanning the "data" directory
- located in the current working directory.
-
-
-#### Keyword
-
-data
-
-
-#### References
-
-This function relies on the [`list.dirs`](#list.dirs) function for directory listing.
-
-
-#### Seealso
-
-[`list.dirs`](#list.dirs) 
-
- Other data retrieval functions:
- [`lfa_get_detections`](#lfagetdetections)
-
-
-#### Value
-
-A character vector containing the names of species found in the "data" directory.
-
-
-#### Examples
-
-```{r}
-#| eval: false
-# Retrieve the list of species
-species_list <- lfa_get_species()
-```
-
-
-#### Usage
-
-```{r}
-#| eval: false
-lfa_get_species()
-```
-
-
-
 ### `lfa_ground_correction`
 
 Correct the point clouds for correct ground imagery
@@ -1145,6 +1152,47 @@ lfa_rd_to_results()
 
 
 
+### `lfa_read_area_as_catalog`
+
+Read LiDAR data from a specified species and location as a catalog.
+
+
+#### Arguments
+
+Argument      |Description
+------------- |----------------
+`specie`     |     A character string specifying the species of interest.
+`location_name`     |     A character string specifying the name of the location.
+
+
+#### Description
+
+This function constructs the file path based on the specified `specie` and `location_name` ,
+ lists the directories at that path, and reads the LiDAR data into a `lidR::LAScatalog` .
+
+
+#### Value
+
+A `lidR::LAScatalog` object containing the LiDAR data from the specified location and species.
+
+
+#### Examples
+
+```{r}
+#| eval: false
+lfa_read_area_as_catalog("beech", "location1")
+```
+
+
+#### Usage
+
+```{r}
+#| eval: false
+lfa_read_area_as_catalog(specie, location_name)
+```
+
+
+
 ### `lfa_segmentation`
 
 Segment the elements of an point cloud by trees

diff --git a/results/methods/distribution-analysis.qmd b/results/methods/distribution-analysis.qmd
@@ -0,0 +1,20 @@
+## Analysis of different distributions
+
+Analysis of data distributions is a critical aspect of our research, with a focus on comparing two or more distributions. Our objective extends beyond evaluating the disparities between species; we also aim to assess differences within a species. To gain a comprehensive understanding of the data, we employ various visualization techniques, including histograms, density functions, and box plots.
+
+In tandem with visualizations, descriptive statistics, such as means, standard errors, and quantiles, are leveraged to provide key insights into the central tendency and variability of the data.
+
+For a more quantitative analysis of distribution dissimilarity, statistical tests are employed. The Kullback-Leibler (KL) difference serves as a measure to compare the similarity of a set of distributions. This involves converting distributions into their density functions, with the standard error serving as the bandwidth. The KL difference is calculated for each pair of distributions, as it is asymmetric. For the two distributions the KL difference is defined as following [@kullback1951kullback]:
+
+$$
+D_{KL}(P \, \| \, Q) = \sum_i P(i) \log\left(\frac{P(i)}{Q(i)}\right)
+$$
+
+To obtain a symmetric score, the Jensen-Shannon Divergence (JSD) is utilized [@grosse2002analysis], expressed by the formula:
+
+$$
+JS(P || Q) = \frac{1}{2} * KL(P || M) + \frac{1}{2} * KL(Q || M)
+$$
+Here, $M = \frac{1}{2} * (P + Q)$. The JSD provides a balanced measure of dissimilarity between distributions [@Brownlee2019Calculate]. For comparing the different scores to each other, we will use averages.
+
+Additionally, the Kolmogorov-Smirnov Test is implemented to assess whether two distributions significantly differ from each other. This statistical test offers a formal evaluation of the dissimilarity between empirical distribution functions.
diff --git a/results/references.bib b/results/references.bib
@@ -34,4 +34,19 @@ @article{popescu2004
 doi = {10.14358/PERS.70.5.589}
 }
 @misc{Blickensdoerfer2022, title={Dominant tree species for Germany (2017/2018)}, url={https://atlas.thuenen.de/layers/geonode:Dominant_Species_Class}, journal={Waldatlas- Wald und Waldnutzung}, publisher={Thünen Atlas}, author={Blickensdoerfer, Lukas}, year={2022}, month={Dec}}
-
+@misc{kullback1951kullback,
+  title={Kullback-leibler divergence},
+  author={Kullback, Solomon},
+  year={1951}
+}
+@article{grosse2002analysis,
+  title={Analysis of symbolic sequences using the Jensen-Shannon divergence},
+  author={Grosse, Ivo and Bernaola-Galv{\'a}n, Pedro and Carpena, Pedro and Rom{\'a}n-Rold{\'a}n, Ram{\'o}n and Oliver, Jose and Stanley, H Eugene},
+  journal={Physical Review E},
+  volume={65},
+  number={4},
+  pages={041905},
+  year={2002},
+  publisher={APS}
+}
+@misc{Brownlee2019Calculate, title={How to calculate the KL divergence for Machine Learning}, url={https://machinelearningmastery.com/divergence-between-probability-distributions/}, journal={MachineLearningMastery.com}, author={Brownlee, Jason}, year={2019}, month={Oct}}
diff --git a/results/report.qmd b/results/report.qmd
@@ -9,7 +9,21 @@ toc-title: Contents
 number-sections: true
 number-depth: 3
 date: today
-author: Jakob Danel and Frederick Bruch
+author:
+    - name: Jakob Danel
+      email: [email protected]
+      url: https://github.com/jakobdanel
+      affiliations:
+      - name: Universität Münster
+        city: Münster
+        country: Germany
+    - name: Federick Bruch
+      email: [email protected]
+      url: https://www.uni-muenster.de/Geoinformatics/institute/staff/index.php/351/Frederick_Bruch
+      affiliations:
+      - name: Universität Münster
+        city: Münster
+        country: Germany
 bibliography: references.bib
 execute-dir: .. 
 prefer-html: true
@@ -23,7 +37,7 @@ This report documents the analysis of forest data for different tree species.
 
 {{< include methods/data-aquisition.qmd >}}
 {{< include methods/preprocessing.qmd >}}
-
+{{< include methods/distribution-analysis.qmd >}}
 # Results
 {{< include results/researched-areas.qmd >}}
 
@@ -43,6 +57,8 @@ This report documents the analysis of forest data for different tree species.
 |spruce |oberhundem          | 0.0162678|
 |spruce |osterwald           | 0.0129892|
 
+
+
 # References
 
 ::: {#refs}