-
Notifications
You must be signed in to change notification settings - Fork 2
/
ggplot2.Rmd
1033 lines (656 loc) · 30.4 KB
/
ggplot2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Introduction to ggplot2"
author: "Thomas Carroll"
date: "MRC-CSC/Shared-bioinformatics-training"
output:
html_document:
toc: true # table of content true
toc_float: yes
depth: 3 # upto three depths of headings (specified by #, ## and ###)
number_sections: false ## if you want number sections at each table header
theme: united # many options for theme, this one is my favorite.
highlight: tango # specifies the syntax highlighting style
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Graphics in R
The R language has extensive graphical capabilities.
Graphics in R may be created by many different methods including base graphics and more advanced plotting packages such as lattice.
![](images/plotsinR.jpg)
The ggplot2 package was created by Hadley Wickham and provides a intuitive plotting system to rapidly generate publication quality graphics.
ggplot2 builds on the concept of the "Grammar of Graphics" (Wilkinson 2005, Bertin 1983) which describes a consistent syntax for the construction of a wide range of complex graphics by a concise description of their components.
# Why use ggplot2
The structured syntax and high level of abstraction used by ggplot2 should allow for the user to concentrate on the visualisations instead of creating the underlying code.
On top of this central philosophy ggplot2 has:
- Increased flexible over many plotting systems.
- An advanced theme system for professional/publication level graphics.
- Large developer base -- Many libraries extending its flexibility.
- Large user base -- Great documentation and active mailing list.
# Grammar of Graphics
## How to build a plot from its components.
![](images/Slide1.jpg)
## How ggplot2 builds a plot.
![](images/Slide2.jpg)
Overview of example code for the ggplot2 scatter plot.
```{r complex,eval=F}
ggplot(data = <default data set>,
aes(x = <default x axis variable>,
y = <default y axis variable>,
... <other default aesthetic mappings>),
... <other plot defaults>) +
geom_scatter(aes(size = <size variable for this geom>,
... <other aesthetic mappings>),
data = <data for this point geom>,
stat = <statistic string or function>,
position = <position string or function>,
color = <"fixed color specification">,
<other arguments, possibly passed to the _stat_ function) +
scale_<aesthetic>_<type>(name = <"scale label">,
breaks = <where to put tick marks>,
labels = <labels for tick marks>,
... <other options for the scale>) +
ggtitle("Graphics/Plot")+
xlab("Weight")+
ylab("Height")+
theme(plot.title = element_text(colour = "gray"),
... <other theme elements>)
```
## What users are required to specify in ggplot2 to build a plot.
![](images/Slide3.jpg)
Actual code for the ggplot2 scatter plot.
```{r simple,eval=F}
ggplot(data=patients_clean,
aes(y=Weight,x=Height,colour=Sex,
size=BMI,shape=Pet))
+geom_point()
```
# Getting started with ggplot2
Earlier we have been working with the patients' dataset to create a "clean and tidy" dataset.
Now we will use a cleaned dataset to demonstrate some of the plotting capabilities of ggplot2.
```{r load_packages, echo=FALSE, eval=TRUE,warning=F,message=F}
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(stringr))
suppressPackageStartupMessages(library(lubridate))
```
```{r present_clean}
library(tidyr)
library(ggplot2)
library(dplyr)
library(stringr)
library(lubridate)
patients_clean <- read.delim("patient-data-cleaned.txt",sep="\t")
```
## Our first ggplot2 graph
As seen above, in order to produce a ggplot2 graph we need a minimum of:-
- Data to be used in graph
- Mappings of data to the graph (aesthetic mapping)
- What type of graph we want to use (The geom to use).
In the code below we define the data as our cleaned patients data frame.
```{r ggplot_only}
pcPlot <- ggplot(data=patients_clean)
class(pcPlot)
pcPlot$data[1:4,]
```
Now we can see that we have gg/ggplot object (pcPlot) and in this the data has been defined.
Important information on how to map the data to the visual properties (aesthetics) of the plot as well as what type of plot to use (geom) have however yet to specified.
```{r missing_rest}
pcPlot$mapping
pcPlot$theme
pcPlot$layers
```
The information to map the data to the plot can be added now using the aes() function.
```{r ggplot_aes}
pcPlot <- ggplot(data=patients_clean)
pcPlot <- pcPlot+aes(x=Height,y=Weight)
pcPlot$mapping
pcPlot$theme
pcPlot$layers
```
But we are still missing the final component of our plot, the type of plot to use (geom).
Below the geom_point function is used to specify a point plot, a scatter plot of Height values on the x-axis versus Weight values on the y values.
```{r ggplot_aes_geom}
pcPlot <- ggplot(data=patients_clean)
pcPlot <- pcPlot+aes(x=Height,y=Weight)
pcPlot <- pcPlot+geom_point()
pcPlot
pcPlot$mapping
pcPlot$theme
pcPlot$layers
```
Now we have all the components of our plot, we need we can display the results.
```{r ggplot_aes_geom_display}
pcPlot
```
More typically, the data and aesthetics are defined within ggplot function and geoms applied afterwards.
```{r ggplot_simple_geom_point}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))
pcPlot+geom_point()
```
# Geoms - Plot types
As we have seen, an important element of a ggplot is the geom used. Following the specification of data, the geom describes the type of plot used.
Several geoms are available in ggplot2:-
* geom_point() - Scatter plots
* geom_line() - Line plots
* geom_smooth() - Fitted line plots
* geom_bar() - Bar plots
* geom_boxplot() - Boxplots
* geom_jitter() - Jitter to plots
* geom_histogram() - Histogram plots
* geom_density() - Density plots
* geom_text() - Text to plots
* geom_errorbar() - Errorbars to plots
* geom_violin() - Violin plots
## Geoms - Line plots
```{r, line_simple}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))
pcPlot_line <- pcPlot+geom_line()
pcPlot_line
```
```{r, smooth_simple}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))
pcPlot_smooth <- pcPlot+geom_smooth()
pcPlot_smooth
```
## Geoms - Bar and frequency plots
```{r, bar_simple}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex))
pcPlot_bar <- pcPlot+geom_bar()
pcPlot_bar
```
```{r, histogram_simple}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height))
pcPlot_hist <- pcPlot+geom_histogram()
pcPlot_hist
```
```{r, density_simple}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height))
pcPlot_density <- pcPlot+geom_density()
pcPlot_density
```
## Geoms - Box and violin plots
```{r, boxplot_simple}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex,y=Height))
pcPlot_boxplot <- pcPlot+geom_boxplot()
pcPlot_boxplot
```
```{r, violin_simple}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex,y=Height))
pcPlot_violin <- pcPlot+geom_violin()
pcPlot_violin
```
An overview of geoms and thier arguments can be found at ggplot2 documentation or within the ggplot2 cheatsheet.
-[ggplot2 documentation](http://docs.ggplot2.org/current/)
-[ggplot2 Cheatsheet](http://sape.inf.usi.ch/quick-reference/ggplot2/geom)
# Aesthetics
In order to change the property on an aesthetic of a plot into a constant value (e.g. set colour of all points to red) we can supply the colour argument to the geom_point() function.
```{r, scatter_coloured}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))
pcPlot+geom_point(colour="red")
```
As we discussed earlier however, ggplot2 makes use of aesthetic mappings to assign variables in the data to the properties/aesthetics of the plot. This allows the properties of the plot to reflect variables in the data dynamically.
In these examples we supply additional information to the aes() function to define what information to display and how it is represented in the plot.
First we can recreate the plot we saw earlier.
```{r, scatter_simple}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))
pcPlot+geom_point()
```
Now we can adjust the aes mapping by supplying an argument to the colour parameter in the aes function. (Note that ggplot2 accepts "color" or "colour" as parameter name)
This simple adjustment allows for identifaction of the separation between male and female measurements for height and weight.
```{r, scatter_aes_sexColour}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight,colour=Sex))
pcPlot+geom_point()
```
Similarly the shape of points may be adjusted.
```{r, scatter_aes_sexShape}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight,shape=Sex))
pcPlot+geom_point()
```
The aesthetic mappings may be set directly in the geom_points() function as previously when specifying red. This can allow the same ggplot object to be used by different aesethetic mappings and varying geoms
```{r, aes_in_geom}
pcPlot <- ggplot(data=patients_clean)
pcPlot+geom_point(aes(x=Height,y=Weight,colour=Sex))
pcPlot+geom_point(aes(x=Height,y=Weight,colour=Smokes))
pcPlot+geom_point(aes(x=Height,y=Weight,colour=Smokes,shape=Sex))
pcPlot+geom_violin(aes(x=Sex,y=Height,fill=Smokes))
```
Again, for a comprehensive list of parameters and aesthetic mappings used in geom_*type* functions see the ggplot2 documentation for individual geoms by using ?geom_*type*
```{r, helpforArguments}
?geom_point
```
or visit the ggplot2 documentations pages and cheatsheet
- [ggplot2 documentation](http://docs.ggplot2.org/current/)
- [Cheatsheet](http://sape.inf.usi.ch/quick-reference/ggplot2/geom)
# Facets
One very useful feature of ggplot is faceting.
This allows you to produce plots subset by variables in your data.
To facet our data into multiple plots we can use the *facet_wrap* or *facet_grid* function specifying the variable we split by.
The *facet_grid* function is well suited to splitting the data by two factors.
Here we can plot the data with the Smokes variable as rows and Sex variable as columns.
<div align="center">
facet_grid(Rows~Columns)
</div>
```{r, facet_grid_SmokesBySex}
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_grid(Smokes~Sex)
```
To split by one factor we can apply the facet_grid() function ommiting the variable before the "~"" to facet along columns in plot.
<div align="center">
facet_grid(~Columns)
</div>
```{r, facet_grid_BySex}
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_grid(~Sex)
```
To split along rows in plot, the variable is placed before the "~.".
<div align="center">
facet_grid(Rows~.)
</div>
```{r, facet_grid_SexBy}
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_grid(Sex~.)
```
The *facet_wrap()* function offers a less grid based structure but is well suited to faceting data by one variable.
For *facet_wrap()* we follow as similar syntax to *facet_grid()*
```{r, facet_Wrap_BySmokes}
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_wrap(~Smokes)
```
For more complex faceting both *facet_grid* and *facet_wrap* can accept combinations of variables.
Using *facet_wrap*
```{r, facet_wrap_smokesBySexandPet}
pcPlot <- ggplot(data=patients_clean,aes(x=Height,y=Weight,colour=Sex))+geom_point()
pcPlot + facet_wrap(~Pet+Smokes+Sex)
```
Or in a nice grid format using facet_grid() and the Smokes variable against a combination of Gender and Pet.
```{r, facet_grid_smokesBySexandPet}
pcPlot + facet_grid(Smokes~Sex+Pet)
```
# Change the plotting order in a boxplot
We will shortly discuss how to change various aspects of the plot layout and appearance. However, a common-asked question is how to change the order in which R plots a categorical variable. Consider the boxplot to compare weights of males and females:-
```{r}
ggplot(patients_clean, aes(x=Sex, y=Weight)) + geom_boxplot()
```
Here, R decides the order to arrange the boxes according to the `levels` of the categorical variable. By default this is the alphabetical order. i.e. Female before Male.
```{r}
patients_clean$Sex
```
Depending on the message we want the plot to convey, we might want control over the order of boxes. The `factor` functions allows us to explictly change the order of the levels.
```{r}
factor(patients_clean$Sex,levels=c("Male","Female"))
```
With the pipe syntax we just learnt about, we can change the variable on-the-fly and generate the plot
```{r}
patients_clean %>%
mutate(Sex = factor(Sex,levels=c("Male","Female"))) %>%
ggplot(aes(x=Sex, y=Weight)) + geom_boxplot()
```
# Exercise set 1
[Link_to_exercises](ggplot2-exercises1.html)
[Link_to_exercises_with_images](ggplot2-exercises1-images.html)
[Link_to_Rmarkdown template](ggplot2-exercises1.Rmd)
[Link_to_answers](ggplot2-solutions1.html)
# Scales
Scales and their legends have so far been handled using ggplot2 defaults.
ggplot2 offers functionality to have finer control over scales and legends using the *scale* methods.
Scale methods are divided into functions by combinations of
* the aesthetics they control.
* the type of data mapped to scale.
scale_**aesthetic**_**type**
Try typing in scale_ then *tab* to autocomplete. This will provide some examples of the scale functions available in ggplot2.
Although different *scale* functions accept some variety in their arguments, common arguments to scale functions include -
- name - The axis or legend title
- limits - Minimum and maximum of the scale
- breaks - Label/tick positions along an axis
- labels - Label names at each break
## Controlling the X and Y scale.
Both continous and discrete X/Y scales can be controlled in ggplot2 using the
scale_**(x/y)**_**(continous/discrete)**
In this example we control the continuous sale on the x-axis by providing a name, X-axis limits, the positions of breaks (ticks/labels) and the labels to place at breaks.
```{r, facet_grid_smokesBySex_scalex}
pcPlot +
geom_point() +
facet_grid(Smokes~Sex)+
scale_x_continuous(name="height ('cm')",
limits = c(100,200),
breaks=c(125,150,175),
labels=c("small","justright","tall"))
```
Similary control over discrete scales is shown below.
```{r, facet_grid_smokesBySex_scaleDisceteX}
pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height))
pcPlot +
geom_violin(aes(x=Sex,y=Height)) +
scale_x_discrete(labels=c("Women", "Men"))
```
Multiple X/Y scales can be combined to give full control of axis marks.
```{r, facet_grid_smokesBySex_scaleDisceteXContinuousY}
pcPlot <- ggplot(data=patients_clean,aes(x=Sex,y=Height,fill=Smokes))
pcPlot +
geom_violin(aes(x=Sex,y=Height)) +
scale_x_discrete(labels=c("Women", "Men"))+
scale_y_continuous(breaks=c(160,180),labels=c("Short", "Tall"))
```
## Controlling other scales.
When using fill,colour,linetype, shape, size or alpha aesthetic mappings the scales are automatically selected for you and the appropriate legends created.
```{r, facet_grid_height_weight}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=Sex))
pcPlot + geom_point(size=4)
```
In the above example the discrete colours for the Sex variable was selected by default.
### Manual discrete colour scale
Manual control of discrete variables can be performed using scale_*aes_Of_Interest*_**manual** with the *values* parameter.
Additionally in this example an updated name for the legend is provided.
```{r, facet_grid_height_weight_manualScale}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=Sex))
pcPlot + geom_point(size=4) +
scale_color_manual(values = c("Green","Purple"),
name="Gender")
```
### Colorbrewer for discrete colour scale
Here we have specified the colours to be used (hence the manual) but when the number of levels to a variable are high this may be impractical and often we would like ggplot2 to choose colours from a scale of our choice.
The brewer set of scale functions allow the user to make use of a range of palettes available from colorbrewer.
- **Diverging**
*BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn, Spectral*
- **Qualitative**
*Accent, Dark2, Paired, Pastel1, Pastel2, Set1, Set2, Set3*
- **Sequential**
*Blues, BuGn, BuPu, GnBu, Greens, Greys, Oranges, OrRd, PuBu, PuBuGn, PuRd, Purples, RdPu, Reds, YlGn, YlGnBu, YlOrBr, YlOrRd*
```{r, facet_grid_height_weight_brewerScale}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=Pet))
pcPlot + geom_point(size=4) +
scale_color_brewer(palette = "Set2")
```
For more details on palette sizes and styles visit the colorbrewer website and ggplot2 reference page.
- [colorbrewer](http://colorbrewer2.org/)
- [ggplot2 colour scales](http://docs.ggplot2.org/current/scale_brewer.html)
### Continuous colour scales
So far we have looked a qualitative scales but ggplot2 offers much functionality for continuous scales such as for size, alpha (transparancy), colour and fill.
- scale_alpha_continuous() - For Transparancy
- scale_size_continuous() - For control of size.
Both these functions accept the range of alpha/size to be used in plotting.
Below the range of alpha to be used in plot is limited to between 0.5 and 1
```{r, facet_grid_height_weight_BMIalpha}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,alpha=BMI))
pcPlot + geom_point(size=4) +
scale_alpha_continuous(range = c(0.5,1))
```
Below the range of sizes to be used in plot is limited to between 3 and 6
```{r, facet_grid_height_weight_BMIsize}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,size=BMI))
pcPlot + geom_point(alpha=0.8) +
scale_size_continuous(range = c(3,6))
```
The limits of the scale can also be controlled but it is important to note data outside of scale is removed from plot.
```{r, facet_grid_height_weight_BMIsizeLimits}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,size=BMI))
pcPlot + geom_point() + scale_size_continuous(range = c(3,6), limits = c(25,40))
```
What points of scale to be labeled and labels text can also be controlled.
```{r, facet_grid_height_weight_BMIsizewithBreaks}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,size=BMI))
pcPlot + geom_point() +
scale_size_continuous(range = c(3,6),
breaks=c(25,30),
labels=c("Good","Good but not 25"))
```
Control of colour/fill scales can be best achieved through the **gradient** subfunctions of scale.
- scale_(colour/fill)_*gradient* - 2 colour gradient (eg. low to high BMI)
- scale_(colour/fill)_*gradient2* - Diverging colour scale with a midpoint colour (e.g. Down, No Change, Up)
Both functions take a common set of arguments:-
- low - Colour for low end of gradient scale
- high - Colour for high end of gradient scale.
- na.value - Colour for any NA values.
An example using scale_colour_gradient below sets the low and high end colours to White and Red respectively
```{r, facet_grid_height_weight_BMIgradient}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=BMI))
pcPlot + geom_point(size=4,alpha=0.8) +
scale_colour_gradient(low = "White",high="Red")
```
Similarly we can use the scale_colour_gradient2 function which allows for the specification of a midpoint value and its associated colour.
```{r, facet_grid_height_weight_BMIgradient2}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=BMI))
pcPlot + geom_point(size=4,alpha=0.8) +
scale_colour_gradient2(low = "Blue",mid="Black", high="Red", midpoint = median(patients_clean$BMI))
```
As with previous continous scales, limits and custom labels in scale legend can be added.
```{r, facet_grid_height_weight_BMIgradient2plus}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=BMI))
pcPlot + geom_point(size=4,alpha=0.8) +
scale_colour_gradient2(low = "Blue",
mid="Black",
high="Red",
midpoint = median(patients_clean$BMI),
breaks=c(25,30),labels=c("Low","High"),
name="Body Mass Index")
```
Multiple scales may be combined to create high customisable plots and scales
```{r, facet_grid_smokesBySex_scaleDisceteXContinuouswY}
pcPlot <- ggplot(data=patients_clean,
aes(x=Height,y=Weight,colour=BMI,shape=Sex))
pcPlot + geom_point(size=4,alpha=0.8)+
scale_shape_discrete(name="Gender") +
scale_colour_gradient2(low = "Blue",mid="Black",high="Red",
midpoint = median(patients_clean$BMI),
breaks=c(25,30),labels=c("Low","High"),
name="Body Mass Index")
```
# Statistical transformations.
In ggplot2 many of the statistical transformations are performed without any direct specification e.g. geom_histogram() will use stat_bin() function to generate bin counts to be used in plot.
An example of statistical methods in ggplot2 which are very useful include the stat_smooth() and stat_summary() functions.
The stat_smooth() function can be used to fit a line to the data being displayed.
```{r, stat_smooth}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))
pcPlot+geom_point()+stat_smooth()
```
By default a "loess" smooth line is plotted by stat_smooth. Other methods available include lm, glm,gam,rlm.
```{r, stat_smoothlm}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))
pcPlot+geom_point()+stat_smooth(method="lm")
```
A useful feature of ggplot2 is that it uses previously defined grouping when performing smoothing.
If colour by Sex is an aesthetic mapping then two smooth lines are drawn, one for each sex.
```{r, stat_smoothlmgroups}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex))
pcPlot+geom_point()+stat_smooth(method="lm")
```
This behaviour can be overridden by specifying an aes within the stat_smooth() function and setting inherit.aes to FALSE.
```{r, stat_smoothlmgroupsOverridden}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height,colour=Sex))
pcPlot+geom_point()+stat_smooth(aes(x=Weight,y=Height),method="lm",inherit.aes = F)
```
Another useful method is stat_summary() which allows for a custom statistical function to be performed and then visualised.
The fun.y parameter specifies a function to apply to the y variables for every value of x.
In this example we use it to plot the quantiles of the Female and Male Height data
```{r, stat_summary}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Sex,y=Height))+geom_jitter()
pcPlot+stat_summary(fun.y=quantile,geom="point",colour="purple",size=8)
```
# Themes
Themes specify the details of data independent elements of the plot.
This includes titles, background colour, text fonts etc.
The graphs created so far have all used the default themes, `theme_grey()`,
but ggplot2 allows for the specification of theme used.
## Predefined themes
Predefined themes can be applied to a ggplot2 object using a family of functions theme_*style*()
In the example below the minimal theme is applied to the scatter plot seen earlier.
```{r, theme_minimal}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+theme_minimal()
```
Several predifined themes are available within ggplot2 including:
- theme_bw
- theme_classic
- theme_dark
- theme_gray
- theme_light
- theme_linedraw
- theme_minimal
Packages such as [ggthemes](https://github.com/jrnold/ggthemes) also contain many useful collections of predined theme_*style* functions.
## Creating your themes
As well as making use of predifened theme styles, ggplot2 allows for control over the attributes and elements within a plot through a collection of related functions and attributes.
**theme()** is the global function used to set attributes for the collections of elements/components making up the current plot.
Within the theme functions there are 4 general graphic elements which may be controlled:-
- rect
- line
- text
- title
and 5 groups of related elements:-
- axis
- legend
- strip
- panel (plot panel)
- plot (Global plot parameters)
These elements may be specified by the use of their appropriate element functions including:
- element_line()
- element_text()
- element_rect()
and additionally element_blank() to set an element to "blank"
A detailed description of controlling elements within a theme can be seen at the ggplot2 vignette and by typing *?theme* into the console.
- [ggplot2 themes](http://docs.ggplot2.org/dev/vignettes/themes.html)
To demonstrate customising a theme, in the example below we alter one element of theme. Here we will change the text colour for the plot.
```{r, theme_custom}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+theme(
text = element_text(colour="red"),
axis.text = element_text(colour="red")
)
```
- Note because we are changing a *text* element we use the *element_text()* function.
A detailed description of which elements are available and their associated element functions can be found by typing *?theme*.
If we wished to set the y-axis label to be at an angle we can adjust that as well.
```{r, theme_custom1}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+theme(
text = element_text(colour="red"),
axis.text = element_text(colour="red"),
axis.title.y = element_text(angle=0)
)
```
Finally we may wish to remove axis line, set the background of plot panels to be white and give the strips (title above facet) a cyan background colour.
```{r, theme_custom2}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+
geom_point()+
facet_grid(Sex~Smokes)
pcPlot+theme(
text = element_text(colour="red"),
axis.text = element_text(colour="red"),
axis.title.y = element_text(angle=0),
axis.line = element_line(linetype = 0),
panel.background=element_rect(fill="white"),
strip.background=element_rect(fill="cyan")
)
```
## + and %+replace%
When altering themes we have been using the **+** operator to add themes as we would adding geoms,scales and stats.
When using the **+** operator
- Themes elements specified in new scheme replace elements in old theme
- Theme elements in the old theme which have not been specified in new theme are maintained.
This makes the **+** operator useful for building up from old themes.
In the example below, we maintain all elements set by theme_bw() but overwrite the theme element attribute of the colour of text.
```{r, theme_custom8}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+
theme_bw()+
theme(text = element_text(colour="red"))
```
The consequence can be seen comparing the effect of theme() on a plot with a default theme or theme_minimal.
Since the default theme, theme_grey() contains a specification for axis.text colour, i will not replace it with "+" operator.
```{r, theme_customMiniVsBW,echo=T,eval=F,collapse=TRUE}
pcPlot+
theme(text = element_text(colour="red"))
pcPlot+
theme_minimal()+
theme(text = element_text(colour="red"))
```
```{r, theme_customMiniVsBWDuck,echo=F,eval=T,collapse=TRUE}
pcPlot+
theme(text = element_text(colour="red"))
pcPlot+
theme_minimal()+
theme(text = element_text(colour="red"))
```
In contrast **%+replace%** replaces all elements within a theme regardless of whether they have been previously specfied in old theme.
When using the **%+replace%** operator
- Theme elements specified in new scheme replace elements in old theme
- Theme elements in the old theme which have not been specified in new theme are also replaced by blank theme elements.
```{r, theme_custom84}
oldTheme <- theme_bw()
newTheme_Plus <- theme_bw() +
theme(text = element_text(colour="red"))
newTheme_Replace <- theme_bw() %+replace%
theme(text = element_text(colour="red"))
oldTheme$text
newTheme_Plus$text
newTheme_Replace$text
```
This means that %+replace% is most useful when creating new themes.
theme_get and theme_set
# Adding titles for plot and labels.
So far no plot titles have been specified. Plot titles can be specified using the labs functions.
```{r, theme_labs}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Weight,y=Height))+geom_point()
pcPlot+labs(title="Weight vs Height",y="Height (cm)")
```
or specified using the ggtitle and xlab/ylab functions.
```{r, theme_ggtitle}
pcPlot <- ggplot(data=patients_clean,
mapping=aes(x=Height,y=Weight))+geom_point()
pcPlot+ggtitle("Weight vs Height")+ylab("Height (cm)")
```
## Saving plots
Plots produced by ggplot can be saved from the interactive viewer as with standard plots.
The ggsave() function allows for additional arguments to be specified including the type, resolution and size of plot.
By default ggsave() will use the size of your current graphics window when saving plots so it may be important to specify width and height arguments desired.
```{r, ggsaving, eval=FALSE}