It contains a lot of formatting options displayed in a WHAT YOU CODE IS WHAT YOU GET (WYCIWYG) fashion.
Two elements are key in your learning process:
That’s all you need to make progress.
Don’t forget: it’s all about storytelling. Always ask yourself: does my graph convey what I have in mind?
The gg in ggplot means grammar of graphics. It was coined by Leland Wilkinson (see book) and coded in depth by Hadley Wickham, the creator of ggplot.
The power of ggplot is that it decomposes the way graphs are built into simple elements that characterize the graph.
Detailed intros:
https://metricsf20.classes.ryansafner.com/slides/1.3-slides#1
https://github.com/thomasp85/ggplot2_workshop/blob/master/presentation.pdf
https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/
There are currently dozens of extensions of ggplot:
https://exts.ggplot2.tidyverse.org
See also the cool list: https://www.r-graph-gallery.com/index.html
You should also download the ggplot cheatsheet!
https://posit.co/resources/cheatsheets/.
+ https://www.bigbookofr.com/data-visualization.html#ggplot2-elegant-graphics-for-data-analysis
Lots of resources online (type “chart chooser” on Google).
Or simply ask the packages:
The ggplot cheat sheet is a great place to find inspiration.
Thanks again Allison Horst!
Plots are most of the time 2D objects.
The syntax is strange at first. The aes() wrapper is key: get used to it!
diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point()
Two blocks:
- the main function where the x and y axis are defined and
- the graph type.
In aes(…), arguments are column names!
You can add many ‘layers’ and/or options (font size, axis limits, axis scale, etc.).
Have a look at: https://ggplot2-book.org
Have a look at: https://evamaerey.github.io/ggplot2_grammar_guide/about
ggplot works even in empty mode (with no specified geom)!
gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp))
Many formatting options exist: that’s the power of tidy data (args = variables).
diamonds %>% ggplot(aes(x = carat, y = price, color = clarity, size = cut)) + geom_point(alpha = 0.3) # alpha : transparence. 1 = hard color, 0 = no color
Many ‘geoms’ are available. For simple bars: don’t specify the y axis, the bars display n(). The x variable is categorical here.
\(\rightarrow\) fill = inside color; color = outside color of bar.
diamonds %>% ggplot(aes(x = clarity, fill = cut)) + geom_bar() + theme_minimal()
Let’s add other formatting options (ugly!).
Be careful: the symbol is ‘+’, not the pipe!
But like the pipe is comes at the end of a line!
diamonds %>% ggplot(aes(x = carat, y = price, color = clarity, shape = cut)) + geom_point(size = 1.8) + xlim(0.3,1.53) + ylim(500, 4000) + labs(title = "Plot of diamonds", x = "Size of the diamond", y = "Price of the diamond", caption = "based on the diams DB") + theme(text = element_text(size = 14), axis.text.x = element_text(angle = 70, size = 16, hjust = 1, color = "red"), axis.text.y = element_text(angle = 90, size = 13, hjust = 1, color = "blue") )
Source: Steve Wexler. The Big Book of Dashboards
RGB codes: https://htmlcolorcodes.com /// Don’t forget the # before the color code !
diamonds %>% ggplot(aes(x = carat, y = price, color = clarity)) + geom_point(size = 1.5) + theme_minimal() + theme(text = element_text(size = 14, color = "#ED18ED"), # Pink axis.text.y = element_text(angle = 90, color = "#33B8FF")) # Light blue
diamonds %>% ggplot(aes(x = color, fill = cut)) + geom_bar() + theme_minimal() + scale_fill_manual(values = c("#FF4536", "#FDB42C", "#61BC4D", "#5195D8", "#B064C2"))
Colors can impress, have a look: https://blog.datawrapper.de/beautifulcolors/
diamonds %>% ggplot(aes(x = carat, y = price, color = clarity)) + geom_point() + scale_color_brewer(palette = "Spectral") + theme_minimal()
Scales can be used for other purposes: https://ggplot2tor.com/scales/
diamonds %>% ggplot(aes(x = color, fill = clarity)) + geom_bar() + scale_fill_brewer(palette = "RdBu") + theme_minimal() # RdBu => Red to Blue
Other palettes: https://awesomeopensource.com/project/EmilHvitfeldt/r-color-palettes
https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html
diamonds |> ggplot(aes(x = carat, y = price, color = clarity)) + geom_point() + scale_color_viridis_d(option = "magma")
gapminder |> filter(year == 2007, continent == "Europe") |> mutate(fill_col = country == "France") |> ggplot(aes(x = gdpPercap, y = reorder(country, gdpPercap), fill = fill_col)) + geom_col() + theme_classic() + theme(axis.title.y = element_blank(), legend.position = "none") + scale_fill_manual(values = c("#CCCCCC", "#11BB44"))
Annotations can help! vjust = 0 means on top of the point.
gapminder %>% filter(continent == "Americas", year == 2007) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point() + geom_text(aes(label = country), vjust = 0, nudge_y = 0.5) # nudge increases the offset
ggrepel to improve the location of labels :)
library(ggrepel) # Don't forget to activate this new package! gapminder %>% filter(continent == "Americas", year == 2007) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point() + theme_classic() + geom_text_repel(aes(label = country), vjust = 0, nudge_y = 0.5) # nudge increases the offset
Annotate anything…
gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point() + annotate("rect", xmin = 90000, xmax = 120000, ymin = 52, ymax = 70, alpha = 0.3, fill = "blue") + annotate("text", x = 105000, y = 47, label = "This is Kuwait") + theme_minimal()
Layout features are awesome. Here: facets to see impacts. (works with categories only)
diamonds %>% ggplot(aes(x = carat)) + geom_histogram() + facet_grid(rows = vars(color), cols = vars(cut))
facets on gapminder! Too bad the output is not dynamic! (more on that later)
gapminder %>% filter(year == 2007) %>% ggplot(aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + geom_point() + facet_grid(rows = vars(continent), scales = "free")
facet_wrap uses only 1 dimension and the result may not be “rectangular”.
gapminder %>% filter(year == 2007) %>% ggplot(aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + geom_point() + facet_wrap(vars(continent), ncol = 2) + theme_minimal()
gapminder %>% filter(year == 2007) %>% ggplot(aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + scale_x_continuous(breaks = seq(0, 50000, by = 5000)) + geom_point() + facet_grid(continent ~ . ) + theme_minimal()
Violins gives a visual representation of distributions (y axis) through categories (x axis).
diamonds %>% ggplot(aes(x = clarity, y = price, fill = clarity)) + geom_violin() + ylim(300, 8000)
Same spirit as violins (for small data), but the syntax is a bit strange (binaxis).
gapminder %>% filter(year == 2007) %>% ggplot(aes(x = continent, y = lifeExp, fill = continent)) + geom_dotplot(binaxis = "y")
More stats-orientated: shows quartiles and outliers (±1.5 IQR).
diamonds %>% ggplot(aes(x = clarity, y = carat, fill = clarity)) + geom_boxplot()
Fuzzy points with small random variations in their positions.
diamonds %>% ggplot(aes(x = clarity, y = carat, fill = clarity)) + geom_jitter(size = 0.3) + geom_boxplot(alpha = 0.7) + theme_minimal() # Adding a boxplot layer for fun.
diamonds %>% group_by(clarity, cut) %>% summarise(avg_carat = mean(carat), avg_price = mean(price)) %>% ggplot(aes(x = avg_carat, y = avg_price, color = clarity, shape = cut)) + geom_point(size = 5)
geom_col() is more flexible than geom_bar(). Remember: fill = inside color.
gapminder %>% group_by(continent) %>% summarise(avg_pop = mean(pop)) %>% ggplot(aes(x = continent, y = avg_pop)) + geom_col(fill = "#427590", alpha = 0.5) + theme_classic()
reorder and labs (labels):
gapminder %>% group_by(continent) %>% summarise(total_pop = sum(pop)) %>% ggplot(aes(y = reorder(continent, total_pop), x = total_pop)) + geom_col(fill = "#4275F0", alpha = 0.5) + xlab("Average population") + ylab("") + theme_classic() # Beware of the flip!
gapminder %>% group_by(continent, year) %>% summarise(avg_lifeexp = mean(lifeExp)) %>% ggplot(aes(x = year, y = avg_lifeexp, color = continent)) + geom_line() + geom_point() + theme_classic()
gapminder %>% group_by(continent, year) %>% summarise(total_pop = sum(pop)) %>% ggplot(aes(x = year, y = total_pop, fill = continent)) + geom_area() + theme_classic()
plotly! See: https://plot.ly/r/, best option for 3D plots. Below: mind the label option!
library(plotly) # Important: don't forget! g <- gapminder %>% filter(year == 2007) %>% ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, label = country)) + geom_point() ggplotly(g)
Two pure R alternatives are
https://echarts4r.john-coene.com/index.html
https://davidgohel.github.io/ggiraph/
gganimate: https://gganimate.com/articles/gganimate.html
Animations are built on top of ggplot. Transition is coded through transition_time() and year.
library(gganimate) # Important: don't forget the package! gapminder %>% group_by(continent, year) %>% summarise(avg_exp = mean(lifeExp)) %>% ggplot(aes(x = year, y = avg_exp, color = continent)) + geom_line() + geom_point() + transition_reveal(year)
Here, there is no chronology:
we use transition_states()
and the variable is clarity.
library(gganimate) # Don't forget the package ! diamonds %>% group_by(cut, clarity) %>% summarise(avg_price = mean(price)) %>% ggplot(aes(x = clarity, y = avg_price, fill = cut)) + theme_minimal() + theme(text = element_text(size = 6), axis.text.y = element_text(angle = 90, size = 6, hjust = 1) ) + geom_col() + transition_states(clarity) + shadow_mark() + enter_fade()
The usual format is .gif. Via anim_save().
anim <- gapminder %>% # Stores the animation in the anim variable group_by(continent, year) %>% summarise(avg_exp = mean(lifeExp)) %>% ggplot(aes(x = year, y = avg_exp, color = continent)) + geom_line() + geom_point() + theme_classic() + transition_reveal(year) animate(anim, renderer = gifski_renderer(), height = 400, width = 700)
anim_save(animation = animate(anim), # Animation to save filename = "anim.gif", # File name on computer duration = 15, # In seconds rewind = TRUE) # Go backwards or not?
geom_smooth() computes the local average of points. Grey zone = uncertainty (95% confidence interval).
diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point(size = 0.3) + theme_minimal() + geom_smooth() + scale_y_log10() # LOG SCALE for y-axis!!! +theme_grey() ?
diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point(size = 0.3) + geom_smooth(color = "red") + geom_smooth(method = "lm") + ylim(300, 15000) + xlim(0,2.5) + theme_void()
\(\rightarrow\) layers like in Photoshop! Below, the red points are the IF diamonds.
diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point(size = 0.3) + geom_point(data = diamonds %>% filter(clarity == "IF"), color = "red") + geom_smooth(data = diamonds %>% filter(clarity == "IF")) + theme_minimal()
diamonds %>% ggplot(aes(x = carat, y = price)) + geom_point() + theme(panel.background = element_rect(fill = "#FEDFA8", # light orange colour = "black", linewidth = 1.5, linetype = "solid"), plot.background = element_rect(fill = "#C0E7F5")) # light blue
# install.packages('devtools') # devtools::install_github('bbc/bbplot') library(bbplot) diamonds %>% ggplot(aes(x = color, fill = clarity)) + geom_bar() + bbc_style()
Combining plots with the cowplot pkg,
histograms & dodging! (without theme_grey())
library(cowplot) # Don't forget the package! g1 <- diamonds %>% # Create first graph filter(carat < 3) %>% ggplot(aes(x = carat, fill = cut)) + geom_histogram() + theme(legend.position = c(0.89,0.55)) + theme_classic() g2 <- diamonds %>% # Create second graph filter(carat < 3) %>% ggplot(aes(x = carat, fill = cut)) + geom_histogram(position = "dodge", bins = 15) + # Nb of rectangles theme(legend.position = c(0.89,0.55)) plot_grid(g1,g2, # This comes from cowplot nrow = 2, labels = c("No dodge", "Dodge"), label_size = 7, hjust = -0.9, vjust = 0.9)
(Allison Horst again!)
https://patchwork.data-imaginist.com
library(patchwork) library(gridExtra) (g1 + g2) + tableGrob(diamonds[1:15, c('carat', 'clarity', 'price')], rows = NULL, theme=ttheme_minimal(base_size = 9))
ggrough! https://xvrdm.github.io/ggrough/index.html - More fun, though not always easy to read!
library(ggrough) p <- diamonds %>% ggplot(aes(x = clarity, fill = cut)) + geom_bar() + scale_fill_brewer(palette = "Spectral") + theme(text = element_text(size=20)) options <- list(GeomBar=list(fill_style="hachure", angle_noise=0.1, fill_weight=1, gap=2, roughness=1)) get_rough_chart(p, options)
seq(-4, 4 , by = .012) %>% # Code from Antonio Sánchez Chinchón expand.grid(x = ., y = .)%>% ggplot(aes(x = (1-abs(x)-sin(y^2)), y = (1+y-cos(x^2))))+ geom_point(alpha = .03, shape = 20, size = 0) + theme_void() + coord_polar()
For plots, use ggsave(). It saves the last plot. Or save a plot via <-
gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point() ggsave("life.png") # Name on computer #or graph_1 <- gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point() ggsave(plot = graph_1, # Plot name filename = "life.png") # Name on computer
Ten formats are available: https://ggplot2.tidyverse.org/reference/ggsave.html
The most usual are: pdf, png and jpeg.
https://datavizuniverse.substack.com/p/whats-wrong-with-pie-charts
With 2-3 colors, pie charts can be ok…
The gurus:
http://albertocairo.com
https://www.visualisingdata.com/blog/
https://informationisbeautiful.net
https://flowingdata.com
https://www.cedricscherer.com
https://github.com/Z3tt/TidyTuesday
Other sources of inspiration / academics
https://www.williamrchase.com/slides/
https://rkabacoff.github.io/datavis/
http://www.machlis.com
https://visualthinking.psych.northwestern.edu
BONUS: http://viz.wtf
Lots of formatting options \(\rightarrow\) know where to find them, then adapt them to your particular problem/task.
Always ask yourself:
\(\blacktriangleright\) What does the graph try to say?
\(\blacktriangleright\) Is the message easy to understand?
Honestly, some of the graphs today were quite ugly! Good looking graphs capture attention!
\(\rightarrow\) you are now ready for the next step!