Who found his/her project topic?

Did you practice since the last session?

About this deck

It contains a lot of formatting options displayed in a WHAT YOU CODE IS WHAT YOU GET (WYCIWYG) fashion.
Two elements are key in your learning process:

  • memory: remember that you saw a particular feature so that you can use it later on when you need it;
  • adaptation: be able to adapt code that you see (mine, or on the web) to your particular problem.

That’s all you need to make progress.

Don’t forget: it’s all about storytelling. Always ask yourself: does my graph convey what I have in mind?

Simple plots

Introduction: ggplot

The gg in ggplot means grammar of graphics. It was coined by Leland Wilkinson (see book) and coded in depth by Hadley Wickham, the creator of ggplot.

The power of ggplot is that it decomposes the way graphs are built into simple elements that characterize the graph.

Detailed intros:
https://metricsf20.classes.ryansafner.com/slides/1.3-slides#1
https://github.com/thomasp85/ggplot2_workshop/blob/master/presentation.pdf
https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/

There are currently dozens of extensions of ggplot:
https://exts.ggplot2.tidyverse.org

See also the cool list: https://www.r-graph-gallery.com/index.html

You should also download the ggplot cheatsheet!
https://posit.co/resources/cheatsheets/.
+ https://www.bigbookofr.com/data-visualization.html#ggplot2-elegant-graphics-for-data-analysis

Introduction: graph types (1/3)

Introduction: graph types (2/3)

Introduction: graph types (3/3)

The ggplot cheat sheet is a great place to find inspiration.

ggplot()

Thanks again Allison Horst!

Introduction: basics

Plots are most of the time 2D objects.

  1. The two dimensions are (often) split according a x-axis and a y axis.
  2. There are many plot types: points, lines, bars (/histograms), areas, etc.
  3. Advanced features exist: colors, sizes, shapes, animations, etc.

A realm of layers

The “simplicity” of ggplot :)

The syntax is strange at first. The aes() wrapper is key: get used to it!

diamonds %>% 
  ggplot(aes(x = carat, y = price)) + geom_point()

Explanation

Two blocks:
- the main function where the x and y axis are defined and
- the graph type.

In aes(…), arguments are column names!

You can add many ‘layers’ and/or options (font size, axis limits, axis scale, etc.).

Have a look at: https://ggplot2-book.org

Decomposing the layers

The magic of ggplot

Many formatting options exist: that’s the power of tidy data (args = variables).

diamonds %>% ggplot(aes(x = carat, y = price, color = clarity, size = cut)) + 
    geom_point(alpha = 0.3)  # alpha : transparence. 1 = hard color, 0 = no color

The magic of ggplot - many plot types!

Many ‘geoms’ are available. For simple bars: don’t specify the y axis, the bars display n(). The x variable is categorical here.
\(\rightarrow\) fill = inside color; color = outside color of bar.

diamonds %>% ggplot(aes(x = clarity, fill = cut)) + geom_bar() + theme_minimal()

The magic of ggplot - lots of options!

Let’s add other formatting options (ugly!).
Be careful: the symbol is ‘+’, not the pipe!
But like the pipe is comes at the end of a line!

diamonds %>% 
  ggplot(aes(x = carat, 
             y = price, 
             color = clarity,
             shape = cut)) +
  geom_point(size = 1.8) + 
  xlim(0.3,1.53) + ylim(500, 4000) +
  labs(title = "Plot of diamonds",
       x = "Size of the diamond",
       y = "Price of the diamond",
       caption = "based on the diams DB") +
  theme(text = element_text(size = 14),
        axis.text.x = element_text(angle = 70, 
                                   size = 16,
                                   hjust = 1,
                                   color = "red"),
        axis.text.y = element_text(angle = 90, 
                                   size = 13,
                                   hjust = 1,
                                   color = "blue")
        )

Using colors: IMPORTANT!

Source: Steve Wexler. The Big Book of Dashboards

Color management: HTML coding

RGB codes: https://htmlcolorcodes.com /// Don’t forget the # before the color code !

diamonds %>% 
  ggplot(aes(x = carat, y = price, color = clarity)) +
  geom_point(size = 1.5) + theme_minimal() + 
  theme(text = element_text(size = 14, color = "#ED18ED"),          # Pink
        axis.text.y = element_text(angle = 90, color = "#33B8FF"))  # Light blue

Color management: fully manual codes

diamonds %>%
  ggplot(aes(x = color, fill = cut)) + geom_bar() + theme_minimal() +
  scale_fill_manual(values = c("#FF4536", "#FDB42C", "#61BC4D", "#5195D8", "#B064C2"))

Colors: benchmark codes

Color management: palettes (via RColorBrewer)

Color management: integrating palettes (color=)

diamonds %>%
  ggplot(aes(x = carat, y = price, color = clarity)) + geom_point() + 
  scale_color_brewer(palette = "Spectral") + theme_minimal()

Scales can be used for other purposes: https://ggplot2tor.com/scales/

Color management: integrating palettes (fill=)

Color management: viridis palette

Color use: relevant!

gapminder |>
  filter(year == 2007, continent == "Europe") |>
  mutate(fill_col = country == "France") |>
  ggplot(aes(x = gdpPercap,  y = reorder(country, gdpPercap),
             fill = fill_col)) +  geom_col() + theme_classic() +
  theme(axis.title.y = element_blank(), legend.position = "none") +
  scale_fill_manual(values = c("#CCCCCC", "#11BB44"))

The magic of ggplot - annotations (1/3)

Annotations can help! vjust = 0 means on top of the point.

gapminder %>%
  filter(continent == "Americas", year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point() +
  geom_text(aes(label = country), vjust = 0, nudge_y = 0.5) # nudge increases the offset 

The magic of ggplot - annotations (2/3)

ggrepel to improve the location of labels :)

library(ggrepel)     # Don't forget to activate this new package!
gapminder %>%
  filter(continent == "Americas", year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point() + theme_classic() + 
  geom_text_repel(aes(label = country), vjust = 0, nudge_y = 0.5) # nudge increases the offset 

The magic of ggplot - annotations (3/3)

Annotate anything…

gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point() +
  annotate("rect", xmin = 90000, xmax = 120000, ymin = 52, ymax = 70, alpha = 0.3, fill = "blue") + 
  annotate("text", x = 105000, y = 47, label = "This is Kuwait") + theme_minimal()

The magic of ggplot - facets (1/3)

Layout features are awesome. Here: facets to see impacts. (works with categories only)

diamonds %>% ggplot(aes(x = carat)) + geom_histogram() + 
    facet_grid(rows = vars(color), cols = vars(cut)) 

The magic of ggplot - facets (2/3)

facets on gapminder! Too bad the output is not dynamic! (more on that later)

gapminder %>% filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + 
  geom_point() + facet_grid(rows = vars(continent), scales = "free") 

The magic of ggplot - facets (3/3)

facet_wrap uses only 1 dimension and the result may not be “rectangular”.

gapminder %>% filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + 
  geom_point() + facet_wrap(vars(continent), ncol = 2) + theme_minimal()

Custom ticks

gapminder %>% filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) + 
  scale_x_continuous(breaks = seq(0, 50000, by = 5000)) +
  geom_point() + facet_grid(continent ~ . ) + theme_minimal()

More exotic geoms

Violins

Violins gives a visual representation of distributions (y axis) through categories (x axis).

diamonds %>%
  ggplot(aes(x = clarity, y = price, fill = clarity)) + geom_violin() + ylim(300, 8000)

Dotplots

Same spirit as violins (for small data), but the syntax is a bit strange (binaxis).

gapminder %>% filter(year == 2007) %>%
  ggplot(aes(x = continent, y = lifeExp, fill = continent)) + geom_dotplot(binaxis = "y")

Boxplots

More stats-orientated: shows quartiles and outliers (±1.5 IQR).

diamonds %>%
  ggplot(aes(x = clarity, y = carat, fill = clarity)) + geom_boxplot()

Jitter

Fuzzy points with small random variations in their positions.

diamonds %>%
  ggplot(aes(x = clarity, y = carat, fill = clarity)) + geom_jitter(size = 0.3) +
  geom_boxplot(alpha = 0.7) + theme_minimal() # Adding a boxplot layer for fun.

Cocktail: pivot tables & ggplot

Combining plots with pivot tables (1/5)

diamonds %>%
  group_by(clarity, cut) %>%
  summarise(avg_carat = mean(carat),
            avg_price = mean(price)) %>%
  ggplot(aes(x = avg_carat, y = avg_price, color = clarity, shape = cut)) + geom_point(size = 5)

Combining plots with pivot tables (2/5)

geom_col() is more flexible than geom_bar(). Remember: fill = inside color.

gapminder %>%
  group_by(continent) %>%
  summarise(avg_pop = mean(pop)) %>%
  ggplot(aes(x = continent, y = avg_pop)) + geom_col(fill = "#427590", alpha = 0.5) + theme_classic()

Combining plots with pivot tables (3/5)

reorder and labs (labels):

gapminder %>%
  group_by(continent) %>%
  summarise(total_pop = sum(pop)) %>%
  ggplot(aes(y = reorder(continent, total_pop), x = total_pop)) + geom_col(fill = "#4275F0", alpha = 0.5) +
  xlab("Average population") + ylab("") + theme_classic() # Beware of the flip!

Combining plots with pivot tables (4/5)

gapminder %>%
  group_by(continent, year) %>%
  summarise(avg_lifeexp = mean(lifeExp)) %>%
  ggplot(aes(x = year, y = avg_lifeexp, color = continent)) + geom_line() + geom_point() + theme_classic()  

Combining plots with pivot tables (5/5)

gapminder %>%
  group_by(continent, year) %>%
  summarise(total_pop = sum(pop)) %>%
  ggplot(aes(x = year, y = total_pop, fill = continent)) + geom_area() + theme_classic()

Dynamic features

Interactive plots

plotly! See: https://plot.ly/r/, best option for 3D plots. Below: mind the label option!

library(plotly)  # Important: don't forget!
g <- gapminder %>% filter(year == 2007) %>%
  ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, label = country)) + geom_point() 
ggplotly(g)

Two pure R alternatives are
https://echarts4r.john-coene.com/index.html
https://davidgohel.github.io/ggiraph/

Animations (1/3)

Animations (2/3)

Animations are built on top of ggplot. Transition is coded through transition_time() and year.

library(gganimate)  # Important: don't forget the package!
gapminder %>%
  group_by(continent, year) %>%
  summarise(avg_exp = mean(lifeExp)) %>%
  ggplot(aes(x = year, y = avg_exp, color = continent)) + geom_line() + geom_point() +
  transition_reveal(year)  

Animations (3/3)

Here, there is no chronology:
we use transition_states()
and the variable is clarity.

library(gganimate)  
# Don't forget the package !
diamonds %>%
  group_by(cut, clarity) %>%
  summarise(avg_price = mean(price)) %>%
  ggplot(aes(x = clarity, 
             y = avg_price, 
             fill = cut)) + theme_minimal() +
  theme(text = element_text(size = 6),
        axis.text.y = element_text(angle = 90, 
                                   size = 6,
                                   hjust = 1)
        ) + 
  geom_col() +  
  transition_states(clarity) +
  shadow_mark() + 
  enter_fade()   

Save animations

The usual format is .gif. Via anim_save().

anim <- gapminder %>%              # Stores the animation in the anim variable
  group_by(continent, year) %>%
  summarise(avg_exp = mean(lifeExp)) %>%
  ggplot(aes(x = year, y = avg_exp, color = continent)) + geom_line() + geom_point() +
  theme_classic() + transition_reveal(year) 
animate(anim, renderer = gifski_renderer(), height = 400, width = 700)

anim_save(animation = animate(anim),  # Animation to save
          filename = "anim.gif",      # File name on computer
          duration = 15,              # In seconds
          rewind = TRUE)              # Go backwards or not?

Extensions

Patterns: smooth fitting

geom_smooth() computes the local average of points. Grey zone = uncertainty (95% confidence interval).

diamonds %>%
  ggplot(aes(x = carat, y = price)) + geom_point(size = 0.3) + theme_minimal() + 
  geom_smooth() + scale_y_log10()    # LOG SCALE for y-axis!!! +theme_grey() ?

Patterns: linear models

diamonds %>%
  ggplot(aes(x = carat, y = price)) + geom_point(size = 0.3) +
  geom_smooth(color = "red") + 
  geom_smooth(method = "lm") + 
  ylim(300, 15000) + xlim(0,2.5) + theme_void()

Plots on subsets

\(\rightarrow\) layers like in Photoshop! Below, the red points are the IF diamonds.

diamonds  %>%
  ggplot(aes(x = carat, y = price)) + geom_point(size = 0.3) +
  geom_point(data = diamonds %>% filter(clarity == "IF"), color = "red") +
  geom_smooth(data = diamonds %>% filter(clarity == "IF")) + theme_minimal()

Some further organization details

Example

diamonds %>% 
  ggplot(aes(x = carat, y = price)) + 
  geom_point() +
  theme(panel.background = element_rect(fill = "#FEDFA8", # light orange
                                colour = "black",
                                linewidth = 1.5, linetype = "solid"),
        plot.background = element_rect(fill = "#C0E7F5")) # light blue

Do it like the BBC

# install.packages('devtools')
# devtools::install_github('bbc/bbplot')
library(bbplot)
diamonds %>% ggplot(aes(x = color, fill = clarity)) + geom_bar() +
  bbc_style()

A last one for the road

Combining plots with the cowplot pkg,
histograms & dodging! (without theme_grey())

library(cowplot)
# Don't forget the package!
g1 <- diamonds %>%  # Create first graph
  filter(carat < 3) %>%
  ggplot(aes(x = carat, 
             fill = cut)) + 
  geom_histogram() +
  theme(legend.position = c(0.89,0.55)) +
  theme_classic()
g2 <- diamonds %>%  # Create second graph
  filter(carat < 3) %>%
  ggplot(aes(x = carat, 
             fill = cut)) + 
  geom_histogram(position = "dodge", 
                 bins = 15) +  # Nb of rectangles
  theme(legend.position = c(0.89,0.55))
plot_grid(g1,g2,      # This comes from cowplot
          nrow = 2,
          labels = c("No dodge", "Dodge"),
          label_size = 7,
          hjust = -0.9, vjust = 0.9)

Or… use patchwork!

(Allison Horst again!)

Structure your plots!

https://patchwork.data-imaginist.com

library(patchwork)
library(gridExtra)
(g1 + g2) + tableGrob(diamonds[1:15, c('carat', 'clarity', 'price')], 
                      rows = NULL, theme=ttheme_minimal(base_size = 9))

Just kidding: this one is the last one!

ggrough! https://xvrdm.github.io/ggrough/index.html - More fun, though not always easy to read!

library(ggrough)
p <- diamonds %>%
  ggplot(aes(x = clarity, fill = cut)) + geom_bar() + scale_fill_brewer(palette = "Spectral") +
  theme(text = element_text(size=20))
options <- list(GeomBar=list(fill_style="hachure", angle_noise=0.1, fill_weight=1, gap=2, roughness=1))
get_rough_chart(p, options)

Generative art (Wow!!!)

seq(-4, 4 , by = .012) %>%     # Code from  Antonio Sánchez Chinchón
expand.grid(x = ., y = .)%>%
ggplot(aes(x = (1-abs(x)-sin(y^2)), y = (1+y-cos(x^2))))+
geom_point(alpha = .03, shape = 20, size = 0) + theme_void() + coord_polar()

Saving (at last!)

For plots, use ggsave(). It saves the last plot. Or save a plot via <-

gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point()
ggsave("life.png")  # Name on computer

#or

graph_1 <- gapminder %>% ggplot(aes(x = gdpPercap, y = lifeExp)) + geom_point()
ggsave(plot = graph_1,         # Plot name
       filename = "life.png")  # Name on computer

Ten formats are available: https://ggplot2.tidyverse.org/reference/ggsave.html
The most usual are: pdf, png and jpeg.

One last thing: beware of pie charts!

Cool people to follow & read

Takeaways

Lots of formatting options \(\rightarrow\) know where to find them, then adapt them to your particular problem/task.


Always ask yourself:
\(\blacktriangleright\) What does the graph try to say?
\(\blacktriangleright\) Is the message easy to understand?

Honestly, some of the graphs today were quite ugly! Good looking graphs capture attention!



Questions?

One step back: a tidyverse of functions

\(\rightarrow\) you are now ready for the next step!

Your turn!