Setup

First, install some new packages.

#install.packages("Rtools", version = "4.2")
#install.packages("Rcpp", version = "1.0.10")
if(!require(tmap)){install.packages("tmap", options = options(timeout = 900))}
if(!require(sf)){install.packages("sf", options = options(timeout = 900))}
if(!require(maps)){install.packages("maps")}
if(!require(mapsf)){install.packages("mapsf")}

Then, activate the ones we need.

library(tidyverse)
library(readxl)
library(tmap)               # Geographical plotting
library(gapminder)          # Country level data
library(maps)               # Geographical datasets
library(sf)                 # Shape files (sf) are a common geographical format 
library(plotly)             # Dynamic graphs
library(mapsf)              # A cool new package!

Now, load the data (included in the packages). The World dataset contains geographical data.

load("World.RData")        # From tmap!
names(World)[2] <- "country"                    # Change the column name
str(World)           # structure of World
Classes ‘sf’ and 'data.frame':  177 obs. of  16 variables:
 $ iso_a3      : Factor w/ 177 levels "AFG","AGO","ALB",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ country     : Factor w/ 177 levels "Afghanistan",..: 1 4 2 166 6 7 5 56 8 9 ...
 $ sovereignt  : Factor w/ 171 levels "Afghanistan",..: 1 4 2 159 6 7 5 52 8 9 ...
 $ continent   : Factor w/ 8 levels "Africa","Antarctica",..: 3 1 4 3 8 3 2 7 6 4 ...
 $ area        : Units: [km^2] num  652860 1246700 27400 71252 2736690 ...
 $ pop_est     : num  28400000 12799293 3639453 4798491 40913584 ...
 $ pop_est_dens: num  43.5 10.3 132.8 67.3 15 ...
 $ economy     : Factor w/ 7 levels "1. Developed region: G7",..: 7 7 6 6 5 6 6 6 2 2 ...
 $ income_grp  : Factor w/ 5 levels "1. High income: OECD",..: 5 3 4 2 3 4 2 2 1 1 ...
 $ gdp_cap_est : num  784 8618 5993 38408 14027 ...
 $ life_exp    : num  59.7 NA 77.3 NA 75.9 ...
 $ well_being  : num  3.8 NA 5.5 NA 6.5 4.3 NA NA 7.2 7.4 ...
 $ footprint   : num  0.79 NA 2.21 NA 3.14 2.23 NA NA 9.31 6.06 ...
 $ inequality  : num  0.427 NA 0.165 NA 0.164 ...
 $ HPI         : num  20.2 NA 36.8 NA 35.2 ...
 $ geometry    :sfc_MULTIPOLYGON of length 177; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:69, 1:2] 61.2 62.2 63 63.2 64 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
  ..- attr(*, "names")= chr [1:15] "iso_a3" "name" "sovereignt" "continent" ...
data("gapminder")
head(gapminder)

The world data is very rich: it has many attributes (population, economy, etc.). The one column we are interested in is geometry: it contains the data for the maps!

NOTE: there are 2 very useful packages that allow to retrieve data from the World Bank: WID and wbstats: have a look!

We see that a few points are missing for life expectancy, so let’s see if we can complete with the gapminder data (spoiler: not so much!).

This exercise (see below) of grouping two datasets is paramount when plotting geographical content because usually, the geometric properties come from one file and the items we wish to plot come from another file. Thus, the joining step is KEY.

Areas

Maps in which areas are coloured to show some numerical attribute are called choropleths.

With the tmap package

The idea below is to merge the two sources to combine geographical forms and life expectancy. To do that, we need left_join(). But before, we need to create a common variable (i.e., a key) between the two. In World, the term “name” is not so great so let’s change it into “country” (just like in gapminder!). Then we can merge!

gapminder %>% 
  left_join(World, by = "country") %>%          # Join the two datasets
  select(country, lifeExp, pop, geometry)
nrow(gapminder)                                   # Number of instances
[1] 1704

There are missing points. As a rough approximation, let’s just remove them and store the result in data (more on them later).

data <- gapminder %>%                     # Join and save via the arrow operator <- 
  left_join(World, by = "country") %>%
  na.omit()                               # Remove rows with missing data
nrow(data)                                # Number of instances in dataset
[1] 1296

The loss is large because of missing points! We then use the tmap package to create a choropleth.

Some countries are missing => data problem because gapminder is incomplete. Let’s complete it via: https://www.gapminder.org/data/ This yields a much bigger dataset, called “gap2.RData”. Let’s see:

load("gap2.RData")
head(gap2)

Again, we resort to left_join to merge the two data sources.

data <- World %>% 
  left_join(gap2, by = "country") #%>%
#na.omit() 

With the mapsf package

References:
https://riatelab.github.io/mapsf/articles/mapsf.html
https://github.com/riatelab/mapsf

It’s super easy: you just need a shape file with the attribute you want to plot!

sf::sf_use_s2(FALSE)
data %>% 
  filter(year == 2050) %>%    # Keep only one year (problems otherwise)
  sf::st_sf() %>%
  mf_map(x = ., var = "population", type = "choro")

The package does not really use pipes/layers.

map <- data %>% 
  filter(year == 2018) %>%    # Keep only one year (problems otherwise)
  sf::st_sf()

my_theme <- list(
  name = "mytheme", 
  bg = "#113366",        # Light blue
  fg = "#FFFFFF",        # Dark grey
  mar = c(0, 0, 0, 0), 
  tab = TRUE, 
  pos = "center", 
  inner = TRUE, 
  line = 1, 
  cex = .9, 
  font = 3
)

mf_theme(my_theme)
mf_map(x = map, type = "base")
mf_map(x = map,
       add = T,             # After you created a map with mf_map, you need this!
       var = c("inequality", "well_being"),      # Two variables! BUT BE CAREFUL!
       type = "prop_choro",
       inches = 0.1,                             # Scale of circles
       lwd = 1,                                  # Line width
       breaks = "equal", 
       nbreaks = 6,                              # Number of colors breaks
       leg_title_cex = c(1,1),                   # Size of legend title
       leg_val_cex = c(0.5,0.8),                 # Size of legend text
       pal = "Spectral",                         # Palette
       leg_pos = c("bottomleft", "left"),        # Legend position
       leg_title= c("Inequality","Well being"),  # Legend titles
       leg_val_rnd = c(2, 2),                    # Number of decimals in legends
       leg_frame = c(TRUE, TRUE))                # Frames around legend?
mf_layout(title = "Inequality & well-being in the world", 
          credits = paste0("Sources: MISC"), 
          frame = TRUE)

With the leaflet package

Next, we move towards interactive maps with the leaflet package. The advantage of leaflet lies in its multiple options that create dynamic plots. The example below is strongly inspired from the reference page: https://rstudio.github.io/leaflet/choropleths.html Thus, looking at the difference between the two codes helps understand how they work.

We are going two use two features:
- a customized color scale;
- labels that appear when the cursor is on the map.

For the scale, we define it manually, both for the colors and for the separation points (bins).
The labels are defined using html functions.

One word on missing data: in the World data, “Czech Rep.” stands for “Czech Republic”, so we recode manually it to fill in a hole. Since it’s a factor in World, we need to use the recode_factor function. For simple text, recode would be sufficient.

if(!require(leaflet)){install.packages("leaflet")}                  # Install package
library(leaflet)                                                    # Load package
datamap <- World %>% 
  mutate(country = country %>% recode_factor(`Czech Rep.` = "Czech Republic")) %>%
  left_join(gap2, by = "country") %>%
  filter(year == 2018)                                # Keep only one year (recent)

palet <- colorBin("YlGnBu",                                         # Yellow-Green-Blue palette
                  domain = datamap %>% pull(lifeExp), # LifeExp     # Domain of labels: lifeExp
                  bins = c(50, 55, 60, 65, 70, 75, 80, 85))         # Age categories

labels <- sprintf(                                                  # Below we define the labels
  "<strong>%s</strong><br/>%g Years",                               # Adding text to label
  datamap$country,                                                  # We show the country name...
  datamap$lifeExp                                                  # ... and the life expectancy
) %>% lapply(htmltools::HTML)                                       # Embedded all into html language

We can then move forward with the plot.

datamap %>% 
  data.frame() %>%                                # Turn into dataframe (technical)
  sf::st_sf() %>%                                 # Format in sf
  #st_transform("+init=epsg:4326") %>%             # Convert in particular coordinate reference 
  leaflet() %>%                                   # Call leaflet
  addPolygons(fillColor = ~palet(lifeExp),        # Create the map (colored polygons)
              weight = 2,                         # Width of separation line
              opacity = 1,                        # Opacity of separation line
              color = "white",                    # Color of separation line
              dashArray = "3",                    # Dash size of separation line
              fillOpacity = 0.7,                  # Opacity of polygon colors
              highlight = highlightOptions(       # 5 lines below control the cursor impact
                weight = 2,                       # Width of line
                color = "#333333",                # Color of line
                dashArray = "",                   # No dash
                fillOpacity = 0.9,                # Opacity
                bringToFront = TRUE),
              label = labels,                     # LABEL! Defined above!
              labelOptions = labelOptions(        # Label options below...
                style = list("font-weight" = "normal", padding = "3px 8px"),
                textsize = "15px",
                direction = "auto")
  ) %>%
  addLegend(pal = palet,                    # Legend: comes from palet colors defined above
            values = ~lifeExp,              # Values come from lifeExp variable
            opacity = 0.9,                  # Opacity of legend
            title = "Map Legend",           # Title of legend
            position = "bottomright")       # Position of legend

Better, but some African countries are still missing… Data problem!

It is possible to integrate leaflet into shiny: https://rstudio.github.io/leaflet/shiny.html You need renderLeaflet() in the server and leafletOutput() in the UI.

Points (with ggplot!)

Points are more easy to handle because they require less information (two coordinates at least plus other attributes like size or color). This is exactly the kind of data type that fits in the ggplot syntax! Below, we show how to combine ggplot with geographical data.

Country level data

While areas must be defined at the country level (complex!), points are simply characterized by two numbers: latitude & longitude. So a list of these two features suffices to plot points. For countries, it is possible to find data online, ex: https://developers.google.com/public-data/docs/canonical/countries_csv

The latitudes and longitudes correspond to the center of the countries.

Below, we create a new dataset with longitudes & latitutes combined with gapminder (updated version).

if(!require(readxl)){install.packages("readxl")}      # Install package to read excel files
library(readxl)                                       # Activate package
options(scipen=999)                                   # Remove scientific notation
country_LL <- read_excel("country_LL.xlsx")           # Read longitude/latitude data
data2 <- left_join(gap2, country_LL) %>%              # Merge two datasets
  #na.omit() %>%                                    # Remove missing points
  mutate(latitude = as.numeric(latitude),           # 'Numerize' latitude
         longitude = as.numeric(longitude),         # 'Numerize' longitude
         population = round(population / 10^6,1))   # Scaling population into millions
data2 %>% head()

Be careful, the code below is computationally demanding (be patient). We use the viridis color palette. In ggplot, a small list of maps is available: world, France, Italy, New Zealand, US (and states).

if(!require(scales)){install.packages("scales")}
library(scales) 
world <- map_data("world")                                # From ggplot
g <- ggplot(data = world) +                               # Data source
  geom_polygon(data = world, aes(x = long,                # Dark background of the world
                                 y = lat,
                                 group = group)) +   
  geom_point(data = data2 %>% filter(year == 2018),       # Most recent data points
             aes(x = longitude,                           # Classical ggplot syntax!
                 y = latitude, 
                 size = gdpPercap, 
                 color = lifeExp, 
                 label = country)) +                      # Label (for plotly)
  scale_color_viridis_c(option = "magma") + ylim(-70,80)    # Color scale
ggplotly(g)                                               # Plotly integration

City level data

You can find longitudes & latitudes by searching on Google. Here’s one example for France: https://simplemaps.com/data/fr-cities Worldwide data can be accessed here: https://simplemaps.com/data/world-cities

cities <- read_excel("worldcities.xlsx")

That’s too big. Let’s focus on Germany, Italy, Austria, Slovenia and Switzerland (arbitrarily) - or other choices.

cities_short <- cities %>% 
  filter(#country %in% c("Germany", "Italy", "Switzerland", "Austria", "Slovenia"),
    country %in% c("China", "India", "Nepal", "Bangladesh", "Thailand"),
    population > 350000) %>%
  na.omit()
head(cities_short)
world <- map_data("world")
g <- ggplot(data = world) +                                     # Data source
  geom_polygon(data = world,                                    # Dark background of the world
               aes(x = long, y = lat, group = group),
               size = 0.3, color = "grey") +                    # Outer lines of polygons
  geom_point(data = cities_short,                               # The points
             aes(x = lng, 
                 y = lat, 
                 size = population, 
                 color = country, 
                 label = city)) +
  #scale_colour_gradient(low = "#FFFB00", high = "#FF1B00") +  # This line is for color = population
  scale_colour_manual(values = c("#FFEC00", "#66E234", "#1BCBF7", "#CC44F8", "#FC2361")) +
  coord_sf(xlim = c(70, 125), ylim = c(6, 52), expand = FALSE) + theme_bw() # Zoom on Europe
# coord_sf(xlim = c(5, 18), ylim = c(36, 57), expand = FALSE) + theme_bw() # Zoom on Europe
ggplotly(g)

Other maps

For points, it’s easy: just use the same coordinate system via longitude & latitude. It’s much more tricky for areas.

The key is to find shape files (one alternative format is GeoJSON, but it is not treated here).
Example: http://datapages.com/gis-map-publishing-program/gis-open-files/global-framework/ Be very careful: some maps are EXTREMELY precise (up to 1m precision). Thus, the files are heavy and the plotting takes a LOT of time. Below, we use 100m precision, which is largely enough.

The full description of all shape file items can be accessed on wikipedia:

  • .shp — shape format; the feature geometry itself
  • .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly
  • .dbf — attribute format; columnar attributes for each shape, in dBase IV format

For instance, it’s possible to fetch data for French regions at: https://www.data.gouv.fr/fr/datasets/contours-geographiques-des-nouvelles-regions-metropole/

regions <- sf::st_read("regions-20140306-100m.shp")
Reading layer `regions-20140306-100m' from data source 
  `/Users/coqueret/Documents/IT/COURS/R4DS/S7_extensions/Geocomputing/regions-20140306-100m.shp' using driver `ESRI Shapefile'
Simple feature collection with 27 features and 10 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -61.80976 ymin: -21.38973 xmax: 55.83665 ymax: 51.08984
Geodetic CRS:  WGS 84
str(regions)    # Structure of the variable region
Classes ‘sf’ and 'data.frame':  27 obs. of  11 variables:
 $ code_insee: chr  "42" "72" "83" "25" ...
 $ nom       : chr  "Alsace" "Aquitaine" "Auvergne" "Basse-Normandie" ...
 $ nom_cl    : chr  "Strasbourg" "Bordeaux" "Clermont-Ferrand" "Caen" ...
 $ insee_cl  : chr  "67482" "33063" "63113" "14118" ...
 $ nuts2     : chr  "FR42" "FR61" "FR72" "FR25" ...
 $ iso3166_2 : chr  NA NA NA NA ...
 $ wikipedia : chr  "fr:Alsace" "fr:Aquitaine" "fr:Auvergne" "fr:Basse-Normandie" ...
 $ nb_dep    : num  2 5 4 3 4 4 6 4 2 4 ...
 $ nb_comm   : num  904 2296 1310 1812 2046 ...
 $ surf_km2  : num  8328 41818 26172 17786 31752 ...
 $ geometry  :sfc_MULTIPOLYGON of length 27; first list element: List of 1
  ..$ :List of 1
  .. ..$ : num [1:760, 1:2] 7.43 7.43 7.42 7.42 7.4 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:10] "code_insee" "nom" "nom_cl" "insee_cl" ...

That’s complicated. The only important column for now is the name of the region which is easy to recognize. Below, we plot a choropleth where the color is link to the area of the region (bigger = darker).

regions %>% 
  tm_shape() +
  tm_polygons("surf_km2")


regions %>% 
  sf::st_sf() %>%
  mf_map(x = ., var = "surf_km2", type = "choro")

That’s prety ugly. Let’s remove the regions outside the mainland.

regions %>%
  filter(! (nom %in% c("Guyane", "Mayotte", "Martinique", "Guadeloupe", "La Réunion"))) %>% 
  tm_shape() +
  tm_polygons("surf_km2", border.col = "white") # With the border of polygons


regions %>% 
  filter(! (nom %in% c("Guyane", "Mayotte", "Martinique", "Guadeloupe", "La Réunion"))) %>% 
  sf::st_sf() %>%
  mf_map(x = ., var = "surf_km2", type = "choro", pal = "Spectral")

Below, we change the color palette and plot the number of French “départements” inside each region.

regions %>%
  filter(! (nom %in% c("Guyane", "Mayotte", "Martinique", "Guadeloupe", "La Réunion"))) %>% 
  tm_shape() +
  tm_fill("nb_dep", palette = "Spectral")  # Without the border

This is static: leaflet integration would be better.

Highcharts

The material below is inspired from the vignette: https://jkunst.com/highcharter/

The highcharter package creates good looking plots. Let’s install it (and activate it).

if(!require(highcharter)){install.packages("highcharter")}
library(highcharter)

The package has a large library of maps. The list is given here: https://code.highcharts.com/mapdata/

If you click on a link, you get the address of the map:

The last part is what you specify to get access to a map:

mapdata <- get_data_from_map(download_map_data("countries/fr/custom/fr-all-mainland"))
trying URL 'https://code.highcharts.com/mapdata/countries/fr/custom/fr-all-mainland.js'
Content type 'text/javascript' length 23677 bytes (23 KB)
==================================================
downloaded 23 KB
head(mapdata)

The information that will be plotted comes from this file. We need to add the value that we want to use for the plot. There are 21 regions in the variable map_data. This bypasses the merging of external data via the left_join() function.

mapdata <- mapdata %>% mutate(value = 1:nrow(mapdata)) # Completely random values
head(mapdata, 20)

Below, we plot the regions of France and the color code is driven by the column “value” defined above.

hcmap("countries/fr/custom/fr-all-mainland",                      # Source for the map
      data = mapdata,                                             # Source for the color code
      value = "value",                                            # Column for color code
      joinBy = c("name"),                                         # Common name & label
      name = "Random Plot",                                       # Name of plot
      dataLabels = list(enabled = TRUE, format = "{point.name}"), # Allow labels
      borderColor = "#FAFAFA", borderWidth = 0.5,                 # Border info
      tooltip = list(valueDecimals = 1,                           # Label info
                     valuePrefix = "Value = ",                    # Before label
                     valueSuffix = " Units")) %>%                 # After label
  hc_colorAxis(
    stops = color_stops(colors = viridisLite::inferno(10, begin = 0.1)),
    type = "logarithmic"
  ) 
trying URL 'https://code.highcharts.com/mapdata/countries/fr/custom/fr-all-mainland.js'
Content type 'text/javascript' length 23677 bytes (23 KB)
==================================================
downloaded 23 KB

Fine grain local maps

Now let’s turn to smaller scale maps. The data comes from Kaggle, scrapped from tripadvisor. The useful columns: Longitude & Latitude, the MuseumName, the Rating and the LengthOfVisit.

tripadvisor <- read_excel("tripadvisor_museum_US.xlsx") %>%    # Importing the data
  mutate(Longitude = as.numeric(Longitude),                    # Getting numbers back
         Latitude = as.numeric(Latitude),
         Rating = as.numeric(Rating),
         LengthOfVisit = as.factor(LengthOfVisit)) %>%
  filter(Latitude > 40, Latitude < 41, Longitude > (-74.5), Longitude < (-73.5)) # NY museums only
Warning: There were 2 warnings in `mutate()`.
The first warning was:
ℹ In argument: `Longitude = as.numeric(Longitude)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run ]8;;ide:run:dplyr::last_dplyr_warnings()dplyr::last_dplyr_warnings()]8;; to see the 1 remaining warning.
tripadvisor

With leaflet, it’s easy to obtain empty maps as long as you know the longitude & latitude of a particular location. Below: Lyon.

leaflet() %>%                                                  # An empty map: a good start!
  setView(lng = 4.85, lat = 45.75, zoom = 12) %>% 
  addTiles()
pal <- colorFactor(palette = "Spectral", domain = tripadvisor$LengthOfVisit)
tripadvisor %>%
  leaflet() %>%                                                   # A blank sheet...
  setView(lng = -73.9772, lat = 40.7808, zoom = 11) %>% 
  addTiles() %>%
  addCircles(lng = ~Longitude, lat = ~Latitude, 
             radius = ~Rating^4, label =  ~MuseumName,
             fillColor = ~pal(LengthOfVisit), fillOpacity = 0.9,  # Circle colors
             stroke = TRUE, color = "black", weight = 2) %>%      # Circle stroke
  addLegend(pal = pal,                      # Legend: comes from palet colors defined above
            values = ~LengthOfVisit,        # Values come from lifeExp variable
            opacity = 0.7,                  # Opacity of legend
            title = "Map Legend",           # Title of legend
            position = "bottomright")       # Position of legend

Below, you need to specify lon and lat in the source dataframe. Or you can use the hcaes() function.

mapdata2 <- get_data_from_map(download_map_data("countries/us/custom/us-ny-congress-113"))
trying URL 'https://code.highcharts.com/mapdata/countries/us/custom/us-ny-congress-113.js'
Content type 'text/javascript' length 23597 bytes (23 KB)
==================================================
downloaded 23 KB
hcmap("countries/us/custom/us-ny-congress-113") %>%
  hc_mapNavigation(enabled = TRUE,
                   buttonOptions = list(
                     align = "right",
                     verticalAlign = "bottom")) %>%
  hc_add_series(data = tripadvisor %>% mutate(lon = Longitude, 
                                              lat = Latitude, 
                                              name = MuseumName),  # Add correct names
                type = "mappoint", name = "Museums") 
trying URL 'https://code.highcharts.com/mapdata/countries/us/custom/us-ny-congress-113.js'
Content type 'text/javascript' length 23597 bytes (23 KB)
==================================================
downloaded 23 KB

Not as impressive as leaflet, but the map could probably be improved. NOTE: in the example above, the geographical background and the points are unrelated.
Other map, same result below…

mapdata3 <- get_data_from_map(download_map_data("countries/us/us-ny-all"))
trying URL 'https://code.highcharts.com/mapdata/countries/us/us-ny-all.js'
Content type 'text/javascript' length 34989 bytes (34 KB)
==================================================
downloaded 34 KB
hcmap("countries/us/us-ny-all") %>%
  hc_mapNavigation(enabled = TRUE,
                   buttonOptions = list(
                     align = "right",
                     verticalAlign = "bottom")) %>%
  hc_add_series(data = tripadvisor %>% mutate(lon = Longitude, 
                                              lat = Latitude, 
                                              name = MuseumName),  # Add correct names
                type = "mappoint", name = "Museums") 
trying URL 'https://code.highcharts.com/mapdata/countries/us/us-ny-all.js'
Content type 'text/javascript' length 34989 bytes (34 KB)
==================================================
downloaded 34 KB

A test with GeoJSON

GeoJSON are, with shapefiles the other main standard in geocomputing.
It’s easy to obtain files, e.g., via https://geojson-maps.ash.ms Also, you will need the rgdal package.

#library(rgdal)
library(geojsonsf)
#africa <- rgdal::readOGR("africa.geo.json")
#africa <- gdal_read("custom.geo.json")
africa <- geojson_sf("africa.geo.json")

labels <- sprintf(
  "<strong>%s</strong><br/>%g people",
  africa$name, africa$pop_est
) %>% lapply(htmltools::HTML)
pal <- colorBin("YlOrRd", domain = africa$pop_est, 
                bins = c(0, 100, 500, 1000, 5000, 10000, Inf))

leaflet(africa) %>%
  addTiles() %>%
  addPolygons(fillOpacity = 0.7,
              fillColor = ~pal(pop_est),
              weight = 2,
              opacity = 1,
              color = "white",
              # highlight = highlightOptions(       # 5 lines below control the cursor impact
              #   weight = 2,                       # Width of line
              #   color = "#333333",                # Color of line
              #   dashArray = "",                   # No dash
              #   fillOpacity = 0.9,                # Opacity
              #   bringToFront = TRUE),
              label = labels,
              labelOptions = labelOptions(
                style = list("font-weight" = "normal", padding = "3px 8px"),
                textsize = "15px",
                direction = "auto")) %>%
  addLegend(pal = pal, values = ~pop_est, opacity = 0.7, title = NULL,
            position = "bottomleft")
Warning: Some values were outside the color scale and will be treated as NA
---
title: "Geocomputing with R"
output:
  html_document:
    toc: yes
    df_print: paged
  html_notebook:
    toc: yes
    toc_float: yes
---


# Setup

First, install some new packages.


```{r, message = FALSE, warning = FALSE}
#install.packages("Rtools", version = "4.2")
#install.packages("Rcpp", version = "1.0.10")
if(!require(tmap)){install.packages("tmap", options = options(timeout = 900))}
if(!require(sf)){install.packages("sf", options = options(timeout = 900))}
if(!require(maps)){install.packages("maps")}
if(!require(mapsf)){install.packages("mapsf")}
```


Then, activate the ones we need.

```{r, message = FALSE, warning = FALSE}
library(tidyverse)
library(readxl)
library(tmap)               # Geographical plotting
library(gapminder)          # Country level data
library(maps)               # Geographical datasets
library(sf)                 # Shape files (sf) are a common geographical format 
library(plotly)             # Dynamic graphs
library(mapsf)              # A cool new package!
```


Now, load the data (included in the packages). The World dataset contains geographical data. 

```{r}
load("World.RData")        # From tmap!
names(World)[2] <- "country"                    # Change the column name
str(World)           # structure of World
data("gapminder")
head(gapminder)
```

The world data is very rich: it has many attributes (population, economy, etc.). The one column we are interested in is **geometry**: it contains the data for the maps!

**NOTE**: there are 2 very useful packages that allow to retrieve data from the World Bank: **WID** and **wbstats**: have a look!

We see that a few points are missing for life expectancy, so let's see if we can complete with the gapminder data (spoiler: not so much!).

**This exercise (see below) of grouping two datasets is paramount when plotting geographical content because usually, the geometric properties come from one file and the items we wish to plot come from another file. Thus, the joining step is KEY.**

# Areas

Maps in which areas are coloured to show some numerical attribute are called *choropleths*.

## With the tmap package

The idea below is to merge the two sources to combine geographical forms and life expectancy. To do that, we need **left_join**(). But before, we need to create a common variable (i.e., a *key*) between the two. In World, the term "name" is not so great so let's change it into "country" (just like in gapminder!). Then we can merge!

```{r, message = FALSE, warning = FALSE}
gapminder %>% 
  left_join(World, by = "country") %>%          # Join the two datasets
  select(country, lifeExp, pop, geometry)
nrow(gapminder)                                   # Number of instances
```

There are missing points. As a rough approximation, let's just remove them and store the result in **data** (more on them later). 

```{r, message = FALSE, warning = FALSE}
data <- gapminder %>%                     # Join and save via the arrow operator <- 
  left_join(World, by = "country") %>%
  na.omit()                               # Remove rows with missing data
nrow(data)                                # Number of instances in dataset
```

The loss is large because of missing points! We then use the *tmap* package to create a **choropleth**.


```{r, fig.width=9}
data %>% 
  filter(year == 2007) %>%    # Keep only one year (problems otherwise)
  sf::st_sf() %>%             # Transform data into shape file (sf) format
  tm_shape() +                # Tranforms the sf into a tmap object
  tm_polygons("lifeExp",      # Plot!
              alpha = 0.9,
              palette = "Spectral")    
```


Some countries are missing => data problem because gapminder is incomplete.
Let's complete it via: https://www.gapminder.org/data/
This yields a much bigger dataset, called "gap2.RData". Let's see:

```{r}
load("gap2.RData")
head(gap2)
```


Again, we resort to left_join to merge the two data sources.

```{r, message = FALSE, warning = FALSE}
data <- World %>% 
  left_join(gap2, by = "country") #%>%
#na.omit() 
```


## With the **mapsf** package

**References**:  
https://riatelab.github.io/mapsf/articles/mapsf.html  
https://github.com/riatelab/mapsf  

It's super easy: you just need a shape file with the attribute you want to plot!

```{r, message = F, warning = F, fig.width=8}
sf::sf_use_s2(FALSE)
data %>% 
  filter(year == 2050) %>%    # Keep only one year (problems otherwise)
  sf::st_sf() %>%
  mf_map(x = ., var = "population", type = "choro")
```

The package does not really use pipes/layers.

```{r, message = F, warning = F, fig.width=9}
map <- data %>% 
  filter(year == 2018) %>%    # Keep only one year (problems otherwise)
  sf::st_sf()

my_theme <- list(
  name = "mytheme", 
  bg = "#113366",        # Light blue
  fg = "#FFFFFF",        # Dark grey
  mar = c(0, 0, 0, 0), 
  tab = TRUE, 
  pos = "center", 
  inner = TRUE, 
  line = 1, 
  cex = .9, 
  font = 3
)

mf_theme(my_theme)
mf_map(x = map, type = "base")
mf_map(x = map,
       add = T,             # After you created a map with mf_map, you need this!
       var = c("inequality", "well_being"),      # Two variables! BUT BE CAREFUL!
       type = "prop_choro",
       inches = 0.1,                             # Scale of circles
       lwd = 1,                                  # Line width
       breaks = "equal", 
       nbreaks = 6,                              # Number of colors breaks
       leg_title_cex = c(1,1),                   # Size of legend title
       leg_val_cex = c(0.5,0.8),                 # Size of legend text
       pal = "Spectral",                         # Palette
       leg_pos = c("bottomleft", "left"),        # Legend position
       leg_title= c("Inequality","Well being"),  # Legend titles
       leg_val_rnd = c(2, 2),                    # Number of decimals in legends
       leg_frame = c(TRUE, TRUE))                # Frames around legend?
mf_layout(title = "Inequality & well-being in the world", 
          credits = paste0("Sources: MISC"), 
          frame = TRUE)
```


## With the **leaflet** package

Next, we move towards interactive maps with the *leaflet* package. The advantage of *leaflet* lies in its multiple options that create dynamic plots. The example below is strongly inspired from the reference page: https://rstudio.github.io/leaflet/choropleths.html
Thus, looking at the difference between the two codes helps understand how they work. 

We are going two use two features:   
- a customized color scale;    
- labels that appear when the cursor is on the map.

For the scale, we define it manually, both for the colors and for the separation points (bins).   
The labels are defined using html functions.

One word on missing data: in the *World* data, "Czech Rep." stands for "Czech Republic", so we recode manually it to fill in a hole. Since it's a factor in *World*, we need to use the **recode_factor** function. For simple text, **recode** would be sufficient.

```{r, message = FALSE, warning = FALSE}
if(!require(leaflet)){install.packages("leaflet")}                  # Install package
library(leaflet)                                                    # Load package
datamap <- World %>% 
  mutate(country = country %>% recode_factor(`Czech Rep.` = "Czech Republic")) %>%
  left_join(gap2, by = "country") %>%
  filter(year == 2018)                                # Keep only one year (recent)

palet <- colorBin("YlGnBu",                                         # Yellow-Green-Blue palette
                  domain = datamap %>% pull(lifeExp), # LifeExp     # Domain of labels: lifeExp
                  bins = c(50, 55, 60, 65, 70, 75, 80, 85))         # Age categories

labels <- sprintf(                                                  # Below we define the labels
  "<strong>%s</strong><br/>%g Years",                               # Adding text to label
  datamap$country,                                                  # We show the country name...
  datamap$lifeExp                                                  # ... and the life expectancy
) %>% lapply(htmltools::HTML)                                       # Embedded all into html language
```

We can then move forward with the plot.

```{r, message = FALSE, warning = FALSE}
datamap %>% 
  data.frame() %>%                                # Turn into dataframe (technical)
  sf::st_sf() %>%                                 # Format in sf
  #st_transform("+init=epsg:4326") %>%             # Convert in particular coordinate reference 
  leaflet() %>%                                   # Call leaflet
  addPolygons(fillColor = ~palet(lifeExp),        # Create the map (colored polygons)
              weight = 2,                         # Width of separation line
              opacity = 1,                        # Opacity of separation line
              color = "white",                    # Color of separation line
              dashArray = "3",                    # Dash size of separation line
              fillOpacity = 0.7,                  # Opacity of polygon colors
              highlight = highlightOptions(       # 5 lines below control the cursor impact
                weight = 2,                       # Width of line
                color = "#333333",                # Color of line
                dashArray = "",                   # No dash
                fillOpacity = 0.9,                # Opacity
                bringToFront = TRUE),
              label = labels,                     # LABEL! Defined above!
              labelOptions = labelOptions(        # Label options below...
                style = list("font-weight" = "normal", padding = "3px 8px"),
                textsize = "15px",
                direction = "auto")
  ) %>%
  addLegend(pal = palet,                    # Legend: comes from palet colors defined above
            values = ~lifeExp,              # Values come from lifeExp variable
            opacity = 0.9,                  # Opacity of legend
            title = "Map Legend",           # Title of legend
            position = "bottomright")       # Position of legend
```


Better, but some African countries are still missing... Data problem!

It is possible to integrate *leaflet* into *shiny*: https://rstudio.github.io/leaflet/shiny.html
You need **renderLeaflet**() in the server and **leafletOutput**() in the UI.


# Points (with ggplot!)

Points are more easy to handle because they require less information (two coordinates at least plus other attributes like size or color). This is exactly the kind of data type that fits in the **ggplot** syntax! Below, we show how to combine ggplot with geographical data.

## Country level data

While areas must be defined at the country level (complex!), points are simply characterized by two numbers: **latitude** & **longitude**. So a list of these two features suffices to plot points. For countries, it is possible to find data online, ex: https://developers.google.com/public-data/docs/canonical/countries_csv

The latitudes and longitudes correspond to the center of the countries.

Below, we create a new dataset with longitudes & latitutes combined with *gapminder* (updated version).
```{r, message = FALSE, warning = FALSE}
if(!require(readxl)){install.packages("readxl")}      # Install package to read excel files
library(readxl)                                       # Activate package
options(scipen=999)                                   # Remove scientific notation
country_LL <- read_excel("country_LL.xlsx")           # Read longitude/latitude data
data2 <- left_join(gap2, country_LL) %>%              # Merge two datasets
  #na.omit() %>%                                    # Remove missing points
  mutate(latitude = as.numeric(latitude),           # 'Numerize' latitude
         longitude = as.numeric(longitude),         # 'Numerize' longitude
         population = round(population / 10^6,1))   # Scaling population into millions
data2 %>% head()
```

Be careful, the code below is computationally demanding (be patient). We use the *viridis* color palette. 
In *ggplot*, a small list of maps is available: world, France, Italy, New Zealand, US (and states). 

```{r, message = FALSE, warning = FALSE, fig.width=8}
if(!require(scales)){install.packages("scales")}
library(scales) 
world <- map_data("world")                                # From ggplot
g <- ggplot(data = world) +                               # Data source
  geom_polygon(data = world, aes(x = long,                # Dark background of the world
                                 y = lat,
                                 group = group)) +   
  geom_point(data = data2 %>% filter(year == 2018),       # Most recent data points
             aes(x = longitude,                           # Classical ggplot syntax!
                 y = latitude, 
                 size = gdpPercap, 
                 color = lifeExp, 
                 label = country)) +                      # Label (for plotly)
  scale_color_viridis_c(option = "magma") + ylim(-70,80)    # Color scale
ggplotly(g)                                               # Plotly integration
```

## City level data

You can find longitudes & latitudes by searching on Google. Here's one example for France:
https://simplemaps.com/data/fr-cities
Worldwide data can be accessed here: https://simplemaps.com/data/world-cities

```{r, message = FALSE, warning = FALSE}
cities <- read_excel("worldcities.xlsx")
```

That's too big. Let's focus on Germany, Italy, Austria, Slovenia and Switzerland (arbitrarily) - or other choices.

```{r}
cities_short <- cities %>% 
  filter(#country %in% c("Germany", "Italy", "Switzerland", "Austria", "Slovenia"),
    country %in% c("China", "India", "Nepal", "Bangladesh", "Thailand"),
    population > 350000) %>%
  na.omit()
head(cities_short)
```

```{r, message = FALSE, warning = FALSE, fig.width=8}
world <- map_data("world")
g <- ggplot(data = world) +                                     # Data source
  geom_polygon(data = world,                                    # Dark background of the world
               aes(x = long, y = lat, group = group),
               size = 0.3, color = "grey") +                    # Outer lines of polygons
  geom_point(data = cities_short,                               # The points
             aes(x = lng, 
                 y = lat, 
                 size = population, 
                 color = country, 
                 label = city)) +
  #scale_colour_gradient(low = "#FFFB00", high = "#FF1B00") +  # This line is for color = population
  scale_colour_manual(values = c("#FFEC00", "#66E234", "#1BCBF7", "#CC44F8", "#FC2361")) +
  coord_sf(xlim = c(70, 125), ylim = c(6, 52), expand = FALSE) + theme_bw() # Zoom on Europe
# coord_sf(xlim = c(5, 18), ylim = c(36, 57), expand = FALSE) + theme_bw() # Zoom on Europe
ggplotly(g)
```


# Other maps

For points, it's easy: just use the same coordinate system via longitude & latitude. It's much more tricky for areas.

The key is to find **shape files** (one alternative format is GeoJSON, but it is not treated here).  
Example: http://datapages.com/gis-map-publishing-program/gis-open-files/global-framework/ 
Be very careful: some maps are *EXTREMELY* precise (up to 1m precision). Thus, the files are heavy and the plotting takes a *LOT* of time. Below, we use 100m precision, which is largely enough.

The full description of all shape file items can be accessed on wikipedia:

* .shp — shape format; the feature geometry itself   
* .shx — shape index format; a positional index of the feature geometry to allow seeking forwards and backwards quickly   
* .dbf — attribute format; columnar attributes for each shape, in dBase IV format

For instance, it's possible to fetch data for French regions at:
https://www.data.gouv.fr/fr/datasets/contours-geographiques-des-nouvelles-regions-metropole/


```{r, message = FALSE, warning = FALSE}
regions <- sf::st_read("regions-20140306-100m.shp")
str(regions)    # Structure of the variable region
```

That's complicated. The only important column for now is the *name* of the region which is easy to recognize. Below, we plot a choropleth where the color is link to the area of the region (bigger = darker).

```{r}
regions %>% 
  tm_shape() +
  tm_polygons("surf_km2")

regions %>% 
  sf::st_sf() %>%
  mf_map(x = ., var = "surf_km2", type = "choro")
```

That's prety ugly. Let's remove the regions outside the mainland. 

```{r}
regions %>%
  filter(! (nom %in% c("Guyane", "Mayotte", "Martinique", "Guadeloupe", "La Réunion"))) %>% 
  tm_shape() +
  tm_polygons("surf_km2", border.col = "white") # With the border of polygons

regions %>% 
  filter(! (nom %in% c("Guyane", "Mayotte", "Martinique", "Guadeloupe", "La Réunion"))) %>% 
  sf::st_sf() %>%
  mf_map(x = ., var = "surf_km2", type = "choro", pal = "Spectral")
```

Below, we change the color palette and plot the number of French "départements" inside each region.

```{r}
regions %>%
  filter(! (nom %in% c("Guyane", "Mayotte", "Martinique", "Guadeloupe", "La Réunion"))) %>% 
  tm_shape() +
  tm_fill("nb_dep", palette = "Spectral")  # Without the border
```

This is static: **leaflet** integration would be better.

# Highcharts

The material below is inspired from the vignette: https://jkunst.com/highcharter/

The *highcharter* package creates good looking plots. Let's install it (and activate it).

```{r, message = FALSE, warning = TRUE}
if(!require(highcharter)){install.packages("highcharter")}
library(highcharter)
```

The package has a large library of maps. The list is given here:
https://code.highcharts.com/mapdata/


If you click on a link, you get the address of the map:

![](high_list.png)

The last part is what you specify to get access to a map:

```{r, message = FALSE, warning = FALSE}
mapdata <- get_data_from_map(download_map_data("countries/fr/custom/fr-all-mainland"))
head(mapdata)
```

The information that will be plotted comes from this file. We need to add the value that we want to use for the plot. There are 21 regions in the variable *map_data*. This bypasses the merging of external data via the **left_join**() function.

```{r, message = FALSE, warning = FALSE}
mapdata <- mapdata %>% mutate(value = 1:nrow(mapdata)) # Completely random values
head(mapdata, 20)
```

Below, we plot the regions of France and the color code is driven by the column "value" defined above. 

```{r, message = FALSE, warning = FALSE}
hcmap("countries/fr/custom/fr-all-mainland",                      # Source for the map
      data = mapdata,                                             # Source for the color code
      value = "value",                                            # Column for color code
      joinBy = c("name"),                                         # Common name & label
      name = "Random Plot",                                       # Name of plot
      dataLabels = list(enabled = TRUE, format = "{point.name}"), # Allow labels
      borderColor = "#FAFAFA", borderWidth = 0.5,                 # Border info
      tooltip = list(valueDecimals = 1,                           # Label info
                     valuePrefix = "Value = ",                    # Before label
                     valueSuffix = " Units")) %>%                 # After label
  hc_colorAxis(
    stops = color_stops(colors = viridisLite::inferno(10, begin = 0.1)),
    type = "logarithmic"
  ) 
```


# Fine grain local maps

Now let's turn to smaller scale maps. The data comes from Kaggle, scrapped from tripadvisor. The useful columns: *Longitude* & *Latitude*, the *MuseumName*, the *Rating* and the *LengthOfVisit*. 

```{r}
tripadvisor <- read_excel("tripadvisor_museum_US.xlsx") %>%    # Importing the data
  mutate(Longitude = as.numeric(Longitude),                    # Getting numbers back
         Latitude = as.numeric(Latitude),
         Rating = as.numeric(Rating),
         LengthOfVisit = as.factor(LengthOfVisit)) %>%
  filter(Latitude > 40, Latitude < 41, Longitude > (-74.5), Longitude < (-73.5)) # NY museums only
tripadvisor
```


With leaflet, it's easy to obtain empty maps as long as you know the longitude & latitude of a particular location. Below: **Lyon**.

```{r, fig.width=8}
leaflet() %>%                                                  # An empty map: a good start!
  setView(lng = 4.85, lat = 45.75, zoom = 12) %>% 
  addTiles()
```

```{r}
pal <- colorFactor(palette = "Spectral", domain = tripadvisor$LengthOfVisit)
tripadvisor %>%
  leaflet() %>%                                                   # A blank sheet...
  setView(lng = -73.9772, lat = 40.7808, zoom = 11) %>% 
  addTiles() %>%
  addCircles(lng = ~Longitude, lat = ~Latitude, 
             radius = ~Rating^4, label =  ~MuseumName,
             fillColor = ~pal(LengthOfVisit), fillOpacity = 0.9,  # Circle colors
             stroke = TRUE, color = "black", weight = 2) %>%      # Circle stroke
  addLegend(pal = pal,                      # Legend: comes from palet colors defined above
            values = ~LengthOfVisit,        # Values come from lifeExp variable
            opacity = 0.7,                  # Opacity of legend
            title = "Map Legend",           # Title of legend
            position = "bottomright")       # Position of legend
```

Below, you need to specify **lon** and **lat** in the source dataframe. Or you can use the *hcaes*() function.

```{r}
mapdata2 <- get_data_from_map(download_map_data("countries/us/custom/us-ny-congress-113"))
hcmap("countries/us/custom/us-ny-congress-113") %>%
  hc_mapNavigation(enabled = TRUE,
                   buttonOptions = list(
                     align = "right",
                     verticalAlign = "bottom")) %>%
  hc_add_series(data = tripadvisor %>% mutate(lon = Longitude, 
                                              lat = Latitude, 
                                              name = MuseumName),  # Add correct names
                type = "mappoint", name = "Museums") 

```

Not as impressive as leaflet, but the map could probably be improved. 
**NOTE**: in the example above, the geographical background and the points are unrelated.  
Other map, same result below...

```{r, message = F, warning = F}
mapdata3 <- get_data_from_map(download_map_data("countries/us/us-ny-all"))
hcmap("countries/us/us-ny-all") %>%
  hc_mapNavigation(enabled = TRUE,
                   buttonOptions = list(
                     align = "right",
                     verticalAlign = "bottom")) %>%
  hc_add_series(data = tripadvisor %>% mutate(lon = Longitude, 
                                              lat = Latitude, 
                                              name = MuseumName),  # Add correct names
                type = "mappoint", name = "Museums") 
```


# Resources

Below, a link of great pages & tutorials:

**Geocomputation with R** (online book): https://geocompr.robinlovelace.net    
**ggplot**!: https://ggplot2.tidyverse.org/reference/map_data.htmlPure     
**sf** (shape files): https://cran.r-project.org/web/packages/sf/vignettes/sf1.html   
**sf & R**: https://r-spatial.github.io/sf/articles/sf1.html   
**sf & ggplot**: https://www.r-spatial.org/r/2018/10/25/ggplot2-sf-2.html   
**list of sf**: http://datapages.com/gis-map-publishing-program/gis-open-files/global-framework/global-heat-flow-database/shapefiles-list  
**tmap**: https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.html   
**ggmap**: https://github.com/dkahle/ggmap    
**leaflet**: https://rstudio.github.io/leaflet/    
**Oregon** (tutorial): http://geog.uoregon.edu/bartlein/courses/geog495/lec06.html    




# A test with GeoJSON

GeoJSON are, with shapefiles the other main standard in geocomputing.   
It's easy to obtain files, e.g., via https://geojson-maps.ash.ms
Also, you will need the **rgdal** package.


```{r, message = F, warning = F}
#library(rgdal)
library(geojsonsf)
#africa <- rgdal::readOGR("africa.geo.json")
#africa <- gdal_read("custom.geo.json")
africa <- geojson_sf("africa.geo.json")
africa <- africa |> mutate(pop_est = round(pop_est/10^6,1)) 

labels <- sprintf(
  "<strong>%s</strong><br/>%g M inhabitants",
  africa$name, africa$pop_est
) %>% lapply(htmltools::HTML)
```

```{r}
pal <- colorBin("YlOrRd", domain = africa$pop_est, 
                bins = c(0, 100, 500, 1000, 5000, 10000, Inf))

leaflet(africa) %>%
  addTiles() %>%
  addPolygons(fillOpacity = 0.7,
              fillColor = ~pal(pop_est),
              weight = 2,
              opacity = 1,
              color = "white",
              # highlight = highlightOptions(       # 5 lines below control the cursor impact
              #   weight = 2,                       # Width of line
              #   color = "#333333",                # Color of line
              #   dashArray = "",                   # No dash
              #   fillOpacity = 0.9,                # Opacity
              #   bringToFront = TRUE),
              label = labels,
              labelOptions = labelOptions(
                style = list("font-weight" = "normal", padding = "3px 8px"),
                textsize = "15px",
                direction = "auto")) %>%
  addLegend(pal = pal, values = ~pop_est, opacity = 0.7, title = NULL,
            position = "bottomleft")
```


```{r}

```

```{r}

```

