Data Visualization in R with ggplot2 - Global Oil Reserves, Production, and Consumption Rates

This post shows you how to import the global oil production and consumption data between 1980 and 2017 from the BP statistical review of world energy 2018 report and create different types of plots using ggplot2 package. In addition, you will see how to create an animated map in R with gganimate package using US and Canada annual oil consumption data.

The project consists of three parts:

1. Installing and loading tidyverse (a collection of very usefull R packages), gganimate, and readxl packages.
2. Importing and cleaning data using dplyr, tidyr, and stringr packages (from tidyverse collection).
3. Creating different types of plots with ggplot2 (from tidyverse collection) and gganimate packages.

1. Installing and loading required packages

pkg_list <- c("tidyverse", "gganimate", "readxl", "maps")
# only install a package from the pkg_list if it is not already installed.
for (package in pkg_list) {
  if (!(package %in% installed.packages()[, "Package"])) {
    install.packages(package)
  }
}
# only load a package from the pkg_list if it is not already loaded.
for (package in pkg_list) {
  if (!(require(package,character.only = TRUE))) {
    require(package,character.only = TRUE)
  }
}

2. Importing and cleaning data

The original data are downloded in Excel format from BP website and are stored as bp_stats_review_2018_all_data.xlsx file in the working directory of my R project.

In the first step, I read the oil proved reserves data for different countries between 1980 and 2017 from the Excel spreadsheet.

oil_proved_reserves <- as.data.frame(read_excel("bp_stats_review_2018_all_data.xlsx", sheet = "Oil - Proved reserves history", range = "A3:AM73"))
names(oil_proved_reserves)[1] <- "country"
oil_proved_reserves <- select(oil_proved_reserves,1:(ncol(oil_proved_reserves)))
oil_proved_reserves <- oil_proved_reserves[complete.cases(oil_proved_reserves),]

In the first line of code, all the data in the “Oil - Proved reserves history” tab from row 3 to 73 are read first and then the format of data is changed to data frame. In the second line of the code, name of the first column is changed to “country”. All the columns in the data frame except the last three ones are selected in the third line of the code. Finally, in the last line of the code all the rows with missing values (NAs) are removed from the data frame using complete.cases() function.

In order to create a new column with assigned regions/continents to the countries listed in the data frame first column (“country”), we first need to extract all the reported regions/continents in the first column of the data frame and then create a new column “Region_Continent” with the regions/continents corresponding to the countries in the first column.

Region_index <- which(str_detect(oil_proved_reserves[,1] ,pattern = "Total"))
Region_name <- as.character(str_split(oil_proved_reserves[Region_index,1],pattern = " ", n = 2, simplify = TRUE)[,2])
Region_info <- bind_cols(as.data.frame(Region_index),as.data.frame(Region_name,stringsAsFactors = FALSE))
counter <- 1
temp <- Region_index[1]
for (i in (1:tail(Region_index,1))) {
  if (i < temp) {
    oil_proved_reserves$Region_Continent[i] <- as.character(Region_info[Region_index == temp,][2])
  } else if (i == temp) {
    counter <- counter  + 1
    temp <- Region_index[counter]
    oil_proved_reserves$Region_Continent[i] <- ""
  } else {
  }
}

In the next step, I continue with one more step cleaning to prepare a wide-format data frame.

# wide format
oil_proved_reserves <- oil_proved_reserves[-Region_index,]
oil_proved_reserves <- select(oil_proved_reserves,country,Region_Continent,as.character(1980:2017))
oil_proved_reserves[oil_proved_reserves == "n/a"] <- NA
oil_proved_reserves[3:ncol(oil_proved_reserves)] <- as.numeric(unlist(oil_proved_reserves[3:ncol(oil_proved_reserves)]))

Finally, the wide-format is converted to a long-format in order to generate useful timeseries plots.

# long format
oil_proved_reserves_long <- oil_proved_reserves %>% gather(key = year, value = oil_reserve, -c("country","Region_Continent")) 
oil_proved_reserves_long$year <- as.integer(oil_proved_reserves_long$year)

3. Creating different types of plots with ggplot2

3.1. Oil reserves plots

Plotting oil reserves data on a world map requires world map information including country names, latitude, and longitude.

world_map <- ggplot2::map_data("world")
tail(world_map)
##            long      lat group  order  region subregion
## 100959 12.43916 41.89839  1627 100959 Vatican   enclave
## 100960 12.43838 41.90620  1627 100960 Vatican   enclave
## 100961 12.43057 41.90547  1627 100961 Vatican   enclave
## 100962 12.42754 41.90073  1627 100962 Vatican   enclave
## 100963 12.43057 41.89756  1627 100963 Vatican   enclave
## 100964 12.43916 41.89839  1627 100964 Vatican   enclave

Before binding the “oil_proved_reserves_long” and “world_map” data frames with matching country names, pre-screening country names in both files helps us detect any difference between reported names in the two data frames.

unique(oil_proved_reserves_long$country[!(oil_proved_reserves_long$country %in% world_map$region)])
##  [1] "US"                       "Trinidad & Tobago"       
##  [3] "Other S. & Cent. America" "United Kingdom"          
##  [5] "Other Europe"             "Russian Federation"      
##  [7] "USSR"                     "Other CIS"               
##  [9] "Other Middle East"        "Other Africa"            
## [11] "Other Asia Pacific"

The above line of code shows that US, Trinidad & Tobago, Russian Federation, and United Kingdom are labeled differently in both data frames. Therefore, the names are changed in the “oil_proved_reserves_long” data frame to have identical names in both data frames.

oil_proved_reserves_long[oil_proved_reserves_long$country == "US",]$country <- "USA"
oil_proved_reserves_long[oil_proved_reserves_long$country == "United Kingdom",]$country <- "UK"
oil_proved_reserves_long[oil_proved_reserves_long$country == "Russian Federation",]$country <- "Russia"
oil_proved_reserves_long[oil_proved_reserves_long$country == "Trinidad & Tobago",]$country <- "Trinidad"

Now, I create a single data frame.

world_reserves <- left_join(world_map, oil_proved_reserves_long, by = c('region' = 'country'))
world_reserves_2017 <- world_reserves %>% filter(year %in% c(NA,2017))
colnames(world_reserves_2017)[9] <- "Reserves"

In the above code, “NA” is considered in the filter() function in order to have a complete map of the worlds (indivisual countries with no reported reserves data are also shown on the world map with gray color).

ggplot() +
  geom_map(data = world_reserves_2017, map = world_reserves_2017, aes(x = long, y = lat, map_id = region, fill = Reserves)) + 
  scale_fill_distiller(palette = "Spectral", na.value = "gray94") +
  ggtitle("Source: BP Statistical Review of World Energy 2018. ") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        axis.ticks = element_blank(), 
        panel.grid  = element_blank(),
        legend.title = element_text(size = 12, face = "bold"),
        plot.title = element_text(size = 8, face = "bold")) 
World map of oil reserves at the end of 2017. There is either no data or lumped data for gray-colored countries.

Figure 1: World map of oil reserves at the end of 2017. There is either no data or lumped data for gray-colored countries.

By sorting the reserves data in descending order, We can show the 20 countries with the largest oil reserves in the world at the end of 2017 on a bar plot.

largest_reserves_2017 <- oil_proved_reserves_long %>% filter(year == 2017) %>% select(country,oil_reserve) %>% arrange(desc(oil_reserve))
country_ordered <- factor(largest_reserves_2017$country, ordered = TRUE, 
                                levels = largest_reserves_2017$country)
largest_reserves_2017$country <- country_ordered
ggplot(largest_reserves_2017[c(1:20),],aes(x = country, y = oil_reserve, fill = country)) +
  geom_col() + 
  scale_x_discrete(name = "") +
  scale_y_continuous(name = "Oil Reserves (Thousand million barrels)", breaks = c(0, 50, 100, 150, 200, 250, 300)) +
  ggtitle("Source: BP Statistical Review of World Energy 2018.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        axis.text.x = element_text(angle = 90, hjust = 1),
        axis.ticks.x = element_blank(), 
        panel.grid  = element_blank(),
        plot.title = element_text(size = 8, face = "bold"),
        legend.position = "none") 
The 20 countries with the largest oil proved reserves at the end of 2017.

Figure 2: The 20 countries with the largest oil proved reserves at the end of 2017.

The following plot shows the oil reserves estimates in all the continents/regions of the world since 1980.

oil_reserves_1980_2017 <- oil_proved_reserves_long %>% group_by(Region_Continent, year) %>% summarise(continent_reserves = sum(oil_reserve, na.rm = TRUE)) 
colbrew2 <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00", "#ffff33", "#a65628")
ggplot(oil_reserves_1980_2017, aes(x = year, y = continent_reserves, color = Region_Continent)) +
  geom_line(size = 2) + 
  scale_color_manual(values = colbrew2) +
  scale_x_continuous(name = "year", expand = c(0, 0), limits = c(1980, 2018), breaks = c(1980,1985,1990,1995,2000,2005,2010,2015,2018)) +
  scale_y_continuous(name = "Oil reserves (Thousand million barrels)", limits = c(0, 900), expand = c(0, 0), breaks = c(0,100,200,300,400,500,600,700,800,900)) + 
  ggtitle("Source: BP Statistical Review of World Energy 2018.
*CIS: Commonwealth of Independent States.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        panel.grid.major =  element_line(colour = "grey50"), 
        axis.text.x = element_text(angle = 90, hjust = 1),
        legend.title = element_text(size = 12, face = "bold"),
        plot.title = element_text(size = 8, face = "bold")) 
Oil proved reserves. Color codes from [colorbrewer2.org](http://colorbrewer2.org/#type=qualitative&scheme=Set1&n=7)

Figure 3: Oil proved reserves. Color codes from colorbrewer2.org

3.2. Oil production plots

The oil production long format data can be generated in the same way as oil reserves long format data.

oil_production <- as.data.frame(read_excel("bp_stats_review_2018_all_data.xlsx", sheet = "Oil Production - Barrels", range =  "A3:BB73"))
names(oil_production)[1] <- "country"
oil_production <- select(oil_production,1:(ncol(oil_production)))
oil_production <- oil_production[complete.cases(oil_production),]
Region_index <- which(str_detect(oil_production[,1] ,pattern = "Total"))
Region_name <- as.character(str_split(oil_production[Region_index,1],pattern = " ", n = 2, simplify = TRUE)[,2])
Region_info <- bind_cols(as.data.frame(Region_index),as.data.frame(Region_name,stringsAsFactors = FALSE))
counter <- 1
temp <- Region_index[1]
for (i in (1:tail(Region_index,1))) {
  if (i < temp) {
    oil_production$Region_Continent[i] <- as.character(Region_info[Region_index == temp,][2])
  } else if (i == temp) {
    counter <- counter  + 1
    temp <- Region_index[counter]
    oil_production$Region_Continent[i] <- ""
  } else {
  }
}
# Wide format
oil_production <- oil_production[-Region_index,]
oil_production <- select(oil_production,country,Region_Continent,as.character(1980:2017))
oil_production[oil_production == "n/a"] <- NA
oil_production[3:ncol(oil_production)] <- as.numeric(unlist(oil_production[3:ncol(oil_production)]))
# Long format
oil_production_long <- oil_production %>% gather(key = year, value = oil_rate, -c("country","Region_Continent")) 
oil_production_long$year <- as.integer(oil_production_long$year)

The world map plot is generated below.

world_map <- ggplot2::map_data("world")
unique(oil_production_long$country[!(oil_production_long$country %in% world_map$region)])
##  [1] "US"                       "Trinidad & Tobago"       
##  [3] "Other S. & Cent. America" "United Kingdom"          
##  [5] "Other Europe"             "Russian Federation"      
##  [7] "USSR"                     "Other CIS"               
##  [9] "Other Middle East"        "Other Africa"            
## [11] "Other Asia Pacific"
oil_production_long[oil_production_long$country == "US",]$country <- "USA"
oil_production_long[oil_production_long$country == "United Kingdom",]$country <- "UK"
oil_production_long[oil_production_long$country == "Russian Federation",]$country <- "Russia"
oil_production_long[oil_production_long$country == "Trinidad & Tobago",]$country <- "Trinidad"
world_production <- left_join(world_map, oil_production_long, by = c('region' = 'country'))
world_production_2017 <- world_production %>% filter(year %in% c(NA,2017))
colnames(world_production_2017)[9] <- "Rates"

ggplot() +
  geom_polygon(data = world_production_2017, aes(x = long, y = lat, group = group, fill = Rates)) +
  scale_fill_viridis_c(option = "C", na.value = "grey94") + 
  ggtitle("Source: BP Statistical Review of World Energy 2018.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        axis.ticks = element_blank(), 
        panel.grid  = element_blank(),
        legend.title = element_text(size = 12, face = "bold"),
        plot.title = element_text(size = 8, face = "bold")) 
World map of daily oil production rates in thousand of barrels per day in 2017. There is either no data or lumped data for gray-colored countries.

Figure 4: World map of daily oil production rates in thousand of barrels per day in 2017. There is either no data or lumped data for gray-colored countries.

highest_production_2017 <- oil_production_long %>% filter(year == 2017) %>% select(country,oil_rate) %>% arrange(desc(oil_rate))
country_ordered <- factor(highest_production_2017$country, ordered = TRUE, 
                                levels = highest_production_2017$country)
highest_production_2017$country <- country_ordered
ggplot(highest_production_2017[c(1:20),],aes(x = country,y = oil_rate, fill = country)) +
  geom_col() + 
  scale_x_discrete(name = "") +
  scale_y_continuous(name = "Oil production rate (Thousand barrels per day)", breaks = c(0, 1000,2000,3000,4000,5000,6000,7000,8000 ,9000,10000,11000,12000,13000)) +
  ggtitle("Source: BP Statistical Review of World Energy 2018.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        axis.text.x = element_text(angle = 90, hjust = 1),
        axis.ticks.x = element_blank(), 
        panel.grid  = element_blank(),
        plot.title = element_text(size = 8, face = "bold"),
        legend.position = "none") 
The 20 countries with the highest oil production rates at the end of 2017.

Figure 5: The 20 countries with the highest oil production rates at the end of 2017.

oil_production_long_region <- oil_production_long %>% group_by(Region_Continent, year) %>% summarise(total_prod = sum(oil_rate, na.rm = TRUE)) 
colbrew2 <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00", "#ffff33", "#a65628")
ggplot(oil_production_long_region, aes(x = year, y = total_prod/1000, fill = Region_Continent)) +
  geom_area() + 
  scale_color_manual(values = colbrew2) +
  scale_x_continuous(name = "year", expand = c(0, 0), limits = c(1980, 2018), breaks = c(1980,1985,1990,1995,2000,2005,2010,2015,2018)) +
  scale_y_continuous(name = "Oil production rate (Million barrels per day)", limits = c(0, 100), expand = c(0, 0), breaks = c(10,20,30,40,50,60,70,80,90,100)) +
  ggtitle("Source: BP Statistical Review of World Energy 2018.
*CIS: Commonwealth of Independent States.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        legend.title = element_text(size = 12, face = "bold"),
        panel.grid  = element_blank(),
        plot.title = element_text(size = 8, face = "bold")) 
Oil production rates of different regions/continents.

Figure 6: Oil production rates of different regions/continents.

3.3. Oil consumption plots

oil_consumption <- as.data.frame(read_excel("bp_stats_review_2018_all_data.xlsx", sheet = "Oil Consumption - Barrels", range = "A3:BB109"))
names(oil_consumption)[1] <- "country"
oil_consumption <- select(oil_consumption,1:(ncol(oil_consumption)))
oil_consumption <- oil_consumption[complete.cases(oil_consumption),]
Region_index <- which(str_detect(oil_consumption[,1] ,pattern = "Total"))
Region_name <- as.character(str_split(oil_consumption[Region_index,1],pattern = " ", n = 2, simplify = TRUE)[,2])
Region_info <- bind_cols(as.data.frame(Region_index),as.data.frame(Region_name,stringsAsFactors = FALSE))
counter <- 1
temp <- Region_index[1]
for (i in (1:tail(Region_index,1))) {
  if (i < temp) {
    oil_consumption$Region_Continent[i] <- as.character(Region_info[Region_index == temp,][2])
  } else if (i == temp) {
    counter <- counter  + 1
    temp <- Region_index[counter]
    oil_consumption$Region_Continent[i] <- ""
  } else {
  }
}
# Wide format
oil_consumption <- oil_consumption[-Region_index,]
oil_consumption <- select(oil_consumption,country,Region_Continent,as.character(1980:2017))
oil_consumption[oil_consumption == "n/a"] <- NA
oil_consumption[3:ncol(oil_consumption)] <- as.numeric(unlist(oil_consumption[3:ncol(oil_consumption)]))
# Long format
oil_consumption_long <- oil_consumption %>% gather(key = year, value = oil_rate, -c("country","Region_Continent")) 
oil_consumption_long$year <- as.integer(oil_consumption_long$year)
world_map <- ggplot2::map_data("world")
unique(oil_consumption_long$country[!(oil_consumption_long$country %in% world_map$region)])
##  [1] "US"                    "Trinidad & Tobago"    
##  [3] "Central America"       "Other Caribbean"      
##  [5] "Other South America"   "United Kingdom"       
##  [7] "Other Europe"          "Russian Federation"   
##  [9] "USSR"                  "Other CIS"            
## [11] "Other Middle East"     "Eastern Africa"       
## [13] "Middle Africa"         "Western Africa"       
## [15] "Other Northern Africa" "Other Southern Africa"
## [17] "China Hong Kong SAR"   "Other Asia Pacific"
oil_consumption_long[oil_consumption_long$country == "US",]$country <- "USA"
oil_consumption_long[oil_consumption_long$country == "United Kingdom",]$country <- "UK"
oil_consumption_long[oil_consumption_long$country == "Russian Federation",]$country <- "Russia"
oil_consumption_long[oil_consumption_long$country == "Trinidad & Tobago",]$country <- "Trinidad"
world_consumption <- left_join(world_map, oil_consumption_long, by = c('region' = 'country'))
world_consumption_2017 <- world_consumption %>% filter(year %in% c(NA,2017))
colnames(world_consumption_2017)[9] <- "Rates"
ggplot() +
  geom_map(data = world_consumption_2017, map = world_consumption_2017, aes(x = long, y = lat, map_id=region, fill = Rates)) + 
  scale_fill_continuous(low = "lightgreen", high = "darkgreen", guide = "colorbar",na.value = "gray93", limits = c(0,20000), breaks = c(0,5000,10000,15000,20000)) +
  ggtitle("Source: BP Statistical Review of World Energy 2018.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        axis.ticks = element_blank(), 
        panel.grid  = element_blank(),
        legend.title = element_text(size = 12, face = "bold"),
        plot.title = element_text(size = 8, face = "bold"))
World map of daily oil consumption rates in thousand of barrels per day in 2017. There is either no data or lumped data for gray-colored countries.

Figure 7: World map of daily oil consumption rates in thousand of barrels per day in 2017. There is either no data or lumped data for gray-colored countries.

highest_consumption_2017 <- oil_consumption_long %>% filter(year == 2017) %>% select(country,oil_rate) %>% arrange(desc(oil_rate))
country_ordered <- factor(highest_consumption_2017$country, ordered = TRUE, 
                                levels = highest_consumption_2017$country)
highest_consumption_2017$country <- country_ordered
ggplot(highest_consumption_2017[c(1:20),],aes(x = country,y = oil_rate, fill = country)) +
  geom_col() + 
  scale_x_discrete(name = "") +
  scale_y_continuous(name = "Oil consumption rate (Thousand barrels per day)", breaks = c(0, 1000,2000,3000,4000,5000,6000,7000,8000 ,9000,10000,11000,12000,13000,14000,15000,16000,17000,18000,19000,20000)) +
  ggtitle("Source: BP Statistical Review of World Energy 2018.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        axis.text.x = element_text(angle = 90, hjust = 1),
        axis.ticks.x = element_blank(), 
        panel.grid  = element_blank(),
        plot.title = element_text(size = 8, face = "bold"),
        legend.position = "none") 
The 20 countries with the highest oil consumption rates at the end of 2017.

Figure 8: The 20 countries with the highest oil consumption rates at the end of 2017.

oil_consumption_long_region <- oil_consumption_long %>% group_by(Region_Continent, year) %>% summarise(total_prod = sum(oil_rate, na.rm = TRUE)) 
colbrew2 <- c("#e41a1c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00", "#ffff33", "#a65628")
ggplot(oil_consumption_long_region, aes(x = year, y = total_prod/1000, fill = Region_Continent)) +
  geom_area() + 
  scale_color_manual(values = colbrew2) +
  scale_x_continuous(name = "year", expand = c(0, 0), limits = c(1980, 2018), breaks = c(1980,1985,1990,1995,2000,2005,2010,2015,2018)) +
  scale_y_continuous(name = "Oil consumption rate (Million barrels per day)", limits = c(0, 100), expand = c(0, 0), breaks = c(10,20,30,40,50,60,70,80,90,100)) +
  ggtitle("Source: BP Statistical Review of World Energy 2018.
*CIS: Commonwealth of Independent States.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        legend.title = element_text(size = 12, face = "bold"),
        panel.grid  = element_blank(),
        plot.title = element_text(size = 8, face = "bold")) 
Oil consumption rates of different regions/continents.

Figure 9: Oil consumption rates of different regions/continents.

An animated plot for US and Canada oil consumption data visualization is generated below using gganimate package.

USA_CAN_consumption <- world_consumption %>% filter(region %in% c("USA", "Canada"))
colnames(USA_CAN_consumption)[9] <- "Rates"
ggplot(USA_CAN_consumption, aes(long, lat, group = group)) +
  geom_polygon(aes(fill = Rates)) +
  scale_fill_continuous(type = "viridis", limits = c(1000,21000), breaks = c(1000,3000, 5000,7000,9000,11000,13000,15000,17000,19000,21000)) + 
  scale_y_continuous(breaks = (-2:2) * 30) +
  scale_x_continuous(breaks = (-4:4) * 45) +
  coord_map("ortho", orientation = c(40, -75, 0)) +
  ggtitle("Source: BP Statistical Review of World Energy 2018.") +
  theme_bw() +
  theme(axis.text = element_text(size = 10, face = "bold"),
        axis.title = element_text(size = 10, face = "bold"),
        axis.ticks = element_blank(),
        panel.grid = element_line(colour = "gray60"),
        legend.title = element_text(size = 12, face = "bold"),
        plot.title = element_text(size = 8, face = "bold")) +
  transition_time(year) +
  labs(title = 'year: {frame_time}') +
  ease_aes('linear')
USA and Canada daily oil consumption rates in thousand of barrels per day from 1980 to 2017.

Figure 10: USA and Canada daily oil consumption rates in thousand of barrels per day from 1980 to 2017.