Basic Data Visualization Operations with R

Master essential R plotting techniques to turn data into clear, insightful visuals with ease.

Ceyhun Enki Aksan
Ceyhun Enki Aksan Entrepreneur, Maker

In our previous post, we cleaned and merged data collected from multiple sources. Now that we have a properly structured data table, we can proceed to the visualization stage.

By changing the metrics used in the post titled “R with Google Analytics Reporting API Access”, you can generate different tables and compare the data presented here either together or across separate tables.

R language basic chart examples
R language basic chart examples

R: Data Visualization

Using functions such as plot, barplot, hist, pie, dotchart, and others, we can easily create statistical graphics specified by the function name. The most basic usage can be illustrated as follows. For additional examples, please refer to the links under the Further Reading section at the end of the article.

plot(x, y, col = "<color>", axes = TRUE)
title(main = "<title>", col.main = "<color>")
box()

Over time, these simple visualization techniques may become increasingly complex and difficult to manage and customize. For this reason, ggplot2 has emerged as one of the most widely used packages1. Additionally, other options such as Plotly2, tufte3, lattice4 can also be considered depending on specific requirements.

I will provide a more detailed article on ggplot2. However, for now, I will keep the scope of this post focused and concise.

R: ggplot2 Package

Yes, we currently have data such as page URLs, daily view counts, session durations, bounce rates, page types, categories it belongs to, and publication dates. Excluding the page URL, we can group the date information by weekly, monthly, and yearly intervals. Based on these values (variables), we can then explore the following visualization operations as needed.

First, let’s examine how page view counts change over time.

ggplot(data = <data>, mapping = aes(x = data$x, y = data$y)) + geom_line()

We can create a simple line plot with the code line above. For a more concise and practical format, we can update our code as follows.

ggplot(data, aes(x, y)) + geom_line()

The part of the code line that determines the type of plot is geom_line(). By changing this function, we can easily switch between different types of plots. We can also add multiple plots to the same code line using the + operator. In this case, each plot can be processed using the same data (mapping).

ggplot(data = gaData, mapping = aes(x = date)) +
  geom_line(aes(y = pageviews), col = "red") +
  geom_line(aes(y = sessions), col = "blue")

The outputs of the code lines mentioned above, from left to right, will appear as follows.

ggplot2 Graph
ggplot2 Graph

In the next step, we can perform groupings, facet the plots horizontally or vertically, add trend lines, and other modifications to enable interpretation of the plots according to our objectives.

For example, let’s compare page view counts obtained from organic and paid traffic.

gaPageData.bsc %>%
  filter(date >= ymd("2020-02-01"), date < ymd("2020-03-01")) %>%
  select(date, organicPageviews, paidPageviews, organicbounceRate, paidbounceRate) %>%
  ggplot(aes(date)) +
  geom_line(aes(y = organicPageviews), color = "red") +
  geom_line(aes(y = paidPageviews), color = "blue") +
  ggtitle("Page Views by Traffic Source") +
  labs(x = "February 2020",
       y = "Page Views",
       subtitle = "Organic vs Paid Traffic")
ggplot
ggplot

R: ggplot2 Theme Customizations

Finally, I’d like to mention the theme() function. Our theme() layer allows us to customize text and graphical elements (such as plot, panel, grid, etc.) within the plot. Thanks to this layer, we can easily modify the visual appearance of the plots.

ggplot theme
ggplot theme

You can see the corresponding ggplot code below. Step by step, I first defined the color variables. I recommend using the hihayk/scale tool for color palettes5. Then, I filtered my gaData dataset and began constructing the ggplot layers6, progressing by passing the ggplot object (object) along. In the second step, I move on to customizing the theme. Generally, I prefer to proceed from the most structural element (plot) down to the most basic (legend). Thus, the customizations applied to the plot appear in the first few lines. In the final step, I use labs to specify the plot’s titles and labels.

tcolor <- c("#474e4b",
            "#7a8785",
            "#acbec1",
            "#c5d1c3",
            "#607494",
            "#f3f0f3",
            "#ffffff") # https://hihayk.github.io/scale/

gg <- gaData %>%
  filter(date >= ymd("2020-02-01"), date < ymd("2020-03-01")) %>%
  ggplot(aes(date, pageviews, col = factor(source))) +
  geom_point() +
  stat_smooth(method = "lm", se = FALSE)

gg +
  theme(plot.margin = unit(c(1, 1, 1, 1), "cm"),
        plot.background = element_rect(fill = tcolor[7]),
        panel.background = element_rect(fill = tcolor[6]),
        panel.grid.major = element_line(colour = tcolor[4], linetype = "dotted", size = 0.4),
        panel.grid.minor = element_line(colour = tcolor[4]),
        plot.title = element_text(color = tcolor[1], vjust = -7),
        plot.subtitle = element_text(color = tcolor[2], vjust = -10),
        plot.caption = element_text(color = tcolor[4], vjust = -5),
        axis.title.x = element_text(color = tcolor[3], vjust = -3),
        axis.title.y = element_text(color = tcolor[3], vjust = 5),
        axis.text.x = element_text(color = tcolor[5], angle = 90, vjust = 5),
        axis.text.y = element_text(color = tcolor[5]),
        legend.title = element_text(size = 10, color = tcolor[2]),
        legend.background = element_blank(),
        legend.key = element_blank(),
        legend.position = "top", legend.box = "horizontal", legend.justification = c(1, 0))

gg + labs(color = 'Traffic Source',
       x = 'February 2020',
       y = 'Page View Count',
       title = 'Monthly Page View Change by Page',
       subtitle = '/landing-age-uri',
       caption = 'Data source: Google Analytics')

You can store a created theme content under a different variable name and reuse it across different plots without having to redefine it each time. Alternatively, you can also save the theme elements as a new theme and apply it to plots using just a function name.

That’s all for now. In the next post, I’ll cover how to perform directory operations, and how to save data and images in R. Following the basic operations, I’ll go through all these concepts again with a real-world case study. To stay updated with future posts, please use the newsletter sign-up form located at the bottom of the page.

Further Reading

Footnotes

  1. The R Graphics Package
  2. ggplot2 extensions
  3. Plotly R
  4. Tufte in R
  5. hihayk/scale. GitHub
  6. abhilasha. (2017) Understanding different visualization layers of ggplot. SkillGaze