介绍

1. {graphics} 包（r中的基本图形，默认加载）
2. {lattice} 包为基本包添加更多功能
3. {ggplot2} 包裹（需要是 安装和加载 beforehand)

{graphics} package comes with a large choice of plots (such as plot, hist, 巴格特, 箱形图, pie, mosaicplot, etc.) and additional related features (e.g., abline, lines, legend, mtext, rect, etc.). It is often the preferred way to draw plots for most R users, and in particular for beginners to intermediate users.

数据

To illustrate plots with the {ggplot2} package we will use the mpg 数据集 available in the package. The dataset contains observations collected by the US Environmental Protection Agency on fuel economy from 1999 to 2008 for 38 popular models of cars (run ?mpg for more information about the data):

library(ggplot2)
dat <- ggplot2::mpg

Before going further, let’s transform the cyl, drv, fl, yearclass variables in 因素 with the transform() function:

dat <- transform(dat,
cyl = factor(cyl),
drv = factor(drv),
fl = factor(fl),
year = factor(year),
class = factor(class)
)

基本原则 {ggplot2}

{ggplot2} package is based on the principles of “The Grammar of Graphics” (hence “gg” in the name of {ggplot2}), that is, a coherent system for describing and building graphs. The main idea is to 设计图形作为连续的层.

1. 数据集 that contains the variables that we want to represent. This is done with the ggplot() function and comes first.
2. 变量 to represent on the x and/or y-axis, and the aesthetic elements (such as color, size, fill, shape and transparency) of the objects to be represented. This is done with the aes() function (abbreviation of aesthetic).
3. 图形表示类型 (scatter plot, line plot, barplot, histogram, boxplot, etc.). This is done with the functions geom_point(), geom_line(), geom_bar(), geom_histogram(), geom_boxplot(), etc.
4. 如果需要，可以添加其他图层（例如标签，注释，尺度，轴刻度，图例，主题，方面等）以个性化图。

To create a plot, we thus first need to specify the data in the ggplot() function and then add the required layers such as the variables, the aesthetic elements and the type of plot:

ggplot(data) +
aes(x = var_x, y = var_y) +
geom_x()
• data in ggplot() is the name of the data frame which contains the variables var_xvar_y.
• + symbol is used to indicate the different layers that will be added to the plot. Make sure to write the + 符号在线末尾 代码而不是在线的开头，否则r会抛出错误。
• 这 layer aes() indicates what variables will be used in the plot and more generally, the aesthetic elements of the plot.
• Finally, x in geom_x() represents the type of plot.
• 除非我们想进一步个性化情节，否则通常不需要其他图层。

创造情节 {ggplot2}

• 散点图
• 线图
• 直方图
• 密度图
• 箱形图
• 巴格特

In order to focus on the construction of the different plots and the use of {ggplot2}, we will restrict ourselves to drawing basic (yet beautiful) plots without unnecessary layers. For the sake of completeness, we will briefly discuss and illustrate different layers to further personalize a plot at the end of the article (see this 部分）。

Note that if you still struggle to create plots with {ggplot2} after reading this tutorial, you may find the {esquisse} addin 有用。此addin允许您 交互方式 (that is, by dragging and dropping variables) create plots with the {ggplot2} package. Give it a try!

散点图

1. 我们首先指定数据：
ggplot(dat) # data

1. 这n we add the variables to be represented with the aes() function:
ggplot(dat) + # data
aes(x = displ, y = hwy) # variables

1. 最后，我们表示情节的类型：
ggplot(dat) + # data
aes(x = displ, y = hwy) + # variables
geom_point() # type of plot

You will also sometimes see the aesthetic elements (aes() with the variables) inside the ggplot() function in addition to the dataset:

ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()

线图

ggplot(dat) +
aes(x = displ, y = hwy) +
geom_line()

线条和点的组合

An advantage of {ggplot2} is the ability to combine several types of plots and its flexibility in designing it. For instance, we can add a line to a scatter plot by simply adding a layer to the initial scatter plot:

ggplot(dat) +
aes(x = displ, y = hwy) +
geom_point() +
geom_line() # add line

直方图

A 直方图 （可视化分布和检测潜力有用 异常值) can be plotted using geom_histogram():

ggplot(dat) +
aes(x = hwy) +
geom_histogram()

By default, the number of bins is equal to 30. You can change this value using the bins argument inside the geom_histogram() function:

ggplot(dat) +
aes(x = hwy) +
geom_histogram(bins = sqrt(nrow(dat)))

密度图

ggplot(dat) +
aes(x = hwy) +
geom_density()

直方图和密度的组合

ggplot(dat) +
aes(x = hwy, y = ..density..) +
geom_histogram() +
geom_density()

ggplot(dat) +
aes(x = hwy, color = drv, fill = drv) +
geom_density(alpha = 0.25) # add transparency

箱形图

A 箱形图 （对可视化分布和检测潜力也非常有用 异常值) can be plotted using geom_boxplot():

# Boxplot for one variable
ggplot(dat) +
aes(x = "", y = hwy) +
geom_boxplot()

# Boxplot by factor
ggplot(dat) +
aes(x = drv, y = hwy) +
geom_boxplot()

It is also possible to plot the points on the boxplot with geom_jitter(), and to vary the width of the boxes according to the size (i.e., the number of observations) of each level with varwidth = TRUE:

ggplot(dat) +
aes(x = drv, y = hwy) +
geom_boxplot(varwidth = TRUE) + # vary boxes width according to n obs.
geom_jitter(alpha = 0.25, width = 0.2) # adds random noise and limit its width

geom_jitter() layer adds some random variation to each point in order to prevent them from overlapping (an issue known as overplotting).1 Moreover, the alpha argument adds some transparency to the points (see more in this 部分）将重点放在盒子上，而不是在点上。

ggplot(dat) +
aes(x = drv, y = hwy) +
geom_boxplot(varwidth = TRUE) + # vary boxes width according to n obs.
geom_jitter(alpha = 0.25, width = 0.2) + # adds random noise and limit its width
facet_wrap(~year) # divide into 2 panels

ggplot(dat) +
aes(x = drv, y = hwy, fill = drv) + # add color to boxes with fill
geom_boxplot(varwidth = TRUE) + # vary boxes width according to n obs.
geom_jitter(alpha = 0.25, width = 0.2) + # adds random noise and limit its width
facet_wrap(~year) + # divide into 2 panels
theme(legend.position = "none") # remove legend

If you are unhappy with the default colors provided in {ggplot2}, you can change them manually with the scale_fill_manual() layer:

ggplot(dat) +
aes(x = drv, y = hwy, fill = drv) + # add color to boxes with fill
geom_boxplot(varwidth = TRUE) + # vary boxes width according to n obs.
geom_jitter(alpha = 0.25, width = 0.2) + # adds random noise and limit its width
facet_wrap(~year) + # divide into 2 panels
theme(legend.position = "none") + # remove legend
scale_fill_manual(values = c("darkred", "darkgreen", "steelblue")) # change fill color manually

巴格特

A 巴格特 (useful to visualize qualitative variables) can be plotted using geom_bar():

ggplot(dat) +
aes(x = drv) +
geom_bar()

By default, the heights of the bars correspond to the observed frequencies for each level of the variable of interest (drv in our case).

Again, for a more appealing plot, we can add some colors to the bars with the fill argument:

ggplot(dat) +
aes(x = drv, fill = drv) + # add colors to bars
geom_bar() +
theme(legend.position = "none") # remove legend

ggplot(dat) +
aes(x = drv, fill = year) + # fill by years
geom_bar()

In order to compare proportions across groups, it is best to make each bar the same height using position = "fill":

ggplot(dat) +
geom_bar(aes(x = drv, fill = year), position = "fill")

To draw the bars next to each other for each group, use position = "dodge":

ggplot(dat) +
geom_bar(aes(x = drv, fill = year), position = "dodge")

进一步个性化

标题和轴标签

p <- ggplot(dat) +
aes(x = displ, y = hwy) +
geom_point()

p + labs(
title= "Fuel efficiency for 38 popular models of car",
subtitle= "Period 1999-2008",
caption = "Data: ggplot2::mpg. See more at statsandr.com",
x = "Engine displacement (litres)",
y = "Highway miles per gallon (mpg)"
)

As you can see in the above code, you can save one or more layers of the plot in an object for later use. This way, you can save your “main” plot, and add more layers of personalization until you get the desired output. Here we saved the main scatter plot in an object called p 和 we will refer to it for the subsequent personalizations.

You can also edit the alignment, the size and the shape of the title and subtitle via the 这me() layer and the element_text() function:

p + labs(
title= "Fuel efficiency for 38 popular models of car",
subtitle= "Period 1999-2008",
caption = "Data: ggplot2::mpg. See more at statsandr.com",
x = "Engine displacement (litres)",
y = "Highway miles per gallon (mpg)"
) +
theme(
plot.title= element_text(
hjust = 0.5, # center
size = 12,
color = "steelblue",
face = "bold"
),
plot.subtitle= element_text(
hjust = 0.5, # center
size = 10,
color = "gray",
face = "italic"
)
)

If the title or subtitle is long and you want to divide it into multiple lines, use \n:

p + labs(
title= "Fuel efficiency for 38 popular \n models of car",
subtitle= "Period 1999-2008",
caption = "Data: ggplot2::mpg. See more at statsandr.com",
x = "Engine displacement (litres)",
y = "Highway miles per gallon (mpg)"
) +
theme(
plot.title= element_text(
hjust = 0.5, # center
size = 12,
color = "steelblue",
face = "bold"
),
plot.subtitle= element_text(
hjust = 0.5, # center
size = 10,
color = "gray",
face = "italic"
)
)

轴蜱

# Adjust ticks
p + scale_x_continuous(breaks = seq(from = 1, to = 7, by = 0.5)) + # x-axis
scale_y_continuous(breaks = seq(from = 10, to = 45, by = 5)) # y-axis

日志转换

In some cases, it is useful to plot the log transformation of the variables. This can be done with the scale_x_log10()scale_y_log10() functions:

p + scale_x_log10() +
scale_y_log10()

限制

p + scale_x_continuous(limits = c(3, 6)) +
scale_y_continuous(limits = c(20, 30))

It is also possible to simply take a subset of the dataset with the subset() 或者 filter() function. See how to 子集数据集 如果您需要提醒。

用于更好的轴格式的尺度

Depending on your data, it is possible to format axes in a certain way with the {scales} package. The formats I use the most are commalabel_number_si() which format large numbers in a more-readable way.

ggplot(dat) +
aes(x = displ * 1000, y = hwy * 1000) +
geom_point() +
scale_y_continuous(labels = scales::label_number_si()) + # format y-axis
scale_x_continuous(labels = scales::comma) # format x-axis

传奇

By default, the legend is located to the right side of the plot (when there is a legend to be displayed of course). To control the position of the legend, we need to use the 这me() function in addition to the legend.position argument:

p + aes(color = class) +
theme(legend.position = "top")

Replace "top" by "left" 或者 "bottom" to change its position and by "none" to remove it.

p + aes(color = class) +
labs(color = "Car's class")

Note that the argument inside labs() must match the one inside the aes() layer (in this case: color）。

p + aes(color = class) +
theme(
legend.title= element_blank(),
legend.position = "bottom"
)

形状，颜色，尺寸和透明度

• 形状，
• 尺寸，
• 颜色，和
• alpha（透明度）。

We can for instance change the shape of all points in a scatter plot by adding shape to geom_point(), or vary the shape according to the values taken by another variable (in that case, the shape argument must be inside aes()):2

# Change shape of all points
ggplot(dat) +
aes(x = displ, y = hwy) +
geom_point(shape = 4)

# Change shape of points based on a categorical variable
ggplot(dat) +
aes(x = displ, y = hwy, shape = drv) +
geom_point()

p <- ggplot(dat) +
aes(x = displ, y = hwy) +
geom_point()

# Change color for all points
p + geom_point(color = "steelblue")

# Change color based on a qualitative variable
p + aes(color = drv)

# Change color based on a quantitative variable
p + aes(color = cty)

# Change color based on a criterion (median of cty variable)
p + aes(color = cty > median(cty))

# Change size of all points
p + geom_point(size = 4)

# Change size of points based on a quantitative variable
p + aes(size = cty)

# Change transparency based on a quantitative variable
p + aes(alpha = cty)

p + geom_point(size = 0.5) +
aes(color = drv, shape = year, alpha = cty)

If you are unhappy with the default colors, you can change them manually with the scale_colour_manual() layer (for qualitative variables) and the scale_coulour_gradient2() layer (for 定量变量):

# Change color based on a qualitative variable
p + aes(color = drv) +
scale_colour_manual(values = c("red", "blue", "green"))

# Change color based on a quantitative variable
p + aes(color = cty) +
low = "green",
mid = "gray",
high = "red",
midpoint = median(dat$cty) ) 文字和标签 To add a label on a point (for example the row number), we can use the geom_text()aes() functions: p + geom_text(aes(label = rownames(dat)), check_overlap = TRUE, size = 2, vjust = -1 ) To add text on the plot, we use the annotate() function: p + annotate("text", x = 6, y = 40, label = "hwy and displ are \n negatively correlated \n (rho = -0.77, p-value < 0.001)", size = 3 ) 阅读文章 r相关系数和相关性测试 看如何计算相关系数（RHO）和 p - 相关试验的值。 光滑和回归线 在散点图中，可以添加适合数据的平滑线： p + geom_smooth() In the context of simple linear regression, it is often the case that the regression line is displayed on the plot. This can be done by adding method = lm (lm stands for linear model) in the geom_smooth() layer: p + geom_smooth(method = lm) 还可以为分类变量的每个级别绘制回归线： p + aes(color = drv, shape = drv) + geom_smooth(method = lm, se = FALSE) se = FALSE argument removes the confidence interval around the regression lines. 面部 facet_grid 允许您根据一个或两个定性变量的值将相同的图形分成多个面板： # According to one variable p + facet_grid(. ~ drv) # According to 2 variables p + facet_grid(drv ~ year) 然后可以将回归线添加到每个方面： p + facet_grid(. ~ drv) + geom_smooth(method = lm) facet_wrap() 也可以使用，如图所示 部分. 主题 Several functions are available in the {ggplot2} package to change the theme of the plot. The most common themes after the default theme (i.e., 这me_gray()) are the black and white (这me_bw()), minimal (这me_minimal()) and classic (这me_classic()) themes: # Black and white theme p + theme_bw() # Minimal theme p + theme_minimal() # Classic theme p + theme_classic() 我倾向于为我的大部分时间使用最小的主题 r markdown. 报告，因为它带出了模式和点而不是情节的布局，但是这是一个个人品味的问题。看到更多主题 ggplot2.tidyverse.org./reference/ggtheme.html. 和 in the {ggthemes} package. In order to avoid having to change the theme for each plot you create, you can change the theme for the current R session using the 这me_set() function as follows: 这me_set(theme_minimal()) 互动情节与 {plotly} You can easily make your plots created with {ggplot2} interactive with the {plotly} package: library(plotly) ggplotly(p + aes(color = year)) You can now hover over a point to display more information about that point. There is also the possibility to zoom in and out, to download the plot, to select some observations, etc. More information about {plotly} for R can be found 这里. 结合绘图 {patchwork} 这re are several ways to combine plots made in {ggplot2}. In my opinion, the most convenient way is with the {patchwork} package using symbols such as +, / 和 parentheses. 我们首先需要创建一些情节并保存： p_a <- ggplot(dat) + aes(x = displ, y = hwy) + geom_point() p_b <- ggplot(dat) + aes(x = hwy) + geom_histogram() p_c <- ggplot(dat) + aes(x = drv, y = hwy) + geom_boxplot() 现在我们在我们的环境中保存了3个地块，我们可以将它们结合起来。有情节 彼此相邻 simply use the + symbol: library(patchwork) p_a + p_b + p_c 显示它们 在彼此之上 simply use the / symbol: p_a / p_b / p_c 最后，结合它们 上一页 to each other, mix +, / 和 parentheses: p_a + p_b / p_c (p_a + p_b) / p_c 查看更多方法来组合绘图： • grid.arrange() from the {gridExtra} package • plot_grid() from the {cowplot} package 翻转坐标 绘图的翻转坐标可用于创建水平boxplots，或者当变量的标签很长时间时，它们在x轴上彼此重叠。有关下面的同步坐标，请参阅： # without flipping coordinates p1 <- ggplot(dat) + aes(x = class, y = hwy) + geom_boxplot() # with flipping coordinates p2 <- ggplot(dat) + aes(x = class, y = hwy) + geom_boxplot() + coord_flip() library(patchwork) p1 + p2 # left: without flipping, right: with flipping 这可以用多种类型的绘图完成，不仅可以使用Boxpots。例如，如果分类变量有许多级别或标签很长，则通常最好翻转坐标以获得更好的视觉： ggplot(dat) + aes(x = class) + geom_bar() + coord_flip() 保存情节 ggsave() function will save the most recent plot in your current 工作目录 除非您指定另一个文件夹的路径： ggplot(dat) + aes(x = displ, y = hwy) + geom_point() ggsave("plot1.pdf") You can also specify the width, height and resolution (dpi) as follows: ggsave("plot1.pdf", width = 12, height = 12, units = "cm", dpi = 300 ) 管理日期 If the time variable in your dataset is in date format, the {ggplot2} package recognizes the date format and automatically uses a specific type for the axis ticks. 这re is no time variable with a date format in our dataset, so let’s create a new variable of this type thanks to the as.Date() function: dat$date <- as.Date("2020-08-21") - 0:(nrow(dat) - 1)

head(dat$date) ## [1] "2020-08-21" "2020-08-20" "2020-08-19" "2020-08-18" "2020-08-17" ## [6] "2020-08-16" str(dat$date)
##  Date[1:234], format: "2020-08-21" "2020-08-20" "2020-08-19" "2020-08-18" "2020-08-17" ...

p <- ggplot(dat) +
aes(x = date, y = hwy) +
geom_line()
p

As soon as the time variable is recognized as a date, we can use the scale_x_date() layer to change the format displayed on the X-axis. The following table shows the most frequent date formats:

Run ?strptime() to see many more date formats available in R.

p + scale_x_date(date_labels = "%B %Y")

It also possible to control the breaks to display on the X-axis with the date_breaks argument. For this example, let’s say we want to display the day as number and the abbreviated month for each interval of 10 days:

p + scale_x_date(date_breaks = "10 days", date_labels = "%d %b")

If labels displayed on the X-axis are unreadable because they overlap each other, you can rotate them with the 这me() layer and the angle argument:

p + scale_x_date(date_breaks = "10 days", date_labels = "%d %b") +
theme(axis.text.x = element_text(angle = 60, hjust = 1))

小费

I recently learned a tip very useful when drawing plots with {ggplot2}. If like me, you often comment and uncomment some lines of code in your plot, you know that you cannot transform the last line into a comment without removing the + sign in the line just above.

Adding a line NULL at the end of your plots will avoid an error if you forget to remove the + sign in the last line of your code. See with this basic example:

ggplot(dat) +
aes(x = date, y = hwy) +
geom_line() + # I do not have to remove the + sign
# theme_minimal() + # this line is a comment
NULL # adding this line doesn't change anything to the plot

This trick saves me a lot of time as I do not need to worry about making sure to remove the last + sign after commenting some lines of code in my plots.

更进一步

By now you have seen that {ggplot2} is a very powerful and complete package to create plots in R. This article illustrated only the tip of the iceberg, and you will find many tutorials on how to create more advanced plots and visualizations with {ggplot2} online. If you want to learn more than what is described in the present article, I highly recommend starting with:

Thanks for reading. I hope this article helped you to create your first plots with the {ggplot2} package. As a reminder, for simple graphs, it is sometimes easier to draw them via the {esquisse} addin。经过一段时间，您将快速学习如何自己创建它们，并且在任何时候，您都无法建立复杂和复杂的数据可视化。

1. Use the geom_jitter() layer with caution because, although it makes a plot more revealing at large scales, it also makes it slightly less accurate at small scales since some randomness is added to the points.↩︎

2. 这re are (at the time of writing) 26 shapes accepted in the shape argument. See this 文件 适用于所有可用形状。↩︎