r markdown入门

Antoine Soetewey 2020-02-18 15 minute read

如果您在R中花费了一些时间编写代码,则可能听说听说过包含R代码,R输出(结果)和文本或评论的动态报告。在本文中,我将解释r Markdown如何工作,并在生产这些动态报告的生产中轻松启动的基本元素。

r markdown:什么,为什么和如何?

r markdown. allows to generate a report (most of the time in PDF, HTML, Word or as a beamer presentation) that is automatically generated from a file written within RStudio. The generated documents can serve as a neat record of your analysis that can be shared and published in a detailed and complete report. Even if you never expect to present the results to someone else, it can also be used as a personal notebook to look back so you can see what you did at that time. A R Markdown file has the extension .Rmd, while a R script file has the extension .R.

使用R MARKDOWN的第一个主要优势在于,在R MARKDOWN文件中,您可以组合任何统计分析的三个重要部分:

  • r码来展示如何完成分析。例如,数据和您使用的功能。这允许读者遵循您的代码并检查是否正确执行了分析。
  • 代码的结果,即分析的输出。例如,您刚刚编码的假设测试的线性模型,绘图或结果的输出。这允许读者查看分析结果。
  • 结果的文本,评论和解释。例如,计算主要后 描述性统计 并绘制一些图形,您可以在问题的上下文中解释它们并突出重要发现。这使读者能够通过您的解释和您的评论来了解您的结果,就像您编写的文件解释您的工作一样。

Another advantage of R Markdown is that the reports are dynamic and reproducible by anyone who has access to the .Rmd file (and the data if external data are used of course), making it perfectly suited to collaboration and dissemination of results. By dynamic, we mean that if your data changes, your results and your interpretations will change accordingly, without any work from your side.

报告的生产是在两个阶段完成的:

  1. The .Rmd file which contains blocks of R code (called chunks) and text is provided to the {knitr} package which will execute the R code to get the output, and create a document in markdown (.md) format. This document then contains the R code, the results (or outputs), and the text.
  2. This .md file is then converted to the desired format (HTML, PDF or Word), by the markdown package based on pandoc (i.e., a document conversion tool).

在你开始之前

To create a new R Markdown document (.Rmd), you first need to install and load the following packages:

install.packages(c("knitr", "rmarkdown", "markdown"))

library(knitr)
library(rmarkdown)
library(markdown)

然后单击文件 - > New File -> R Markdown or click on the small white sheet with a green cross in the top left corner and select r markdown.:

创建一个新的R标记文件

一个窗口将打开,选择标题和作者,然后单击“确定”。默认输出格式是HTML。它可以稍后更改为PDF或Word。

After you have clicked on OK, a new .Rmd file which serves as example has been created. We are going to use this file as starting point to our more complex and more personalized file.

To compile your R Markdown document into a HTML document, click on the Knit button located at the top:

针织A r Markdown文件

出现HTML报告的预览,它也被保存在您的工作目录中(请提醒您的内容 工作目录)。

Components of a .Rmd file

在R Markdown文件的主要组件下方:

这些组件在以下部分中详述。

yaml标题

A .Rmd file starts with the YAML header, enclosed by two series of ---. By default, this includes the title, author, date and the format of the report. If you want to generate the report in a PDF document, replace output: html_document byoutput: pdf_document. These information from the YAML header will appear at the top of the generated report after you compile it (i.e., after knitting the document).

To add a table of contents to your documents, replace output: html_document by

output:
  html_document:
    toc: true

Here are my usual settings regarding the format of a HTML document (remove everything after number_sections: true if you render the document in PDF, as PDF documents do not accept these options in the YAML header):

output:
  html_document:
    toc: true
    toc_depth: 6
    number_sections: true
    toc_float: true
    code_folding: hide
    theme: flatly
    code_download: true

In addition to adding a table of contents, it sets its depth, adds a section numbering, the table of contents is floating when scrolling down the document, the code is hidden by default, the flatly theme is used and it adds the possibility to download the .Rmd document.

You can visualize your table of contents even before knitting the document, or go directly to a specific section by clicking on the small icon in the top right corner. Your table of contents will appear, click on a section to go to this section in your .Rmd document:

可视化您的目录

除了增强的内容表外,我通常在yaml标题中设置以下日期:

r Markdown中的动态日期

这段代码允许编写当前日期,而无需自己更改它。这对于持续数周或几个月的项目非常方便,以始终在文件顶部有更新的日期。

代码块

在Yaml标题下方,有一个第一个代码块,用于您的设置选项 全部的 文档。最好在此刻离开它,我们可以在需要时更换它。

代码块 in R Markdown documents are used to write R code. Every time you want to include R code, you will need to enclose it with three backwards apostrophes. For instance, to compute the mean of the values 1, 7 and 11, we first need to insert a R code chunk by clicking on the Insert button located at the top and select R (see below a picture), then we need to write the corresponding code inside the code chunk we just inserted:

在R Markdown中插入R码块

代码块的示例

In the example file, you can see that the firs R code chunk (except the setup code chunk) includes the function summary() of the preloaded dataset cars: summary(cars). If you look at the HTML document that is generated from this example file, you will see that the summary measures are displayed just after the code chunk.

The next code chunk in this example file is plot(pressure), which will produce a plot. Try writing other R codes and knit (i.e., compile the document by clicking on the knit button) the document to see if your code is generated correctly.

如果您已经在R脚本中写入代码并希望在R Markdown文档中重用它,则可以简单地将粘贴代码复制在代码块中。不要忘记始终包含代码中的代码块或r会在编译文档时丢失错误。

As you can see, there are two additional arguments in the code chunk of the plot compared to my code chunk of the mean presented above. The first argument following the letter r (without comma between the two) is used to set the name of the chunk. In general, do not bother with this, it is mainly used to refer to a specific code chunk. You can remove the name of the chunk, but do not remove the letter r between the {} as it tells R that the code that follows corresponds to R code (yes you read it well, that also means you can include code from another programming language, e.g., Python, SQL, etc.).

After the name of the chunk (after pressure in the example file), you can see that there is an additional argument: echo = FALSE. This argument, called an option, indicates that you want to hide the code, and display only the output of the code. Try removing it (or change it to echo = TRUE), and you will see that after knitting the document, both the code AND the output will appear, while only the results appeared previously.

您可以指定是否要将每个代码的输出隐藏或显示代码,例如,如果要为某些代码块显示代码,但不是其他代码。或者,如果要始终将代码与整个文档的输出一起隐藏/显示,则可以在yaml标题之后的设置代码块中指定它。传递给此设置代码块的选项将确定所有代码块的选项,除了专门修改的块。

By default, the only setup option when you open a new R Markdown file is knitr::opts_chunk$set(echo = TRUE), meaning that by default, all outputs will be accompanied by its corresponding code. If you want to display only the results without the code for the whole document, replace it by knitr::opts_chunk$set(echo = FALSE). Two other options often passed to this setup code chunk are warning = FALSEmessage = FALSE to prevent warnings and messages to be displayed on the report. If you want to pass several options, do not forget to separate them with a comma:

代码块的几个选项

You can also choose to display the code, but not the result. For this, pass the option results = "hide". Alternatively, with the option include = FALSE, you can prevent code and results from appearing in the finished file while R still runs the code in order to use it at a later stage. If you want to prevent the code and the results to appear, and do not want R to run the code, use eval = FALSE. To edit the width and height of figures, use the options fig.widthfig.height. Another very interesting option is the tidy = 'styler' option which automatically 重新格式化R代码 shown in the output.

查看所有选项及其描述 这里 or see the list of the default options by running str(knitr::opts_chunk$get()).

小费: When writing in R Markdown, you will very often need to insert new R code chunks. To insert a new R code chunk more rapidly, press CTRL + ALT + I on Windows or command + option + I on Mac. If you are interested in such shortcuts making you more efficient, see other r markdown的提示和技巧.

Note that a code chunk can be run without the need to compile the entire document, if you want to check the results of a specific code chunk for instance. In order to run a specific code chunk, select the code and run it as you would do in a R script (.R), by clicking on run or by pressing CTRL + Enter on Windows or command + Enter on Mac. Results of this code chunk will be displayed directly in the R Markdown document, just below the code chunk.

文本

可以在代码块外部添加文本。 R Markdown文档使用Markdown语法进行文本的格式。在Setup代码块下方的示例文件中,已插入了某些文本。要插入文本,只需在没有任何封闭的情况下写文本。尝试添加一些句子并编织文档以查看HTML文档中出现的方式。

R码块的示例下方的文本示例

Markdown语法可用于更改出现在输出文件中的文本的格式,例如格式化某些文本 斜斜体, 在 大胆的等于一些常见的格式命令:

  • 标题: # Title
  • 字幕: ## Subtitle
  • Subsubtitle: ### Subsubtitle. These headings will automatically be included in the table of contents if you included one.
  • 斜斜体: *italics* or _italics_
  • 大胆的: **bold** or __bold__
  • 关联: [link](//eltasik.com/) (do not forget // or http:// if it is an external URL)
  • 方程式:

Enclose your equation (written in LaTeX) with one $ to have it in the text:

$A = \pi*r^{2}$

这是一个知名的等式 \(a = \ pi * r ^ {2} \).

Enclose your LaTeX equation with two $$ to have it centered on a new line:

$$A = \pi*r^{2}$$

这是另一个着名的等式:

\ [e = mc ^ 2 \]

列表:

  • 无序列表,第1项: * Unordered list, item 1
  • 无序列表,第2项: * Unordered list, item 2
  1. 订购列表,第1项: 1. Ordered list, item 1
  2. 订购列表,第2项: 2. Ordered list, item 2

里面的代码

Before going further, I would like to introduce an important feature of R Markdown. It is often the case that, when writing interpretations or detailing an analysis, we would like to refer to a result directly in our text. For instance, suppose we work on the iris dataset (preloaded in R). We may want to explain in words, that the mean of the length of the petal is a certain value, while the median is another value.

如果没有RACKDOWN,则用户需要计算均值和中位数,然后手动报告。感谢R Markdown,可以报告这两个 描述性统计 直接在文本中,没有手动编码它。更好的是,如果DataSet发生变化,因为我们删除了一些观察,所生成的文档中报告的均值和中位数将自动更改,而不会在我们身边的文本中的任何变化。

We can insert results directly in the interpretations (i.e., in the text) by placing a backward apostrophe, the letter r, a space, the code, and then close it with another backward apostrophe:

在r markdown中的内联代码

Here is an illustration with the mean and median of the length of the sepal for the iris dataset integrated in a sentence:

内联代码和文本的示例

这种文本和代码的组合将在生成的报告中提供以下输出:

萼片长度的平均值为5.8433333,标准偏差为0.8280661。

这种技术,称为 内联代码,允许您将结果直接插入R Markdown文档的文本。如上所述,如果数据集更改,则在文本内(我们的案例中的平均值和标准偏差)的结果将自动调整到新数据集,因此如果数据集更改,则与代码块的输出完全相同。

这种内联码的技术以及可以组合代码,代码的输出和文本来评论输出的事实使得R在统计分析方面使我最喜欢的工具。由于我发现了R Markdown的力量(而且我仍然学习它具有大量可能性和功能),我几乎从不在脚本中写入R代码。每次R代码I编写都由R Markdown文档中的文本和内联代码补充,导致专业和完整的最终文件准备分享,发布或存储以供将来使用。如果您对这种类型的文件不熟悉,我邀请您了解有关它并尝试使用下一次分析的更多信息,您将很可能不会再回到R脚本。

突出显示像它是代码的文本

或者,到内联代码技术,您可能希望将一些文本出现,就好像它是生成的报表中的代码一样,而无需实际运行它。

For this, surround your text with back ticks (the same backward apostrophe used for inline code) without the letter r. Writing the following:

将在生成的报告中生成此项:

For example, in this sentence I would like to highlight the variable name Species from the dataframe iris as if it is a piece of code.

“物种”和“虹膜”这个词出现,并突出显示,好像它是一段代码。

图片

除了代码,结果和文本之外,还可以在最终文档中插入图像。要插入图像,请将其放在当前的工作目录中,在代码块外写入:

![](path_to_your_image.jpg)

Note that the the file/url path is NOT quoted. To add an alt text to your image, add it between the square brackets []:

![alt text here](path_to_your_image.jpg)

桌子

有两个选项可在R Markdown文档中插入表:

  1. the kable() function from the {knitr} package
  2. the pander() function from the {pander} package

Here are an example of a table without any formatting, and the same code with the two functions applied on the iris dataset:

# without formatting
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
# with kable()
library(knitr)
kable(summary(iris))

# with pander()
library(pander)
pander(summary(iris))

The advantage of pander() over kable() is that it can be used for many more different outputs than table. Try on your own code, with results of a linear regression or a simple vector for example.

其他注释和有用的资源

对于更多高级用户,R MARKDOWS文件也可用于创建 闪亮的应用程序, websites (this website is built thanks to R Markdown and the {blogdown} package), to write scientific papers based on templates from several international journals (with the {rticles} package), or even to write books (with the {bookdown} package).

要继续学习R Markdown,请参阅R Studio City的两个完整的备忘单 这里这里以及更完整的指南 这里 由yihui谢,J.J.Allaire和Garrett Grolemund撰写。

谢谢阅读。我希望这篇文章确信您使用RABLDOWN供您未来的项目。看更多 r markdown的提示和技巧 甚至提高您在R Markdown中的效率。

一如既往,如果您有问题或与本文所涵盖的主题相关的建议,请将其添加为评论,以便其他读者可以从讨论中受益。



喜欢这篇文章?

获取更新 每次发布新文章。
任何垃圾邮件都没有任何垃圾邮件。
分享: