# 数据

# display first 5 observations
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2

# remove vs and am variables
library(tidyverse)
dat <- mtcars %>%
select(-vs, -am)

# display 5 first obs. of new dataset
##                    mpg cyl disp  hp drat    wt  qsec gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02    3    2

# 相关系数

## 在两个变量之间

# Pearson correlation between 2 variables
cor(dat$hp, dat$mpg)
## [1] -0.7761684

# Spearman correlation between 2 variables
cor(dat$hp, dat$mpg,
method = "spearman"
)
## [1] -0.8946646

• Pearson相关性通常用于 定量连续 具有线性关系的变量
• Spearman相关性（其实际上与Pearson类似，但基于每个变量的排名值而不是原始数据）通常用于评估涉及的关系 定性序单 如果链接部分是线性的，则变量或定量变量
• 从协调和不和谐对的数量计算的肯德尔通常用于定性序数变量

## 相关矩阵：所有变量的相关性

Suppose now that we want to compute correlations for several pairs of variables. We can easily do so for all possible pairs of variables in the dataset, again with the cor() function:

# correlation for all variables
round(cor(dat),
digits = 2 # rounded to 2 decimals
)
##        mpg   cyl  disp    hp  drat    wt  qsec  gear  carb
## mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.48 -0.55
## cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.49  0.53
## disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.56  0.39
## hp   -0.78  0.83  0.79  1.00 -0.45  0.66 -0.71 -0.13  0.75
## drat  0.68 -0.70 -0.71 -0.45  1.00 -0.71  0.09  0.70 -0.09
## wt   -0.87  0.78  0.89  0.66 -0.71  1.00 -0.17 -0.58  0.43
## qsec  0.42 -0.59 -0.43 -0.71  0.09 -0.17  1.00 -0.21 -0.66
## gear  0.48 -0.49 -0.56 -0.13  0.70 -0.58 -0.21  1.00  0.27
## carb -0.55  0.53  0.39  0.75 -0.09  0.43 -0.66  0.27  1.00

## 诠释相​​关系数

As an illustration, the Pearson correlation between horsepower (hp) and miles per gallon (mpg) found above is -0.78, meaning that the 2 variables vary in opposite direction. This makes sense, cars with more horsepower tend to consume more fuel (and thus have a lower millage par gallon). On the contrary, from the correlation matrix we see that the correlation between miles per gallon (mpg) and the time to drive 1/4 of a mile (qsec) is 0.42, meaning that fast cars (low qsec) tend to have a worse millage per gallon (low mpg). This again make sense as fast cars tend to consume more fuel.

# 可视化

## 2个变量的散点图

A good way to visualize a correlation between 2 variables is to draw a scatterplot of the two variables of interest. Suppose we want to examine the relationship between horsepower (hp) and miles per gallon (mpg):

# scatterplot
library(ggplot2)

ggplot(dat) +
aes(x = hp, y = mpg) +
geom_point(colour = "#0c4c8a") +
theme_minimal()

plot(dat$hp, dat$mpg)

## 几对变量的散点图

Suppose that instead of visualizing the relationship between only 2 variables, we want to visualize the relationship for several pairs of variables. This is possible thanks to the pair() function. For this illustration, we focus only on miles per gallon (mpg), horsepower (hp) and weight (wt):

# multiple scatterplots
pairs(dat[, c(1, 4, 6)])

## 另一种简单的相关矩阵

# improved correlation matrix
library(corrplot)

corrplot(cor(dat),
method = "number",
type = "upper" # show only upper side
)

# 相关性测试

## 对于2个变量

• \（h_0 \）: \（\ rho = 0 \）
• \（H_1）: \（\ rho \ ne 0 \）

Suppose that we want to test whether the rear axle ratio (drat) is correlated with the time to drive a quarter of a mile (qsec):

# Pearson correlation test
test <- cor.test(dat$drat, dat$qsec)
test
##
##  Pearson's product-moment correlation
##
## data:  dat$drat and dat$qsec
## t = 0.50164, df = 30, p-value = 0.6196
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.265947  0.426340
## sample estimates:
##        cor
## 0.09120476

p - 这些2变量之间的相关试验值为0.62。在5％的重要性水平下，我们不会拒绝无相关的无效假设。因此，我们得出结论，我们不拒绝假设，即2个变量之间没有线性关系。

A nice and easy way to report results of a correlation test in R is with the report() function from the {report} package:

# install.packages("remotes")
# remotes::install_github("easystats/report") # You only need to do that once
library("report") # Load the package every time you start R
report(test)
## Effect sizes were labelled following Funder's (2019) recommendations.
##
## The Pearson's product-moment correlation between dat$drat and dat$qsec is positive, not significant and very small (r = 0.09, 95% CI [-0.27, 0.43], t(30) = 0.50, p = 0.620)

## 对于几对变量

Similar to the correlation matrix used to compute correlation for several pairs of variables, the rcorr() function (from the {Hmisc} package) allows to compute p - 同时对几对变量的相关性测试的值。应用于我们的数据集，我们有：

# correlation tests for whole dataset
library(Hmisc)
res <- rcorr(as.matrix(dat)) # rcorr() accepts matrices only

# display p-values (rounded to 3 decimals)
round(res$P, 3) ## mpg cyl disp hp drat wt qsec gear carb ## mpg NA 0.000 0.000 0.000 0.000 0.000 0.017 0.005 0.001 ## cyl 0.000 NA 0.000 0.000 0.000 0.000 0.000 0.004 0.002 ## disp 0.000 0.000 NA 0.000 0.000 0.000 0.013 0.001 0.025 ## hp 0.000 0.000 0.000 NA 0.010 0.000 0.000 0.493 0.000 ## drat 0.000 0.000 0.000 0.010 NA 0.000 0.620 0.000 0.621 ## wt 0.000 0.000 0.000 0.000 0.000 NA 0.339 0.000 0.015 ## qsec 0.017 0.000 0.013 0.000 0.620 0.339 NA 0.243 0.000 ## gear 0.005 0.004 0.001 0.493 0.000 0.000 0.243 NA 0.129 ## carb 0.001 0.002 0.025 0.000 0.621 0.015 0.000 0.129 NA 只与之相关 p - 比显着性水平小的值（通常是 \（\ alpha = 0.05 \））应该被解释。 # 相关系数和相关试验的组合 现在我们涵盖了相关系数和相关测试的概念，让我们看看我们是否可以组合这两个概念。 correlation function from the easystats {correlation} package 允许将相关系数和相关性测试组合在单个表中（谢谢 Krzysiektr. 向我指向我）： library(correlation) correlation::correlation(dat, include_factors = TRUE, method = "auto" ) ## Parameter1 | Parameter2 | r | 95% CI | t(30) | p | Method | n_Obs ## ---------------------------------------------------------------------------------------------------------------- ## mpg | cyl | -0.85 | [-0.93, -0.72] | -8.92 | < .001 | Pearson's product-moment correlation | 32 ## mpg | disp | -0.85 | [-0.92, -0.71] | -8.75 | < .001 | Pearson's product-moment correlation | 32 ## mpg | hp | -0.78 | [-0.89, -0.59] | -6.74 | < .001 | Pearson's product-moment correlation | 32 ## mpg | drat | 0.68 | [ 0.44, 0.83] | 5.10 | < .001 | Pearson's product-moment correlation | 32 ## mpg | wt | -0.87 | [-0.93, -0.74] | -9.56 | < .001 | Pearson's product-moment correlation | 32 ## mpg | qsec | 0.42 | [ 0.08, 0.67] | 2.53 | 0.137 | Pearson's product-moment correlation | 32 ## mpg | gear | 0.48 | [ 0.16, 0.71] | 3.00 | 0.065 | Pearson's product-moment correlation | 32 ## mpg | carb | -0.55 | [-0.75, -0.25] | -3.62 | 0.016 | Pearson's product-moment correlation | 32 ## cyl | disp | 0.90 | [ 0.81, 0.95] | 11.45 | < .001 | Pearson's product-moment correlation | 32 ## cyl | hp | 0.83 | [ 0.68, 0.92] | 8.23 | < .001 | Pearson's product-moment correlation | 32 ## cyl | drat | -0.70 | [-0.84, -0.46] | -5.37 | < .001 | Pearson's product-moment correlation | 32 ## cyl | wt | 0.78 | [ 0.60, 0.89] | 6.88 | < .001 | Pearson's product-moment correlation | 32 ## cyl | qsec | -0.59 | [-0.78, -0.31] | -4.02 | 0.007 | Pearson's product-moment correlation | 32 ## cyl | gear | -0.49 | [-0.72, -0.17] | -3.10 | 0.054 | Pearson's product-moment correlation | 32 ## cyl | carb | 0.53 | [ 0.22, 0.74] | 3.40 | 0.027 | Pearson's product-moment correlation | 32 ## disp | hp | 0.79 | [ 0.61, 0.89] | 7.08 | < .001 | Pearson's product-moment correlation | 32 ## disp | drat | -0.71 | [-0.85, -0.48] | -5.53 | < .001 | Pearson's product-moment correlation | 32 ## disp | wt | 0.89 | [ 0.78, 0.94] | 10.58 | < .001 | Pearson's product-moment correlation | 32 ## disp | qsec | -0.43 | [-0.68, -0.10] | -2.64 | 0.131 | Pearson's product-moment correlation | 32 ## disp | gear | -0.56 | [-0.76, -0.26] | -3.66 | 0.015 | Pearson's product-moment correlation | 32 ## disp | carb | 0.39 | [ 0.05, 0.65] | 2.35 | 0.177 | Pearson's product-moment correlation | 32 ## hp | drat | -0.45 | [-0.69, -0.12] | -2.75 | 0.110 | Pearson's product-moment correlation | 32 ## hp | wt | 0.66 | [ 0.40, 0.82] | 4.80 | < .001 | Pearson's product-moment correlation | 32 ## hp | qsec | -0.71 | [-0.85, -0.48] | -5.49 | < .001 | Pearson's product-moment correlation | 32 ## hp | gear | -0.13 | [-0.45, 0.23] | -0.69 | > .999 | Pearson's product-moment correlation | 32 ## hp | carb | 0.75 | [ 0.54, 0.87] | 6.21 | < .001 | Pearson's product-moment correlation | 32 ## drat | wt | -0.71 | [-0.85, -0.48] | -5.56 | < .001 | Pearson's product-moment correlation | 32 ## drat | qsec | 0.09 | [-0.27, 0.43] | 0.50 | > .999 | Pearson's product-moment correlation | 32 ## drat | gear | 0.70 | [ 0.46, 0.84] | 5.36 | < .001 | Pearson's product-moment correlation | 32 ## drat | carb | -0.09 | [-0.43, 0.27] | -0.50 | > .999 | Pearson's product-moment correlation | 32 ## wt | qsec | -0.17 | [-0.49, 0.19] | -0.97 | > .999 | Pearson's product-moment correlation | 32 ## wt | gear | -0.58 | [-0.77, -0.29] | -3.93 | 0.008 | Pearson's product-moment correlation | 32 ## wt | carb | 0.43 | [ 0.09, 0.68] | 2.59 | 0.132 | Pearson's product-moment correlation | 32 ## qsec | gear | -0.21 | [-0.52, 0.15] | -1.19 | > .999 | Pearson's product-moment correlation | 32 ## qsec | carb | -0.66 | [-0.82, -0.40] | -4.76 | < .001 | Pearson's product-moment correlation | 32 ## gear | carb | 0.27 | [-0.08, 0.57] | 1.56 | 0.774 | Pearson's product-moment correlation | 32 ## ## p-value adjustment method: Holm (1979) As you can see, it gives, among other useful information, the correlation coefficients (column r) and the result of the correlation test (column 95% CI for the confidence interval or p for the \（p \）-value）对于所有成对的变量。此表非常有用和信息性，但允许在一个可视化中组合相关系数和相关性测试的概念。一种可视化，易于阅读和解释。 理想地，我们希望简明地概述数据集中存在于数据集中的所有可能对变量之间的相关性，并且对于显着不同的相关性与0显着不同。 下图，称为a 相互言 和 adapted from the corrplot() function, does precisely this: corrplot2 <- function(data, method = "pearson", sig.level = 0.05, order = "original", diag = FALSE, type = "upper", tl.srt = 90, number.font = 1, number.cex = 1, mar = c(0, 0, 0, 0)) { library(corrplot) data_incomplete <- data data <- data[complete.cases(data), ] mat <- cor(data, method = method) cor.mtest <- function(mat, method) { mat <- as.matrix(mat) n <- ncol(mat) p.mat <- matrix(NA, n, n) diag(p.mat) <- 0 for (i in 1:(n - 1)) { for (j in (i + 1):n) { tmp <- cor.test(mat[, i], mat[, j], method = method) p.mat[i, j] <- p.mat[j, i] <- tmp$p.value
}
}
colnames(p.mat) <- rownames(p.mat) <- colnames(mat)
p.mat
}
p.mat <- cor.mtest(data, method = method)
col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA"))
corrplot(mat,
method = "color", col = col(200), number.font = number.font,
mar = mar, number.cex = number.cex,
type = type, order = order,
tl.col = "black", tl.srt = tl.srt, # rotation of text labels
# combine with significance level
p.mat = p.mat, sig.level = sig.level, insig = "blank",
# hide correlation coefficients on the diagonal
diag = diag
)
}

corrplot2(
data = dat,
method = "pearson",
sig.level = 0.05,
order = "original",
diag = FALSE,
type = "upper",
tl.srt = 75
)