ggplot2学术绘图:配色方案、主题、多图拼接

2366 字
12 分钟
ggplot2学术绘图:配色方案、主题、多图拼接

ggplot2 默认输出与学术期刊要求之间存在差距:灰色背景、默认配色、字体偏小、图例默认位置。本文覆盖从默认输出到发表级图表的关键调整:CNS 级别配色方案、ggpubr 统计标注、cowplot/patchwork 多图拼接、300dpi 矢量图导出。每一步都带可运行代码。

实测环境:Debian 13,R 4.3.2,ggplot2 3.5.0。

1. 数据准备——用真实表达矩阵演示#

library(tidyverse)
library(ggplot2)
# 模拟差异表达结果(5000个基因)
set.seed(42)
degs <- tibble(
gene_id = paste0("ENSG", sprintf("%08d", 1:5000)),
log2FC = rnorm(5000, mean = 0, sd = 1.2),
pvalue = runif(5000, 0, 1),
padj = p.adjust(pvalue, method = "BH"),
baseMean = 10^rnorm(5000, mean = 3, sd = 1.5)
) %>%
mutate(
direction = case_when(
log2FC > 1 & padj < 0.05 ~ "Up",
log2FC < -1 & padj < 0.05 ~ "Down",
TRUE ~ "NS"
)
)
# 统计
degs %>% count(direction)

2. 默认图 vs 学术图——差在哪里#

先看默认输出,记住它的样子:

# 默认版本(丑但能跑)
p_default <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) +
geom_point()
ggsave("default.png", p_default, width = 8, height = 6, dpi = 72)

默认图的问题清单:

  1. 灰底灰网格——期刊一律要求白底
  2. 默认配色饱和度拉满——大红大蓝打印出来一团黑
  3. 字体太小——缩放到页面宽度后坐标轴标签看不清
  4. 图例标题 “direction” 没人知道什么意思
  5. 散点透明度不设——几十万个点糊在一起

下面一步一步修。

3. 配色——从默认到 CNS 级别#

3.1 配色原则#

学术配色的核心约束:打印友好 + 色盲友好 + 期刊限制。

大部分期刊(Nature、Cell、Science)接受彩色图但要额外收费,所以很多人投稿用灰阶兼容的配色。推荐两个方案:

方案适用
viridisscale_color_viridis_d()连续/离散都行,色盲友好
ggsciscale_color_npg()模仿 CNS 期刊常用色板
RColorBrewerscale_color_brewer()经典学术色板
library(viridis)
library(ggsci)
# 方案A:viridis(色盲友好,连续型适合热图)
p_viridis <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) +
geom_point(alpha = 0.4, size = 0.6) +
scale_color_viridis_d(option = "D", end = 0.85)
# 方案B:ggsci 的 NPG 色板(Nature Publishing Group 风格)
p_npg <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) +
geom_point(alpha = 0.4, size = 0.6) +
scale_color_npg()
# 方案C:手动指定——最可控
p_custom <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) +
geom_point(alpha = 0.4, size = 0.6) +
scale_color_manual(
values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB"),
labels = c("Up" = "Up-regulated", "Down" = "Down-regulated", "NS" = "Not significant"),
name = "" # 去掉图例标题
)

Pro tip: 手动配色用十六进制,在 colorbrewer2.org 上挑。红-蓝对是生信火山图的事实标准。

3.2 连续型配色——表达量热图#

热图用连续渐变色。默认从暗蓝到亮蓝——审稿人看了想打人。

# 经典红-白-蓝(低表达白,高表达红)
ggplot(heatmap_data, aes(x = sample, y = gene, fill = expression)) +
geom_tile() +
scale_fill_gradient2(
low = "#2166AC", # 深蓝(低表达)
mid = "#F7F7F7", # 白色(中间值)
high = "#B2182B", # 深红(高表达)
midpoint = 0,
name = "Z-score"
)

这里的核心是 midpoint 参数——如果数据不是对称的(比如只有上调没有下调),要把 midpoint 设成中位数而不是 0。

4. 主题——告别灰色背景#

# 主题选择链
p_base <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) +
geom_point(alpha = 0.4, size = 0.6) +
scale_color_manual(
values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB")
)
# 方案A:theme_bw() + 微调(最常用)
p_bw <- p_base + theme_bw(base_size = 12) +
theme(
panel.grid.minor = element_blank(), # 去掉次要网格
panel.grid.major = element_line(linewidth = 0.3, color = "grey90"),
legend.position = c(0.9, 0.85), # 图例放右上角
legend.background = element_rect(fill = "white", color = "grey80"),
legend.key.size = unit(0.4, "cm")
)
# 方案B:theme_minimal() + 边框(Nature 风格)
p_nature <- p_base + theme_minimal(base_size = 12) +
theme(
panel.border = element_rect(fill = NA, color = "black", linewidth = 0.8),
panel.grid = element_blank(),
axis.line = element_blank(),
legend.position = "right"
)
# 方案C:theme_classic()(最简单,像Excel图但干净)
p_classic <- p_base + theme_classic(base_size = 12) +
theme(
axis.line = element_line(color = "black", linewidth = 0.5),
legend.position = "right"
)

base_size 参数的秘密: 全图字体基于这个值缩放。设 12,最终导出的 PDF 放论文里刚刚好。缩到期刊半页宽度(约 80mm)时,base_size 设 10。

4.1 自定义主题——一次定义,终生复用#

# 定义你自己的学术主题
theme_academic <- function(base_size = 12) {
theme_bw(base_size = base_size) %+replace%
theme(
panel.grid.minor = element_blank(),
panel.grid.major = element_line(linewidth = 0.3, color = "grey92"),
panel.border = element_rect(fill = NA, color = "black", linewidth = 0.8),
strip.background = element_rect(fill = "grey95", color = "black"),
strip.text = element_text(size = base_size - 1, face = "bold"),
axis.text = element_text(color = "black"),
axis.title = element_text(size = base_size),
legend.position = "bottom",
legend.key.size = unit(0.5, "cm"),
plot.title = element_text(size = base_size + 2, face = "bold", hjust = 0.5),
plot.subtitle = element_text(size = base_size, hjust = 0.5)
)
}
# 使用
p_final <- p_base + theme_academic(base_size = 12)

5. 统计标注——ggpubr 一键加 p 值#

library(ggpubr)
# 箱线图 + 显著性标注
data(mtcars)
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
geom_boxplot(outlier.shape = NA, width = 0.6) +
geom_jitter(width = 0.1, alpha = 0.5, size = 1.5) +
stat_compare_means(
comparisons = list(c("4", "6"), c("6", "8"), c("4", "8")),
method = "t.test",
label = "p.signif", # 只显示显著性符号
# label = "p.format", # 显示精确 p 值
step.increase = 0.08 # 星号往上挪
) +
scale_fill_manual(values = c("#4DBBD5", "#E64B35", "#00A087")) +
theme_academic(base_size = 12) +
labs(x = "Cylinders", y = "Miles per Gallon", fill = "")

label = "p.signif" 显示的是符号(*, **, ***),label = "p.format" 显示精确数值。SCI 通常要求给精确 p 值而不是星号——除非在 Figure Legend 里定义了阈值。

p 值的精确格式:

p=P(TtH0)p = P(T \geq |t| \mid H_0)

在实际标注时,p<2.2×1016p < 2.2 \times 10^{-16} 是 R 的浮点精度极限——别写成 p=0p = 0,那是错的。

6. 多图拼接——patchwork 无敌#

library(patchwork)
# 生成四个子图
p1 <- ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) +
geom_point(alpha = 0.3, size = 0.5) +
scale_color_manual(values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB")) +
theme_academic() +
labs(title = "Volcano plot")
p2 <- ggplot(degs, aes(x = log2FC)) +
geom_histogram(fill = "#4DBBD5", alpha = 0.7, bins = 60) +
theme_academic() +
labs(title = "log2FC distribution", x = "log2 Fold Change")
p3 <- ggplot(degs, aes(x = direction, fill = direction)) +
geom_bar() +
scale_fill_manual(values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB")) +
theme_academic() +
labs(title = "DEG counts", x = "") +
theme(legend.position = "none")
p4 <- ggplot(degs, aes(x = baseMean, y = log2FC)) +
geom_point(alpha = 0.3, size = 0.5, color = "grey40") +
scale_x_log10() +
geom_hline(yintercept = c(-1, 1), linetype = "dashed", color = "red", alpha = 0.5) +
theme_academic() +
labs(title = "MA plot", x = "Mean expression (log10)")
# 拼接:上面两个下面两个(经典2×2)
combined <- (p1 | p2) / (p3 | p4) +
plot_annotation(
title = "RNA-seq Differential Expression Analysis",
tag_levels = "A" # 自动加 A/B/C/D 标签
) &
theme(plot.tag = element_text(face = "bold", size = 14))
# 导出
ggsave("figure_panel.pdf", combined, width = 14, height = 10, device = cairo_pdf)

patchwork 的灵魂操作符:

  • |:水平拼接
  • /:垂直拼接
  • + plot_annotation():加总标题和子图标签
  • & theme(...):所有子图统一应用主题
  • plot_layout(guides = "collect"):合并相同的图例

7. 导出——像素级控制#

# PDF(矢量,期刊首选)
ggsave("figure.pdf", p_final, width = 8, height = 6, device = cairo_pdf)
# TIFF(300dpi 位图,Cell/Nature 要求)
ggsave("figure.tiff", p_final, width = 8, height = 6, dpi = 300,
device = "tiff", compression = "lzw")
# PNG(预览用)
ggsave("figure_preview.png", p_final, width = 8, height = 6, dpi = 150)
# SVG(矢量,适合进一步在 Illustrator 里编辑)
ggsave("figure.svg", p_final, width = 8, height = 6)

期刊宽度换算:

  • 单栏:80-90mm ≈ 3.15-3.54 inch
  • 双栏/整页:170-180mm ≈ 6.7-7.1 inch
# 适配单栏宽度
ggsave("figure_single_col.pdf", p_final, width = 3.5, height = 3, device = cairo_pdf)

8. 两个高级技巧#

8.1 火山图基因标签(ggrepel)#

library(ggrepel)
# 标记 top10 基因
top_genes <- degs %>%
filter(padj < 0.05) %>%
slice_max(order_by = abs(log2FC), n = 10)
ggplot(degs, aes(x = log2FC, y = -log10(padj), color = direction)) +
geom_point(alpha = 0.4, size = 0.6) +
geom_text_repel(
data = top_genes,
aes(label = gene_id),
size = 3,
max.overlaps = 15,
box.padding = 0.5,
force = 2
) +
scale_color_manual(
values = c("Up" = "#E64B35", "Down" = "#4DBBD5", "NS" = "#BBBBBB")
) +
theme_academic()

8.2 分面(facet)——一组图自动拆分#

# 按染色体分面展示(模拟数据)
degs_chr <- degs %>%
mutate(chr = sample(paste0("chr", 1:22), nrow(degs), replace = TRUE)) %>%
filter(chr %in% paste0("chr", 1:6))
ggplot(degs_chr, aes(x = baseMean, y = log2FC)) +
geom_point(alpha = 0.3, size = 0.3, color = "grey40") +
facet_wrap(~ chr, ncol = 3, scales = "free_x") +
theme_academic(base_size = 9) +
labs(x = "Mean expression", y = "log2 Fold Change")

9. 踩坑记录#

坑1:PDF 里中文字体全是方框#

症状:ggsave("figure.pdf") 打开,中文标题全变成 □□□。

解决:cairo_pdf 设备,它能通过 fontconfig 找到系统中文字体。

ggsave("figure.pdf", p, device = cairo_pdf)
# 或者全局设置
options(bitmapType = "cairo")

如果还是不行,检查系统有无中文字体:

Terminal window
fc-list :lang=zh | head -5

没有就装:

Terminal window
sudo apt install fonts-noto-cjk -y

坑2:ggsave 尺寸用错了单位#

症状:ggsave("fig.png", p, width = 80, height = 60) 输出 80 英寸宽的图,几十 MB。

原因: widthheight 单位是英寸(inch),不是毫米。。

换算公式: inch=mm25.4\text{inch} = \frac{\text{mm}}{25.4}

80mm ≈ 3.15 inches。别直接写 80。

坑3:stat_compare_meansCan't compute p-value#

# 报错示例
stat_compare_means(comparisons = list(c("A", "B")), method = "t.test")
# Error: not enough 'x' observations

原因:某个分组样本数太少(比如 n=1),t 检验需要至少 2 个样本。

检查:

table(metadata$group)
# 如果某组 n=1,用 Wilcoxon(非参数)或换可视化方式
stat_compare_means(method = "wilcox.test")

坑4:patchwork 拼接后图例重复出现#

症状:(p1 | p2) / (p3 | p4) 四个子图的图例各出现一次,占了 1/3 画面。

解决: 把所有图的 legend 统一收集:

combined <- (p1 | p2) / (p3 | p4) +
plot_layout(guides = "collect") &
theme(legend.position = "bottom")

plot_layout(guides = "collect") 会自动合并相同的图例。& theme(legend.position = "bottom") 把合并后的图例放到底部。

坑5:ggsave 后的图在当前设备上不显示#

症状:跑完 ggsave("x.pdf", p),RStudio 的 Plots 面板空白。

原因: ggsave 默认关闭当前图形设备。不影响实际导出,但会让你以为代码没运行。

解决: 导出后重新打印一次图:

ggsave("x.pdf", p) # 导出
print(p) # 在当前设备中重新显示

或者用 ggsave(..., create.dir = TRUE) 至少保证目录存在。更好的习惯是——在脚本最后加一行 p_final 让图自动在 console 里重新渲染。


本文于 2025-07-22 在 Debian 13 + R 4.3.2 上实测。所有代码可直接复制运行。

文章分享

如果这篇文章对你有帮助,欢迎分享给更多人!

ggplot2学术绘图:配色方案、主题、多图拼接
https://fg.ink/posts/ggplot2-academic-plotting/
作者
风观
发布于
2025-12-15
许可协议
CC BY-NC-SA 4.0
Profile Image of the Author
风观
风有来路,观有所思
分类
标签
站点统计
文章
50
分类
1
标签
29
总字数
61,837
运行时长
0
最后活动
0 天前

文章目录