序列分析

DESeq2 差异分析

DESeq2 分析流程

1. 安装与加载

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("DESeq2")

library(DESeq2)
library(tidyverse)

2. 准备数据

# 读取 counts 矩阵
counts <- read.csv("counts.csv", row.names = 1)

# 准备样本信息
coldata <- data.frame(
  row.names = colnames(counts),
  condition = factor(c(rep("Control", 3), rep("Treatment", 3)))
)

# 检查顺序一致性
all(rownames(coldata) == colnames(counts))

3. 创建 DESeq2 对象

dds <- DESeqDataSetFromMatrix(
  countData = counts,
  colData = coldata,
  design = ~ condition
)

# 过滤低表达基因
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

4. 差异分析

# 运行 DESeq2
dds <- DESeq(dds)

# 获取结果
res <- results(dds, contrast = c("condition", "Treatment", "Control"))

# 按 p-value 排序
res_ordered <- res[order(res$padj),]

# 查看结果
summary(res)

5. 结果筛选

# 设置阈值
padj_cutoff <- 0.05
log2fc_cutoff <- 1

# 筛选差异基因
sig_genes <- res_ordered %>%
  as.data.frame() %>%
  filter(padj < padj_cutoff, abs(log2FoldChange) > log2fc_cutoff)

# 分类
sig_genes$regulation <- ifelse(sig_genes$log2FoldChange > 0, "Up", "Down")
table(sig_genes$regulation)

6. 导出结果

# 导出所有结果
write.csv(as.data.frame(res_ordered), "DESeq2_all_results.csv")

# 导出显著差异基因
write.csv(sig_genes, "DESeq2_significant_genes.csv")

可视化

MA Plot

plotMA(res, ylim = c(-5, 5))

Volcano Plot

library(EnhancedVolcano)

EnhancedVolcano(res,
  lab = rownames(res),
  x = 'log2FoldChange',
  y = 'pvalue',
  pCutoff = 0.05,
  FCcutoff = 1
)

常见问题

问题解决方案
样本量太少使用 lfcShrink 进行 log2FC 收缩
批次效应design = ~ batch + condition
零值过多考虑使用零膨胀模型