序列分析
DESeq2 差异分析
DESeq2 分析流程
1. 安装与加载
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DESeq2")
library(DESeq2)
library(tidyverse)
2. 准备数据
# 读取 counts 矩阵
counts <- read.csv("counts.csv", row.names = 1)
# 准备样本信息
coldata <- data.frame(
row.names = colnames(counts),
condition = factor(c(rep("Control", 3), rep("Treatment", 3)))
)
# 检查顺序一致性
all(rownames(coldata) == colnames(counts))
3. 创建 DESeq2 对象
dds <- DESeqDataSetFromMatrix(
countData = counts,
colData = coldata,
design = ~ condition
)
# 过滤低表达基因
keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]
4. 差异分析
# 运行 DESeq2
dds <- DESeq(dds)
# 获取结果
res <- results(dds, contrast = c("condition", "Treatment", "Control"))
# 按 p-value 排序
res_ordered <- res[order(res$padj),]
# 查看结果
summary(res)
5. 结果筛选
# 设置阈值
padj_cutoff <- 0.05
log2fc_cutoff <- 1
# 筛选差异基因
sig_genes <- res_ordered %>%
as.data.frame() %>%
filter(padj < padj_cutoff, abs(log2FoldChange) > log2fc_cutoff)
# 分类
sig_genes$regulation <- ifelse(sig_genes$log2FoldChange > 0, "Up", "Down")
table(sig_genes$regulation)
6. 导出结果
# 导出所有结果
write.csv(as.data.frame(res_ordered), "DESeq2_all_results.csv")
# 导出显著差异基因
write.csv(sig_genes, "DESeq2_significant_genes.csv")
可视化
MA Plot
plotMA(res, ylim = c(-5, 5))
Volcano Plot
library(EnhancedVolcano)
EnhancedVolcano(res,
lab = rownames(res),
x = 'log2FoldChange',
y = 'pvalue',
pCutoff = 0.05,
FCcutoff = 1
)
常见问题
| 问题 | 解决方案 |
|---|---|
| 样本量太少 | 使用 lfcShrink 进行 log2FC 收缩 |
| 批次效应 | design = ~ batch + condition |
| 零值过多 | 考虑使用零膨胀模型 |