其实还有另外一个策略,有点类似于人工选择啦,通常是可以往热点靠,比如肿瘤免疫,相当于你不需要全部的两万多个基因的表达量矩阵进行后续分析, 仅仅是拿着几千个免疫相关基因的表达矩阵即可。
但是关于这一点,就有很多粉丝问我,为什么看了很多文献,大家的 免疫相关基因集的数量都不一样,希望我给出一个可靠的数据源!
随便列举几个免疫基因集数据挖掘文献:
2020-Development and Validation of a Novel Immune-Related Prognostic Model in Hepatocellular Carcinoma
A total of 329 differentially expressed immune-related genes were detected.
64 immune-related genes were identified to be markedly related
to overall survival in HCC patients using univariate Cox regression
analysis.
2020-A novel immune-related genes prognosis biomarker for melanoma: associated with tumor microenvironment
63 immune-related genes of the total 1039 unique IRGs retrieved were associated with overall survival of melanoma.
2020-Immune-related gene signature for predicting the prognosis of head and neck squamous cell carcinoma
Among 1073 immune genes measured on all platforms, 915 IRGs
were selected after filtering median absolute deviation (MAD) > 0.5.
By 1000 times resampling, 81 IRGs were robustly associated with individual patients’ OS.
2020-Novel Immune-Related Gene Signature for Risk Stratification and Prognosis of Survival in Lower-Grade Glioma
277 immune-related DEGs were identified
6 immune genes ( CANX , HSPA1B , KLRC2 , PSMC6 , RFXAP , and TAP1 ) were identified as risk signature
你可以迅速解读一波,因为都大同小异,仅仅是癌症不一样,图表没啥子区别。差异分析,火山图,热图等等标准流程,基本上读一下我在生信技能树的表达芯片的公共数据库挖掘系列推文 就明白了;
免疫基因集数据库
其实可以看到, 大多就是来源于 ImmPort(The Immunology Database and Analysis Portal)数据库的:
A predictive immune-related signature was constructed by
concentrating on immune-related genes (IRGs) obtained from the
Immunology Database and Analysis Portal (ImmPort)
(https://immport.niaid.nih.gov).
A comprehensive list of IRGs was downloaded from the Immunology
Database and Analysis Portal (ImmPort) database2. The list comprised a
total of 2,498 IRGs, covering 17 immune categories(Bhattacharya et al., 2014).
如果你是初出茅庐,就选择它好了,反正数据库都提供了列表:https://www.immport.org/shared/genelists
或者你去KEGG和GO等数据库人工筛选免疫相关基因集,然后去冗余也行,再或者其它数据库,比如:
Immunogenetic Related Information Source
Immunome Database
InnateDB: Systems Biology of the Innate Immune Response
我一直觉得, 这样的挑选其实是引入了人工偏差,但是这样的策略文章却屡见不鲜。比如几年前我总结的TCGA泛癌研究策略,其中一类就都是集中于某生物学功能基因集:
100篇泛癌研究文献解读之上皮细胞-间充质细胞转化
100篇泛癌研究文献解读之同源重组修复通路
100篇泛癌研究文献解读之微卫星不稳定性
100篇泛癌研究文献解读之合成致死事件
100篇泛癌研究文献解读之核受体基因家族探索
100篇泛癌研究文献解读之激酶相关基因融合事件
100篇泛癌研究文献解读之磷酸化相关点突变
100篇泛癌研究文献解读之APOBECs家族基因突变及表达量异常
这个完全是取决于大家的生物学背景啦,很多人的课题组,实验室祖传就是研究某个通路,某个基因的,那么你就有先天优势。