As we navigate the intricate landscape of proteomics data, the demand for effective visualization tools becomes paramount. These tools empower us to intuitively explore the uniformity within samples, differences between them, and trends in protein expression variations. From Principal Component Analysis (PCA) to volcano plots, Venn diagrams, and heatmaps, these advanced visualization techniques equip us with the means to delve deep into proteomics data, enabling a precise capture of changing patterns and interaction networks of proteins. Through these intuitive tools, we can comprehensively and systematically understand our experimental results, laying a solid foundation for further exploration into the structure and functionality of biological systems.
After obtaining proteomics data, our first step is to gain a general understanding of the data, including the uniformity within groups, differences between groups, and trends in variations. What analyses can quickly visualize these aspects of the data?
Principal Component Analysis (PCA)
PCA is an unsupervised multivariate statistical analysis that simplifies and reduces high-dimensional complex data, establishing a reliable mathematical model to summarize and characterize the protein expression profiles of the research subjects. It provides an overall reflection of protein differences between groups and the variability within groups
Correlation Analysis
Correlation analysis is the process of analyzing two or more elements of a variable that are correlated in order to measure how closely the two variables are related. There needs to be a certain link or probability between the correlated elements for correlation analysis to be performed. Based on the Pearson's correlation coefficient, the degree of correlation between the samples within a group can be measured to analyze the parallelism between the samples within the group.
Volcano plot
The volcano plot visualizes the significance of the differences of all detected proteins. The horizontal coordinate of the plot is the multiplicative change value of the difference between the two samples of the proteins, i.e., the value obtained by dividing the expression amount of Sample 2 by the expression amount of Sample 1, which is logarithmized; the vertical coordinate is the p-value of t-test for the difference in the expression of the proteins, and the smaller p-value is the more significant the difference in the expression of the proteins. The purple dots are the significantly different proteins, and the black dots are the non-significantly different proteins; after mapping all the proteins, it can be obtained that the dots on the left are the proteins with down-regulation of the expression differences, the dots on the right are the proteins with up-regulation of the expression differences, and the more the dots on the left/right and the upper side of the map, the more the differences of the expression differences are significant.
Venn Diagram
The Venn Diagram of Differential Proteins allows us to visualize the distribution of the number of differential proteins among the comparison groups. Each color represents a group of differential metabolites screened by comparative analysis.
Heatmap
Heatmap can visualize the expression trend of proteins in each sample and perform cluster analysis based on the expression trend. Each column of the graph represents a sample, each row represents a protein, and the color of the graph indicates the relative expression size of the protein in the group of samples, with red representing a higher expression of the protein in the sample and green representing a lower expression. The left side is the dendrogram of protein clustering, the closer the two protein branches are, the closer their expressions are; the upper side is the dendrogram of sample clustering, the closer the two sample branches are, the closer the expression patterns of all the proteins in these two samples are, i.e., the closer the trends of protein expression changes are.
Cluster
Each line in the graph represents a protein. Each graph demonstrates one type of expression pattern, i.e., a trend that reflects the change in expression of this group of proteins.
After these analyses we can visualize the parallelism between the samples, as well as the differences between the groups, and can quickly find some proteins whose differences are extremely significant or whose pattern of change meets our expectations. How can we determine whether these proteins are relevant to our study, or what functions these proteins perform and how they work?
GO Analysis
GO stands for Gene Ontology, an international standard classification system for gene function. It aims to establish a repository of annotated information on gene and protein functions applicable to a wide range of species, and they have categorized the functions of genes into three parts, namely: cellular component (CC), molecular function (MF), and biological process (BP). process, BP).
KEGG Analysis
Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway database (www.kegg.jp/kegg/pathway.html), stores functional information on genes and genomes, including illustrated cellular biochemical processes such as metabolism, membrane transport, signaling, and cell cycle, as well as information on conserved subpathways in the same lineage. By analyzing signaling pathways that are significantly enriched for differentially expressed proteins, it is possible to understand which pathways are significantly and systematically altered under different experimental conditions.
PPI Analysis
Protein-Protein Interaction Network Analysis (PPI Network Analysis) is a crucial aspect of proteomics research. Proteins, in executing biological functions, form PPI networks to maintain temporal and spatial coordination. By constructing interaction networks for differentially expressed proteins, PPI Network Analysis allows us to identify trends in the changes of differentially expressed proteins at the proteomic level. This further aids in pinpointing key nodes within the network of differentially expressed proteins.
References
- Zhou, Juntuo, et al. "Proteomic analyses identify differentially expressed proteins and pathways between low-risk and high-risk subtypes of early-stage lung adenocarcinoma and their prognostic impacts." Molecular & Cellular Proteomics 20 (2021).
- Zhang, Yu, et al. "Comparative proteomics analysis reveals differentially accumulated proteins associated with male and female A. chinensis var. chinensis bud development." Proteome Science 19 (2021): 1-12.
- Cao, Shiyang, et al. "Secretome and comparative proteomics of Yersinia pestis identify two novel E3 ubiquitin ligases that contribute to plague virulence." Molecular & Cellular Proteomics 20 (2021).
- Ni, Hanwen, et al. "Label-free proteomic analysis of serum exosomes from paroxysmal atrial fibrillation patients." Clinical Proteomics 18 (2021): 1-14.