Glycosylation is an important and widespread post-translational modification of proteins. Over half of the proteins in living organisms undergo glycosylation, playing a crucial role in maintaining normal physiological functions in the human body. Based on different linkage patterns, protein glycosylation is mainly classified into N-glycosylation and O-glycosylation. O-GalNAc modification is one of the most common types of O-glycosylation, and a typical example is the abundant O-GalNAc modification sites in the hinge region of immunoglobulins. Increasing research reveals its close association with physiological processes such as inflammation, immune evasion, viral infection, cell adhesion, metastasis, and apoptosis, making O-GalNAc modification a widely studied and hot topic in glycosylation research.
Due to the substrate specificity of glycosylation enzymes and the complexity of O-glycan modification structures, there are significant technical limitations in determining the O-glycosylation structures at each site. This leads to incomplete studies of O-glycoproteins in vivo, making it challenging to establish the relationship between structure and function. Therefore, establishing a site-specific O-glycoprotein profile through practical conditions and quantification (in complex tissue extracts or biological fluids) is a crucial step in determining its significance in disease function.
Case Tissue-Specific O-Glycoprotein Regulation in Mice
The research team led by Lawrence A. Tabak at the National Institutes of Health in the United States has published online in the prestigious journal PNAS, presenting a study titled "Quantitative mapping of the in vivo O-GalNAc glycoproteome in mouse tissues identifies GalNAc-T2 O-glycosites in metabolic disorder." The study utilized quantitative glycoproteomics and proteomics to create a comprehensive map of the mouse O-GalNAc glycoproteome (referred to as O-glycans hereafter). They identified a practical method for quantitatively mapping O-glycoproteins in complex samples such as tissue extracts and biological fluids. Using GALNT2-deficient mice as an example, the researchers explored the reasons behind lipid and metabolic imbalances in patients with congenital glycosylation disorders.
The researchers initially integrated the EXoO method, HCD-pd-EThcD mass spectrometry, and database software packages. Through continuous optimization of the workflow, they ultimately established an integrated workflow for O-glycosylated peptide quantification. Subsequently, samples were collected from six healthy mice (three males and three females) from various tissues, including the brain, heart, lungs, liver, spleen, kidneys, colon, muscles, submandibular glands, and blood. After extracting glycopeptides and conducting liquid chromatography-mass spectrometry analysis, they used MSFragger-Glyco, pGlyco3, and O-Pair software for retrieval and analysis. This yielded a substantial number of overlapping glycopeptide sequences and unique glycopeptide sequences. After taking the intersection, the researchers obtained 2154 O-glycosylation sites, 2834 glycopeptide sequences, 38 different glycan compositions, and 4020 glycopeptide sequences from 595 glycoproteins. In conclusion, through the aforementioned procedures, they established an integrated approach using EXoO, HCD-pd-EThcD, and software packages for multi-level precise localization of O-glycosylation sites, ensuring high-confidence positioning of O-glycosylation sites.
To investigate the conserved motifs on side-chain amino acids, researchers further analyzed 2150 O-glycosylation sites (excluding four sites near the N-terminus and C-terminus). The results revealed that serine and threonine accounted for approximately 70.1% and 29.9% of O-glycosylation sites, respectively. A comparison with the predicted O-glycosylation sites by NetOGlyc4.0 showed that it predicted approximately 80.7% of the experimentally identified O-glycosylation sites. Furthermore, NetOGlyc4.0 predicted 22,608 O-glycosylation sites from 68,802 serine and threonine residues on 595 glycoproteins that were experimentally identified. Subsequent GO enrichment analysis of these 595 glycoproteins revealed enrichment in biological processes, including extracellular structure organization, blood coagulation, regulation of synapse structure and organization, receptor-mediated endocytosis, cell-substrate adhesion, injury response, vascular development, and axon development pathways.
Subsequently, the researchers compared the identified O-GalNAc sites (2,154 sites) with the mouse O-GlcNAc database and the phosphorylation site database from the PhosphoSitePlus website (100,795 sites). They found that at the C-terminus of nuclear calcium-binding protein-2 (Nucb2), serine 408 could undergo O-GalNAc, O-GlcNAc, and phosphorylation, indicating an overlap between O-GalNAc and O-GlcNAc glycosylation sites and phosphorylation sites.
To investigate the tissue-specific regulation of O-glycosylation, the researchers integrated quantitative proteomics and glycoproteomics data from eight mouse tissues to assess the abundance of glycoproteins and their O-glycosylation sites. The results showed that 95 glycoproteins had high correlation scores among tissues, but their correlation scores within individual tissues were significantly lower. This suggests that the regulation of O-glycosylation is independent of glycoprotein abundance. Subsequently, the researchers used hierarchical clustering analysis to examine the tissue-specific spectra of O-glycosylation sites, finding clustering patterns similar to proteomics. Further investigations revealed that variations in the relative protein abundance of glycosylation enzymes in different tissues may contribute to the tissue-specific characteristics of O-glycosylation sites.
Patients with congenital glycosylation disorders related to GalNAc transferase 2 (GalNAc-T2) exhibit abnormal lipid, energy, and metabolic phenotypes, potentially stemming from O-glycosylation defects in GalNAc-T2 substrates in the liver. Researchers conducted quantitative glycoproteomics and proteomics analyses on the livers of wild-type and Galnt2-deficient mice to elucidate the underlying mechanisms. The results revealed a significant reduction in 82 O-glycosylation sites from 62 different glycoproteins in Galnt2−/− samples, suggesting them as potential GalNAc-T2 substrates. Among these, previously reported sites such as serine 483 of phospholipid transfer protein (Pltp), threonine 267 of alpha-2-HS-glycoprotein, serine 48 of gelsolin (Gsn), and threonine 104 of Igh3 were identified, confirming the reliability of mapping GalNAc-T2-specific O-glycosylation sites. Subsequently, through STRING network analysis, researchers found that glycoproteins with altered O-glycosylation site abundance were involved in plasma lipoprotein particle remodeling, catalytic activity regulation, and system development. Proteins with abundance changes played various roles, including lipid and fatty acid metabolism, oxidoreductase activity, and general metabolic processes. Finally, researchers validated the accurate prediction of experimentally identified GalNAc-T2 substrates using the ISOGlyP algorithm.
In summary, this study revealed tissue-specific regulation of O-glycoproteins by mapping O-glycosylation and proteomic profiles in nine different tissues and blood in mice. Additionally, by comparing and analyzing this dataset with glycosylation data from a congenital glycosylation disorder mouse model, the reliability of mapping specific O-glycosylation sites was confirmed, highlighting its crucial role in investigating the mechanisms of metabolic disorders. The study provides a rich database and research examples for O-glycosylation research, suggesting that constructing proteomic and modification-omic profiles is an important, effective, and innovative approach for a comprehensive exploration of the mechanisms underlying life activities.
Reference
- Yang, Weiming, et al. "Quantitative mapping of the in vivo O-GalNAc glycoproteome in mouse tissues identifies GalNAc-T2 O-glycosites in metabolic disorder." Proceedings of the National Academy of Sciences 120.43 (2023): e2303703120.