Analysis of N-terminus and C-terminus in Protein: A Comprehensive Guide
- Home
- Resource
- Knowledge Bases
- Analysis of N-terminus and C-terminus in Protein: A Comprehensive Guide
Proteins are composed of linear chains of amino acids, with each protein having two distinct ends: the N-terminus, featuring a free amine group, and the C-terminus, having a free carboxyl group. These termini are essential for the protein's function, stability, and interaction with other molecules. The analysis of these termini is crucial in protein engineering, drug development, and quality control of biopharmaceuticals.
Research on Protein Physiological Functions
Analyzing the N-terminus and C-terminus of proteins is crucial due to the significant roles these terminal regions play in protein functionality, localization, and stability. The diversity of protein forms generated from a single gene through mechanisms such as co- or post-translational modifications, alternative splicing, and alternative translation initiation makes terminal analysis indispensable for understanding protein biology.
Verification of Protein Terminal Modifications
Precise chemical modification of proteins can improve their physicochemical properties and endow them with new physiological functions, such as extending protein half-life and regulating protein interactions. The N-terminus of a protein or peptide drug is crucial as it determines the site and efficacy of the drug's action. Changes at the N-terminus, including alterations and modifications of the N-terminal amino acids during the drug manufacturing process or storage, are common. In contrast to the development of modification techniques for protein side chains and N-termini, precise strategies for C-terminal modifications are still significantly lacking. During the modification process of protein N-termini or C-termini, it is essential to monitor and verify their sequences to obtain correctly modified samples.
Terminomics Analysis (High Throughput)
With the rise of proteogenomics, re-annotating genomes using proteomics data has become a research hotspot. Employing proteomics methods for large-scale identification and analysis of protein termini helps verify and correct annotated genes and even discover new genes. Over the past 20 years, mass spectrometry technology has undergone rapid development, leading to more in-depth studies on the proteomics of protein termini. The establishment and improvement of various terminal enrichment techniques and methods have gradually matured large-scale sequencing of protein termini.
Quality control of biopharmaceutical products
Analyzing the N- and C-termini is vital across a broad spectrum of biological products, including proteins, antibodies, vaccines, polypeptides, and recombinant collagen. Such analysis ensures the structural integrity and biological activity of these products. For instance, in monoclonal antibody (mAb) quality control, identifying and quantifying terminal amino acid sequences in light and heavy chains is crucial. Multiple terminal sequences must be accurately determined to ensure product consistency and efficacy.
Advances in proteomics have introduced diverse technologies for terminal analysis, each offering unique benefits and insights.
Currently, there are two primary methods for N-terminal sequencing of proteins.
The first method is Edman degradation, which sequentially cleaves and analyzes the limited number of amino acids from the N-terminus. This method determines the N-terminal amino acid sequence by comparing the retention times of amino acid standards.
The second method is mass spectrometry, where the protein is denatured, reduced, and enzymatically digested into peptides of varying sizes. These peptides are then separated by liquid chromatography and introduced into a high-resolution mass spectrometer at different time intervals. The m/z values of the peptides are analyzed and compared against a proposed sequence.
Depending on the conditions and the amount of amino acid data required, these two methods are often combined or chosen based on comprehensive considerations.
Principle
Edman degradation is a process for determining the amino acid sequence of a peptide starting from the free N-terminus. Under alkaline conditions, phenylisothiocyanate (PITC) reacts with the N-terminal amino group of the protein or peptide to form a phenylthiocarbamoyl (PTC) derivative. This derivative is then treated with acid, causing cyclization and selective cleavage of the N-terminal residue, producing a phenylthiohydantoin (PTH) derivative of the N-terminal amino acid. The cyclic peptide then enters the next cycle. The PTH derivative is extracted with an organic solvent, and under acidic conditions, it reacts to form a stable phenylthiohydantoin (PTH) derivative. This PTH derivative is analyzed by HPLC to identify the amino acid. Each cycle provides the information of one amino acid.
Procedure
Figure 1. Combination of Edman degradation with LC-MS workflow. (Anna A. et al., 2013)
Instrumentation
Based on the steps of Edman amino acid sequencing, instrument companies have developed protein sequencers, which include two main modules: a sample processing module and a liquid chromatography module. The sample processing module automates coupling, cleavage, extraction, and conversion, while the liquid chromatography module automatically analyzes the obtained PTH-amino acids, identifying the amino acid types.
Advantages
Disadvantages
Principle
The most widely used method for protein sequencing currently is mass spectrometry (MS). MS-based protein sequencing strategies are broadly categorized into two main approaches: Top-Down and Bottom-Up.
Top-Down Strategy: This approach analyzes intact proteins directly using liquid chromatography-mass spectrometry (LC-MS) without prior degradation. The protein's sequence is determined by identifying fragment ions in the mass spectra.
Bottom-Up Strategy: In this approach, proteins are first hydrolyzed into peptides. These peptides are then analyzed using LC-MS, where they are sequenced de novo and their sequences are pieced together to reconstruct the complete protein sequence.
Procedure
Top-Down Approach
Bottom-Up Approach
Figure 2. CID MS/MS spectra of the N-terminal tryptic peptide of the light chain for mAb1. (Malgorzata Monika et al., 2019)
Instrumentation
High-performance liquid chromatography-tandem mass spectrometry (LC-MS) consists of two main components:
Advantages
Disadvantages
Principle
Carboxypeptidases are exopeptidases that specifically hydrolyze peptide bonds at the C-terminus of proteins, releasing amino acids one at a time. The number and type of amino acids released vary with reaction time. By analyzing the amount of amino acids released over time, the sequence of amino acids at the C-terminus can be determined.
Procedure
Instrumentation
The analysis method depends on the approach used:
Advantages
Disadvantages
Principle
MS-based methods for C-terminal sequencing of proteins utilize two main approaches: top-down and bottom-up. The bottom-up approach is widely used due to its practical advantages, including higher sensitivity for low-abundance proteins, easier handling of complex protein mixtures, and compatibility with existing proteomic workflows. In the bottom-up approach, proteins are enzymatically digested into peptides, which are then analyzed by mass spectrometry (MS). It can further be categorized into two main strategies:
Labeling Strategies
Labeling strategies involve chemically modifying peptides to introduce tags or labels that aid in their detection and sequencing by MS. These strategies enhance the ionization efficiency and distinguish C-terminal peptides from other peptide fragments in the mixture. Examples include:
Enrichment strategies
Enrichment strategies focus on isolating C-terminal peptides from complex peptide mixtures before MS analysis. This selective enrichment improves sensitivity and specificity in identifying C-terminal sequences. Common enrichment methods include:
Procedure
The methods encompassed within Labeling Strategies and Enrichment Strategies are highly diverse. Here, two straightforward approaches are outlined respectively:
Labeling Strategies
Figure 3. Schematic representation of methods for C-terminal labeling. (A) Protease-assisted 18O-labeling. (B) Method to differentiate C-terminal peptides in cyanogen bromide digests. (C) Isotopic arginine labeling based on the oxazolone chemistry. (Sebastian et al., 2015)
Enrichment strategies
The COFRADIC technology exploits peptide chromatography to specifically isolate peptides of interest from complex peptide mixtures.
Figure 4. Schematic representation of C-terminal enrichment strategies. C-terminal COFRADIC (A) and C-TAILS (B) represent negative selection strategies for C-terminal peptide enrichment. On the other hand, ProC-TEL (C) applies a positive selection to isolate C-termini. (Sebastian et al., 2015)
Instrumentation
MS-based C-terminal sequencing requires advanced instrumentation:
Advantages
Disadvantages
Understanding the N-terminus and C-terminus is fundamental in the field of protein biochemistry. At Creative Proteomics, we leverage this knowledge to explore innovative solutions in biotechnology, enhancing protein function, stability, and interaction capabilities. The distinct roles and structural characteristics of these termini underscore their importance in protein biology, providing a foundation for advanced research and application in various scientific and industrial domains.
References
For research use only, not intended for any clinical use.