Clustering analysis is a widely used statistical method that involves dividing observed datasets into a few subclasses or clusters on the basis of a selected statistical distance function. Clustering algorithm scan be divided into two categories: hierarchical and non- hierarchical methods. Both algorithms partition the observed datasets into subgroups so that datasets with similar metabolomic profiles are placed in each subgroup.
Hierarchical clustering (HCL) aligns datasets by generating dendrograms using the following procedures:
- Calculate the similarity of the two samples using a specific metric, such as Pearson correlation, Euclidean, mutual information and covariance values;
- Align the most similar samples as neighbors or pair them as a single cluster;
- Reiterate step 1 and 2 until all samples are aligned.
Non-hierarchical clustering (non-HCL) also divides data into subclasses but without any hierarchical organization. The K- means and fuzzy c-means approaches are typical examples of non-HCL. In the K-means method, k data points are initially randomly chosen to be close to the mean of each cluster, a new mean is then computed for each cluster and the patterns are reassigned to the new means. This clustering process is repeated until the cluster means are such that no pattern moves from one cluster to another. The K-means method assigns each data point into only one cluster while the fuzzy c-means method allows a data point to be assigned to several clusters. Fuzzy c-means also calculates the probability of a data point belonging to each cluster. These analyses are widely applied when the number of clusters for the samples is unknown, and can be applied for one-time snapshot profiling as well as time-course data.
Clustering analysis service provided by Creative Proteomics, include:
How to place an order:
*If your organization requires signing of a confidentiality agreement, please contact us by email