Glossary

Glossary

Accuracy

Measures the overall classification performance. For example, in a sample set consisting of X samples of phenotype 1 and Y samples of phenotype 2, if the classification accurately predicts A samples of phenotype 1 and B samples of phenotype Y, accuracy is defined as (A+B)/(X+Y). See also Sensitivity and Specificity.

Algorithm

A set of mathematical rules for solving complex problems with the aid of computer technology. Correlogic develops algorithms as computational tools to understand complex biological data.

Bayesian Net

Bayesian nets consist of a collection of Bayesian classifiers connected in manner resembling a neural net. A Bayesian net uses adjusted probabilities to arrive at an answer where neural nets use non-linear transformed dot products.

Bioinformatics

The scientific discipline that encompasses all aspects of biological information acquisition, processing, storage, distribution, analysis and interpretation that combines the tools of mathematics, computer science and biology with the aim of understanding the biological significance of a variety of data.

Biomarker

A specific biochemical in the body which has a particular molecular feature that makes it useful for measuring the progress of disease or the effects of treatment.

Centroid

The center of a cluster.

Cluster Homogeneity

In a perfect classification, each cluster would be composed of only a single phenotype. In practice no classification is perfect, and each cluster is a mix of phenotypes, usually with one phenotype predominating. The cluster heterogeneity is the percentage of the dominant phenotype in a cluster.

Decision Boundary

The decision boundary defines the edge of a cluster. If the cluster is a spherical one, the decision boundary would be the set of points at a fixed distance (radius) from the centroid.

Feature

The name given to the index value of a datastream. For example, in a mass spectrometer, the features are the m/z values; in NMR, the features are the chemical shift values; in an expression array, the features are the gene names.

Genetic Algorithm (GA)

A genetic algorithm (or short GA) is a technique used to search through large, complex data sets to rapidly identify near-optimal solutions. They are most effective in high dimensional space where linear statistical analyses lose their power.

Genomics

The study of the human genome—the entire genetic composition of each individual. This discipline and its sibling, “Proteomics,” are the cutting edge of the biotechnology revolution, allowing scientists to delve further than ever before into the nature and origin of human physiology and disease.

GC-FAIMS

A separation technique that uses a serial combination of gas chromatography and High-Field Asymmetric Waveform Ion Mobility Spectrometry.

GC-MS

A separation technique that uses a serial combination of gas chromatography and mass spectrometry.

Heuristics

A learning method employing experimentation, evaluation, and trial-and-error methods to learn, discover, understand, or solve problems. A heuristic is a rule. The KDE is a rule finding algorithm.

KDE

The Knowledge Discovery Engine® – Correlogic’s patented for efficient pattern discovery in highly complex systems.

kNN

K-nearest neighbor analysis. A kNN model is a map of a corpus of known data vectors. When an unknown data vector is presented to the model a score is produced based on the k known vectors nearest to the unknown vector. For example, if k=7 the vectors nearest to an unknown vector consisted of three state 0 and 4 state 1, the score returned is 4/7, or, 0.5714.

LC-MS

A separation technique that uses a serial combination of liquid chromatography and mass spectrometry.

Lead Cluster Map

A fast clustering technology employed during a KDE modeling process.

Logical Chromosome

A unique combination of features.

Mass Spectrometer

An analytical instrument that ionizes samples and separates them based on their mass to charge ratio.

Metabolomics

An extension of proteomics in that proteins catalyze biochemical reactions that either produce or consume small molecules, or metabolites. Disease processes affect metabolites in ways characteristics of the disease, e.g., hepatitis, myocardial disease. A global understanding of metabolite dynamics could lead to better diagnosis and treatment of disease.

Model

A KDE model is a collection of clusters and their decision radii in N-dimension, where N is the number of features in the chromosome forming the map.

Neural Net

An artificial neural net is a supervised non-linear modeling algorithm based on a theoretical conception of biological memory and learning. Artificial neural nets are considered to be universal function approximators.

Normalize

A method of scaling data to a common dynamic range.

NMR

Nuclear Magnetic Resonance. A NMR spectrum provides specific qualitative information regarding a chemical or mixture of chemicals.

Phenotype

The classification state given to a sample. A phenotype is a refection of the expression of one or more genes.

PCA

Principal Component Analysis is a statistical method that is often applied to non-linear, complex data.

Protein Separation and Sequencing Equipment

The type of specialized equipment that generates protein data for analysis by the algorithms forming the basis of Correlogic’s proprietary software.

Proteomics

The study of proteins and their interactions. Much the same as genomics where the goal was structural and functional knowledge of the entire set of human genes, the goal of proteomics is the identification and characterization of all human proteins. However, the number of proteins may be indeterminate. Proteins undergo post-synthesis modifications before they become functional. The nature of these modifications and their products increases the complexity of proteomics far beyond that encountered in genomics.

Proteome Quest

A software realization of the KDE algorithm used to analyze data streams generated from protein profiles.

PSA Test

Acronym for Protein Specific Antigen test, the most widely used test for the detection of prostate cancer.

Robustness

Classification robustness is a measure of how accurate a classification scheme generated on one data set will be on a completely independent data set. Classifications that retain accuracy when challenged by more, independent data, are considered robust. Classifications that lose accuracy when challenged by more independent sets lack robustness.

Self-Organizing Map (SOM)

An unsupervised learning method that uses data in a training data set to define a two dimensional surface where dense data is spread out to reveal hidden detail and sparse data still retains its identity.

Sensitivity and Specificity

Measures of classification performance. In a binary classification these values measure the accuracy of prediction of each phenotype. For example, in a sample set consisting of X samples of phenotype 1 and Y samples of phenotype 2, if the classification accurately predicts A samples of phenotype 1 and B samples of phenotype Y, sensitivity may be defined as A/X and specificity as B/Y. See also Accuracy.

Back to top >