Skip to Content
Developmental Therapeutics Program (DTP)
Contact DTP
Show menu
Search this site
Last Updated: 06/21/23

Milestone (1981)

Hodes model for ranking small molecule stuctures

The Hodes clustering model revolutionized the selection of compounds of interest by measuring the novelty of a chemical structure by comparing it to known compounds.

In 1981, the selection of compounds of interest for DTP’s repository was greatly enhanced by the use of a clustering model developed by Louis Hodes. The model is capable of measuring the novelty of a chemical structure by comparing it to known compounds in DTP’s structural database.1

The process for clustering involves generating appropriate descriptors for each compound in the data set, selecting an appropriate similarity measure, using the appropriate clustering method, and analyzing the results. DTP’s clustering model used a nonhierarchical, single-pass method to determine which cluster a compound should be assigned to. The descriptors of each compound were weighted by the occurrence in each compound, the size of the fragment, and the frequency of occurrence in the data set. Unlike most clustering models, the Hodes model uses an asymmetric coefficient to determine similarity.

The Hodes model was used not only to determine the novelty of a compound offered for evaluation, but also to determine a representative sample of the entire database, which, at the time, contained structures for more than 350,000 compounds. Representative samples are useful for multiple purposes, including the selection of samples for small screening sets representing the "chemical space" of the entire compound set.

1 Hodes L. Clustering of a large number of compounds. 1. Establishing the method on an initial sample. J Chem Inf Comput Sci 1989;29:66–71.

Links:

Clustering a large number of compounds. 1. Establishing the method on an initial sample.

Clustering a large number of compounds. 2. Using the Connection Machine.

Clustering a large number of compounds. 3. The limits of classification.