A Protein Classification Benchmark collection for machine learning

What you can do:
Use standard datasets to compare protein classification by different machine learning methods.
Highlights:
  • This database was created in order to provide standard datasets on which the performance of machine learning methods can be compared.
  • The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways.
  • There is a total of 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA.
  • Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems.
  • In the case of hierarchical classification schemes, the classification tasks can be defined at various levels of the hierarchy (such as classes, folds, superfamilies, etc.).
  • For each dataset there are distance matrices available that contain all vs. all comparison of the data, based on various sequence or structure comparison methods, as well as a set of classification performance measures computed with various classifier algorithms.
Keywords:
  • protein classification
  • protein structural domain classification
  • protein classification evaluation
  • protein structure classification
  • protein family classification
This record last updated: 03-21-2007
Report a missing or misdirected URL.

The Health Sciences Library System supports the Health Sciences at the University of Pittsburgh.

© 1996 - 2023 Health Sciences Library System, University of Pittsburgh. All rights reserved.