A Protein Classification Benchmark collection for machine learning
What you can do:
Use standard datasets to compare protein classification by different machine learning methods.
Highlights:
- This database was created in order to provide standard datasets on which the performance of machine learning methods can be compared.
- The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways.
- There is a total of 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA.
- Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems.
- In the case of hierarchical classification schemes, the classification tasks can be defined at various levels of the hierarchy (such as classes, folds, superfamilies, etc.).
- For each dataset there are distance matrices available that contain all vs. all comparison of the data, based on various sequence or structure comparison methods, as well as a set of classification performance measures computed with various classifier algorithms.
Keywords:
- protein classification
- protein structural domain classification
- protein classification evaluation
- protein structure classification
- protein family classification
Literature & Tutorials:
This record last updated: 03-21-2007