CoCoPyE - feature-based prediction of genome quality indices

CoCoPyE is a fast tool for quality assessment of microbial genomes. It is able to reliably predict completeness and contamination of bacterial and archaeal genomes. Additionally, it can provide a taxonomic classification of the input.

Background: The classical approach for estimation of quality indices solely relies on a relatively small number of universal single copy genes. Because these classical markers only cover a small fraction of the whole genome, the quality assessment can be rather unreliable. Our method is based on a novel two-stage feature extraction and transformation scheme. It first performs a flexible extraction of genomic markers and then refines the marker-based estimates with a machine learning approach based on count-ratio histograms. In our simulation studies CoCoPyE showed a more accurate prediction of quality indices than existing tools.

Citing CoCoPyE

N. Birth, N. Leppich, J. Schirmacher, N. Andreae, R. Steinkamp, M. Blanke, P. Meinicke. "CoCoPyE: feature engineering for learning and prediction of genome quality indices". GigaScience, Volume 13, 2024

Demo

Upload a FASTA file and let CoCoPyE calculate completeness and contamination.
Upload limit: 50MB

Results

Completeness estimate -
Contamination estimate -
Prediction method
-
Taxonomy prediction
-