Automatic Prediction and Annotation
Résumé
In the last few decades, the explosion of genomic projects has produced huge sets of predicted genes and annotated sequences. The prediction of a gene structure can be defined as the capacity to determine the start and the stop of the gene as well as the positions of introns, if present. Despite the number of performant gene prediction programs combining ab initio and homology-based approaches (Mathe et al., 2002; Hoff and Stanke, 2015), the rate of mis-predicted genes is not negligible and can be due to several factors (Scalzitti et al., 2020). For example, unusually long introns, short exons or long genes can generate incomplete or partially predicted gene structure; short intergenic regions can lead to gene fusion; DNA sequencing errors (nucleotide deletions or insertions) introducing frameshifts can affect predictions; non-canonical splice sites, overlapping genes and genes located within introns are also a source of erroneous predictions. [...]
Domaines
Biologie végétale
Origine : Fichiers éditeurs autorisés sur une archive ouverte