List of Courses

ICS Research Abstracts

ICS Seminars

ICS Web Pages

The ICS website
conforms to
the W3C
XHTML 1.0 Transitional
Standard Encoding
Valid XHTML 1.0 Transitional

Search

CMSC 191: Special Topic

courses@UPLB

Open Access Journals in Computer Science

Open Access Journals

CMSC 190 LaTeX stylesheets

Navigation

An Improved Exon-Intron Recognition via a Committee of Machines

Published in |

J.P. PABICO, E.R.E. Mojica and J.R.L. Micor. 2008. Transactions of the National Academy of Science and Technology of the Philippines 30(1):117.

Abstract

The human genome consists of a sequence of gene base pairs that generate proteins called exons. Exons are bounded by subsequences, called introns, that are spliced out prior to translation. In RNA splicing, the current procedure followed by researchers to recognize the gene boundaries is the GU-AG heuristic which has the following motif: exon/GU-intron-AG/exon. However, this motif occurs so frequently that a typical intron will contain several GUs and AGs within it, resulting in many false boundaries being recognized. Several methodologies to automate the recognition of these sites have been employed by other researchers, such as support vector machines, hidden Markov models, and artificial neural networks (ANN), where the reported maximum recognition accuracy on a production set is only 81%. A production set is a set of DNA sequences whose intron-exon boundaries are known but where not used in the development of the model. A committee of machines is a computational methodology where the output of multiple models are combined into a single output. The member models' output are combined using several methodologies such as averaging, boosting, bagging and simple majority voting. It has been shown, both theoretically and empirically, that the output of the committee machine is superior to those of its constituent member models. In this effort, we developed a committee of neural network classifiers trained to classify whether a given 60bp long DNA sequence is an intron-exon (IE) boundary (acceptor site), an exon-intron (EI) boundary (donor site), or not (N). Using the same production set used by other researchers, our committee machine was able to recognize 84% of the DNA sequences, improving the recognition rate by 3%.

Keywords: intron, exon, committee machines, machine recognition

Elmer-Rico E. Mojica and Jose Rene L. Micor are Assistant Professors at the Institute of Chemistry.

Submitted 09 February 2008; Received 26 February 2008; Accepted 19 June 2008.

Suggested citation for this online article:

_______. An Improved Exon-Intron Recognition Via A Committee Of Machines. Accessed 21 November 2008. UPLB-ICS webpage (http://www.ics.uplb.edu.ph/node/282).