brazerzkidaieo.blogg.se - Protein sequence analysis

#Protein sequence analysis software

#Protein sequence analysis software

Lots of alignment algorithms have been proposed in the past few years to identify the class of the unseen protein sequence based on comparing it with some known protein sequences and calculating their similarities, such as iPro-Class ( ), SAM (SAM: Sequence Alignment and Modeling Software System, Baskin Center for Computer Engineering and Science, ), and MEME (MEME: Multiple Expectation Maximization for Motif Elicitation UCSD Computer Science and Engineering, ). Generally, two protein sequences are classified into the same category if their feature patterns extracted by sequence alignment algorithms show high homology. Precisely classifying a member protein sequence into a superfamily protein would show the benefit that it only needs to carry out some molecular analysis within a particular superfamily instead of the analysis on all the individual member protein sequences. As mentioned by Baldi and Brunak, protein sequence classification plays an important role in protein sequence analysis on the account of those protein sequence members consisting of a same protein superfamily are evolutionally related and functionally and structurally relevant to each other. Hence, it becomes an important and challenging task to efficiently exploit useful information from the large protein sequence dataset for both computer scientists and biologists. Hence, a number of protein sequence databases have been established in the past decades, such as Protein Information Resource (PIR) ( ), Protein Data Bank (PDB) ( ), and Universal Protein Resource (UniProt) ( ). Recent research has shown that the comparative analysis of the protein sequences is more sensitive than directly comparing DNA. Protein sequence analysis generally helps to characterize protein sequences in silico and allows the prediction of protein structures and functions. The DNA sequence of a gene encodes the amino acid sequence of a protein.ĭue to the wide applications in clinical proteomics and protein bioinformatics, protein sequence analyses have been comprehensively studied in recent years, such as the work presented by Barve et al. Typically, a protein such as an enzyme initiates one specific action. As shown in Figure 1, a gene is any given segment along the deoxyribonucleic acid (DNA) that encodes instructions which allow a cell to produce a specific product. The sequence of amino acids in a protein is defined by the sequence of a gene, which is encoded in the genetic code. The amino acids in a polymer chain are joined together by the peptide bonds between the carboxyl and amino groups of adjacent amino acid residues. Protein sequences (also known as polypeptides) are organic compounds made of amino acids arranged in a linear chain and folded into a globular form. The experimental results show the priority of the proposed algorithms. The performance is analyzed and compared with several existing methods using datasets obtained from the Protein Information Resource center. Two approaches, namely, the basic ELM and the OP-ELM, are adopted for the ensemble based SLFNs. The final category index is derived using the majority voting method. For each ensemble, the same training algorithm is adopted. To further enhance the performance, the ensemble based SLFNs structure is constructed where multiple SLFNs with the same number of hidden nodes and the same activation function are used as ensembles. The optimal pruned ELM is first employed for protein sequence classification in this paper. The recent efficient extreme learning machine (ELM) and its invariants are utilized as the training algorithms. In this paper, we study the performance of protein sequence classification using SLFNs. Therefore, it is urgent and necessary to build an efficient protein sequence classification system. Comparing the unseen sequence with all the identified protein sequences and returning the category index with the highest similarity scored protein, conventional methods are usually time-consuming. Precisely classifying a protein sequence from a large biological protein sequences database plays an important role for developing competitive pharmacological products.