Истраживање образаца у одређивању карактеристика протеина

Marovac, Ulfeta A.

Mining sequential patterns for determination of protein characteristics

dc.contributor.advisor	Mitić, Nenad
dc.contributor.other	Pavlović Lažetić, Gordana
dc.contributor.other	Pavlović, Mirjana
dc.creator	Marovac, Ulfeta A.
dc.date.accessioned	2016-07-16T12:54:27Z
dc.date.available	2016-07-16T12:54:27Z
dc.date.available	2020-07-03T08:39:22Z
dc.date.issued	2015-07-13
dc.identifier.uri	http://eteze.bg.ac.rs/application/showtheses?thesesId=3185
dc.identifier.uri	https://nardus.mpn.gov.rs/handle/123456789/5798
dc.identifier.uri	https://fedorabg.bg.ac.rs/fedora/get/o:11546/bdef:Content/download
dc.identifier.uri	http://vbs.rs/scripts/cobiss?command=DISPLAY&base=70036&RID=47621135
dc.description.abstract	Беланчевине или протеини су важни биолошки макромолекули полимерне природе (полипептиди), који се састоје од амино киселина и представљају основну градивну јединицу сваке ћелије...	sr
dc.description.abstract	Proteins are signicant biological macromolecules of polymeric nature (polypeptides), which contain amino acids and are basic structural units of each cell. Their contents include 20+3 amino acids and, as a consequence, they are presented in biological databases as sequences formed from 23 dierent characters. Proteins can be classied based on their primary structure, secondary structure, function etc. One of possible classications of proteins by their function is related to their contents in a certain cluster of ortholologous groups (COGs). This classication is based on the previous comparison of proteins by their similarities in their primary structures, which is most often a result of homology, i.e. their mutual (evolutionary) origin. COG database is obtained by comparison of the known and predicted proteins encoded in the completely sequenced prokaryotic (archaea and bacteria) genomes and their classication by orthology. The proteins are classied in 25 categories which can be ordered in three basic functional groups (the proteins responsible for: (1) information storage and processing; (2) cellular processes and signaling; and (3) metabolism), or in a group of poorly characterized proteins. Classication of proteins by their contents in certain COG category (euKaryote Orthologous Groups- KOG for eukaryotic organisms) is signicant for better understanding of biological processes and various pathological conditions in people and other organisms. The dissertation proposed the model for classication of proteins in COG categories based on amino acid n-grams (sequences of n- length). The set of data contains protein sequences of genomes from 8 dierent taxonomic classes [TKL97] of bacteria (Aquicales, Bacteroidia, Chlamydiales, Chlorobia, Chloroexia, Cytophagia, Deinococci, Prochlorales), which are known to have been classied by COG categories. The new method is presented, based on the generalized systems of Boolean equations, used for separation of n-grams characteristic for proteins of corresponding COG categories. The presented method signicantly reduces the number of processed n-grams in comparison to previously used methods of n-gram analysis, thus more memory space is provided and less time for protein procession is necessary. The previously known methods for classication of proteins by functional categories compared each new protein (whose function had to be determined) to the set of all proteins which had already been classied by functions in order to determine the group which contained most similar proteins to the one which was to be classied. In relation to the previous, the advantage of the new method is in its avoidance of sequence-sequence comparison and in search for those patterns (n-grams, up to 10 long) in a protein which are characteristic of the corresponding COG category. The selected patterns are added to a corresponding COG category and describe sequences of certain length, which have previously appeared in that COG category only, not in the proteins of other COG categories. On the basis of the proposed method, the predictor for determination of the corresponding COG category for a new protein is implemented. Minimal precision of the prediction is one of the predictors arguments. During the test phase the constructed predictor shown excellent results, with maximal precision of 99% reached for some proteins. According to its properties and relatively simple construction, the proposed method can be applied in similar domains where the solution of problem is based on n-gram sequence analysis.	en
dc.format	application/pdf
dc.language	sr
dc.publisher	Универзитет у Београду, Математички факултет	sr
dc.relation	info:eu-repo/grantAgreement/MESTD/Integrated and Interdisciplinary Research (IIR or III)/44007/RS//
dc.rights	openAccess	en
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.source	Универзитет у Београду	sr
dc.subject	карактеристике протеина	sr
dc.subject	characteristics of proteins	en
dc.subject	класификација	sr
dc.subject	истраживање секвенцијалних образаца	sr
dc.subject	n-грам	sr
dc.subject	Булова алгебра	sr
dc.subject	classication	en
dc.subject	mining sequential patterns	en
dc.subject	n-gram	en
dc.subject	Boolean algebra	en
dc.title	Истраживање образаца у одређивању карактеристика протеина	sr
dc.title	Mining sequential patterns for determination of protein characteristics	en
dc.type	doctoralThesis	en
dc.rights.license	BY-NC-ND
dcterms.abstract	Митић, Ненад; Павловић, Мирјана; Павловић Лажетић, Гордана; Маровац, Улфета A.; Istraživanje obrazaca u određivanju karakteristika proteina;
dc.identifier.fulltext	http://nardus.mpn.gov.rs/bitstream/id/6778/Disertacija3746.pdf
dc.identifier.fulltext	http://nardus.mpn.gov.rs/bitstream/id/6779/Marovac_Ulfeta_A.pdf
dc.identifier.fulltext	https://nardus.mpn.gov.rs/bitstream/id/6779/Marovac_Ulfeta_A.pdf
dc.identifier.fulltext	https://nardus.mpn.gov.rs/bitstream/id/6778/Disertacija3746.pdf
dc.identifier.rcub	https://hdl.handle.net/21.15107/rcub_nardus_5798

Документи за докторску дисертацију

Име:: Disertacija3746.pdf
Величина:: 7.586Mb
Формат:: PDF

Отварање

Име:: Marovac_Ulfeta_A.pdf
Величина:: 84.35Kb
Формат:: PDF

Отварање

Ова дисертација се појављује у следећим колекцијама

Математички факултет

Приказ основних података о дисертацији