Modeli konačnih stanja u ekstrakciji informacija

Pajić, Vesna

Finite state models in information extraction

dc.contributor.advisor	Pavlović-Lažetić, Gordana
dc.contributor.other	Vitas, Duško
dc.contributor.other	Obradović, Ivan
dc.creator	Pajić, Vesna
dc.date.accessioned	2016-01-05T12:39:38Z
dc.date.available	2016-01-05T12:39:38Z
dc.date.available	2020-07-03T08:38:58Z
dc.date.issued	2012-11-08
dc.identifier.uri	http://eteze.bg.ac.rs/application/showtheses?thesesId=1713
dc.identifier.uri	https://nardus.mpn.gov.rs/handle/123456789/2848
dc.identifier.uri	https://fedorabg.bg.ac.rs/fedora/get/o:9290/bdef:Content/download
dc.identifier.uri	http://vbs.rs/scripts/cobiss?command=DISPLAY&base=70036&RID=45438223
dc.description.abstract	Disertacija je posvećena istraživanju naučne oblasti nazvane ekstrakcija informacija (engl. information extraction), koja predstavlja podoblast veštačke inteligencije, a u sebi kombinuje i koristi tehnike i dostignuća više različitih oblasti računarstva. Termin "ekstrakcija informacija" će biti korišćen u dva različita konteksta. U jednom od njih misli se na ekstrakciju informacije kao naučnu oblast i tada će se koristiti skraćenica IE, preuzeta iz anglosaksonske literature u značenju "Information Extraction". U drugom slučaju, kada se bude mislilo na sam proces i postupak izdvajanja informacija iz teksta, koristiće se oblik "ekstrakcija informacija". Ova disertacija predstavlja, pored pregleda postojećih metoda iz ove oblasti, i jedan originalni pristup i metod za ekstrakciju informacija baziran na konačnim transduktorima. Tokom istraživanja i rada na disertaciji, a primenom pomenutog metoda, kao rezultat formirana je baza podataka o mikroorganizmima koja sadrži fenotipske i genotipske karakteristike za 2412 vrsta i 873 rodova, namenjena za istraživanja iz oblasti bioinformatike i genetike. Baza i korišćeni metod su detaljno prikazani u nekoliko radova, publikovanih u časopisima ili izlaganih na međunarodnim konferencijama (Pajić, 2011; Pajić i sar. 2011a; Pajić i sar. 2011b) U glavi 1 dat je uvod u oblast ekstrakcije informacije, unutar koga je opisan istorijat i razvoj metoda ove oblasti. Dalje je opisana klasifikacija tekstualnih resursa nad kojima se vrši ekstrakcija informacija, kao i klasifikacija samih informacija. Na kraju glave 1 oblast ekstrakcije informacije je upoređena sa drugim srodnim disciplinama računarstva. Glava 2 je posvećena prikazu teorijskih osnova na kojima su zasnovana istraživanja ove disertacije. Razmatrana je teorija formalnih jezika i modela konačnih stanja, kao i njihova uzajamna veza i veza sa ekstrakcijom informacija. Akcenat je stavljen na konačne modele i metode koji su zasnovani na modelima konačnih stanja. Ovi metodi pokazuju veću preciznost od drugih metoda za ekstrakciju informacije, te su nezamenljivi u situacijama kada je tačnost izdvojenih podataka iz teksta od presudnog značaja. Pojedini pojmovi ekstrakcije informacija - jezik relevantnih informacija, jezik izdvojenih informacija, pravila ekstrakcije, definisani su iz ugla teorije formalnih jezika. Formulisano je i dokazano osnovno svojstvo relacije transdukcije za zadato pravilo ekstrakcije. Definisan je i pojam jezika konteksta informacija i dokazano je njegovo svojstvo regularnosti...	sr
dc.description.abstract	This dissertation is on research and studying in scientific field called information extraction, which can be seen as a sub-area of artificial intelligence and which combines and uses techniques and achievements of several computer science areas. The term „information extraction“ will be used in two different contexts. In the first one, the term will refer to the scientific area and the acronym IE will be used in that case. In the second case, this term will refer to the very process of extracting information. Beside the IE state-of-the-art survey, an original approach and a method for information extraction based on finite state transducers are presented. A database with microbial phenotype and genotype characteristics, for 2412 species and 873 genera has been created, as a result of the research and the work on the dissertation. The database is intended for research, in bioinformatics and genetics. The method used for the creation of the database and the database itself are described in details and published in several journals and conference proceedings (Pajić, 2011; Pajić et al. 2011a; Pajić et al. 2011b). In the Section 1, the introduction to IE is given, together with the history of development of methods in this area. The classification of textual resources that are used for information extraction and classification of the information itself are described. At the end of the Section 1, IE is compared with other related disciplines of computer science. Section 2 contains some excerpts from formal language theory and abstract automata, on which the dissertation is based. The mutual relationship between these two areas and their connection with IE are described. The emphasis is put on the final state models and methods based on them. These methods show higher precision than other methods for extracting information, and are indispensable in situations where the accuracy of data extracted from the text is of crucial importance. Some specific terms of information extraction - the language of the relevant information, the language of extracted information and extraction rules, are defined from the perspective of formal language theory. The basic feature of the transduction relation for the given rule extraction is formulated and proved. The language of information context is defined and its regularilty is proven...	en
dc.format	application/pdf
dc.language	sr
dc.publisher	Универзитет у Београду, Математички факултет	sr
dc.relation	info:eu-repo/grantAgreement/MESTD/Basic Research (BR or ON)/178006/RS//
dc.rights	openAccess	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.source	Универзитет у Београду	sr
dc.subject	ekstrakcija informacija	sr
dc.subject	information extraction	en
dc.subject	natural language processing	en
dc.subject	finite state automata	en
dc.subject	finite state transducers	en
dc.subject	recursive transition networks	en
dc.subject	obrada prirodnih jezika	sr
dc.subject	konačni automati	sr
dc.subject	konačni transdukori	sr
dc.subject	rekurzivne mreže prelaza	sr
dc.title	Modeli konačnih stanja u ekstrakciji informacija	sr
dc.title	Finite state models in information extraction	en
dc.type	doctoralThesis	en
dc.rights.license	BY
dcterms.abstract	Павловић-Лажетић, Гордана; Обрадовић, Иван; Витас, Душко; Пајић, Весна; Модели коначних стања у екстракцији информација; Модели коначних стања у екстракцији информација;
dc.identifier.fulltext	https://nardus.mpn.gov.rs/bitstream/id/6643/Disertacija.pdf
dc.identifier.fulltext	http://nardus.mpn.gov.rs/bitstream/id/6643/Disertacija.pdf
dc.identifier.rcub	https://hdl.handle.net/21.15107/rcub_nardus_2848

Документи за докторску дисертацију

Име:: Disertacija.pdf
Величина:: 3.651Mb
Формат:: PDF

Отварање

Ова дисертација се појављује у следећим колекцијама

Математички факултет

Приказ основних података о дисертацији