Metodologija rešavanja semantičkih problema u obradi kratkih tekstova napisanih na prirodnim jezicima sa ograničenim resursima

Batanović, Vuk

dc.contributor.advisor	Nikolić, Boško
dc.contributor.other	Cvetanović, Miloš
dc.contributor.other	Bojić, Dragan
dc.contributor.other	Ševarac, Zoran
dc.contributor.other	Drašković, Dražen
dc.creator	Batanović, Vuk
dc.date.accessioned	2021-02-09T15:14:43Z
dc.date.available	2021-02-09T15:14:43Z
dc.date.issued	2020-12-23
dc.identifier.uri	http://eteze.bg.ac.rs/application/showtheses?thesesId=7878
dc.identifier.uri	https://fedorabg.bg.ac.rs/fedora/get/o:23189/bdef:Content/download
dc.identifier.uri	http://vbs.rs/scripts/cobiss?command=DISPLAY&base=70036&RID=31036425
dc.identifier.uri	https://nardus.mpn.gov.rs/handle/123456789/17783
dc.description.abstract	Statistički pristupi obradi prirodnih jezika tipično zahtevaju značajne količine anotiranih podataka, a često i različite pomoćne jezičke alate, što ograničava njihovu primenu u resursno ograničenim situacijama. U ovoj disertaciji predstavljena je metodologija razvoja statističkih rešenja u semantičkoj obradi prirodnih jezika sa ograničenim resursima. Ovakvi jezici se odlikuju ne samo limitiranim brojem postojećih jezičkih resursa, već i ograničenim mogućnostima za razvoj novih skupova podataka i namenskih alata i algoritama. Predložena metodologija je usredsređena na kratke tekstove zbog njihove rasprostranjenosti u digitalnoj komunikaciji i zbog veće složenosti njihove semantičke obrade. Metodologija obuhvata sve faze izrade statističkih rešenja, od prikupljanja tekstualnog sadržaja, preko anotacije podataka, do formulisanja, obučavanja i evaluacije modela mašinskog učenja. Njena upotreba je detaljno ilustrovana na dva semantička problema – analizi sentimenta i određivanju semantičke sličnosti. Kao primer jezika sa ograničenim resursima korišćen je srpski jezik, ali se predložena metodologija može primeniti i na druge jezike iz ove kategorije. Pored opšte metodologije, u doprinose ove disertacije spada razvoj novog, fleksibilnog sistema označavanja sentimenta kratkih tekstova, nove metrike za utvrđivanje ekonomičnosti anotacije, kao i nekoliko novih modela za određivanje semantičke sličnosti kratkih tekstova. Rezultati disertacije uključuju i kreiranje prvih javno dostupnih anotiranih skupova podataka za probleme analize sentimenta i određivanja semantičke sličnosti kratkih tekstova na srpskom jeziku, razvoj i evaluaciju većeg broja modela na ovim problemima, i prvu komparativnu evaluaciju većeg broja alata za morfološku normalizaciju na kratkim tekstovima na srpskom jeziku.	sr
dc.description.abstract	Statistical approaches to natural language processing typically require considerable amounts of labeled data, and often various auxiliary language tools as well, limiting their applicability in resource-limited settings. This thesis presents a methodology for developing statistical solutions in the semantic processing of natural languages with limited resources. In these languages, not only are existing language resources limited, but so are the capabilities for developing new datasets and dedicated tools and algorithms. The proposed methodology focuses on short texts due to their prevalence in digital communication, as well as the greater complexity regarding their semantic processing. The methodology encompasses all phases in the creation of statistical solutions, from the collection of textual content, to data annotation, to the formulation, training, and evaluation of machine learning models. Its use is illustrated in detail on two semantic tasks – sentiment analysis and semantic textual similarity. The Serbian language is utilized as an example of a language with limited resources, but the proposed methodology can also be applied to other languages in this category. In addition to the general methodology, the contributions of this thesis consist of the development of a new, flexible short-text sentiment annotation system, a new annotation cost-effectiveness metric, as well as several new semantic textual similarity models. The thesis results also include the creation of the first publicly available annotated datasets of short texts in Serbian for the tasks of sentiment analysis and semantic textual similarity, the development and evaluation of numerous models on these tasks, and the first comparative evaluation of multiple morphological normalization tools on short texts in Serbian.	en
dc.format	application/pdf
dc.language	sr
dc.publisher	Универзитет у Београду, Електротехнички факултет	sr
dc.rights	openAccess	en
dc.rights.uri	https://creativecommons.org/licenses/by-nc-sa/4.0/
dc.source	Универзитет у Београду	sr
dc.subject	obrada prirodnih jezika	sr
dc.subject	natural language processing	en
dc.subject	računarska lingvistika	sr
dc.subject	semantička sličnost tekstova	sr
dc.subject	analiza sentimenta	sr
dc.subject	morfološka normalizacija	sr
dc.subject	lingvistička anotacija	sr
dc.subject	mašinsko učenje	sr
dc.subject	computational linguistics	en
dc.subject	semantic textual similarity	en
dc.subject	morphological normalization	en
dc.subject	linguistic annotation	en
dc.subject	machine learning	en
dc.subject	sentiment analysis	en
dc.title	Metodologija rešavanja semantičkih problema u obradi kratkih tekstova napisanih na prirodnim jezicima sa ograničenim resursima	sr
dc.type	doctoralThesis	en
dc.rights.license	BY-NC-SA
dcterms.abstract	Николић, Бошко; Шеварац, Зоран; Бојић, Драган; Драшковић, Дражен; Цветановић, Милош; Батановић, Вук; Методологија решавања семантичких проблема у обради кратких текстова написаних на природним језицима са ограниченим ресурсима; Методологија решавања семантичких проблема у обради кратких текстова написаних на природним језицима са ограниченим ресурсима;
dc.identifier.fulltext	https://nardus.mpn.gov.rs/bitstream/id/67921/IzvestajKomisije23600.pdf
dc.identifier.fulltext	https://nardus.mpn.gov.rs/bitstream/id/67920/Disertacija.pdf
dc.identifier.rcub	https://hdl.handle.net/21.15107/rcub_nardus_17783

Files in this item

Name:: Disertacija.pdf
Size:: 2.209Mb
Format:: PDF

View/Open

Name:: IzvestajKomisije23600.pdf
Size:: 1.292Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Електротехнички факултет

Show simple item record