Show simple item record

Models of the Serbian language and their application in speech and language technologies

dc.contributor.advisorSečujski, Milan
dc.contributor.otherDelić, Vlado
dc.contributor.otherBajić, Dragana
dc.contributor.otherGudurić, Snežana
dc.contributor.otherGrbić, Tatjana
dc.contributor.otherNikolić, Jelena
dc.contributor.otherSečujski, Milan
dc.creatorOstrogonac, Stevan
dc.date.accessioned2018-12-26T13:20:51Z
dc.date.available2018-12-26T13:20:51Z
dc.date.available2020-07-03T14:06:21Z
dc.date.issued2018-12-21
dc.identifier.urihttp://nardus.mpn.gov.rs/handle/123456789/10445
dc.identifier.urihttps://www.cris.uns.ac.rs/DownloadFileServlet/Disertacija153796019798434.pdf?controlNumber=(BISIS)107812&fileName=153796019798434.pdf&id=12013&source=NaRDuS&language=srsr
dc.identifier.urihttps://www.cris.uns.ac.rs/record.jsf?recordId=107812&source=NaRDuS&language=srsr
dc.identifier.urihttps://www.cris.uns.ac.rs/DownloadFileServlet/IzvestajKomisije153796021102819.pdf?controlNumber=(BISIS)107812&fileName=153796021102819.pdf&id=12014&source=NaRDuS&language=srsr
dc.description.abstractStatistički jezički model, u teoriji, predstavlja raspodelu verovatnoća nad skupom svih mogućih sekvenci reči nekog jezika. U praksi, to je mehanizam kojim se estimiraju verovatnoće sekvenci, koje su od interesa. Matematički aparat vezan za modele jezika je uglavnom nezavisan od jezika. Međutim, kvalitet obučenih modela ne zavisi samo od algoritama obuke, već prvenstveno od količine i kvaliteta podataka koji su na raspolaganju za obuku. Za jezike sa kompleksnom morfologijom, kao što je srpski, tekstualni korpus za obuku modela mora biti daleko obimniji od korpusa koji bi se koristio kod nekog od jezika sa relativno jednostavnom morfologijom, poput engleskog. Ovo istraživanje obuhvata razvoj jezičkih modela za srpski jezik, počevši od prikupljanja i inicijalne obrade tekstualnih sadržaja, preko adaptacije algoritama i razvoja metoda za rešavanje problema nedovoljne količine podataka za obuku, pa do prilagođavanja i primene modela u različitim tehnologijama, kao što su sinteza govora na osnovu teksta, automatsko prepoznavanje govora, automatska detekcija i korekcija gramatičkih i semantičkih grešaka u tekstovima, a postavljaju se i osnove za primenu jezičkih modela u automatskoj klasifikaciji dokumenata i drugim tehnologijama. Jezgro razvoja jezičkih modela za srpski predstavlja definisanje morfoloških klasa reči na osnovu informacija koje su sadržane u morfološkom rečniku, koji je nastao kao rezultat jednog od ranijih istraživanja.sr
dc.description.abstractA statistical language model, in theory, represents a probability distribution over sequences of words of a language. In practice, it is a tool for estimating probabilities of word sequences of interest. Mathematical basis related to language models is mostly language independent. However, the quality of trained models depends not only on training algorithms, but on the amount and quality of available training data as well. For languages with complex morphology, such as Serbian, textual corpora for training language models need to be significantly larger than the corpora needed for training language models for languages with relatively simple morphology, such as English. This research represents the entire process of developing language models for Serbian, starting with collecting and preprocessing of textual contents, extending to adaptation of algorithms and development of methods for addressing the problem of insufficient training data, and finally to adaptation and application of the models in different technologies, such as text-to-speech synthesis, automatic speech recognition, automatic detection and correction of grammar and semantic errors in texts, and determining basics for the application of the models in automatic document classification and other tasks. The core of the development of language models for Serbian is defining morphologic classes of words, based on the information contained within the morphologic dictionary of Serbian, which was one of the results of a previous research.en
dc.languagesr (latin script)
dc.publisherУниверзитет у Новом Саду, Факултет техничких наукаsr
dc.rightsopenAccessen
dc.sourceУниверзитет у Новом Садуsr
dc.subjectObrada prirodnog jezikasr
dc.subjectNatural language processingen
dc.subjectComputational linguisticsen
dc.subjectračunarska lingvistikasr
dc.titleModeli srpskog jezika i njihova primena u govornim i jezičkim tehnologijamasr
dc.title.alternativeModels of the Serbian language and their application in speech and language technologiesen
dc.typedoctoralThesissr
dc.rights.licenseBY
dc.identifier.fulltexthttp://nardus.mpn.gov.rs/bitstream/id/41393/IzvestajKomisije.pdf
dc.identifier.fulltexthttp://nardus.mpn.gov.rs/bitstream/id/41392/Disertacija.pdf


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record