National Repository of Dissertations in Serbia
    • English
    • Српски
    • Српски (Serbia)
  • English 
    • English
    • Serbian (Cyrilic)
    • Serbian (Latin)
  • Login
View Item 
  •   NaRDuS home
  • Универзитет у Новом Саду
  • Филозофски факултет
  • View Item
  •   NaRDuS home
  • Универзитет у Новом Саду
  • Филозофски факултет
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Automatsko određivanje vrsta riječi u morfološki složenom jeziku

Automatic parts of speech determination in amorphologically complex language

Thumbnail
2015
Disertacija9354.pdf (1.806Mb)
IzvestajKomisije9354.pdf (232.2Kb)
Author
Димитријевић, Страхиња
Mentor
Milin, Petar
Committee members
Filipović-Đurđević, Dušica
Kostić, Aleksandar
Metadata
Show full item record
Abstract
Istraţivanje je imalo za cilj da provjeri u kojoj mjeri se naš kognitivni sistem moţe osloniti na fonotaktiĉke informacije, tj. moguće/dozvoljene kombinacije fonema/ grafema, u zadacima automatske percepcije i produkcije rijeĉi u jezicima sa bogatom infleksionom morfologijom. Da bi se dobio odgovor na to pitanje, sprovedene su tri studije. U prvoj studiji, uz pomoć mašina sa vektorima podrške (SVM), obavljena je diskriminacija promjenljivih vrsta rijeĉi. U drugoj studiji, produkcija infleksionih oblika rijeĉi izvedena je pomoću uĉenja zasnovanog na memoriji (MBL). Na osnovu rezultata iz druge studije, izveden je eksperiment u kojem se traţila potvrda kognitivne vjerodostojnosti modela i korišćenih informacija. Diskriminacija promjenljivih vrsta rijeĉi obavljena je na osnovu dozvoljenih sekvenci dva i tri grafema/fonema (tzv. bigrama i trigrama), ĉije su frekvencije javljanja unutar pojedinaĉnih gramatiĉkih tipova izraĉunate u zavisnosti od njihovog poloţaja u rijeĉima: na poĉetku, na k...raju, unutar rijeĉi, svi zajedno. Maksimalna taĉnost se kretala oko 95% i dobijena je na svim bigramima, uz pomoć RBF jezgrene funkcije. Ovako visok procenat taĉne diskriminacije ukazuje da postoje karakteristiĉne distribucije bigrama za razliĉite vrste promjenljivih rijeĉi. S druge strane, najmanje informativnim su se pokazali bigrami na kraju i na poĉetku rijeĉi. MBL model iskorišćen je u zadatku automatske infleksione produkcije, tako što je za zadatu rijeĉ, na osnovu fonotaktiĉkih informacija iz posljednja ĉetiri sloga, generisan traţeni infleksioni oblik. Na uzorku od 89024 promjenljivih rijeĉi uzetih iz Frekvencijskog reĉnika dnevne štampe srpskog jezika, koristeći metod izostavljanja jednog primjera i konstantu veliĉinu skupa susjeda (k = 7), ostvarena je taĉnost oko 92%. Identifikovano je nekoliko faktora koji su uticali na ovu taĉnost, kao što su: vrsta rijeĉi, gramatiĉki tip, naĉin tvorbe i broj primjera u okviru jednog gramatiĉkog tipa, broju izuzetaka, broj fonoloških alternacija itd. U istraţivanju na subjektima, u zadatku leksiĉke odluke, za rijeĉi koje je MBL pogrešno obradio utvrĊeno je duţe vrijeme obrade. Ovo ukazuje na kognitivnu vjerodostojnost uĉenja zasnovanog na memoriji. Osim toga, potvrĊena je i kognitivna vjerodostojnost fonotaktiĉkih informacija, ovaj put u zadatku razumijevanja jezika. Sveukupno, nalazi dobijeni u ove tri studije govore u prilog teze o znaĉajnoj ulozi fonotaktiĉkih informacija u percepciji i produkciji morfološki sloţenih rijeĉi. Rezultati, takoĊe, ukazuju na potrebu da se ove informacije uzmu u obzir kada se diskutuje pojavljivanje većih jeziĉkih jedinica i obrazaca.

The study was aimed at testing the extent to which our cognitive system can rely on phonotactic information, i.e., possible/ permissible combinations of phonemes/ graphemes, in the tasks of automatic processing and production of words in languages with rich inflectional morphology. In order to obtain the answer to this question, three studies have been conducted. In the first study, by applying the support vector machines (SVM) the discrimination of part of speech (PoS) with more than one possible meaning (i.e., ambiguous PoS) was performed. In the second study, the production of inflected word forms was done with memory based learning (MBL). Based on the results from the second study, a behavioral experiment was conducted as the third study, to test cognitive plausibility of the MBL performance. The discrimination of ambiguous PoS was performed using permissible sequences of two and three characters/sounds (i.e., bigrams and trigrams), whose frequency of occurrence within individual g...rammatical types was calculated depending on their position in a word: at the beginning, at the end, and irrespective of position in a word. Maximum accuracy achieved was approximatelly 95%. It was obtained when bigrams irrespective of position in a word were used. SVM model used RBF kernel function. Such high accuracy suggests that brigrams' probability distribution is informative about the types of flective words. Interestingly, the least informative were bigrams at the end and at the beginning of words. The MBL model was used in the task of automatic production of inflected forms, utilizingphonotactic information from the last four syllables. In a sample of 89024 flective words, taken from the Frequency dictionary of Serbian language (daily press), achieved accuracy was 92%. For this result the MBL used leave -one -out method and nearest neighborhood size of 7 (k = 7). We identified several factors that have contributed to the accuracy; in particular, part of speech, grammatical type, formation method and number of examples within one grammatical type, number of exceptions, the number of phonological alternations, etc. The visual lexical decision experiment revealed that words that the MBL model produced incorrectly also induced elongated reaction time latencies. Thus, we concluded that the MBL model might be cognitively plausibile. In addition, we reconfirmed informativeness of phonotactic information, this time in human conmprehension task. Overall, findings from three undertaken studies are in favor of phonotactic information for both processing and production of morphologically complex words. Results also suggest a necessity of taking into account this information when discussing emergence of larger units and language patterns.

Faculty:
Универзитет у Новом Саду, Филозофски факултет
Date:
24-07-2015
Keywords:
Fonotaktiĉke informacije / Phonotactic information / inflectionalmorphology / support vector machines / memory based learning / cognitiveplausibility / infleksionamorfologija / mašine sa vektorima podrške / uĉenje zasnovano na memoriji / kognitivnavjerodostojnost
[ Google Scholar ]
Handle
https://hdl.handle.net/21.15107/rcub_nardus_8139
URI
http://www.cris.uns.ac.rs/DownloadFileServlet/Disertacija143132921542657.pdf?controlNumber=(BISIS)94868&fileName=143132921542657.pdf&id=3666&source=NaRDuS&language=sr
https://nardus.mpn.gov.rs/handle/123456789/8139
http://www.cris.uns.ac.rs/record.jsf?recordId=94868&source=NaRDuS&language=sr
http://www.cris.uns.ac.rs/DownloadFileServlet/IzvestajKomisije143132920523017.pdf?controlNumber=(BISIS)94868&fileName=143132920523017.pdf&id=3665&source=NaRDuS&language=sr

DSpace software copyright © 2002-2015  DuraSpace
About NaRDus | Contact us

OpenAIRERCUBRODOSTEMPUS
 

 

Browse

All of DSpaceUniversities & FacultiesAuthorsMentorCommittee membersSubjectsThis CollectionAuthorsMentorCommittee membersSubjects

DSpace software copyright © 2002-2015  DuraSpace
About NaRDus | Contact us

OpenAIRERCUBRODOSTEMPUS