Impact of text classification on natural language processing applications
Uticaj klasifikacije teksta na primene u obradi prirodnih jezika
Author
Šandrih, BranislavaMentor
Kartelj, AleksandarCommittee members
Pavlović-Lažetić, Gordana
Filipović, Vladimir

Krstev, Cvetana

Mitkov, Ruslan
Metadata
Show full item recordAbstract
The main goal of this dissertation is to put different text classification tasks in
the same frame, by mapping the input data into the common vector space of linguistic
attributes. Subsequently, several classification problems of great importance for natural
language processing are solved by applying the appropriate classification algorithms.
The dissertation deals with the problem of validation of bilingual translation pairs, so
that the final goal is to construct a classifier which provides a substitute for human evaluation
and which decides whether the pair is a proper translation between the appropriate
languages by means of applying a variety of linguistic information and methods.
In dictionaries it is useful to have a sentence that demonstrates use for a particular dictionary
entry. This task is called the classification of good dictionary examples. In this thesis,
a method is developed which automatically estimates whether an example is good or bad
for a specific dictionary entr...y.
Two cases of short message classification are also discussed in this dissertation. In the
first case, classes are the authors of the messages, and the task is to assign each message
to its author from that fixed set. This task is called authorship identification. The other
observed classification of short messages is called opinion mining, or sentiment analysis.
Starting from the assumption that a short message carries a positive or negative attitude
about a thing, or is purely informative, classes can be: positive, negative and neutral.
These tasks are of great importance in the field of natural language processing and the
proposed solutions are language-independent, based on machine learning methods: support
vector machines, decision trees and gradient boosting. For all of these tasks, a
demonstration of the effectiveness of the proposed methods is shown on for the Serbian
language.
Osnovni cilj disertacije je stavljanje različitih zadataka klasifikacije teksta u
isti okvir, preslikavanjem ulaznih podataka u isti vektorski prostor lingvističkih atributa...