Identifying the appropriate features in the text to resolve semantic ambiguity

Number of pages: 87 File Format: word File Code: 31079
Year: 2013 University Degree: Master's degree Category: Computer Engineering
  • Part of the Content
  • Contents & Resources
  • Summary of Identifying the appropriate features in the text to resolve semantic ambiguity

    Master's Thesis in Computer-Software Engineering

    Abstract

    Identifying appropriate features in the text to resolve semantic ambiguity

    It can be boldly claimed that the present age is the age of information explosion, and perhaps language can be considered as the most important obstacle in the transmission of information. Therefore, the necessity of using machines in the processing and translation of texts has become an undeniable necessity. But the problems that exist in the way of machine translators prevent this task from being of sufficient quality and accuracy.

    One of the most influential issues in the accuracy and quality of machine translation is removing the ambiguity of meaning, which accuracy increases the accuracy of the entire translation process. The purpose of removing semantic ambiguity is to choose the appropriate meaning of the word according to the text, for words that have several different meanings. Therefore, in this research, we have tried to examine different methods and different ideas and take a step in this direction by presenting a different method.

    The method presented in this thesis is a knowledge-based method that resolves ambiguity by using additional information about an ambiguous word in the text and presenting a scoring method. For this purpose, on the one hand, by using Wordnet and other sources that are somehow complementary to Wordnet, we prepare a list of words related to an ambiguous word, and on the other hand, we extract the words associated with an ambiguous word in the text from the desired corpus. Then, using a scoring relationship, we select the meaning that has the most points and seems more relevant. Finally, we check the accuracy of the presented method and compare the results with the accuracy of other methods.

    Keywords: removing semantic ambiguity, knowledge-based view, Wordnet, developed Wordnet, machine translation

    Chapter 1

    Introduction

    1-1- Introduction

    The production of a huge amount of articles and documents prompted the scientific community to Taking advantage of the advantages and capabilities of automatic methods for processing these texts, turned to the field of natural language processing[1]. Also, considering the existence of a list of the meanings of words and phrases, or the dictionary, and even assigning institutions to determine how to use a language in some countries, it seems that it is possible to mechanize the understanding of a language by a computer [1]. It may seem that the languages ??we use in our daily life to communicate with others are simple. But in reality, these human languages ??have many complexities, which have led to the formation of many sub-fields such as machine translation[2], information retrieval[3], text processing[4], speech recognition[5], grammar analysis[6], semantic ambiguity resolution[7], etc. in the field of natural language processing. My thesis deals with this issue. Semantic ambiguity is one of the complex and at the same time important topics that is also discussed in fields such as machine translation and information retrieval and is valuable and important as an integral part of such systems.

    In fact, this topic originates from the ambiguity that lies in natural languages; Although the existence of these ambiguities is hidden from human sight most of the time. What resolves the ambiguities between native speakers is their linguistic ability, their information about the world around them, re-questioning in case of ambiguity or a sense of ambiguity, and in general, sets of linguistic and non-linguistic information that native speakers are equipped with [40]. This importance has been discussed and applied in many branches of natural language processing, among which the main and most obvious use case is in the field of machine translation. Therefore, in this chapter, we first have a brief reference to the scope of natural language processing and its sub-branches, then we briefly explain the concept of machine translation and its methods.

    1-2- Natural language processing

    Natural language processing, which is usually abbreviated as NLP, is one of the needs of the technological era for the optimal use of information resources, which today, with the growth of the volume of documents produced and the need to store, categorize, retrieve and process them quickly and mechanically, the attention to this branch is more prominent.

    Natural language is the language that we use in our daily social interactions to write and speak we do There are many different natural languages ??that may have different spoken and written forms and are independent from each other. The processing of natural languages ??and conversations is one of the things that has attracted the attention of many scientists with the introduction of computer technology into human life. Even Alan Turing's idea [9] of his intelligent machine and his definition of artificial intelligence [10] were related to the processing of natural languages ??in the first stage. In addition, many efforts had been made by humans to pursue this, for example, the Lisa machine is one of the products of these efforts. Lisa's machine was a machine that by typing remotely with a human, processed his sentences and gave him an appropriate answer. Therefore, it can be said that one of the important branches in the wide field of artificial intelligence is natural language processing; To the extent that many experts in the field of artificial intelligence believe that the most important task that artificial intelligence should address is NLP. The reason they give for this belief is that natural language processing opens the way for direct human-computer communication through conversation. In this way, conventional programming and conventions related to operating systems will be abandoned. Also, if a computer can understand and speak a human language, many tasks that should be designed by software engineers will no longer be needed. But the dimensions and complexities of human languages ??have made it difficult to fully achieve this capability.

    In natural language processing, it is tried to give the computer the ability to understand commands written in standard human languages. It means to have a computer that can analyze and understand human language and can even produce natural language. Obviously, in order to achieve this goal, a wide knowledge of the language is needed. Therefore, in addition to computer science researchers, the knowledge of linguists is also necessary. In the context of natural language processing, the answer to the following four questions should be studied:

    What words does a language consist of?

    How are words combined to form language sentences?

    What is the meaning of language words?

    How are the meanings of words used to create the meaning of sentences?

    In fact, the main goal in NLP is to machine the process of understanding and understanding the concepts expressed with a natural language. is human More precisely, natural language processing is the use of computers to process spoken and written language in such a way that computers use natural language as input and output. With this, it is possible to translate languages, use web pages and written databases to answer questions, or talk to devices, for example, to get advice.

    Generally, the way this branch works is to imitate natural human languages. In the meantime, human complexity affects interactive communication from the perspective of psychology. Therefore, natural language processing is considered a very attractive approach for communication between humans and machines, and if it is fully implemented, it can lead to amazing changes. The following figure shows a general outline of natural language processing architecture:

    The problem of natural language processing is usually considered an AI-Complete problem, because its realization requires a high level of understanding of the outside world and human states for the machine. One of the basic obstacles in this field can be mentioned the need to understand the meanings by the computer. It means that in order for the computer to have a correct understanding of a sentence and to understand the information hidden in that sentence, sometimes it is necessary to have an understanding of the meaning of the words in the sentence and only familiarity with grammar is not enough. For example, the sentence "Hasan did not eat the apple because it was ugly." And the sentence "Hasan did not eat the apple because he was full." They have the same grammatical structure.

  • Contents & References of Identifying the appropriate features in the text to resolve semantic ambiguity

    List:

    Chapter One: Introduction

    1-1- Introduction. 2

    1-2- natural language processing. 3

    1-3- machine translation. 8

    1-3-1- Machine translation methods 10

    1-3-1-1- Law-based methods. 11

    1-3-1-2- Corpus-based methods 13

    1-3-2- Factors affecting the quality of translation 15

    1-4- Thesis structure. 17

    Chapter Two: Resolving Semantic Ambiguity

    2-1- Introduction. 20

    2-2- Types of knowledge sources. 22 2-2-1- Structured knowledge sources 23 2-2-2- Unstructured knowledge sources 24 2-2-2-1 Another division of bodies 25 2-3 Different approaches in resolving semantic ambiguity. 26

    2-3-1- Body-based view 26

    2-3-1-1- Monitoring systems. 26

    2-3-1-2- Unsupervised systems. 27

    2-3-2- Knowledge-based view 28

    2-3-3- Combined and creative view 30

    2-4- Evaluation factors. 30

    2-4-1- Coverage 31

    2-4-2- Accuracy 31

    2-4-3- Correctness and recall 31

    2-4-4- F-SCORE 32

    Chapter 3: Review of previous related works

    3-1- Introduction. 34

    3-2- Supervisory methods. 35

    3-3- Unsupervised methods. 39

    3-4- Knowledge-based methods. 41

    3-5- Combined and creative methods. 44

    Chapter Four: Proposed Method

    4-1- Introduction. 51

    4-2- Introducing the tools and resources used 52

    4-2-1- Root finder 52

    4-2-2- Labeling part of speech 53

    4-2-3- Wordnet 54

    4-2-4- Expanded Wordnet 57

    4-2-5- Domain Wordnet 59

    3-4- Steps of the proposed method. 59

    4-3-1- extraction of associated words 60

    4-3-1-1- preprocessing. 61

    4-3-2- word list extraction 61

    4-3-2-1- synonyms and definitions. 62

    4-3-2-2- All semantic relations. 62

    4-3-2-3- Hypernym on several levels. 63

    4-3-2-4- range of words. 64

    4-3-2-5- Scoring. 64

    Chapter Five: Implementation and Evaluation

    5-1- Introduction. 67

    5-2- Results. 68

    Chapter Six: Summary and Conclusion

    6-1- Conclusion. 71

    6-2- Upcoming works. 72

    List of sources. 74

     

    Source:

    [1] D. Martinez Iraolak, “Supervised Word Sense Disambiguation: Facing Current Challenges,” University of the Basque Country, 2004.

    [2] A. H. Rasekh, “Word Sense Disambiguation Based on Conceptual and Morphological Analysis of Words,” Shiraz University, 2012.

    [3] A. R. Rezapour, “An Investigation into Knowledge-Based Machine Translation in Persian Language,” Shiraz University, 2011. Disambiguation,” Shiraz University, 2013.

    [6] http://people.lett.unitn.it/baroni/tp/materials/WSD-HLT-2011.pdf

    [7] A. E. Agirre and P. G. Edmonds, “Word Sense Disambiguation: Algorithms and Applications,” Springer Science + Business Media, Vol. 33, 2006.[8] R. Navigli, "Word Sense Disambiguation: A Survey," ACM Computing Surveys, Vol. 41, No. 2, Article. 10, 2009.

    [9] http://icame.uib.no/brown/bcm.html#n1

    [10] H. T. Ng and H. B. Lee, "Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach," Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 40-47, 1996.

    [11] A. R. Rezapour, S. M. Fakhrahmad and M. H. Sadreddini, "Applying Weighted KNN to Word Sense Disambiguation," Proceedings of the World Congress on Engineering 2011, Vol. III, 2011. M. Nameh, S.M. Jahromi, “A New Approach to Word Sense Disambiguation,” Proceedings of the World Congress on Engineering 2011, Vol. I, 2011. Naïve BayesianPedersen, "A Simple Approach to Building Ensembles of Naïve Bayesian Classifiers for Word Sense Disambiguation," Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, pp. 63-69, 2000.

    [14] S. Elmougy, T. Hamza and H. M. Noaman, “Naïve Bayes Classifier for Arabic Word Sense Disambiguation,” Proceedings of the INFOS2008, pp. 16-21, 2008.

    [15] P. F. Brown, S. A. DellaPietra, V. J. DellaPietra and R. L. Mercer, “Word Sense Disambiguation Using Statistical Methods,” Annual Meeting of the Association for Computational Linguistics, pp. 264-70, 1991.

    [16] I. Dagan and A. Itai, “Word sense disambiguation using a second language monolingual corpus,” Association for Computational Linguistics, 20(4): 563-96, 1994.

    [17] M. Soltani and H. Faili, “A Statistical Approach on Persian Word Sense Disambiguation,” The 7th International Conference on Informatics and Systems (INFOS), pp. 1-6, 2010. [18] D. Yarowsky, "Unsupervised word sense disambiguation rivaling supervised methods," Proceeding ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pp.189-196, 1995. [19] X. Wang, "Automatic acquisition of English topic signatures based on a second language,” Proceedings of the ACL 2004 workshop on Student research, 2004.

    [20] M. Lesk, “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone From an Ice Cream Cone,” Proceedings of the 5th Annual International Conference on Systems Documentation, New York, pp. 24-26, 1986.

    [21] S. Banerjee and T. Pedersen, "An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet," Proceedings of the Third Computational linguistics and intelligent text processing, pp. 136-145, 2002. [22] A. Kilgarriff and J. Rosenzweig, "English SENSEVAL: Report and results," Proceedings of the 2nd International Conference on Language Resources and Evaluation, 2000.

    [23] S. Banerjee and T. Pedersen, "Extended gloss overlaps as a measure of semantic relatedness," International Joint Conference on Artificial Intelligence, vol. 18, pp. 805-810, 2003.

    [24] F. Vasilescu, P. Langlais and G. Lapalme, “Evaluating Variants of the Lesk Approach for Disambiguating Words,” Proceedings of the Conference of Language Resources and Evaluations, pp. 633-636, 2004.

    [25] S. Kumar Naskar and S. Bandyopadhyay, "JU-SKNSB: extended WordNet based WSD on the English all-words task at SemEval-1," Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 203-206, 2007.

    [26] D. Yarowsky, "Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora," Proceedings of 15th International Conference on Computational Linguistics, pp.454-60, 1992.

    [27] R. Mihalcea and D. Moldovan, "An Iterative Approach to Word Sense Disambiguation," Proceedings of FLAIRS, pp. 219-223, 2000.

    [28]    A. H. Rasekh and M. H. Sadreddini, “Word Sense Disambiguation Algorithms Based on the Context, Structure and Meaning,” International Journal of Signal and Data Processing, Vol. 2, pp. 40-47, 2013. [29] S.M. Fakhrahmad, A.R. Rezapour, M. Zolghadri Jahromi and M.H. Sadreddini, "A New Word Sense Disambiguation System Based on Deduction," Proceedings of the World Congress on Engineering, vol. II, pp. 1276-1281, 2011.

    [30] M. F. Porter, “An Algorithm for Suffix Stripping,” in Program: Electronic Library and Information Systems, Vol. 14 Issue: 3, pp. 130 – 137, 1980.

    [31] R. Krovetz, “Viewing Morphology as an Inference Process,” in R. Korfhage et al., Proc 16th ACM SIGIR Conference, Pittsburgh, pp. 191-202, June 27-July 1,1993.

    [32] K. Glass and S. Bangay, “Evaluating Parts-of-Speech Taggers for Use in a Text-to-Scene Conversion System,” Proceedings of SAICSIT, pp.1-9, 2005.

    [33] http://nlp.stanford.

Identifying the appropriate features in the text to resolve semantic ambiguity