Word Files
Reference for Downloading Educational Files

Identifying the appropriate features in the text to resolve semantic ambiguity

Number of pages: 87 File Format: word File Code: 31079
Year: 2013 University Degree: Master's degree Category: Computer Engineering

Tags/Keywords: A knowledge-based perspective - artificial intelligence - Language processing - Natural language processing - Resolving semantic ambiguity - Wordnet

Part of the Content
Contents & Resources

Summary of Identifying the appropriate features in the text to resolve semantic ambiguity

Master's Thesis in Computer-Software Engineering

Abstract

Identifying appropriate features in the text to resolve semantic ambiguity

It can be boldly claimed that the present age is the age of information explosion, and perhaps language can be considered as the most important obstacle in the transmission of information. Therefore, the necessity of using machines in the processing and translation of texts has become an undeniable necessity. But the problems that exist in the way of machine translators prevent this task from being of sufficient quality and accuracy.

One of the most influential issues in the accuracy and quality of machine translation is removing the ambiguity of meaning, which accuracy increases the accuracy of the entire translation process. The purpose of removing semantic ambiguity is to choose the appropriate meaning of the word according to the text, for words that have several different meanings. Therefore, in this research, we have tried to examine different methods and different ideas and take a step in this direction by presenting a different method.

The method presented in this thesis is a knowledge-based method that resolves ambiguity by using additional information about an ambiguous word in the text and presenting a scoring method. For this purpose, on the one hand, by using Wordnet and other sources that are somehow complementary to Wordnet, we prepare a list of words related to an ambiguous word, and on the other hand, we extract the words associated with an ambiguous word in the text from the desired corpus. Then, using a scoring relationship, we select the meaning that has the most points and seems more relevant. Finally, we check the accuracy of the presented method and compare the results with the accuracy of other methods.

Keywords: removing semantic ambiguity, knowledge-based view, Wordnet, developed Wordnet, machine translation

Chapter 1

Introduction

1-1- Introduction

The production of a huge amount of articles and documents prompted the scientific community to Taking advantage of the advantages and capabilities of automatic methods for processing these texts, turned to the field of natural language processing[1]. Also, considering the existence of a list of the meanings of words and phrases, or the dictionary, and even assigning institutions to determine how to use a language in some countries, it seems that it is possible to mechanize the understanding of a language by a computer [1]. It may seem that the languages ??we use in our daily life to communicate with others are simple. But in reality, these human languages ??have many complexities, which have led to the formation of many sub-fields such as machine translation[2], information retrieval[3], text processing[4], speech recognition[5], grammar analysis[6], semantic ambiguity resolution[7], etc. in the field of natural language processing. My thesis deals with this issue. Semantic ambiguity is one of the complex and at the same time important topics that is also discussed in fields such as machine translation and information retrieval and is valuable and important as an integral part of such systems.

In fact, this topic originates from the ambiguity that lies in natural languages; Although the existence of these ambiguities is hidden from human sight most of the time. What resolves the ambiguities between native speakers is their linguistic ability, their information about the world around them, re-questioning in case of ambiguity or a sense of ambiguity, and in general, sets of linguistic and non-linguistic information that native speakers are equipped with [40]. This importance has been discussed and applied in many branches of natural language processing, among which the main and most obvious use case is in the field of machine translation. Therefore, in this chapter, we first have a brief reference to the scope of natural language processing and its sub-branches, then we briefly explain the concept of machine translation and its methods.

1-2- Natural language processing

Natural language processing, which is usually abbreviated as NLP, is one of the needs of the technological era for the optimal use of information resources, which today, with the growth of the volume of documents produced and the need to store, categorize, retrieve and process them quickly and mechanically, the attention to this branch is more prominent.

Natural language is the language that we use in our daily social interactions to write and speak we do There are many different natural languages ??that may have different spoken and written forms and are independent from each other. The processing of natural languages ??and conversations is one of the things that has attracted the attention of many scientists with the introduction of computer technology into human life. Even Alan Turing's idea [9] of his intelligent machine and his definition of artificial intelligence [10] were related to the processing of natural languages ??in the first stage. In addition, many efforts had been made by humans to pursue this, for example, the Lisa machine is one of the products of these efforts. Lisa's machine was a machine that by typing remotely with a human, processed his sentences and gave him an appropriate answer. Therefore, it can be said that one of the important branches in the wide field of artificial intelligence is natural language processing; To the extent that many experts in the field of artificial intelligence believe that the most important task that artificial intelligence should address is NLP. The reason they give for this belief is that natural language processing opens the way for direct human-computer communication through conversation. In this way, conventional programming and conventions related to operating systems will be abandoned. Also, if a computer can understand and speak a human language, many tasks that should be designed by software engineers will no longer be needed. But the dimensions and complexities of human languages ??have made it difficult to fully achieve this capability.

In natural language processing, it is tried to give the computer the ability to understand commands written in standard human languages. It means to have a computer that can analyze and understand human language and can even produce natural language. Obviously, in order to achieve this goal, a wide knowledge of the language is needed. Therefore, in addition to computer science researchers, the knowledge of linguists is also necessary. In the context of natural language processing, the answer to the following four questions should be studied:

What words does a language consist of?

How are words combined to form language sentences?

What is the meaning of language words?

How are the meanings of words used to create the meaning of sentences?

In fact, the main goal in NLP is to machine the process of understanding and understanding the concepts expressed with a natural language. is human More precisely, natural language processing is the use of computers to process spoken and written language in such a way that computers use natural language as input and output. With this, it is possible to translate languages, use web pages and written databases to answer questions, or talk to devices, for example, to get advice.

Generally, the way this branch works is to imitate natural human languages. In the meantime, human complexity affects interactive communication from the perspective of psychology. Therefore, natural language processing is considered a very attractive approach for communication between humans and machines, and if it is fully implemented, it can lead to amazing changes. The following figure shows a general outline of natural language processing architecture:

The problem of natural language processing is usually considered an AI-Complete problem, because its realization requires a high level of understanding of the outside world and human states for the machine. One of the basic obstacles in this field can be mentioned the need to understand the meanings by the computer. It means that in order for the computer to have a correct understanding of a sentence and to understand the information hidden in that sentence, sometimes it is necessary to have an understanding of the meaning of the words in the sentence and only familiarity with grammar is not enough. For example, the sentence "Hasan did not eat the apple because it was ugly." And the sentence "Hasan did not eat the apple because he was full." They have the same grammatical structure.
Contents & References of Identifying the appropriate features in the text to resolve semantic ambiguity

List:

Chapter One: Introduction

1-1- Introduction. 2

1-2- natural language processing. 3

1-3- machine translation. 8

1-3-1- Machine translation methods 10

1-3-1-1- Law-based methods. 11

1-3-1-2- Corpus-based methods 13

1-3-2- Factors affecting the quality of translation 15

1-4- Thesis structure. 17

Chapter Two: Resolving Semantic Ambiguity

2-1- Introduction. 20

2-2- Types of knowledge sources. 22 2-2-1- Structured knowledge sources 23 2-2-2- Unstructured knowledge sources 24 2-2-2-1 Another division of bodies 25 2-3 Different approaches in resolving semantic ambiguity. 26

2-3-1- Body-based view 26

2-3-1-1- Monitoring systems. 26

2-3-1-2- Unsupervised systems. 27

2-3-2- Knowledge-based view 28

2-3-3- Combined and creative view 30

2-4- Evaluation factors. 30

2-4-1- Coverage 31

2-4-2- Accuracy 31

2-4-3- Correctness and recall 31

2-4-4- F-SCORE 32

Chapter 3: Review of previous related works

3-1- Introduction. 34

3-2- Supervisory methods. 35

3-3- Unsupervised methods. 39

3-4- Knowledge-based methods. 41

3-5- Combined and creative methods. 44

Chapter Four: Proposed Method

4-1- Introduction. 51

4-2- Introducing the tools and resources used 52

4-2-1- Root finder 52

4-2-2- Labeling part of speech 53

4-2-3- Wordnet 54

4-2-4- Expanded Wordnet 57

4-2-5- Domain Wordnet 59

3-4- Steps of the proposed method. 59

4-3-1- extraction of associated words 60

4-3-1-1- preprocessing. 61

4-3-2- word list extraction 61

4-3-2-1- synonyms and definitions. 62

4-3-2-2- All semantic relations. 62

4-3-2-3- Hypernym on several levels. 63

4-3-2-4- range of words. 64

4-3-2-5- Scoring. 64

Chapter Five: Implementation and Evaluation

5-1- Introduction. 67

5-2- Results. 68

Chapter Six: Summary and Conclusion

6-1- Conclusion. 71

6-2- Upcoming works. 72

List of sources. 74

Source:

[1] D. Martinez Iraolak, “Supervised Word Sense Disambiguation: Facing Current Challenges,” University of the Basque Country, 2004.

[2] A. H. Rasekh, “Word Sense Disambiguation Based on Conceptual and Morphological Analysis of Words,” Shiraz University, 2012.
[3] A. R. Rezapour, “An Investigation into Knowledge-Based Machine Translation in Persian Language,” Shiraz University, 2011. Disambiguation,” Shiraz University, 2013.

[6] http://people.lett.unitn.it/baroni/tp/materials/WSD-HLT-2011.pdf

[7] A. E. Agirre and P. G. Edmonds, “Word Sense Disambiguation: Algorithms and Applications,” Springer Science + Business Media, Vol. 33, 2006.[8] R. Navigli, "Word Sense Disambiguation: A Survey," ACM Computing Surveys, Vol. 41, No. 2, Article. 10, 2009.

[9] http://icame.uib.no/brown/bcm.html#n1

[10] H. T. Ng and H. B. Lee, "Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach," Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 40-47, 1996.

[11] A. R. Rezapour, S. M. Fakhrahmad and M. H. Sadreddini, "Applying Weighted KNN to Word Sense Disambiguation," Proceedings of the World Congress on Engineering 2011, Vol. III, 2011. M. Nameh, S.M. Jahromi, “A New Approach to Word Sense Disambiguation,” Proceedings of the World Congress on Engineering 2011, Vol. I, 2011. Naïve BayesianPedersen, "A Simple Approach to Building Ensembles of Naïve Bayesian Classifiers for Word Sense Disambiguation," Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computational Linguistics, Seattle, pp. 63-69, 2000.

[14] S. Elmougy, T. Hamza and H. M. Noaman, “Naïve Bayes Classifier for Arabic Word Sense Disambiguation,” Proceedings of the INFOS2008, pp. 16-21, 2008.

[15] P. F. Brown, S. A. DellaPietra, V. J. DellaPietra and R. L. Mercer, “Word Sense Disambiguation Using Statistical Methods,” Annual Meeting of the Association for Computational Linguistics, pp. 264-70, 1991.

[16] I. Dagan and A. Itai, “Word sense disambiguation using a second language monolingual corpus,” Association for Computational Linguistics, 20(4): 563-96, 1994.

[17] M. Soltani and H. Faili, “A Statistical Approach on Persian Word Sense Disambiguation,” The 7th International Conference on Informatics and Systems (INFOS), pp. 1-6, 2010. [18] D. Yarowsky, "Unsupervised word sense disambiguation rivaling supervised methods," Proceeding ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pp.189-196, 1995. [19] X. Wang, "Automatic acquisition of English topic signatures based on a second language,” Proceedings of the ACL 2004 workshop on Student research, 2004.

[20] M. Lesk, “Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone From an Ice Cream Cone,” Proceedings of the 5th Annual International Conference on Systems Documentation, New York, pp. 24-26, 1986.

[21] S. Banerjee and T. Pedersen, "An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet," Proceedings of the Third Computational linguistics and intelligent text processing, pp. 136-145, 2002. [22] A. Kilgarriff and J. Rosenzweig, "English SENSEVAL: Report and results," Proceedings of the 2nd International Conference on Language Resources and Evaluation, 2000.

[23] S. Banerjee and T. Pedersen, "Extended gloss overlaps as a measure of semantic relatedness," International Joint Conference on Artificial Intelligence, vol. 18, pp. 805-810, 2003.

[24] F. Vasilescu, P. Langlais and G. Lapalme, “Evaluating Variants of the Lesk Approach for Disambiguating Words,” Proceedings of the Conference of Language Resources and Evaluations, pp. 633-636, 2004.

[25] S. Kumar Naskar and S. Bandyopadhyay, "JU-SKNSB: extended WordNet based WSD on the English all-words task at SemEval-1," Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 203-206, 2007.

[26] D. Yarowsky, "Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora," Proceedings of 15th International Conference on Computational Linguistics, pp.454-60, 1992.

[27] R. Mihalcea and D. Moldovan, "An Iterative Approach to Word Sense Disambiguation," Proceedings of FLAIRS, pp. 219-223, 2000.

[28] A. H. Rasekh and M. H. Sadreddini, “Word Sense Disambiguation Algorithms Based on the Context, Structure and Meaning,” International Journal of Signal and Data Processing, Vol. 2, pp. 40-47, 2013. [29] S.M. Fakhrahmad, A.R. Rezapour, M. Zolghadri Jahromi and M.H. Sadreddini, "A New Word Sense Disambiguation System Based on Deduction," Proceedings of the World Congress on Engineering, vol. II, pp. 1276-1281, 2011.

[30] M. F. Porter, “An Algorithm for Suffix Stripping,” in Program: Electronic Library and Information Systems, Vol. 14 Issue: 3, pp. 130 – 137, 1980.

[31] R. Krovetz, “Viewing Morphology as an Inference Process,” in R. Korfhage et al., Proc 16th ACM SIGIR Conference, Pittsburgh, pp. 191-202, June 27-July 1,1993.

[32] K. Glass and S. Bangay, “Evaluating Parts-of-Speech Taggers for Use in a Text-to-Scene Conversion System,” Proceedings of SAICSIT, pp.1-9, 2005.

[33] http://nlp.stanford.

How To Access The File

Designing and implementing a contradiction solver in an intelligent assistant decision system based on diversity of opinions

Number of pages: 105 Category: Computer Engineering

Master's thesis in the field of computer engineering (artificial intelligence) abstract design and implementation of contradiction resolution in an intelligent assistant decision system based on the diversity of points of view, resolution of inconsistency is an important procedure in many intelligent systems, including law-based systems. This procedure determines the order of ...

The manifestations and symbols of nature in Nimai's poetry

Number of pages: 93 Category: Literature - Persian Language

Dissertation for master's degree in literature and humanities Abstract Shaarno has impressive and admirable expressive and rhetorical strengths and differs from old poetry in terms of face, intentions and concepts, form and content, rhetorical forms, weight, song and music, and areas of emotion, feeling and imagination. One of the fields in which contemporary poetry has a new ...

Presenting an efficient model based on the subcombinations extracted from the feature to recognize human physical activities

Number of pages: 140 Category: Computer Engineering

Doctoral thesis in the field of computer engineering (artificial intelligence) Abstract Understanding and extracting information from images and videos is the common thread of the majority of problems related to machine vision. Finding the main and useful parts of a movie and modeling the actions between these parts is one of the main goals of movie analysis. In the last decade, ...

An efficient model for creating a parallel text corpus from a comparative text corpus

Number of pages: 94 Category: Computer Engineering

Master's Thesis in Computer Engineering (Software) Abstract Most modern translation approaches in the field of machine translation, including statistical machine translation, example-based machine translation, and combined machine translation, use a collection of co-translated texts under the name of parallel text bodies as the main educational data. But for most languages, ...

Presenting a model for solving constraint satisfaction problems using multi-agent systems

Number of pages: 97 Category: Computer Engineering

Master's Thesis in Computer Engineering (Artificial Intelligence) Abstract Multi-agent systems are computing systems in which several agents interact and work together to achieve a specific goal. The reason for the emergence of such systems is the existence of situations in which a problem must be solved in a distributed fashion. For example, in situations where the use of a ...

Presenting a feature-based model to analyze the sentiment in texts

Number of pages: 74 Category: Computer Engineering

Master's Thesis in Computer Engineering (Software) First Chapter Preface 1-1- Introduction Some authors define data mining as a tool to search for useful information in a large amount of data. To perform the data mining process, we encounter various research fields, such as database, machine learning and statistics. Databases are essential for analyzing large amounts of data. ...

Examining imaginary images (rhetorical images) in Divan of Vasal Shirazi's poems

Number of pages: 160 Category: Literature - Persian Language

Thesis, in order to receive a master's degree (M.A) in the field of Persian language and literature. Abstract: Persian poetry and literature have long been mixed and familiar with the forms of fantasy. Rhetorical figures include simile, metaphor, metaphor, and irony, and for more than a thousand years, Persian language poets have made themselves familiar in their poetry. In all ...

Examining imaginary images (rhetorical images) in Divan of Vasal Shirazi's poems

Number of pages: 160 Category: Literature - Persian Language

Examining the manifestations of the school of critical realism in Gogol's dramatic works and its reflection in Akhundzadeh's works (from the perspective of Lukacs's social criticism)

Number of pages: 213 Category: Literature - Persian Language

The thesis of the master's degree in dramatic literature was Nikolai Gogol, a Russian novelist and dramatist, who dealt with social realities with an emotional and humorous look in his works. In the first chapter, the generalities of the research are presented. In the second chapter, Luk?cs's point of view in this field is also explained. In the third chapter, Gogol's approach ...

The relationship between spiritual intelligence and reliable leadership of managers and employees (Sports Federation of Veterans and Disabled of the Islamic Republic of Iran)

Number of pages: 167 Category: Management

Dissertation for obtaining a master's degree (public administration field of transformation management) Abstract In the complex, unstable, changing, unpredictable and competitive conditions of today's environment, the organizational leader needs to build more trust with the employees. Authentic leadership style is a good way to increase trust. Therefore, in this research, we ...

Identifying the appropriate features in the text to resolve semantic ambiguity

Summary of Identifying the appropriate features in the text to resolve semantic ambiguity

Contents & References of Identifying the appropriate features in the text to resolve semantic ambiguity