Himanshu SharmaGLA University [email protected]: Machine Learning can play a vital role in amany applications such as data mining, natural languageprocessing, image recognition and expert systems.In the development of natural language system,the corpus based machine learning techniques arewidely applied. In this paper, machine learning methodssuch as classifiers, structured models and unsupervisedlearning methods are discussed that are appliedto natural language processing tasks such asdocument classification, disambiguation, parsing,tagging, extraction etc. This paper also covers differentlevels of linguistic analysis: Lexical Analysis,Parsing, Semantic Analysis, Part-of –Speech Taggingand Discourse Knowledge. The aim of this is to providevaluable information for further research.Index Terms: Machine Learning, Corpus, Tagging,Parsing, Discourse.I. INTRODUCTIONMachine Learning is science that provides computers theability to learn without being openly programmed. We usemachine learning many times a day without knowing it. Itbrings together computer science and statistics to exploit thepredictive power. On the other hand Natural Language Processingis a artificial intelligence method to communicatewith computers using natural language. Speech and Textworked as an input and output of an Natural Language ProcessingSystem. We can broadly divide NLP system into twoparts as Natural Language Understanding (NLU) and NaturalLanguage Generation (NLG).NLU refers transforming a giveninput into some useful representation and also analyzesdifferent aspects of lnguage.NLG refers to generate meaningfulsentence structures in some natural language from someinternal representation.Most of the approaches that exist for NLP are mainly focusedon machine learning that is a type artificial intelligence thatexamines and apply on patterns in data to develop a program’sown understanding. In the early 90s, the application ofmachine learning techniques for natural language learningproblems has drawn attention of NLP researchers. As a humanbeing, we understand our natural language easily andreally don’t care about what actually understanding involves.Corpus-based language acquisition techniques that are usedby NLP researchers are based on statistics and informationtheory. In this paper, we are discussing machine learningtechniques that are applied to major NLP task such as POSTagging, Parsing , Semantic analysis Word Sense Disambiguation,Text Categorization, Text Summarization and InformationExtraction. Different linguistic levels that are applicableopen to Machine Learning approaches are also covered inthis paper; after that we explain the different types of machinelearning approaches which are apply to natural languageapplication tasks.II. LEVELS OF LANGAUGE ANALYSISThe following are different form of knowledge in relationwith natural language understanding.Fig.1 Linguistic levelsA. Lexical AnalysisA lexicon is a gathering of information about the words of alanguage and the lexical categories to which a word belongs.A lexicon is usually structured as a collection of lexical entries,like (“bank” N). Thus Lexical Analysis refers to identifyand analyze structure of words. It is dividing whole text intoparagraphs, sentences and words. A morphological analyzeridentifies a word in a sentence and calls it a token. The tokensidentified are classified according to their use i.e. with identifyit lexical category (grammatical class).”.B. Syntactic ParsingThe most important goal of parsing is to construct a parse treefor a sentence from a given grammar. We check whether asentence is well formed or not. Thus we analyze words in asentence and arrange words in a manner that shows relationshipbetween these words in sentence. For example”Ramgoes to school” is valid sentence and “Ram school thegoes” is rejected by a syntactic analyzer. Consider a grammarG as:S ? NP VP NP ? ART N NP ?ART ADJ NVP ?V VP ?V NPHere S is a sentence, NP is a noun phrase, VP is verb phrase,ART is article, ADJ is adjective and V is verb.Parsing can be broadly classified into two types:(1)Top-Down Parser and (2)Bottom Up Parser. In Top down parsing,the parsing starts with the start symbol S given in the grammarand applies the production rules so that it changes into asequence of terminal symbols that matches the input sentencethat is to be parsed. If it matches with the input sentencethen the parsing is successful. If not, the process isstarted over again and we apply other production rules. Theparsing is repeated till a particular rule is found which explainsthe structure of the sentence. In Bottom up parsingmethod we start with the sentence and tries to reach at thestart symbol S by replacing right hand side of a rule with itcorresponding left hand side.C. Semantic AnalysisThe main purpose of semantic analysis is to generate thepartial meaning of a sentence from its syntactic structure. Thesentence is analyzed for its meaningfulness. There are variousapproaches of performing semantic analysis. First approach issytnta driven approach. In this approach, we generate themeaning of a sentence from the meaning of its parts. Secondapproach is Semantic Grammar, In this approach we augmentwith domain specific semantics and mainly developed fordialogue systems.D. Discourse AnalysisIn real world, the meaning of a sentence is actually may dependupon the meaning of the previous sentence. This isknown ad discourse knowledge. For example, interpretingpronouns and interpreting temporal aspects of the information.Consider a sentence “Ram hits a bike with a stone. Itbounces back”. Here “it” refers to stones. Discourse structureLexical AnalysisSyntactic AnalysisSemantic AnalysisDiscourse IntegrationPragmatic analysisdepends on application: Monologue, Dialogue or HumanComputer Interaction. Thus it deals with the study of therelationship between language and its use context.E. Pragmatic AnalysisIn Pragmatic Analysis we have to analyze on how differentsituations affects use of an sentence and how this use affectsthe meaning of sentence. It deals with those aspects of sentencethat require real world knowledge. It focuses on thecommunicative use of language realized as intentional humanaction. Thus it a practical usage of sentence: what a sentencemeans in practice.III. DIFFICULTIES IN NATURAL LANGUAGEUNDERSTANDINGA. Lexical AmbiguityA word may be a noun or verb. For example word “bank”can be noun as well as verb. This is the example of lexicalambiguity.B. Syntactic AmbiguityA sentence can have more than one parse trees thus thissentence is ambiguous in nature. For example, “Tell all trainson Monday.” –This is an example of syntactic ambiguity. Wecan have two interpretations of the same sentence as shownin the fig.Fig.2 Two structural representations of Tell all trains on Monday.C. Referential AmbiguityThis type of ambiguity means when a word or a phrase refersto two or more properties or things. For example use ofpronouns creates a referential ambiguity. Consider the sentence:John killed the girl with the knife.Here one interpretationcan be”kill the girl having knife” and other can be “killthe girl by knife”.Fig.3 Two interpretations of John killed the girl withthe knifeall trainsallteflightsVPV NPTell On MondayPPDETPPVPV NPNPNTelltrainsallOn MondayVPPSNPVPNPNPNkilledthe girl with the knifeJohnVPSNPVPPPNPNVthe girl with the knifeJohnkilleIV. MACHINE LEARNING BASED APPROACHESFOR NATURAL LANGUAGE PROCESSINGA. Naive Bayes ClassifierNaive Bayes is a simple and powerful technique for predictivemodeling. In this approach, training is very fast becauseonly the probability of each class and the probability of eachclass when given different input (x) values required to becalculated. We need not to fit any coefficients by optimizationprocedures. It is widely used by machine learning andNLP researchers with a great success. This algorithm is appliesto many NLP disambiguation tasks such as Part-of-Speech tagging, Word Sense disambiguation and Text Categorizationetc. We can do sentiment analysis of tweet ontwitter. We will collect the tweets from twitter using Twitterstreaming API. The collected tweets can be preprocessedusing Natural Language Toolkit methods. Then we will selectthe features of the tweets based on Chi square and NaïveBayes classifier is used to classify the tweets as positive andnegative.Fig.4 Naïve Bayes classifier builderB. N-gram and Hidden Markov ModelN-gram model uses the previous N-1 words in a sequence topredict n-th word. Hence we tried to approximate the probabilityof each word in terms of its context. Suppose a languagehas N word types in its lexicon, how likely is word a tofollow word b? For simplicity we can have following models:In Unigram model: Prob (pen)In Bigram model: Prob (pen | black)In Trigram model: Prob (pen | your black)The Markov model/chain is the guesses that the future behaviorof a system only depends on limited/narrow history. Thusin Markov Model with ith-order, the next state only dependson the i latest states, hence we conclude that an N-gram modelis a (N?1)-order Markov model.Hidden Markov Models (HMMs) are variations of MarkovModels. In HMMs we consider two layers of states: a visiblelayer that represents to input symbols and a hidden layerwhich is learnt by the system, describing tobroader categories.HMMs are widely used in language disambiguationtasks such as POS tagging, names entity recognition andclassification and extensions of HMMs can also be used forword sense disambiguation tasks. Consider the sentence”flies like a sand”. Here flies can be noun(N),verb(V) and soon.We will calculate the probability of flies as a verb, as anoun and so on. Similarly we will calculate the probability oflike as verb, noun and so on. Depending upon probabilityoutcome we will provide most appropriate part of speech toeach word in the sentence.Fig.5 Encoding the possible sequences using the MarkovAssumptionsC. Log-Linear ModelsLog Linear models have wide applications in NLP classificationtasks. Log Linear regression that is mostly used for binaryclassification is also used to classify verbs for machinetranslation purposes. This model is also used for POS tagging.We have some input domain X, and a finite label set Y. Ouraimis to provide a conditional probability P(y / x) for any x? Xand y ? Y.TrainingdataNaïveBayesclassifierbuilderNaïveBayesclassifierStartflies / VFlies / NFlies / PFlies/ARTlike / Vlike / Nlike / Plike /ARTA feature is a function f : X x Y ?R.Say we have m features?k for = 1……m.We also have a parameter vector W? Rm.We defineP(y / x,W)= eW.?(x,y) / ? y’ ? YeW.?(x,y’)D. Transformation Based LearningTransformation based learning is used widely for corpusbased natural language learning. This algorithm is based onmistake driven greedy approach that generates set of rules.We iteratively add a rule that best repairs current errors. Thisalgorithm is widely used in many natural language problemssuch as POS tagging, Word Sense Disambiguation and Parsing.Samuel, Carberry and Vijay-Shankar describes a MonteCarlo version of the Transformation based learning algorithmthat focuses on the large-scale search problem of selectingrestricts from all possible combinations of a pre-selected setof conditions. Every condition is made up of a feature and adistance. The feature refers a property of utterance that maybe important for the Dialogue Act Tagging applications, andthe distance point to the relative position of the utterance thatthe feature should be applied to.E. Decision TreesDecision Trees are widely used for Classification Problems.They are part of the family of machine learning methods andallow a hierarchical distribution of dataset collection. Using adecision tree algorithm is generated knowledge in the form ofa hierarchical tree structure that can be used to classify instancesbased on a training dataset. An example of the decisiontree used in natural language processing is the syntactictree generated with a grammar. Parts-of-speech are very importantin morphology because they can give us a largeamount of information about a word and its neighbours andthe way the word is pronounced. So, the problem of assigningparts-of-speech to words (part-of-speech tagging) is veryimportant in speech and language processing. Decision treescan also be used in pragmatics. Interpreting dialog act assumethat the system must decide whether a given input is a statement,a question, a directive, or an acknowledgement.We also use decision tree to solve ambiguity problems innatural language at different levels such as text summarization,text categorization and word sense disambiguation.Fig.6 Decision tree to find whether a person is fit or not.Suppose we like to predict whether a person is fit with theinformation like age, eating habit, and physical activity, etcgiven to us. The decision / internal nodes here are questionssuch as ‘His/Her age?’, ‘Exercise daily or not?’, ‘Does he eata lot of junk food’? And the leaves represent the outcomeslike either ‘fit’, or ‘unfit’. In this example it is a binary classificationproblem ( yes/ no type problem).F .Clustering AlgorithmsThese algorithms are unsupervised machine learning algorithmsexamples and are used to in various NLP tasks such assemantic classification, syntactic classification, documentretrieval and machine translation. The modified versions ofthese algorithms are used to handle noun and pronoun phraseco-reference resolutions in information extraction.We caninclude clustering techniques calculated by various means ofclassification, machine learning, tagging, stemming, andparsing. This will also improve the probability measure of thewhole sentence by looking ahead of traditional n-grammodelsAge < 35Eat Junk food alot?Exercisedaily?Unfit fit fit UnfitYes?No?No?Yes?No?Yes?entirely and developing measures of key, linked pairs ofwords in the sentence.G. Support Vector MachinesSupport Vector Machines (SVMs) work on principle StructuralRisk Minimization. SVMs are widely and efficientlyused in Pattern Recognition problems. In the field of NaturalLanguage Processing, SVMs are used in Text Categorizationtasks.The SVM based Recursive Feature Elimination algorithm is aimportant method for feature selection and extraction, used innatural language processing. Support Vector Machine methodgive good performance on the chunking tasks.H. Neural NetworksNeural Networks are used in Natural Language Processingfiled in many problems such as Speech recognition and synthesis,Optical Character Recognition, Part-of-Speech TaggingParsing, sentence analysis, PP-attachment disambiguationand text categorization.In NLP field, words and their nearby contexts are quite important:a word bounded by related context is important,while a word surrounded by unrelated context is not veryimportant. Every word is mapped to a vector that is describedin terms of its features (which in turn relate to the word'srelated context), and thus neural networks concepts can beused to learn which features is important and that maximize aword vector's score.A very close model to this is the semantic network. In Semanticnetworks we have nodes that represent concepts orlogics and connections that represent semantically meaningfulrelationship between these concepts or logics. These networksare mainly described as associative network modelsthan as neural/brain models. The activation rules that implementdata retrieval in these associative networks, often referredto as spreading activation, particularly produces a jointsearch. Therefore, they are also called "spreadingactivation"models.I. Genetic AlgorithmsGenetic Algorithms are used in Dialogue systems. Thesesystems are computer based systems that are used to communicatewith human in some natural language in form oftext or speech.Also Genetic algorithms are used in Language Generation.Language generation is a method that aims to create naturallanguagefrom a data representation such as a knowledgebase. These Language generators are often used to createtextual form of natural language. Genetic Algorithms are alsoused in story generation.We can also use these algorithms in recognizing lexical inference- given two terms a and b, predicting whether the meaningof b can be inferred from a.J. Instance Based LearningIBL methods for modeling real-valued or discrete valuedpredicted functionsIBL methods initially store the presentedtraining data. These algorithms are widely used in many areasof Artificial Intelligence. These algorithms are used in manyNLP problems such as information extraction, lexical, semanticdisambiguation of complete sentences, chunking, contextsensitive parsing, text categorization and semantic interpretation.Instance based Learning with automatic feature selectionis a widely used approach in the Word Sense Disambiguationfield.V. CONCLUSIONIn this paper, many machine learning methods are describedthat are used in natural language processing field. Many NLPtasks such as POS tagging, Syntactic Parsing, SemanticAnalysis, Word Sense Disambiguation, Text Categorization,Text Summarization, Information Extraction, Language Generation,Dialogue based Systems are addresses in terms ofmachine learning approaches. This paper has explored all themain methods used in different linguistic levels. In futurework, disambiguation problems must be addressed in detailbased on the given Machine Learning Methods.REFERENCES1 Leech, G, Garside, R & Atwell, E, 1983. The AutomaticGrammatical Tagging of the LOB CorpusICAMEJournal of the International Computer Archive ofModern English Vol.7.2 Samuel, K, Carberry, S, and Vijay-Shanker, K, 1998.Dialogue act tagging with transformation-based learning.In Proceedings of COLING/ACL-98, Montreal, volume2, pp.1150-11563 Brill, E. 1995. Transformation-based error-drivenlearning and natural language processing: a casestudy inPart-of-Speech tagging. Computational Linguistics,volume 21(4), pages 543-566.4 OpenNLP, http://opennlp.sourceforge.net5 E.Charnaik,Statistical Language Learning.The MITPress,Cambridge,Massachusetts,1993.6 Bishop C. 2006. Pattern Recognition and MachineLearning, Springer

Post Author: admin

x

Hi!
I'm Eileen!

Would you like to get a custom essay? How about receiving a customized one?

Check it out