Tuesday, March 19, 2019

Essay --

Chapter Four Related hold out there be several work and study on text category with Arabic text and every work take the study from some points and leave others depend on type of study. in 68 make classification for Arabic text and the consequent was that very robust and reliable without morphological analysis, in 71 make comparative degree study using N-Gram and using two measures, Manhattan measure and Dices measure and make comparison between them and the result was the N-Gram with Dices measure stop than using Manhattan measure and make experimental on four category, in other 83 Text Classification from Labeled and Un designate Documents using EM, Been proposed algorithmic program utilise expectation - maximization with the naive Bayes classifier to learn from the documents labeled and non-labeled, The first step classifier using trains and documents named, and labels potentially Unnamed documents. And then apt on the new classifier using the labels for all the documents, and is repeated to convergence. many researches are proposed and presented for the problem of the Arabic text classification In this section we nominate the main algorithms of these studies such as Decision tree 36, KNN 37,38,39,40, NB 17,41,42, N-Gram frequence 5,45,Rocchio 4, SVM 19,21,43, and distance based classifier 46,47,48.Syiam et. al. 40 presented an intelligent Arabic text categorization scheme that used the KNN and Rocchio profile-based 50 classifiers to classify a set of Arabic text documents poised from three Egyptians news paper called Al Ahram, Al Gomhoria, and Al Akhbar during the hitch from August 1998 to September 2004. the lead contains 1132 documents with 39468 words and cover sestet topics. 3 approaches were adopted as pre... ... Agency website. The corpus contain 1562 documents of different lengths belongs to six categories.The documents were normalized and preprocessed by removing digits, foreign words, punctuation marks, and stop-words. The Chi squa re method was used forfeature selection with various numbers of words ranging from 10 to 1000. The corpus was spied such as 70% of the documents were used for training the classifier while the stay 30% of documents were used for testing. Three evaluation measures precision, recall, and F-measure were used to evaluate the surgical operation of the NB classifier. Results showed that the NB classifier work well when the number of words grows. The NB classifier reach its peak for precision and F-measure when the number of selected words equal 800 words, while the peak for the recall measure was when the number of selected words equal to 700 words.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.