HomeОбразованиеRelated VideosMore From: Mohamed Heny SELMI

ESPRIT Mining - Profiling des réclamations avec RStudio

2 ratings | 76 views
Html code for embedding videos on your blog
Text Comments (1)
Mohamed Heny SELMI (3 years ago)
# 1)Read the text file filePath <- ("C:\\Users\\Safa\\Desktop\\stage_ESPRIT\\all.txt") text <- readLines(filePath) # 2)Load the data as a corpus library(tm) docs <- Corpus(VectorSource(text)) library(NLP) inspect(docs) # 3)cleaning toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x)) docs <- tm_map(docs, toSpace, "/") docs <- tm_map(docs, toSpace, "@") docs <- tm_map(docs, toSpace, "\\|") # Convert the text to lower case docs <- tm_map(docs, content_transformer(tolower)) # Remove numbers docs <- tm_map(docs, removeNumbers) # Remove french common stopwords docs <- tm_map(docs, removeWords, stopwords("French")) # Remove punctuations docs <- tm_map(docs, removePunctuation) # Eliminate extra white spaces docs <- tm_map(docs, stripWhitespace) # Text stemming adocs <- tm_map(docs, stemDocument) # 4)Making a document-term matrix dtm <- TermDocumentMatrix(docs) m <- as.matrix(dtm) # 5)Finding the most frequent terms v <- sort(rowSums(m),decreasing=TRUE) d <- data.frame(word = names(v),freq=v) head(v) head(d, 10) set.seed(1234) # 6)Wordcloud library(wordcloud) wordcloud(words = d$word, freq = d$freq, min.freq = 1,           max.words=200, random.order=FALSE, rot.per=0.35,            colors=brewer.pal(8, "Dark2"))

Would you like to comment?

Join YouTube for a free account, or sign in if you are already a member.