The aim of this study was to determine which method would be most efficient for generating summarization using the word frequency text. Three forms were analyzed: without any method, lemmatization and stemming. The rouge metrics were used as a score. The generation of summaries using the method of stemming was more suitable.
Summarization is the task of producing a shorter version of one or several documents that preserves most of the input’s meaning. The most important points for documents that are summarized is whether the meaning of your data is maintained.
Word-Frequency-Text-Summarization is a simplest way to do text summarization is to compute the frequency of words and extract sentences that contain the words that are most common in the text.
Lemmatization is a text normalization method used in Natural Language Processing (NLP). Lemma is the form of a word which be inserted into the dictionary, the singular form of a word or the infinitive form of a verb. Stemming is the process of removing a part of a word, or reducing a word to its stem or root (e.g. “Flying” is a word and its suffix is “ing”, if we remove “ing” from “Flying” then we will get base word or root word which is “Fly”).
The advantage of applying stemming or lemmatization is clear: vocabulary reduction and meaning abstraction.
Metrics obtaining is extremely crucial to determining the quality of machine generated summaries. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is the standard automatic evaluation measure for evaluating summarization tasks.
It is essentially a set of metrics for evaluating automatic summarization of texts as well as machine translations. It works by comparing an automatically produced summary or translation against a set of reference summaries (typically human-produced).
For more details about ROUGE metrics, access this site (Portuguese content): https://clonageinvitro.wixsite.com/datascience-for-all/post/m%C3%A9tricas-rouge-para-mensura%C3%A7%C3%A3o-de-resumos-gerados-por-intelig%C3%AAncia-artificial