News Article Classification Model is a machine learning model that classifies news articles into various categories - business, tech, politics, sports and entertainment using Natural Language Processing (NLP).
The models used are:-
- Logistic Regression
- Random Forest
- Multinomial Naive Bayes
- Support Vector Classifier
- Decision Tree Classifier
- K Nearest Neighbour
- Gaussian Naive Bayes.
Download these libraries using pip if haven't already:-
- Natural Language Toolkit
- Regular Expressions
- Numpy
- Pandas
- Matplotlib
- Wordcloud
- Scikit-Learn
- The dataset News.csv comprises of 1490 news articles.
- It contains three columns - ArticleId, Text and Category(business/tech/politics/sports/entertainment).
- 30% of dataset is split for test set and the remaining is used for training.
Business Related Words:-
Tech Related Words:-
Politics Related Words:-
Sports Related Words:-
Entertainment Related Words:-
- Removal of tags.
- Removal of special characters.
- Conversion to lower case.
- Removal of stop-words.
- Lemmatizing the Words
- Accuracy, Precision, Recall and F1 score is displayed for each model.
- The best accuracy of model is 97.99 from Random Forest.