This is a machine learning experimental project for classifying messages as "spam" or "non-spam" using logistic regression. This experiment demostrates a grounds up approach of machine learning classification. NumPy module is mainly used to perform vector operations more efficiently, and Pandas and SciKit-Learn modules are used to haddle data and to preprocess them. TfidfVectorizer is used to vectorize the text data.
Main data set used: https://archive.ics.uci.edu/dataset/228/sms+spam+collection