Author: Nikolas Petrou, MSc in Data Science
- The implementation and code in R of the project is located in the project.html and project.rmd files
- The extended technical report and analysis of the work is not included (contact me in case you are interested in them)
The project focuses on the Exploratory Data Analysis (EDA) of the given Performance dataset, which includes marks obtained by students in different subjects. The aim was to first clean (if necessary) the data and then perform a full Exploratory Data Analysis (with summary statistics, plots and statistical hypothesis testing), which would help to understand the variation within the variables. Additionally, it was requested to find correlations or any patterns between variables if they exist, and more depending on the questions that will be raised.
The data was directly obtained from Kaggle.
Some of the key methods which were used throughout the work are:
- Data cleaning
- Visualization - EDA
- Summary statistics & Statistical Hypothesis Testing
- Feature importance using Linear-Lasso Regression and Gradient Boosting Machine