This project uses Decision Tree models to evaluate car acceptability based on various features like safety, maintenance cost, and capacity. The repository provides insights into feature importance, hyperparameter tuning, and decision-making processes using Scikit-learn, Pandas, and visualization libraries.
The purpose of this project is to:
- Predict car acceptability based on attributes such as safety, maintenance cost, and capacity.
- Analyze feature importance to understand the impact of each attribute on predictions.
- Enhance skills in building, tuning, and visualizing Decision Tree models.
-
Description: Contains data about car evaluations focusing on various attributes like safety, maintenance costs, and capacity.
-
Use Cases: Classification tasks, feature importance analysis, and decision-making modeling.
-
Data Dictionary:
- Buying: Buying price (values: vhigh, high, med, low).
- Maint: Maintenance cost (values: vhigh, high, med, low).
- Doors: Number of doors (values: 2, 3, 4, 5more).
- Persons: Capacity in terms of persons to carry (values: 2, 4, more).
- Lug_boot: Size of luggage boot (values: small, med, big).
- Safety: Estimated safety of the car (values: low, med, high).
- Class: Overall evaluation of the car (values: unacc, acc, good, vgood).
-
File Reference: car_evaluation.csv
import pandas as pd url = 'https://github.com/vmahawar/data-science-datasets-collection/raw/main/car_evaluation.csv' df = pd.read_csv(url) print(df.head())
The analysis and model building process for car evaluation prediction using Decision Tree models can be found in the following notebook:
The following tools and libraries were utilized in this project:
- Pandas: For data manipulation, cleaning, and preprocessing.
- Scikit-learn: For building, training, and visualizing Decision Tree models.
- Matplotlib & Seaborn: For creating visualizations to explore feature relationships and data distributions.
- Jupyter Notebook: For interactive development and presentation of the project workflow.
-
Feature Importance:
- Gained insights into how attributes like safety and maintenance cost influence car evaluation.
- Used
feature_importances_
from Scikit-learn to identify key features driving predictions.
-
Decision Tree Tuning:
- Learned how hyperparameters such as
max_depth
,min_samples_split
, andcriterion
(e.g., Gini, Entropy) impact the model's accuracy and interpretability.
- Learned how hyperparameters such as
-
Evaluation Metrics:
- Explored metrics such as accuracy, precision, recall, and F1-score to assess model performance.
- Focused on achieving a balance between overfitting and underfitting by tuning model complexity.
-
Visualization:
- Developed skills in visualizing Decision Trees and understanding their splits.
- Enhanced the ability to represent categorical data relationships through pair plots, heatmaps, and other visualizations.
This project is licensed under the MIT License. You are free to use the code and data for educational and non-commercial purposes.
If you'd like to collaborate, provide feedback, or explore similar projects, feel free to connect:
- LinkedIn: Vijay Mahawar
- GitHub: vmahawar