Evaluation of Ensemble Methods for Binary Classification

Project Overview:

This project aims to evaluate ensemble methods on their accuracy and time complexity while making classifying an asteroid as hazardous or not based on various features of the celestial object. The dataset used for this purpose contains information about 9083 unique asteroids and their characteristics. The features in the dataset are as follows:

id: Unique identifier of the asteroid.

name: Name of the asteroid.

est_diameter_min: Estimated minimum diameter of the asteroid (in kilometers).

est_diameter_max: Estimated maximum diameter of the asteroid (in kilometers).

relative_velocity: The relative velocity of the asteroid with respect to Earth (in kilometers per hour).

miss_distance: The distance at which the asteroid will pass by Earth (in kilometers).

orbiting_body: The celestial body that the asteroid is orbiting (in this case, all are orbiting Earth).

sentry_object: A boolean value indicating whether the asteroid is on the Sentry Impact Risk Table (True if it is, False otherwise).

absolute_magnitude: The absolute magnitude of the asteroid, which is a measure of its intrinsic brightness.

The dataset provides valuable insights into the physical properties and trajectory of each asteroid. By analyzing these features, the project aims to identify the machine learning model that can most accurately predict whether an asteroid poses a hazard to Earth. This information can be crucial for space agencies and researchers, enabling them to assess potential threats and develop mitigation strategies to ensure the safety of our planet. Additionally, this also helps us draw a comparison between different ensemble methods and their advantages and disadvantages.

Repository Structure:

dataset: Contains the raw training data and unseen data for model evaluation.
data_processing_pipeline: Includes a Jupyter Notebook that outlines the data preprocessing and cleaning steps for the datasets.
cleaned_datasets: Stores the cleaned versions of the datasets, which have been processed using the data processing pipeline.
ensemble_methods: Comprises four separate Jupyter Notebooks, each dedicated to building a model using a distinct ensemble method for binary classification.
trained_models: Holds the serialized trained models (in pickle format) for each of the four ensemble methods.
evaluation_of_models: Features a Jupyter Notebook that generates predictions on unseen data using all four ensemble models, and performs a time analysis of each prediction method for comparison.

This structured repository provides a comprehensive overview of the process of evaluating various ensemble methods for binary classification tasks, from initial data processing to final model evaluation and comparison.

Results:

It was observed that the XGBoost Classifier has the most optimised prediction process, as it produces the fastest results. Moreover, it should also be noted that the Voting and Stacking Classifiers which are a combination of ensemble methods including XGBoost, are not as fast as the other two ensemble methods, and have varying prediction speeds. This is because the voting and stacking classifiers combines multiple models (e.g., decision tree, random forest, etc.) to make a prediction, while XGBoost uses a gradient boosting algorithm. Additionally, the voting classifier and stacking classifier combine multiple models, and the more models that are combined, the longer they may take to make a prediction. Moreover, the best accuracy score was recorded by Voting Classifier, followed by Stacking Classifier. This can be attributed to the fact that they are more complex than the XGBoost and Random Forest Classifiers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluation of Ensemble Methods for Binary Classification

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
cleaned_datasets		cleaned_datasets
data_processing_pipeline		data_processing_pipeline
dataset		dataset
ensemble_methods		ensemble_methods
evaluation_of_models		evaluation_of_models
trained_models		trained_models
LICENSE		LICENSE
README.md		README.md

License

acse-ra922/evaluation-of-ensemble-methods-for-binary-classification

Folders and files

Latest commit

History

Repository files navigation

Evaluation of Ensemble Methods for Binary Classification

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages