Project Details

The Challenge | Chasers of the Lost Data

Help find ways to improve the performance of machine learning and predictive models by filling in gaps in the datasets prior to model training. This entails finding methods to computationally recover or approximate data that is missing due to sensor issues or signal noise that compromises experimental data collection. This work is inspired by data collection during additive manufacturing (AM) processes where sensors capture build characteristics in-situ, but it has applications across many NASA domains.

MiniScript

This consists of a simple python script in which we use datacleaning techniques based on python libraries such as pandas and numpy to clean the data then we use a decision tree/random forest classifier or regression to compare accuracy

This link redirects you to the github page which has python script which consists of the project implementation this doesn't have the proper ui but it executes and output appears when done with unclean data then no output occurs but when data is cleaned then the results appear the model used is random-forest classifier and regression based on the data set

In the following project there were 4 data sets give in which we were told to fill the empty spaces using pandas and impute in this project we used pandas to fill the empty spaces and applied random forest model to predict the outcome of data this can be copy pasted in the python jupyter notebook and it works the edits must be done in the file name which i have changed to my convenience in this the incomplete data files are named f1,f2,f3,f4 and clean datasets are stored in f11,f22,f33,f44 here the cleaning method in numerical was using mean of each column and filled empty spaces with it the works are left as we needn't compute then in 4 datasets then ml model is used to predict .since this project is done by a amateur i would like to extend it if i learn more skills and would like to build large scalable applicatons