Project Details

The Challenge | Chasers of the Lost Data

Help find ways to improve the performance of machine learning and predictive models by filling in gaps in the datasets prior to model training. This entails finding methods to computationally recover or approximate data that is missing due to sensor issues or signal noise that compromises experimental data collection. This work is inspired by data collection during additive manufacturing (AM) processes where sensors capture build characteristics in-situ, but it has applications across many NASA domains.

Finder

A very commom problem faced by developers, programers or whomever work with data analysis is a database with missing data.

A very commom problem faced by developers, programers or whomever work with data analysis is a database with missing data. Missing data consists on na error during the data aquisition process which some data is not inputed. Many factors leads data loss, such as sensor errors or transmition interference.The proposed solution to this problem is to create a Machine Learning Algorithm desingned to create the missing data by learning patterns in na heuristic point of view. By the complexibility of some datasets, we are aware that some data cannot be implied by an exact algorithm. To address this problem, we designed na artificial neural network in order to learn a pattern over the dataset to fill the gaps by changing the class column over the last neural network layer.First, our method detect the columns which has missing data, this columns are candidates to become classes during the learning stage. After detecting the columns with missing data, a Fully Connected Neural Network is created and trained with the complete data samples. After the process of training the neural net with the complete data, we classify new instances (with the missing data) filling out the gaps. In order to fine tune the data generated, this process of filling out the gaps runs over two to five times improving the data quality. To test this method, we used the Nasa Asteroids Classification Dataset, available online for public access. Our alrogithm was developed on a Macbook Pro with macOS Catalina 8GB of RAM and Intel Core i5. The framewoks used were Tensorflow 2.0 with Python 3.7 in a text editor Sublime na ran over the operational system terminal. We got collaborations to develop the material pesentation and tips to train our neural network. For future plans we intend to develop a tool to complete preprocess the data by filling out the gaps, find outliers and normalize the data.

#machinelearning #ML #python #artificialintelligence #AI #database #correction #missingdata #moreAIlessdata #cybermancy #problemsolving #programming