Project Details

The Challenge | Chasers of the Lost Data

Help find ways to improve the performance of machine learning and predictive models by filling in gaps in the datasets prior to model training. This entails finding methods to computationally recover or approximate data that is missing due to sensor issues or signal noise that compromises experimental data collection. This work is inspired by data collection during additive manufacturing (AM) processes where sensors capture build characteristics in-situ, but it has applications across many NASA domains.

DataFiller

DataFiller uses ML tools to cluster and train bunch of models to get value for every cell of dataset. Predicted value can be used to fill missing values or detect outliers.

Data problems^2

When I read the description of this challenge I got idea to try to use existing Sudoku puzzle solver libraries to fill missing values in data. This is not possible because dataset with missing values do not follow easily described rules of Sudoku so correct values cen't be directly reasoned.

This got me thinking about training machine learning model to work in place of ruleset for every feature column in dataset. Dataset might contain multiple types of data so I also added unsupervised clustering of data before training the column models.

Code for the experiment is available at https://github.com/mikeful/nasa_2019_autofiller