Project Details

The Challenge | Chasers of the Lost Data

Help find ways to improve the performance of machine learning and predictive models by filling in gaps in the datasets prior to model training. This entails finding methods to computationally recover or approximate data that is missing due to sensor issues or signal noise that compromises experimental data collection. This work is inspired by data collection during additive manufacturing (AM) processes where sensors capture build characteristics in-situ, but it has applications across many NASA domains.

THE PILOT OF THE CHASERS

The app aim to figure out where is blank in the dataset and estimate the plausible data.

IVSLAB

Introduction:

In real world, there are various reasons such as sensor issues or signal noise may cause data losing. Missing data has a huge amount of influence in data description, so it's the reason why handling lost data always is a hot topic in data science.

Traditionally, the ways we handle missing data are normally based on statistics. Dropping the observations which have missing data is the most intutive approach, but we may lose some curcial values by using this method. Replacing Nahs(Blank) with the mean values or the values selected from the Regression equation are two common methods as well. However both have some disadvantage of smoothness or complexity. Thus, we chose to try doing data recovery with ML method.

Method:

How well the model is depends on how much data we feed. It's quite insufficient data for training a model. As a result, producing training data is the first priority of our work. And then try different DNN structures to sketch the distribution of data. Fully connected network, the simplest neuro network, surprisingly beat other architecture at stability and precision.

Our aim is to make an app that can help to estimate missing data. Hence there must be some additional steps to do in order to integrate the model into a mobile phone. Applying google Tensorflow lite and MTK NeuroPilot help us to reduce the complexity of the system significantly, after all, without losing too much precision.

Dataset:

https://data.nasa.gov/api/views/gh4g-9sfh/rows.csv?accessType=DOWNLOAD

https://data.nasa.gov/api/views/b67r-rgxc/rows.csv?accessType=DOWNLOAD

https://data.nasa.gov/api/views/mc52-syum/rows.csv?accessType=DOWNLOAD

https://data.nasa.gov/api/views/9ns5-uuif/rows.csv?accessType=DOWNLOAD

Environment:

Language - Python 3, Java

Libraried - Pandas, Sklearn, IPython, Tensorflow, Numpy

API - NeuroPilot

Our link to GitHub:

https://github.com/howardlee1995/NASA-HACKERSON.gi...

Our Powerpoint file on the Cloud:

https://drive.google.com/open?id=1dy7Wkiv1U1CnLmVB...