Project Details

The Challenge | Chasers of the Lost Data

Help find ways to improve the performance of machine learning and predictive models by filling in gaps in the datasets prior to model training. This entails finding methods to computationally recover or approximate data that is missing due to sensor issues or signal noise that compromises experimental data collection. This work is inspired by data collection during additive manufacturing (AM) processes where sensors capture build characteristics in-situ, but it has applications across many NASA domains.

Math Methods in the Improvemment of Machine Learning.

Uses of estatistical methods and matrix completion for minimize errors of meansures.

The fundamental project approach is the development of a method that uses mathematic ways that can improve the training set of Nasa database. Therefor, generating predictives models with more precision.

One of the paths used for this was the study of matrix completion problem applications which in turn requires a good understanding of linear algebra and can easily be applied to "N" matrices and "N" can be broad. Thus, this method seeks to complete some empty (lost) entries of the matrices to be worked.

For a better view of the values found, we have searched in the statistical calculations methodologies that could show variations from errors or the probability of them occurring giving feedback on how accurate the data manipulation is being.

- SEPARATING DATA

Suppose that at any given moment, by ways of more precise processes, we see a tendency to a value, which we don't know and want to computationally deduce as a convergent series that tends to a value. For this, we use the matrix completion method to achieve extremely accurate results for empty entries (errors or deviations).

The importance of using this method is due to its great versatility of application. Through it we can find values so precise that the margin of error is almost nil. This method works by searching for an array with as few columns as possible that completes the empty and inconsistent entries of an "M" array.

First, we take each column of data as an example, a column matrix with 2000 masses of 2000 different meteors, so that values that were not measured by measurement errors would be specific to each column of information. Thus, using the Matrix Completion method will get values for that column corresponding to empty entries.

Secondly, having done this on all columns (with a calculation software like matlab/octave) and having their values found and saved at a particular location for further analysis we will now do the second part of the process which will consist of in applying it to a general matrix (involving all data) and after taking the values of this process and comparing it with the first step we will have a notion of the values obtained and the range of what parameter value is.

-CHECKING ERROR MARGIN

Now to check the degree of accuracy we will use two processes, one from the matrix completion itself (Incoherence) and the other is the margin of error method used in statistics when there are large data and measurements within the found values.



Reference:

https://ssd.jpl.nasa.gov/sbdb_query.cgi?obj_group=neo;obj_kind=all;obj_numbered=all;OBJ_field=0;ORB_field=0;table_format=HTML;max_rows=500;format_option=comp;query=Gerar%20tabela;c_fields=AcAsBtApAh;c_sort=;.cgifields=format_option;.cgifields=ast_orbit_class;.cgifields=table_format;.cgifields=obj_kind;.cgifields=obj_group;.cgifields=obj_numbered;.cgifields=com_orbit_class&page=1

https://en.wikipedia.org/wiki/Foundations_of_Computational_Mathematics

https://www.mathworks.com/matlabcentral/fileexchange/50056-matrix-completion-using-nuclear-norm-spectral-norm-or-weighted-nuclear-norm-minimization

https://www.coursera.org/learn/machine-learning/supplement/YlEVx/model-representation-ii

https://en.wikipedia.org/wiki/Dimension_(vector_space)

https://2019.spaceappschallenge.org/challenges/planets-near-and-far/raiders-lost-data/details