we create methods to approximate the missing data and evaluate that method by building a ML model and describing the improvement in that model’s performance before and after the data recovery method was applied.
In order to acomplish our objective, we used several MachineLearning models such as KNN(K-Nearest Neighbors) to fill geoespacial data, using coordinates to stablish similar features between geographic areas.
another method used is Polinomial Regresion used to estimate an specific data over a period of time.
In order to estimate missing values within time series, two types of interpolation was used: linear interpolation and spline interpolation. This approach tries to find a curve that better fits to the actual data.
Resources
-Imputation of missing longitudinal data: a comparison of methods Jean Mundahl Engels*, Paula Diehr Departments of Biostatistics and Health Services, University of Washington, 1959 Northeast Pacific Avenue, Bo
- ST-MVL: Filling Missing Values in Geo-sensory Time Series Data
Project GitHub Repository: