資料恢復與補全再過去一直都是集中化的計算叢集來處理,但是隨著我們所使用的感測器越來越多也越來越多元,單純的集中處理將沒辦法發揮最佳的效果。所以我們想驗證一套新的方法來進行資料恢復與補全,那就是讓所有邊端設備都能夠在回傳給計算叢集前都先對自身資料做資料恢復與補齊。這個方法能夠大幅度減少計算叢集的壓力,並且也能夠針對每個感測器的資料特質來創造獨特的處理模型 (Processing Model)。
Data lost is a ubiquitous issuein data-driven modeling.We are interested in data lostrecovery in the context ofplanet exploration.Planet exploration usuallyemploys edge devices fordata collection.Traditionally, theydo not handle data lost:data is simply collected by themand uploaded to central devices,where data lost is taken care of.The very same scenario is also foundin day-to-day data-driven applications.
Note that this scenario becomesinefficient, if not impractical,in the following situations:
A possible example for the first caseis when a spacecraft uses sensor swarm(e.g., NASA project OpGrav).As for the second case,Mars Reconnaissance Orbiter (MRO)provides a good example:while its high resolutioncamera HiRISE can generateimage of size 16.4 Gb,the size of its memory is only 28 Gb.In such a case, image compression is necessary.
We would like toverifying the plausibilitythat each edge devicedeals with data lost on its own.We expect such an approach will
我們先針對資料進行 downsampling (algorithm: bilinear interpolation) 如此一來就能將高頻雜訊先去除,之後我們利用 super resolution (algorithm: ESPCN) 將資料再 upscaling 回到原本的尺寸,我們的方法讓能夠在計算能力受限的設備上,依舊能在約 350ms 的時間達到 state-of-the-art 的 PSNR (peak signal-to-noise ratio),並且我們在也驗證我們的方法恢復的資料在模型訓練依舊達到與用原本資料訓練的模型相同的 performance。
Many common types of image glitches canbe considered as visual data lost;this includes noise, blur, missing ordeteriorated area, etc.
We are targeting image denoisingvia the following approach:First perform downsamplingwith bilinear interpolationin order to filterout high frequency noise.Next, super-resolution with ESPCNreconstructs the original imagefrom the downsampling one.A state-of-the-art PSNRcan be achieved for around 350mson a device with limited computing power.

我們使用帶有長短期記憶的循環神經網路 (RNN with LSTM) 模型。在神經網路的訓練過程中,我們將以時序 [t1~tn] 排列的數欄 (column) 關聯數位訊號 (signal data) 與一欄目標訊號組合成一個矩陣作為輸入放入神經網路訓練。經過一定的 epoch 後,我們將得到一欄以時序 [(t1)-k, (tn)-k] 排列的目標訊號的填充後的序號,其中參數 k 為可選定的 k 個時間戳記 (time stamp) 前的資料。而缺失資料 (missing data) 的模擬我們使用兩個方式。第一種為隨機缺失資料 (random missing data),第二種為區塊缺失資料 (fiber missing data)。我們認為第二種的資料缺失模式更加符合感測器故障 (sensor issue) 或是通訊不良 (communication loss) 所帶來的資料缺失情況。在進行實驗後,我們發現我們的方法在兩種情況中都能夠在保證一定精準度的情況下填補 (imputate) 缺失的資料。
We consider two data lost patterns:random missing values andconsecutive missing values.The latter characterizespractical situations such assensor issue and communication lost.The imputation is done by RecurrentNeural Network with Long Short-TermMemory.
我們採用 MediaTek 的 NeuroPilot 作為我們的 edge AI solution 來驗證我們的方法論是否能達到預期。相關的實驗數據,請參閱我們的 GitHub。
We tested our solution on NeuroPilot,which is an Edge AI solutiondesigned by MediaTek.Please refer to our GitHub pagefor the experimental data.