REDDA: Reduced subspace in big data treatment: A new paradigm for efficient geophysical Data Assimilation

Big Data methods in geosciences
Objectives
The objective of REDDA is to contribute to the birth of novel Big Data methods capable of efficiently treating a huge amount of data while extracting as much information as possible. The project is structured along two research lines (RL). RL1 will deliver reduced-order Bayesian methods sufficiently accurate and efficient for the needs of nonlinear high-dimensional geophysical systems. RL2 will develop a data assimilation method for the new sea ice model (neXtSIM) developed at NERSC. NeXtSIM is a Lagrangian model and will assimilate Lagrangian observations of sea ice motions. On the long term, the fully-Lagrangian sea ice data assimilation system will be included into both short-term Arctic environment monitoring (TOPAZ in Copernicus) and decadal climate predictions systems (NorCPM), both already using a Monte Carlo data assimilation framework. REDDA aims at producing methods that would - by construction - be generic enough for future inclusion into complex coupled Earth System Models.
Project Summary
Environmental science has been a primary challenge test-ground for Data Assimilation. The huge dimension of the numerical models of the climate system, the vast amount of Earth observational data at our disposal, and the pressure to deliver timely accurate forecasts, have motivated an extraordinary research activity that has led to enormous advances which have subsequently spread out to other domains of science. At the same time, geophysical DA is an exemplar of a Big Data problem: models have O(109) and the observational datasets O(108). Computationally efficient state estimation and uncertainty quantification must be carried out using massive datasets and huge dynamical models. Increasing computational power alone will not suffice to solve the issue since the problem complexity grows commensurately with both the data volume and model size, making continuous development of advanced DA procedures necessary. REDDA’s aim is to contribute to the birth of novel Big Data methods capable of efficiently treating a huge amount of data while extracting as much information as possible. REDDA is an interdisciplinary project between geoscientists and mathematicians with two research lines (RL) having their origin in climate science, but that will be investigated with a mathematical perspective:
RL1. Reduced order fully Bayesian DA methods for nonlinear systems
RL2. DA methods for Lagrangian sea-ice models
REDDA employs two postdoctoral scientists:
- Postdoc 1 - Colin Grudzien on RL1
- Postdoc 2 - New open position soon on RL2
Peer Review Publications
-
Degenerate Kalman filter error covariances and their convergence onto the unstable subspace. SIAM/ASA Journal on Uncertainty Quantification (JUQ). 2017;5(1)..
-
Rank Deficiency of Kalman Error Covariance Matrices in Linear Time-Varying System With Deterministic Evolution. SIAM Journal of Control and Optimization. 2017;55(2)..
-
Four-dimensional ensemble variational data assimilation and the unstable subspace. Tellus A: Dynamic Meteorology and Oceanography. 2017;69(1)..
-
Scientific challenges of convective-scale numerical weather prediction. Bulletin of The American Meteorological Society - (BAMS). 2018.
-
Data assimilation in the geosciences - An overview of methods, issues and perspectives. WIREs Climate Change. 2018..
-
Chaotic dynamics and the role of covariance inflation for reduced rank Kalman filters with model error. Nonlinear processes in geophysics. 2018;25(3)..
-
Asymptotic Forecast Uncertainty and the Unstable Subspace in the Presence of Additive Model Error. SIAM/ASA Journal on Uncertainty Quantification (JUQ). 2018;6(4)..
-
Stochastic parameterization identification using ensemble Kalman filtering combined with maximum likelihood methods. Tellus A: Dynamic Meteorology and Oceanography. 2018;70(1)..
-
Improving weather and climate predictions by training of supermodels. Earth System Dynamics. 2019..
-
Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization. Foundations of Data Science (FoDS). 2020;2(1)..
-
Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the Lorenz 96 model. Journal of Computational Science. 2020;44..
-
Combining data assimilation and machine learning to infer unresolved scale parametrization. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 2021;379(2194)..