Release of global monthly land surface temperture data

Release of first version of global monthly land surface temperature databank - first key output of International Surface Temperature Initiative.

Peter Thorne, Senior scientist at NERSC, Chairs the International Surface Temperature Initiative, an international and multi-diciplinaty effort, which aims to create a suite of open, transparent, rigorously assessed and understood land surface air temperature products to meet 21st Century needs of science, policy makers, industry and society.

Holdings used in current NOAA and NASA global surface temperature datasets (left) and those in the new data holdings release (right). Color denotes record length and longer records overplot shorter records. The release consists of just over 32,000 stations, over a four-fold increase.Holdings used in current NOAA and NASA global surface temperature datasets (left) and those in the new data holdings release (right). Color denotes record length and longer records overplot shorter records. The release consists of just over 32,000 stations, over a four-fold increase.Today marks the official release of the first version of the monthly land surface air temperature holdings prepared by the databank working group of the International Surface Temperature Initiative, led by Jay Lawrimore of NOAA’s National Climatic Data Center. These holdings consist of basic environmental data from over land meteorological stations arising from 50 data sources collated by members of the working group and other scientists from across every continent. These different sources have been ranked and merged in an open and transparent manner by Jared Rennie of CICS-NC and colleagues with provenance as far back as is possible – in several cases to the original hardcopy or an image thereof. The paper describing the methods underlying the merge has been published in Geosciences Data Journal and appear alongside the release. The final merge consists of just over 32,000 station records, a significant advance on the 7,000 or so used in many current global estimates.

The databank is testament to the substantial efforts of the many scientists in the working group. Their efforts have uncovered several new data sources that have until now not been publicly available. For example data from the Argentinian agriculture ministry were recovered by South American collagues. In addition the databank has benefitted from several national open access policies to climatic data such as that from Norwegian Meteorological Institute and from several international long-term efforts such as ECD&D to enable data sharing, access and exchange. Jared Rennie and colleagues, with guidance from working group members and the steering committee undertook a novel merge of the various sources to create a set of station estimates that should not include duplicates. Feedback from scientists and public on various beta releases further improved the merge quality.

Timeseries of station count on a monthly basis for GHCNMv3 (black) and the recommended merge (red).Timeseries of station count on a monthly basis for GHCNMv3 (black) and the recommended merge (red).Compared to GHCNMv3, the current land surface air temperature product used to create global mean surface temperature estimates by NOAA NCDC and NASA GISS the data are more complete in terms of both number of records and completeness of global coverage.

This databank release is a substantial advance, but it is most definitely not the end. Firstly, we know there are many data out there that are either not shared or not yet digitized. We will continue to strive to persuade rights holders of the utility and value of openly sharing data. We will also continue to pursue, with numerous partners, innovative means of digitizing the many millions of images of data never digitized and to rescue hard copy data before its too late. Here crowdsourcing portal type solutions akin to oldweather.org for the marine data are perhaps the most plausible route to success in an affordable manner and we will continue to pursue such opportunities as they arise.

Even if we can access and rescue all the data ever taken this does not mean that we have finished. These basic data holdings contain a myriad of data artifacts arising from multiple causes such as changes of instrumentation and observing practices, station moves, and changes in the local environment. It is therefore necessary to analyze the data and adjust for such effects and there are many potential ways to do so. 

Timeseries of global sampling completeness over time for GHCNMv3 (black) and the databank release (red). The globe has been split into 5 degree boxes and a single station at a timestep denotes data for that box. Boxes are defined as land if they contain any land. So to get to 100% would require sampling all small islands and all of Greenland, Antarctica, the Sahara etc. at least once each 5 degree box.Timeseries of global sampling completeness over time for GHCNMv3 (black) and the databank release (red). The globe has been split into 5 degree boxes and a single station at a timestep denotes data for that box. Boxes are defined as land if they contain any land. So to get to 100% would require sampling all small islands and all of Greenland, Antarctica, the Sahara etc. at least once each 5 degree box.

Therefore if we are to truly capitalize upon the substantive databank advance we need multiple groups to assess and quantify the artifacts in the records. The databank release provides a platform from which to achieve this next step, but the success or otherwise now fundamentally depends upon whether interested investigators or teams of investigators take up this challenge.

The final piece of the puzzle is how to assess in a rigorous fashion the success or otherwise of different groups in finding and accounting for data artifacts. In the real world we do not have the luxury of knowing the truth and therefore which approach is better (and whether different approaches may be better for different purposes / situations). What we can do is assess how well the approaches cope in plausible synthetic test cases. The benchmarking and assessment working group, led by Kate Willett of the Met Office Hadley Centre will deliver later in 2014 a suite of such test cases that exactly mimic the space and time sampling in the databank first version release. After a period for analysts to create and publish analyses on the real world data and apply the same methods to the test cases their performance against the benchmarks will be assessed.

Even then we are not done. It is envisaged that there will be regular updates to the databank, including month on month updates to enable monitoring of recent changes and more infrequent updates to incorporate new sources or extended records. The benchmarking cycle will repeat with time to encourage development. Further, this first effort concentrates on monthly series, which are a simpler problem than daily records or sub-daily records. This is on the principle that to learn to run one must first learn how to walk. But if we are serious about providing data products to meet societal needs we had better eventually take up the challenge and learn how to run.


AttachmentSize
gdj38.pdf6.6 MB