G5.01 Vast number of data portals serving data under distinct data policies in multiple formats for fiducial reference-quality data inhibits their discovery, access, and usage for applications, such as satellite Cal/Val

Gap abstract: 

Presently, access to high-quality reference network data and satellite data is obtained through a variety of portals, using a broad range of access protocols, and the data files are available in an array of native data formats that lack interoperability (see Gap 1.06). There also exists a broad range of data policies from open access through delayed mode restricted access. To make effective usage of the full range of reference-quality measurements, e.g., for the characterisation of satellite data, therefore presently requires substantial investment of time and resources to instigate and maintain a large number of data-access protocols and data read/write routines, as well as to fully understand and adhere to a broad range of data policies and timeliness. This is a substantial impediment to the effective usage of data for applications, such as the GAIA-CLIM Virtual Observatory or similar application areas.

Part I Gap description

Primary gap type: 
  • Technical (missing tools, formats etc.)
Secondary gap type: 
  • Parameter (missing auxiliary data etc.)
  • Governance (missing documentation, cooperation etc.)
ECVs impacted: 
  • Temperature,Water vapour, Ozone, Aerosols, Carbon Dioxide, Methane
User category/Application area impacted: 
  • Operational services and service development (meteorological services, environmental services, Copernicus Climate Change Service (C3S) and Atmospheric Monitoring Service (CAMS), operational data assimilation development, etc.)
  • International (collaborative) frameworks and bodies (space agencies, EU institutions, WMO programmes/frameworks etc.)
  • Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
Non-satellite instrument techniques involved: 
  • Independent of instrument technique
  • Gap 1.06 pertains to unifying metadata format and discovery metadata, which would naturally form a component of resolving the current gap. This critical dependent gap should be addressed with this gap. 

Detailed description: 

The task of characterizing satellite measurements by means of comparison to reference measurements needs consistent and reliable access to data and documentation of various fiducial reference measurements for the analysis of the quality of satellite measurements and/or derived geophysical data products. This task can be massively complicated and time-consuming arising from the need to collect data from multiple locations also often offering the data on various types of user interfaces with which a user needs to become familiar. In many cases, data downloads do not follow specific data exchange standards, which makes it difficult to automate access to them. In addition, the available bandwidth at the provider side might be too small to serve many customers, which can result in extended waiting times for the data. This applies even more when co-located ground based and satellite data are to be offered to the user. The range of data policies that a user needs to adhere to further complicates the issue. These include timeliness of the data exchange.

A common source that integrates several reference-data networks with satellite data considering traceable uncertainty does not exist but is needed according to the GAIA-CLIM user survey. A key first step to this is consistent access to reference quality measurement systems in a harmonised data format that contains requisite discovery metadata and for which the data usage policy and restrictions are clearly articulated. Many of the existing data policies can be very different, e.g.,

  • Completely open access for all users including commercial users;

  • Open access for research purposes only;

  • Open access after a set time delay;

  • Access only upon request to PI.

Several sources for co-located data sets exist, but most of them are specialized to very particular use cases. Most are not fully utilizing the potentially available information on uncertainty or including uncertainty arising from spatiotemporal mismatch of the compared data streams. Some of the existing datasets are publically available via the internet, while others are run internally to organizations like space agencies to monitor data quality in real time. While many validation activities are performed, they do not use the available uncertainty information in an optimal way, which has resulting impacts on the quality of the research and the robustness of any conclusions drawn from such validation exercises.

In summary, the issues over data discovery and access are pervasive and inhibit their effective usage in a broad range of application areas, including satellite Cal/Val activities. The recently instigated Copernicus Climate Change Service contract C3S311a Lot3 which is concerned with access to data from baseline and reference networks may go a considerable length towards addressing this gap for non-satellite reference measurements and is discussed under remedy G5.01(R1). 

Operational space missions or space instruments impacted: 
  • Independent of specific space mission or space instruments
Validation aspects addressed: 
  • Radiance (Level 1 product)
  • Geophysical product (Level 2 product)
  • Gridded product (Level 3)
  • Assimilated product (Level 4)
  • Time series and trends
  • Representativity (spatial, temporal)
  • Calibration (relative, absolute)
Gap status after GAIA-CLIM: 
  • GAIA-CLIM explored and demonstrated potential solutions to close this gap in the future

Some of the work within GAIA-CLIM WP1 and WP5 will provide unified access to a range of reference quality data products via the VO facility. However, this access shall not be operational and substantive further work would be required. It also will not permit universal access for other applications to integrated holdings.

Part II Benefits to resolution and risks to non-resolution

Identified benefitUser category/Application area benefittedProbability of benefit being realisedImpacts
Access to reference measurements organised via a brokering system service as envisioned by Copernicus makes discovery and access easier.
  • All users and application areas will benefit from it
  • High
  • Medium
The one-stop-shop for the described data would become the central platform where several scientific and service oriented communities would search for such data.
This can lead to significant cost reductions for research and development activities that count on the availability of such data.
Access to reference measurements co-located to satellite measurements through the GAIA-CLIM Virtual Observatory in operational mode, in particular at level 1, could boost satellite-retrieval development and comparison.
  • International (collaboration) frameworks (SDGs, space agency, EU institutions, WMO programmes/frameworks etc.)
  • Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
  • Medium
Individual satellite retrieval developers, international retrieval round robin activities for retrieval analysis and selection, as well as climate data record quality assessments, as performed by WCRP,
would save significant effort in setting up data bases like the ones contained in the Virtual Observatory.
An operational Virtual Observatory could be exploited as real time Cal/Val facility for new satellite instruments at space agencies.
  • International (collaboration) frameworks (SDGs, space agency, EU institutions, WMO programmes/frameworks etc.)
  • Medium
The Virtual Observatory may provide a basic structure for real-time satellite data Cal/Val that can be reused and further developed with new programmes.
This system would for the first time consider the full uncertainty budget involved in such a data comparison at the operational level.
Identified riskUser category/Application area at riskProbability of risk being realisedImpacts
The use of multiple locations with different set ups for data access continues to complicate work on data comparison and increases cost to delivery and analysis / exploitation of data.
  • Operational services and service development (meteorological services, environmental services, Copernicus services C3S & CAMS, operational data assimilation development, etc.)
  • Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
  • High
The limited number of users who are able to fully exploit available observations to undertake activities,
such as satellite Cal/Val, reduces the intrinsic value of these data and related investments into infrastructure.
Non-satellite reference measurements will have limited value for the characterisation of satellite measurements.
  • Operational services and service development (meteorological services, environmental services, Copernicus services C3S & CAMS, operational data assimilation development, etc.)
  • High
Negative impacts on funding support for non-satellite measurements. Poorer quality assessments of satellite measurement programs.

Part III Gap remedies

Gap remedies: 

Remedy 1: Successful implementation of the Copernicus Climate Change Service activity on baseline and reference network data access via the Climate Data Store

Primary gap remedy type: 
Deployment
Secondary gap remedy type: 
Technical
Governance
Proposed remedy description: 

The C3S 311a Lot 3 contract, concerned with access to baseline and reference network data, shall make considerable strides in making harmonised access to reference- and baseline-network data available under a common data model and with clear articulation of data policies that enables appropriate and seamless usage. Work is envisaged to cover aspects of data access brokering, data harmonisation, and data provision and builds upon aspects of work within GAIA-CLIM. Data shall be served via the Climate Data Store (CDS) facility of C3S. However, it is limited to accessing data from a subset of atmospheric networks and ECVs, so in the longer-term, extension to remaining atmospheric ECVs and oceanic and terrestrial ECVs would be required were these to be used for satellite cal/val. 

Relevance: 

The remedy would provide single point of access to harmonised data products served under a common data model. Note that rapid access, e.g. for satellite validation in the commissioning phase, is not being addressed through this remedy. 

Measurable outcome of success: 

Data available via the CDS and used in applications such as the GAIA-CLIM Virtual Observatory

Expected viability for the outcome of success: 
  • High
Scale of work: 
  • Programmatic multi-year, multi-institution activity
Time bound to remedy: 
  • Less than 5 years
Indicative cost estimate (investment): 
  • Medium cost (< 5 million)
Indicative cost estimate (exploitation): 
  • Yes
Potential actors: 
  • Copernicus funding

Remedy 2: Operationalisation and extension of the Virtual Observatory facility developed within GAIA-CLIM

Primary gap remedy type: 
Deployment
Secondary gap remedy type: 
Technical
Proposed remedy description: 

The diverse sources of reference-quality data could be integrated with data made available through operational exploitation platforms, which could be developed for different user communities. GAIA-CLIM provides this as part of the Virtual Observatory for a set of atmospheric ECVs and the specific application of characterising satellite measurements. As a major part of the Virtual Observatory, a co-location database has been developed. The first step is to identify all pertinent satellite and non-satellite reference datasets that are of interest for a comparison to a given satellite sensor data. This could either be via a forward modelling approach to derive an estimate of the satellite-sensor data or a comparison to geophysical variables derived from the satellite data or both. The provided data need to be complemented by as complete as possible metadata and traceable uncertainty information, including comparison mismatch uncertainties that need to be derived from the comparison setting and the variability of the geophysical variable to be compared.

 

The Virtual Observatory has been developed to demonstrate the use of non-satellite reference data and NWP model data for the characterisation of satellite data. The Virtual Observatory integrates the different measurements, their metadata, quantified uncertainty for the measurements, and the uncertainty arising from the comparison process. Many other ECV reference measurements satellite data combinations, e.g., for terrestrial and oceanic ECVs, are outside the scope of the GAIA-CLIM project and have not been addressed by this project. But these could be accommodated via operationalisation and extension of the service in the future. Such an operational service should involve unified access to the underlying reference quality non-satellite measurements used benefitting from proposed Remedy 1 to this gap. 

Relevance: 

An operational and extended Virtual Observatory facility would provide unified access to non-satellite reference-quality measurements and specific co-located data under its purview via the Copernicus CDS. 

Measurable outcome of success: 

Operational access to relevant measurements and colocations 

Expected viability for the outcome of success: 
  • High
Scale of work: 
  • Single institution
  • Programmatic multi-year, multi-institution activity
Time bound to remedy: 
  • Less than 5 years
Indicative cost estimate (investment): 
  • Medium cost (< 5 million)
Indicative cost estimate (exploitation): 
  • Yes
Potential actors: 
  • ESA, EUMETSAT or other space agency