G1.06 Currently heterogeneous metadata standards hinder data discoverability and usability

Gap abstract: 

The need for extensive and accurate metadata is ever increasing in both research and operations, enabling large-scale, distributed management of resources. Recent years have seen a growth in interaction between previously relatively isolated communities, driven by a need for cross-domain collaboration and exchange of data and products. However, metadata standards have generally not been able to meet the needs of interoperability between independent standardization communities. Observations without useable metadata are of very limited use as the metadata provides key context such as the time, location, and modality of the measurements. Several efforts have been undertaken to improve the harmonization of metadata across the networks and international programs, but currently this is still insufficient. 

Part I Gap description

Primary gap type: 
  • Technical (missing tools, formats etc.)
Secondary gap type: 
  • Governance (missing documentation, cooperation etc.)
ECVs impacted: 
  • Temperature,Water vapour, Ozone, Aerosols, Carbon Dioxide, Methane
User category/Application area impacted: 
  • Operational services and service development (meteorological services, environmental services, Copernicus Climate Change Service (C3S) and Atmospheric Monitoring Service (CAMS), operational data assimilation development, etc.)
  • International (collaborative) frameworks and bodies (space agencies, EU institutions, WMO programmes/frameworks etc.)
  • Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
Non-satellite instrument techniques involved: 
  • Independent of instrument technique
  • G1.03 and G1.04 should be addressed after G1.06

    The resolution of G1.06 would bring invaluable benefits to support the resolution of G1.03 and G1.04 by facilitating the review of existing capabilities, starting from rich standardized information and enabling classification of measurement maturity in a more accurate way.

    G5.01 should be addressed with this gap

    Metadata harmonization across multiple data provides will also positively impact on the interoperability among different data repositories with clear benefits for addressing gap G5.01 

Detailed description: 

Metadata is an increasingly essential tool enabling large-scale, distributed management of resources. Recent years have seen a growth in interaction between previously relatively isolated communities across observing domains and techniques, driven by a need for interdisciplinary research and understanding. However, metadata standards have not been able to meet the needs of interoperability between these to date largely independent communities and networks. Observations without metadata are of very limited use: it is only when accompanied by adequate metadata (data describing the data) that the full potential of the observations can be realized. Format conversions always bring with them the danger of destroying information in the process, in particular in the accompanying metadata, which usually receives less attention.

Several efforts have been undertaken to improve the harmonization of metadata across numerous networks and international programs, but this is still not sufficient. Harmonization effort in the atmospheric science community is starting to be addressed by the emerging WIGOS standards, currently under development and subsequent implementation at the WMO, and by the ESA Climate Change Initiative (CCI), amongst others. Copernicus Climate Change Service Data Store activities are also highly relevant to this gap. There are also challenges that arise due to interoperability across observational domains (surface, atmospheric, oceanic, terrestrial etc.). 

Operational space missions or space instruments impacted: 
  • Independent of specific space mission or space instruments
Validation aspects addressed: 
  • Radiance (Level 1 product)
  • Geophysical product (Level 2 product)
  • Gridded product (Level 3)
  • Assimilated product (Level 4)
  • Time series and trends
  • Representativity (spatial, temporal)
Gap status after GAIA-CLIM: 
  • GAIA-CLIM explored and demonstrated potential solutions to close this gap in the future

GAIA-CLIM metadata standards and format harmonization have been carried out with aim to provide a model for facilitating the users access and the usability in-situ data. This exercise included the establishment and documentation of common metadata and data formats for a selected subset of networks that will contribute to the Virtual Observatory. The Virtual Observatory facility shall also support the remedy of this gap by providing data format conversion for various input data and a data extraction function that makes the outputs available in user friendly formats.

GAIA-CLIM activities will be followed up by the Copernicus Climate Change Service, where, for a selected number of networks reviewed within GAIA-CLIM, the harmonization of the data and metadata format and structure is ongoing. According to the requirements provided by the Copernicus end-users through the C3S Sectoral Information System (SIS) projects, this effort involves the implementation of a common data model compliant with the ECWMF Observational DataBase (ODB) and a data-management facility, which shall become part of the operational C3S services at the end of the above-mentioned contract. 

Part II Benefits to resolution and risks to non-resolution

Identified benefitUser category/Application area benefittedProbability of benefit being realisedImpacts
Full data interoperability and availability of full metadata records for reprocessing of CDRs
  • Operational services and service development (meteorological services, environmental services, Copernicus services C3S & CAMS, operational data assimilation development, etc.)
  • Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
  • High
Unlimited use of available datasets in a synergetic way for any kind of climate and weather study. Facilitate the interoperability among the existing international data repository.
Increase in the usage of multiple satellite and non-satellite products for research study, operational and downstream services.
  • All users and application areas will benefit from it
  • High
Improved accuracy of the weather and climate projections.
Increased number of products delivered by any type of service for different sector.
Identified riskUser category/Application area at riskProbability of risk being realisedImpacts
Missing interoperability between independent metadata standardization communities
  • Operational services and service development (meteorological services, environmental services, Copernicus services C3S & CAMS, operational data assimilation development, etc.)
  • Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
  • Medium
Limited cross-domain collaboration and data exchange between different communities. Limits the ability to appropriately use and derive value from the data.
Limitations on the development of robust downstream services
  • Operational services and service development (meteorological services, environmental services, Copernicus services C3S & CAMS, operational data assimilation development, etc.)
  • Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
  • High
  • Medium
Challenges to the creation of downstream products and services by Copernicus able to satisfy the needs of European and global markets.
Continued need for data format conversion tools that are established by many different groups.
  • Operational services and service development (meteorological services, environmental services, Copernicus services C3S & CAMS, operational data assimilation development, etc.)
  • Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
  • High
Preventing easy data exploitation due to continued need for data format conversion tools that are established by many different groups.
General higher cost or longer times for data handling before achieving results.

Part III Gap remedies

Gap remedies: 

Remedy 1: Design and implementation of unified metadata format under a common data model

Primary gap remedy type: 
Governance
Secondary gap remedy type: 
Technical
Technical remedy: 
TRL7
Proposed remedy description: 

To develop a sustained service, metadata, and data quality and data validation are of crucial importance. Their harmonization is a requirement which is intended to establish a common understanding of the data content, to ensure correct and proper use and interpretation of the data by its owners and users, thus maximizing the benefit for the users. To address the current heterogeneity in the metadata standards, a collaborative effort among different communities and stakeholders must be undertaken. The technical approach to adopt could be of two different types:

  1. A common data model merging the metadata information provided in the various existent metadata formats (CFNetCDF, WIGOS, ISO-19115, and NASA-Ames mainly) must be adopted. This allows users to provide, as realised within GAIA-CLIM, a unified metadata format (UMDF) that retains all contributing metadata and that is extendable should new metadata elements be required. This leads to an improvement in the discoverability of data and enables an easy and comprehensive conversion into a multitude of formats desired by end users. Similar efforts include the smart extensions of existing international standards like Climate Science Modelling Language (CSML), developed by University of Reading on the basis of ISO19115 or the UNIDATA abstract model.

The Copernicus Climate Change Service is already extending the scope of the GAIA-CLIM work for selected Baseline and Reference in-situ observations to make metadata and data compatible with Observation Data Base (ODB) developed at ECMWF. The use of a CDM (and consequently of a UMD) could make a significant attempt to improve the metadata harmonization at the international level can also facilitate the interoperability and, if possible, the integration of the existing data repositories improving the users access to the data from multiple suppliers and collected with different measurement techniques.

2. A different approach is to adopt or customize one broadly used standard for both discovery and observation metadata and to provide users with a number of software converters to map the metadata onto the most commonly used international standards. To date, this has been the approach adopted by various international bodies (WMO, ESA, GCOS, GEOSS, GAW...). It must be noted that this solution, as well as being more computationally consuming, might arise substantial challenges in the metadata conversion from one format to another (often left to the users themselves), with the possibility to lose information in the conversion between standards as the element-wise mapping is often not 1-to-1. 

Relevance: 

The proposed remedy will help to aid discoverability and interoperability of holdings and avoid the repetition of work for format conversions and conversions of data. The first suggested approach also allows us to preserve the richness of the original metadata. Its benefit may be expected to be large and affecting many type of (primarily expert) data users.

Expected viability for the outcome of success: 
  • Medium
Scale of work: 
  • Programmatic multi-year, multi-institution activity
Time bound to remedy: 
  • Less than 1 year
Indicative cost estimate (investment): 
  • Low cost (< 1 million)
Indicative cost estimate (exploitation): 
  • No
Potential actors: 
  • Copernicus funding
  • National Meteorological Services
  • WMO
  • ESA, EUMETSAT or other space agency