This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 640276.
G1.06 Currently heterogeneous metadata standards hinder data discoverability and usability
The need for extensive and accurate metadata is ever increasing in both research and operations, enabling large-scale, distributed management of resources. Recent years have seen a growth in interaction between previously relatively isolated communities, driven by a need for cross-domain collaboration and exchange of data and products. However, metadata standards have generally not been able to meet the needs of interoperability between independent standardization communities. Observations without useable metadata are of very limited use as the metadata provides key context such as the time, location, and modality of the measurements. Several efforts have been undertaken to improve the harmonization of metadata across the networks and international programs, but currently this is still insufficient.
Part I Gap description
- Technical (missing tools, formats etc.)
- Governance (missing documentation, cooperation etc.)
- Temperature,Water vapour, Ozone, Aerosols, Carbon Dioxide, Methane
- Operational services and service development (meteorological services, environmental services, Copernicus Climate Change Service (C3S) and Atmospheric Monitoring Service (CAMS), operational data assimilation development, etc.)
- International (collaborative) frameworks and bodies (space agencies, EU institutions, WMO programmes/frameworks etc.)
- Climate research (research groups working on development, validation and improvement of ECV Climate Data Records)
- Independent of instrument technique
G1.03 and G1.04 should be addressed after G1.06
The resolution of G1.06 would bring invaluable benefits to support the resolution of G1.03 and G1.04 by facilitating the review of existing capabilities, starting from rich standardized information and enabling classification of measurement maturity in a more accurate way.
G5.01 should be addressed with this gap
Metadata harmonization across multiple data provides will also positively impact on the interoperability among different data repositories with clear benefits for addressing gap G5.01
Metadata is an increasingly essential tool enabling large-scale, distributed management of resources. Recent years have seen a growth in interaction between previously relatively isolated communities across observing domains and techniques, driven by a need for interdisciplinary research and understanding. However, metadata standards have not been able to meet the needs of interoperability between these to date largely independent communities and networks. Observations without metadata are of very limited use: it is only when accompanied by adequate metadata (data describing the data) that the full potential of the observations can be realized. Format conversions always bring with them the danger of destroying information in the process, in particular in the accompanying metadata, which usually receives less attention.
Several efforts have been undertaken to improve the harmonization of metadata across numerous networks and international programs, but this is still not sufficient. Harmonization effort in the atmospheric science community is starting to be addressed by the emerging WIGOS standards, currently under development and subsequent implementation at the WMO, and by the ESA Climate Change Initiative (CCI), amongst others. Copernicus Climate Change Service Data Store activities are also highly relevant to this gap. There are also challenges that arise due to interoperability across observational domains (surface, atmospheric, oceanic, terrestrial etc.).
- Independent of specific space mission or space instruments
- Radiance (Level 1 product)
- Geophysical product (Level 2 product)
- Gridded product (Level 3)
- Assimilated product (Level 4)
- Time series and trends
- Representativity (spatial, temporal)
- GAIA-CLIM explored and demonstrated potential solutions to close this gap in the future
GAIA-CLIM metadata standards and format harmonization have been carried out with aim to provide a model for facilitating the users’ access and the usability in-situ data. This exercise included the establishment and documentation of common metadata and data formats for a selected subset of networks that will contribute to the Virtual Observatory. The Virtual Observatory facility shall also support the remedy of this gap by providing data format conversion for various input data and a data extraction function that makes the outputs available in user friendly formats.
GAIA-CLIM activities will be followed up by the Copernicus Climate Change Service, where, for a selected number of networks reviewed within GAIA-CLIM, the harmonization of the data and metadata format and structure is ongoing. According to the requirements provided by the Copernicus end-users through the C3S Sectoral Information System (SIS) projects, this effort involves the implementation of a common data model compliant with the ECWMF Observational DataBase (ODB) and a data-management facility, which shall become part of the operational C3S services at the end of the above-mentioned contract.
Part II Benefits to resolution and risks to non-resolution
Identified benefit | User category/Application area benefitted | Probability of benefit being realised | Impacts |
---|---|---|---|
Full data interoperability and availability of full metadata records for reprocessing of CDRs |
|
| Unlimited use of available datasets in a synergetic way for any kind of climate and weather study. Facilitate the interoperability among the existing international data repository. |
Increase in the usage of multiple satellite and non-satellite products for research study, operational and downstream services. |
|
| Improved accuracy of the weather and climate projections. Increased number of products delivered by any type of service for different sector. |
Identified risk | User category/Application area at risk | Probability of risk being realised | Impacts |
---|---|---|---|
Missing interoperability between independent metadata standardization communities |
|
| Limited cross-domain collaboration and data exchange between different communities. Limits the ability to appropriately use and derive value from the data. |
Limitations on the development of robust downstream services |
|
| Challenges to the creation of downstream products and services by Copernicus able to satisfy the needs of European and global markets. |
Continued need for data format conversion tools that are established by many different groups. |
|
| Preventing easy data exploitation due to continued need for data format conversion tools that are established by many different groups. General higher cost or longer times for data handling before achieving results. |
Part III Gap remedies
Remedy 1: Design and implementation of unified metadata format under a common data model
To develop a sustained service, metadata, and data quality and data validation are of crucial importance. Their harmonization is a requirement which is intended to establish a common understanding of the data content, to ensure correct and proper use and interpretation of the data by its owners and users, thus maximizing the benefit for the users. To address the current heterogeneity in the metadata standards, a collaborative effort among different communities and stakeholders must be undertaken. The technical approach to adopt could be of two different types:
- A common data model merging the metadata information provided in the various existent metadata formats (CFNetCDF, WIGOS, ISO-19115, and NASA-Ames mainly) must be adopted. This allows users to provide, as realised within GAIA-CLIM, a unified metadata format (UMDF) that retains all contributing metadata and that is extendable should new metadata elements be required. This leads to an improvement in the discoverability of data and enables an easy and comprehensive conversion into a multitude of formats desired by end users. Similar efforts include the smart extensions of existing international standards like “Climate Science Modelling Language” (CSML), developed by University of Reading on the basis of ISO19115 or the UNIDATA abstract model.
The Copernicus Climate Change Service is already extending the scope of the GAIA-CLIM work for selected Baseline and Reference in-situ observations to make metadata and data compatible with Observation Data Base (ODB) developed at ECMWF. The use of a CDM (and consequently of a UMD) could make a significant attempt to improve the metadata harmonization at the international level can also facilitate the interoperability and, if possible, the integration of the existing data repositories improving the users’ access to the data from multiple suppliers and collected with different measurement techniques.
2. A different approach is to adopt or customize one broadly used standard for both discovery and observation metadata and to provide users with a number of software converters to map the metadata onto the most commonly used international standards. To date, this has been the approach adopted by various international bodies (WMO, ESA, GCOS, GEOSS, GAW...). It must be noted that this solution, as well as being more computationally consuming, might arise substantial challenges in the metadata conversion from one format to another (often left to the users themselves), with the possibility to lose information in the conversion between standards as the element-wise mapping is often not 1-to-1.
The proposed remedy will help to aid discoverability and interoperability of holdings and avoid the repetition of work for format conversions and conversions of data. The first suggested approach also allows us to preserve the richness of the original metadata. Its benefit may be expected to be large and affecting many type of (primarily expert) data users.
- Medium
- Programmatic multi-year, multi-institution activity
- Less than 1 year
- Low cost (< 1 million)
- No
- Copernicus funding
- National Meteorological Services
- WMO
- ESA, EUMETSAT or other space agency