G5.02

G5.02    Access to and use of reference and satellite data provided in different data formats and structures (e.g. granularity of data) prevents easy exploitation

Gap detailed description

The comparison of satellite data and reference measurements iss further complicated through the fact that data are provided in multiple data formats, e.g., HDF, NetCDF, BUFR, ASCII, etc, and in different structures (granules vs. global datasets, level 1 vs. level 2 data). In particular, the granularity of available data may differ between data sources. The use of such data is complex as the inclusion into a common data base that allows geographical and temporal sub-setting and the reliable use of data analysis tools requires format conversion modules for each format used on the input side. Format conversions always  bring with them the danger of destroying information, in particular in the accompanying meta-data. For instance data flag definitions are often coded in meta-data and during format conversion they are not correctly transferred due to bugs in the conversion software, which can render flags in the data useless as they cannot be interpreted anymore.

Different granularity of the data creates work to collect and resample data until they represent the same area and time. Then, to perform a comparison, data need to be co-located using specific criteria. For this aspect, not having access to the highest temporal resolution for the reference data can really hamper the comparison, e.g., if they cannot be brought close to the satellite measurement in a meaningful way. For instance, if we have a vertical profile at one place that should be compared to a snapshot from a satellite with a certain geographical coverage and spatial resolution, one needs the reference data to be available at the highest possible frequency in order to average over timescales representative for the spatial variability, as seen by the satellite.

Work to achieve correct co-locations under the described conditions are repeated by users many times, which is a gross redundancy in effort and prone to processing errors.

Activities within GAIA-CLIM related to this gap

WP5 activities support the remedy of this gap by providing data format conversion tools for various input data and a data extraction function that makes the outputs available in user friendly formats.

Gap remedy

Develop a Virtual Observatory that converts all input data formats into a data base and also adapts the granularity of input data to ensure efficient use of the data in reference measurement and satellite data comparisons.

Remedy #1

Specific remedy proposed

The inputs to the VO must be reformatted which can partly rely on existing data conversion tools (e.g. cdo), but for some there is the need to develop new tools to convert the datasets into the appropriate format. Some effort will have to be made to standardise metadata. The granularity of the input data needs to be adjusted to support comparisons from different data-bases. Outputs need to be made available in ‘most wanted’ formats as indicated in the recent GAIA-CLIM user survey that clearly indicated the usefulness of NetCDF to target users.

Measurable outcome of success

Success can be measured by assessing if for all available data formats readily available conversion tools exist that allow integration into the GAIA-CLIM VO.

Achievable outcomes

Technological viability: High

Indicative cost estimate: low (<1 million)

Relevance

The proposed remedy will help to avoid the repetition of work for format conversions and conversions of data provided with different granularity into data sets that can be compared.

Timebound

The envisaged demonstrator developments are within the timebounds of the GAIA-CLIM project.

Gap risks to non-resolution

 

Identified future risk / impact

Probability of occurrence if gap not remedied

Downstream impacts on ability to deliver high quality services to science / industry / society

Continued need for data format conversion tools that are established by many different groups.

High

General higher cost for data handling before achieving results.

 

Work package: 
WP6