VALUE (text) format for station data

Weather station data are most often stored in the form of text/csv files. In the following, we describe the standard format for the station datasets used in VALUE. For instance, this is the format used for the VALUE_ECA_86 dataset (used in the validation experiments) which is used here as example to illustrate the data format.

Each dataset consists of a collection of text files (comma separated value, csv, format) following this structure:

  • stations.txt

This file contains the metadata information regarding the weather stations and, for each station, must  include at least an identification code (station_id), the longitude (longitude) and latitude (latitude); other metadata fields can also be included (altitude, name, etc.), but the columns station_id, longitude and latitude are compulsory, and should have these particular labels. For intance, the folder of the VALUE_ECA_86 dataset includes a stations.txt file containing:

station_id, name,            longitude, latitude,     altitude, source
000012,     GRAZ,            15.450000, 47.083100, 366,      ECA&D
000013,     INNSBRUCK, 11.400000, 47.266700, 577,      ECA&D
000014,     SALZBURG,   13.000000, 47.800000, 437,      ECA&D
... 

  • variables.txt

This file contains the information regarding the variables contained in the dataset, including their identification code (variable_id), description (name), units of measure (unit), the code used to identify missing data (missing_code) and other info that can be optionally included.

variable_id, name, unit, missing_code, type, source
precip, Total_precipitation_accumulated_in_24 hours, mm, NaN, observation, ECA&D
tmean, Daily_maximum_temperature, degC, NaN, observation, ECA&D
tmin, Daily_minimum_temperature, degC, NaN, observation, ECA&D
tmax, Daily_mean_temperature, degC, NaN, observation, ECA&D

  • Data files, one for each variable (variable_id.txt). variable_id is the identification code of the variable (e.g. tmin.txt).

Variables are stored separately in text files named as indicated by the variable field in the variable.txt file. The first column of the file represents the observation date dates, following the format YYYYMMDD. More exceptionally in downscaling applications, time records for subdaily data can be indicated using the format YYYYMMDDHH. The remaining columns (2 to n) correspond to the observed series at each station, following the order indicated by the station_id labels in the first row (from the stations.txt file). The following is a (truncated) example file for the minimum daily temperature data (file tmin.txt):

YYYYMMDD, 000012, 000013, 000014, 000015, 000016, 000017, ...
19610101,      -4.7,      -4.8,       -5.2,       -13.7,     -1.7,       1.2,        ...
19610102,      -1.2,      -2.0,       -1.2,       -13.2,     -0.3,       2.0,        ... 
...