Database layout¶
A formal description of the data base layout used by the whdp datapool.
Legend:
pk
= Primary Keyfk
= Foreign Keyuq
= Unique within table- A
field
in bold letters indicates a field which cannot be NULL
signal¶
This is the central table holding the measurements. Each row represents a value of a parameter measured at a given time and location (site). The coordinates of the signal may not correlate to entries in the site table.
field | datatype | description |
---|---|---|
signal_id | integer (pk) | |
value | float | the actual measured value of the parameter |
timestamp | date_time | time when the value was measured. |
parameter_id | integer (fk) | parameter |
source_id | integer (fk) | source |
site_id | integer (fk) | site |
coord_x | string | at a given site, a signal may origin from a specific place |
coord_y | string | the type of coordinate system is CH1903/LV03 |
coord_z | string | elevation |
site¶
A site is a location where measurements are made. At a given site, several measuring equipments (source) can be found. The location of the site is also described by its coordinates.
field | datatype | description |
---|---|---|
site_id | integer (pk) | |
name | string (uq) | Name of that site |
description | string | |
street | string | street and number |
postcode | string | postal code |
city | string | name of the city/village |
coord_x | string | coordinates of a given site |
coord_y | string | the type of coordinate system is CH1903/LV03 |
coord_z | string | elevation |
picture¶
Every site may contain a number of pictures. Filenames for each site
must be unique. The filetype (e.g. png, jpg, tiff) is determined by the filename
extenion of the filename
field.
field | datatype | description |
---|---|---|
picture_id | integer (pk) | |
site_id | integer (fk) | referring to the site |
filename | string | |
description | string | additional information about the picture |
data | bytea | contains the (binary) content of the file |
timestamp | date_time | creation date of the picture |
source¶
A (data-) source is a specific measuring equipment. Every measurement (signal) origins from a specific source. Sources are categorized into source_types. The name of a source must be unique.
field | datatype | description |
---|---|---|
source_id | integer (pk) | |
source_type_id | integer (fk) | source category |
site_id | integer (fk) | optional reference to a site (may be NULL) |
name | string (uq) | Name of that source. Usually is a combination of source_type and site name. |
description | string | |
serial | string | serial number. Is unique, if available |
manufacturer | string | company which produced that equipment |
manufacturing_date | date | date when the equipment was manufactured |
source_type¶
Categorization of a given source.
field | datatype | description |
---|---|---|
source_type_id | integer (pk) | pk |
name | string (uq) | Name of that source |
description | string |
special_value_definition¶
Certain source types produce categorical data, such as «dry», «wet»,
«n/a» and so on. This table is used to correlate categorical data and numeric
values for a given source type. For example the numerical value 1
might encode
the state «dry».
field | datatype | description |
---|---|---|
special_value_definition_id | integer (pk) | |
source_type_id | integer (fk) | source_type |
description | string | |
categorical_value | string | the catecorical value |
numerical_value | float | the numeric value it is mapped to. |
parameter¶
Every value in the signal table is connected to a specific parameter which describes and defines its unit.
field | datatype | description |
---|---|---|
parameter_id | integer (pk) | |
name | string (uq) | e.g. “rain intensity”, “absorbance 200.00”, etc. |
description | string | |
unit | string | the physical unit, e.g. “mm/h”, “m-1” |
parameter_averaging¶
Sometimes, data coming from a source comes already processed. For example, the windspeed is the average speed during a certain time period. These kind of information is parameter- and source-specific.
field | datatype | description |
---|---|---|
parameter_averaging_id | integer (pk) | not really necessary, since parameter_id and source_id together are unique |
parameter_id | integer (fk) | parameter |
source_id | integer (fk) | source |
integration_length_x | float | data integration in meters |
integration_length_y | float | data integration in meters |
integration_angle | float | data integration in degrees |
integration_time | float | data integration time in seconds |
comment¶
There are two types of signal annotations: comments and quality. A comment is an arbitrary text, where as quality annotations have a controlled vocabulary. A signal may contain more than one comment.
field | datatype | description |
---|---|---|
comment_id | interger (pk) | |
signal_id | integer (fk) | |
text | string | the comment itself |
timestamp | date_time | the time the comment was added |
author | string | the username of the author who added the comment |
signal_quality¶
A signal may contain more than one quality flag (but not the same quality flag twice). The combination of signal_id and quality_id must be unique.
field | datatype | description |
---|---|---|
signal_quality_id | integer (pk) | |
signal_id | integer (fk) | |
quality_id | integer (fk) | |
timestamp | date_time | date when annotation was added |
author | string | username of the author who added the annotation |
quality¶
Measuring the environment is always error prone. This table holds the controlled vacabulary mentioned above. As some quality flags may be assigned programatically the method field indicates the origin of such an quality entry.
field | datatype | description |
---|---|---|
quality_id | integer (pk) | |
flag | string (uq) | a textual description of quality_id |
method | string | a description how the quality flag is generated. |
Design priniciples¶
The design of the database follows the https://en.wikipedia.org/wiki/Star_schema to model multidimensional data with a https://en.wikipedia.org/wiki/Data_warehouse.
You find a graphical descripton of the star schema here
.
We follow these principles to assure a consistent layout of the underlying tables:
- primary keys of a table are called
tablename\_id
instead ofid
- table names are in singular
- the star schema avoids too much normalization
- a table should not contain too abstract information