Database layout¶
A formal description of the data base layout used by the WHDP.
Legend:
pk
= Primary Keyfk
= Foreign Keyuq
= Unique within table- A
field
in bold letters indicates a field which cannot be NULL
Design principles¶
The design of the database follows the https://en.wikipedia.org/wiki/Star_schema to model multidimensional data with a https://en.wikipedia.org/wiki/Data_warehouse.
You find a graphical description of the star schema here
.
We follow these principles to assure a consistent layout of the underlying tables:
- primary keys of a table are called
tablename\_id
instead ofid
- table names are in singular
- the star schema avoids too much normalization
- a table should not contain too abstract information
signal¶
This is the central table holding the measurements generated by online sensors. Each row represents a value of a measured variable at a given time and location (site).
Note: In the original UWO datapool, it is so that the coordinates of the signal may not correlate to entries in the site table. In the WHDP this is not a concern. For this reason, it is sufficient to link the online sensor measurement to an instance of site without additional coordinates.
field | datatype | description |
---|---|---|
signal_id | integer (pk) | |
value | float | the actual measured value of the variable |
timestamp | date_time | time when the value was measured. |
source_id | integer (fk) | source |
site_id | integer (fk) | site |
variable_id | integer (fk) | variable |
lab_results¶
This is the central table holding the measurements generated by offline laboratory measurements. Each row represents a value of a measured variable corresponding to a given sampling time (timestamp_sample) and location (site).
field | datatype | description |
---|---|---|
lab_result_id | integer (pk) | automatically generated during upload |
lab_identifier | string (uq) | string identifying the measurement |
sample_identifier | string | string identifying the sample |
variable_id | integer (fk) | unique id of variable description |
filter_lab | string | applied filter during analysis |
dilution_lab | float | applied dilution during analysis (>= 1) |
method_lab | string | analytic method |
value_lab | float | obtained measurement |
description_lab | string | comment about the analysis |
person_id_lab | integer (fk) | unique id of person executing the analysis |
timestamp_start_lab | date_time | time of first analysis of this sample |
timestamp_end_lab | date_time | time of last analysis of this sample |
site_id | integer (fk) | unique id of site where sample was taken |
person_id_sample | integer (fk) | unique id of person taking the sample |
filter_sample | string | applied filter during sampling |
dilution_sample | string | applied dilution during sampling (>=1) |
timestamp_sample | date_time | time of sampling |
description_sample | string | comment about the sample |
site¶
A site is a specific location where measurements are made or samples are taken. At a given site, several measuring devices (source) can be found. The location of the site is described by a 4-level hierarchy including unit (=room, e.g. NE_A17), area (e.g., GW), setup (e.g., GW), and component (e.g., MBR-Tank1). Together, these four elements are combined to provide a unique name (e.g., NE_WaterHub_GW_GW_MBR-Tank1) for the site.
field | datatype | description |
---|---|---|
site_id | integer (pk) | |
name | string (uq) | name of that site |
description | string | |
unit | string | room |
area | string | room section |
setup | string | experimental system (same controller) |
component | string | subsystem of the experimental system |
status | string | status of the subsystem |
picture¶
Every site may contain a number of pictures. Filenames for each site must be unique. The file type (e.g. png, jpg, tiff) is determined by the filename extension of the filename field.
field | datatype | description |
---|---|---|
picture_id | integer (pk) | |
site_id | integer (fk) | referring to the site |
filename | string | |
description | string | additional information about the picture |
data | bytea | contains the (binary) content of the file |
timestamp | date_time | creation date of the picture |
source¶
Description of an online data-generating device. A (data-) source is a specific measuring equipment, i.e. an instance of the source_type class. Every measurement (signal) is produced by a source. The name of a source must be unique.
Note: For devices that change location frequently, it may be best to not list the site_id in this table and only specify site_id in the signal table.
field | datatype | description |
---|---|---|
source_id | integer (pk) | |
source_type_id | integer (fk) | source category |
name | string (uq) | short name for device, e.g. plc_grey_v1.0, concube_grey1) |
description | string | description of device (e.g., S::CAN con::cube) |
serial | string | serial number (unique, if available) |
manufacturer | string | company which produced that equipment |
source_type¶
Categorization of a given source.
field | datatype | description |
---|---|---|
source_type_id | integer (pk) | |
name | string (uq) | short name for device type (e.g., plc_grey) |
description | string | device type description (e.g., WAGO programmable logic controller) |
special_value_definition¶
Certain source types produce categorical data, such as «dry», «wet»,
«n/a» and so on. This table is used to correlate categorical data and numeric
values for a given source type. For example the numerical value 1
might encode
the state «dry».
field | datatype | description |
---|---|---|
special_value_definition_id | integer (pk) | |
source_type_id | integer (fk) | source_type |
description | string | |
categorical_value | string | the categorical value |
numerical_value | float | the numeric value it is mapped to |
variables¶
Every value in the signal table is connected to a specific variable which describes and defines its unit.
field | datatype | description |
---|---|---|
variable_id | integer (pk) | |
name | string (uq) | short name for variable (e.g. lab_cod, actuator_bf, sensor_bq_cond) |
description | string | explanation of measurement; include reference to SOP(s) where available |
unit | string | unit of measurement (e.g. “m3 h-1”); use no unit when the variable is not a continuous scale (e.g., pump on/off ); use 1 for dimensionless variables (e.g., pH) |
comment¶
There are two types of signal annotations: comments and quality. A comment is an arbitrary text, where as quality annotations have a controlled vocabulary. A signal may contain more than one comment.
Note: Note that current implementation uses an associative table to link each comment (comment_id) to a signal (signal_id). This allows many-to-one associations (multiple comments for same signal by different people) as well as one-to-many associations (one comment for multiple measurements).
field | datatype | description |
---|---|---|
comment_id | integer (pk) | |
signal_id | integer (fk) | (via association table) |
text | string | the comment itself |
timestamp | date_time | the time the comment was added |
person_id | integer (fk) | identified of the author who added the annotation |
signal_quality¶
An online sensor measurement may contain more than one quality flag (but not the same quality flag twice). The combination of signal_id, quality_id, and person_id must be unique.
Note: Note that current implementation uses an associative table to link each label (signal_quality_id) to a signal (signal_id). This allows many-to-one associations (multiple labels for same signal by different people) as well as one-to-many associations (one label for multiple measurements).
field | datatype | description |
---|---|---|
signal_quality_id | integer (pk) | |
quality_definition_id | integer (fk) | |
signal_id | integer (fk) | (via association table) |
timestamp | date_time | date when annotation was added |
quality¶
Measurements contain errors. This table holds the controlled vocabulary mentioned above. As some quality flags may be assigned automatically, the method field indicates the origin of such a quality entry.
field | datatype | description |
---|---|---|
quality_id | integer (pk) | |
flag | string (uq) | a textual description of quality_id |
method | string | a description how the quality flag is generated. |
person_id | integer (fk) | author who added the annotation |
persons¶
Information about the staff involved in sampling, laboratory analysis, data quality checks, and commenting.
field | datatype | description |
---|---|---|
person_id | integer (pk) | |
abbrev | string (uq) | unique identifier for the staff member (alias) |
name | string | name of staff member |
department | string | department acronym |
project¶
Information about projects.
Note: This is considered legacy code obtained from the original UWO datapool software. Since the entries in this table are not linked anywhere, filling this table has lowest priority.
field | datatype | description |
---|---|---|
project_id | integer (pk) | |
abbrev | string (uq) | unique identifier for the project |
title | string | project title |
description | string | project goals |