Database layout

A formal description of the data base layout used by the WHDP.

Legend:

  • pk = Primary Key
  • fk = Foreign Key
  • uq = Unique within table
  • A field in bold letters indicates a field which cannot be NULL

Design principles

The design of the database follows the https://en.wikipedia.org/wiki/Star_schema to model multidimensional data with a https://en.wikipedia.org/wiki/Data_warehouse.

You find a graphical description of the star schema here.

We follow these principles to assure a consistent layout of the underlying tables:

  • primary keys of a table are called tablename\_id instead of id
  • table names are in singular
  • the star schema avoids too much normalization
  • a table should not contain too abstract information

signal

This is the central table holding the measurements generated by online sensors. Each row represents a value of a measured variable at a given time and location (site).

Note: In the original UWO datapool, it is so that the coordinates of the signal may not correlate to entries in the site table. In the WHDP this is not a concern. For this reason, it is sufficient to link the online sensor measurement to an instance of site without additional coordinates.

field datatype description
signal_id integer (pk)  
value float the actual measured value of the variable
timestamp date_time time when the value was measured.
source_id integer (fk) source
site_id integer (fk) site
variable_id integer (fk) variable

lab_results

This is the central table holding the measurements generated by offline laboratory measurements. Each row represents a value of a measured variable corresponding to a given sampling time (timestamp_sample) and location (site).

field datatype description
lab_result_id integer (pk) automatically generated during upload
lab_identifier string (uq) string identifying the measurement
sample_identifier string string identifying the sample
variable_id integer (fk) unique id of variable description
filter_lab string applied filter during analysis
dilution_lab float applied dilution during analysis (>= 1)
method_lab string analytic method
value_lab float obtained measurement
description_lab string comment about the analysis
person_id_lab integer (fk) unique id of person executing the analysis
timestamp_start_lab date_time time of first analysis of this sample
timestamp_end_lab date_time time of last analysis of this sample
site_id integer (fk) unique id of site where sample was taken
person_id_sample integer (fk) unique id of person taking the sample
filter_sample string applied filter during sampling
dilution_sample string applied dilution during sampling (>=1)
timestamp_sample date_time time of sampling
description_sample string comment about the sample

site

A site is a specific location where measurements are made or samples are taken. At a given site, several measuring devices (source) can be found. The location of the site is described by a 4-level hierarchy including unit (=room, e.g. NE_A17), area (e.g., GW), setup (e.g., GW), and component (e.g., MBR-Tank1). Together, these four elements are combined to provide a unique name (e.g., NE_WaterHub_GW_GW_MBR-Tank1) for the site.

field datatype description
site_id integer (pk)  
name string (uq) name of that site
description string  
unit string room
area string room section
setup string experimental system (same controller)
component string subsystem of the experimental system
status string status of the subsystem

picture

Every site may contain a number of pictures. Filenames for each site must be unique. The file type (e.g. png, jpg, tiff) is determined by the filename extension of the filename field.

field datatype description
picture_id integer (pk)  
site_id integer (fk) referring to the site
filename string  
description string additional information about the picture
data bytea contains the (binary) content of the file
timestamp date_time creation date of the picture

source

Description of an online data-generating device. A (data-) source is a specific measuring equipment, i.e. an instance of the source_type class. Every measurement (signal) is produced by a source. The name of a source must be unique.

Note: For devices that change location frequently, it may be best to not list the site_id in this table and only specify site_id in the signal table.

field datatype description
source_id integer (pk)  
source_type_id integer (fk) source category
name string (uq) short name for device, e.g. plc_grey_v1.0, concube_grey1)
description string description of device (e.g., S::CAN con::cube)
serial string serial number (unique, if available)
manufacturer string company which produced that equipment

source_type

Categorization of a given source.

field datatype description
source_type_id integer (pk)  
name string (uq) short name for device type (e.g., plc_grey)
description string device type description (e.g., WAGO programmable logic controller)

special_value_definition

Certain source types produce categorical data, such as «dry», «wet», «n/a» and so on. This table is used to correlate categorical data and numeric values for a given source type. For example the numerical value 1 might encode the state «dry».

field datatype description
special_value_definition_id integer (pk)  
source_type_id integer (fk) source_type
description string  
categorical_value string the categorical value
numerical_value float the numeric value it is mapped to

variables

Every value in the signal table is connected to a specific variable which describes and defines its unit.

field datatype description
variable_id integer (pk)  
name string (uq) short name for variable (e.g. lab_cod, actuator_bf, sensor_bq_cond)
description string explanation of measurement; include reference to SOP(s) where available
unit string unit of measurement (e.g. “m3 h-1”); use no unit when the variable is not a continuous scale (e.g., pump on/off ); use 1 for dimensionless variables (e.g., pH)

comment

There are two types of signal annotations: comments and quality. A comment is an arbitrary text, where as quality annotations have a controlled vocabulary. A signal may contain more than one comment.

Note: Note that current implementation uses an associative table to link each comment (comment_id) to a signal (signal_id). This allows many-to-one associations (multiple comments for same signal by different people) as well as one-to-many associations (one comment for multiple measurements).

field datatype description
comment_id integer (pk)  
signal_id integer (fk) (via association table)
text string the comment itself
timestamp date_time the time the comment was added
person_id integer (fk) identified of the author who added the annotation

signal_quality

An online sensor measurement may contain more than one quality flag (but not the same quality flag twice). The combination of signal_id, quality_id, and person_id must be unique.

Note: Note that current implementation uses an associative table to link each label (signal_quality_id) to a signal (signal_id). This allows many-to-one associations (multiple labels for same signal by different people) as well as one-to-many associations (one label for multiple measurements).

field datatype description
signal_quality_id integer (pk)  
quality_definition_id integer (fk)  
signal_id integer (fk) (via association table)
timestamp date_time date when annotation was added

quality

Measurements contain errors. This table holds the controlled vocabulary mentioned above. As some quality flags may be assigned automatically, the method field indicates the origin of such a quality entry.

field datatype description
quality_id integer (pk)  
flag string (uq) a textual description of quality_id
method string a description how the quality flag is generated.
person_id integer (fk) author who added the annotation

persons

Information about the staff involved in sampling, laboratory analysis, data quality checks, and commenting.

field datatype description
person_id integer (pk)  
abbrev string (uq) unique identifier for the staff member (alias)
name string name of staff member
department string department acronym

project

Information about projects.

Note: This is considered legacy code obtained from the original UWO datapool software. Since the entries in this table are not linked anywhere, filling this table has lowest priority.

field datatype description
project_id integer (pk)  
abbrev string (uq) unique identifier for the project
title string project title
description string project goals