Database layout¶

A formal description of the data base layout used by the WHDP.

Legend:

pk = Primary Key
fk = Foreign Key
uq = Unique within table
A field in bold letters indicates a field which cannot be NULL

Design principles¶

The design of the database follows the https://en.wikipedia.org/wiki/Star_schema to model multidimensional data with a https://en.wikipedia.org/wiki/Data_warehouse.

You find a graphical description of the star schema here.

We follow these principles to assure a consistent layout of the underlying tables:

primary keys of a table are called tablename\_id instead of id
table names are in singular
the star schema avoids too much normalization
a table should not contain too abstract information

signal¶

This is the central table holding the measurements generated by online sensors. Each row represents a value of a measured variable at a given time and location (site).

Note: In the original UWO datapool, it is so that the coordinates of the signal may not correlate to entries in the site table. In the WHDP this is not a concern. For this reason, it is sufficient to link the online sensor measurement to an instance of site without additional coordinates.

field	datatype	description
signal_id	integer (pk)
value	float	the actual measured value of the variable
timestamp	date_time	time when the value was measured.
source_id	integer (fk)	source
site_id	integer (fk)	site
variable_id	integer (fk)	variable

lab_results¶

This is the central table holding the measurements generated by offline laboratory measurements. Each row represents a value of a measured variable corresponding to a given sampling time (timestamp_sample) and location (site).

field	datatype	description
lab_result_id	integer (pk)	automatically generated during upload
lab_identifier	string (uq)	string identifying the measurement
sample_identifier	string	string identifying the sample
variable_id	integer (fk)	unique id of variable description
filter_lab	string	applied filter during analysis
dilution_lab	float	applied dilution during analysis (>= 1)
method_lab	string	analytic method
value_lab	float	obtained measurement
description_lab	string	comment about the analysis
person_id_lab	integer (fk)	unique id of person executing the analysis
timestamp_start_lab	date_time	time of first analysis of this sample
timestamp_end_lab	date_time	time of last analysis of this sample
site_id	integer (fk)	unique id of site where sample was taken
person_id_sample	integer (fk)	unique id of person taking the sample
filter_sample	string	applied filter during sampling
dilution_sample	string	applied dilution during sampling (>=1)
timestamp_sample	date_time	time of sampling
description_sample	string	comment about the sample

site¶

A site is a specific location where measurements are made or samples are taken. At a given site, several measuring devices (source) can be found. The location of the site is described by a 4-level hierarchy including unit (=room, e.g. NE_A17), area (e.g., GW), setup (e.g., GW), and component (e.g., MBR-Tank1). Together, these four elements are combined to provide a unique name (e.g., NE_WaterHub_GW_GW_MBR-Tank1) for the site.

field	datatype	description
site_id	integer (pk)
name	string (uq)	name of that site
description	string
unit	string	room
area	string	room section
setup	string	experimental system (same controller)
component	string	subsystem of the experimental system
status	string	status of the subsystem

picture¶

Every site may contain a number of pictures. Filenames for each site must be unique. The file type (e.g. png, jpg, tiff) is determined by the filename extension of the filename field.

field	datatype	description
picture_id	integer (pk)
site_id	integer (fk)	referring to the site
filename	string
description	string	additional information about the picture
data	bytea	contains the (binary) content of the file
timestamp	date_time	creation date of the picture

source¶

Description of an online data-generating device. A (data-) source is a specific measuring equipment, i.e. an instance of the source_type class. Every measurement (signal) is produced by a source. The name of a source must be unique.

Note: For devices that change location frequently, it may be best to not list the site_id in this table and only specify site_id in the signal table.

field	datatype	description
source_id	integer (pk)
source_type_id	integer (fk)	source category
name	string (uq)	short name for device, e.g. plc_grey_v1.0, concube_grey1)
description	string	description of device (e.g., S::CAN con::cube)
serial	string	serial number (unique, if available)
manufacturer	string	company which produced that equipment

source_type¶

Categorization of a given source.

field	datatype	description
source_type_id	integer (pk)
name	string (uq)	short name for device type (e.g., plc_grey)
description	string	device type description (e.g., WAGO programmable logic controller)

special_value_definition¶

Certain source types produce categorical data, such as «dry», «wet», «n/a» and so on. This table is used to correlate categorical data and numeric values for a given source type. For example the numerical value 1 might encode the state «dry».

field	datatype	description
special_value_definition_id	integer (pk)
source_type_id	integer (fk)	source_type
description	string
categorical_value	string	the categorical value
numerical_value	float	the numeric value it is mapped to

variables¶

Every value in the signal table is connected to a specific variable which describes and defines its unit.

field	datatype	description
variable_id	integer (pk)
name	string (uq)	short name for variable (e.g. lab_cod, actuator_bf, sensor_bq_cond)
description	string	explanation of measurement; include reference to SOP(s) where available
unit	string	unit of measurement (e.g. “m3 h-1”); use no unit when the variable is not a continuous scale (e.g., pump on/off ); use 1 for dimensionless variables (e.g., pH)

comment¶

There are two types of signal annotations: comments and quality. A comment is an arbitrary text, where as quality annotations have a controlled vocabulary. A signal may contain more than one comment.

Note: Note that current implementation uses an associative table to link each comment (comment_id) to a signal (signal_id). This allows many-to-one associations (multiple comments for same signal by different people) as well as one-to-many associations (one comment for multiple measurements).

field	datatype	description
comment_id	integer (pk)
signal_id	integer (fk)	(via association table)
text	string	the comment itself
timestamp	date_time	the time the comment was added
person_id	integer (fk)	identified of the author who added the annotation

signal_quality¶

An online sensor measurement may contain more than one quality flag (but not the same quality flag twice). The combination of signal_id, quality_id, and person_id must be unique.

Note: Note that current implementation uses an associative table to link each label (signal_quality_id) to a signal (signal_id). This allows many-to-one associations (multiple labels for same signal by different people) as well as one-to-many associations (one label for multiple measurements).

field	datatype	description
signal_quality_id	integer (pk)
quality_definition_id	integer (fk)
signal_id	integer (fk)	(via association table)
timestamp	date_time	date when annotation was added

quality¶

Measurements contain errors. This table holds the controlled vocabulary mentioned above. As some quality flags may be assigned automatically, the method field indicates the origin of such a quality entry.

field	datatype	description
quality_id	integer (pk)
flag	string (uq)	a textual description of quality_id
method	string	a description how the quality flag is generated.
person_id	integer (fk)	author who added the annotation

persons¶

Information about the staff involved in sampling, laboratory analysis, data quality checks, and commenting.

field	datatype	description
person_id	integer (pk)
abbrev	string (uq)	unique identifier for the staff member (alias)
name	string	name of staff member
department	string	department acronym

project¶

Information about projects.

Note: This is considered legacy code obtained from the original UWO datapool software. Since the entries in this table are not linked anywhere, filling this table has lowest priority.

field	datatype	description
project_id	integer (pk)
abbrev	string (uq)	unique identifier for the project
title	string	project title
description	string	project goals