Database layout =============== A formal description of the data base layout used by the WHDP. Legend: - ``pk`` = Primary Key - ``fk`` = Foreign Key - ``uq`` = Unique within table - A ``field`` in **bold letters** indicates a field which cannot be NULL Design principles ----------------- The design of the database follows the https://en.wikipedia.org/wiki/Star_schema to model multidimensional data with a https://en.wikipedia.org/wiki/Data_warehouse. You find a graphical description of the star schema :download:`here <./graphics/DataModel_WHDP.svg>`. We follow these principles to assure a consistent layout of the underlying tables: - primary keys of a table are called ``tablename\_id`` instead of ``id`` - table names are in singular - the star schema avoids too much normalization - a table should not contain too abstract information signal ------ This is the central table holding the measurements generated by online sensors. Each row represents a *value* of a measured variable at a given *time* and location (*site*). **Note:** In the original UWO datapool, it is so that *the coordinates of the signal may not correlate to entries in the site table*. In the WHDP this is not a concern. For this reason, it is sufficient to link the online sensor measurement to an instance of site without additional coordinates. +-------------------------+----------------+--------------------------------------------------------------+ | field | datatype | description | +=========================+================+==============================================================+ | **signal\_id** | integer (pk) | | +-------------------------+----------------+--------------------------------------------------------------+ | **value** | float | the actual measured value of the variable | +-------------------------+----------------+--------------------------------------------------------------+ | **timestamp** | date\_time |  time when the value was measured. | +-------------------------+----------------+--------------------------------------------------------------+ | **source\_id** | integer (fk) | source | +-------------------------+----------------+--------------------------------------------------------------+ | **site\_id** | integer (fk) | site | +-------------------------+----------------+--------------------------------------------------------------+ | **variable\_id** | integer (fk) | variable | +-------------------------+----------------+--------------------------------------------------------------+ lab\_results ------------ This is the central table holding the measurements generated by offline laboratory measurements. Each row represents a *value* of a measured variable corresponding to a given sampling time (*timestamp_sample*) and location (*site*). +---------------------------+---------------+-----------------------------------------------------------+ | field | datatype | description | +===========================+===============+===========================================================+ | **lab\_result\_id** | integer (pk) | automatically generated during upload | +---------------------------+---------------+-----------------------------------------------------------+ | **lab\_identifier** | string (uq) | string identifying the measurement | +---------------------------+---------------+-----------------------------------------------------------+ | **sample\_identifier** | string | string identifying the sample | +---------------------------+---------------+-----------------------------------------------------------+ | **variable\_id** | integer (fk) | unique id of variable description | +---------------------------+---------------+-----------------------------------------------------------+ | **filter\_lab** | string | applied filter during analysis | +---------------------------+---------------+-----------------------------------------------------------+ | dilution\_lab | float | applied dilution during analysis (>= 1) | +---------------------------+---------------+-----------------------------------------------------------+ | **method\_lab** | string | analytic method | +---------------------------+---------------+-----------------------------------------------------------+ | **value\_lab** | float | obtained measurement | +---------------------------+---------------+-----------------------------------------------------------+ | description\_lab | string | comment about the analysis | +---------------------------+---------------+-----------------------------------------------------------+ | **person\_id\_lab** | integer (fk) | unique id of person executing the analysis | +---------------------------+---------------+-----------------------------------------------------------+ | timestamp\_start\_lab | date_time | time of first analysis of this sample | +---------------------------+---------------+-----------------------------------------------------------+ | **timestamp\_end\_lab** | date_time | time of last analysis of this sample | +---------------------------+---------------+-----------------------------------------------------------+ | **site\_id** | integer (fk) | unique id of site where sample was taken | +---------------------------+---------------+-----------------------------------------------------------+ | **person\_id\_sample** | integer (fk) | unique id of person taking the sample | +---------------------------+---------------+-----------------------------------------------------------+ | **filter\_sample** | string | applied filter during sampling | +---------------------------+---------------+-----------------------------------------------------------+ | dilution\_sample | string | applied dilution during sampling (>=1) | +---------------------------+---------------+-----------------------------------------------------------+ | **timestamp\_sample** | date_time | time of sampling | +---------------------------+---------------+-----------------------------------------------------------+ | description\_sample | string | comment about the sample | +---------------------------+---------------+-----------------------------------------------------------+ site ---- A *site* is a specific location where measurements are made or samples are taken. At a given site, several measuring devices (*source*) can be found. The location of the site is described by a 4-level hierarchy including *unit* (=room, e.g. NE_A17), *area* (e.g., GW), *setup* (e.g., GW), and *component* (e.g., MBR-Tank1). Together, these four elements are combined to provide a unique *name* (e.g., NE_WaterHub_GW_GW_MBR-Tank1) for the *site*. +----------------+----------------+------------------------------------------------+ | field | datatype | description | +================+================+================================================+ | **site\_id** | integer (pk) | | +----------------+----------------+------------------------------------------------+ | **name** | string (uq) | name of that site | +----------------+----------------+------------------------------------------------+ | description | string | | +----------------+----------------+------------------------------------------------+ | unit | string | room | +----------------+----------------+------------------------------------------------+ | area | string | room section | +----------------+----------------+------------------------------------------------+ | setup | string | experimental system (same controller) | +----------------+----------------+------------------------------------------------+ | component | string | subsystem of the experimental system | +----------------+----------------+------------------------------------------------+ | status | string | status of the subsystem | +----------------+----------------+------------------------------------------------+ picture ------- Every site may contain a number of pictures. Filenames for each site must be unique. The file type (e.g. png, jpg, tiff) is determined by the filename extension of the *filename* field. +-------------------+----------------+---------------------------------------------+ | field | datatype | description | +===================+================+=============================================+ | **picture\_id** | integer (pk) | | +-------------------+----------------+---------------------------------------------+ | **site\_id** | integer (fk) | referring to the site | +-------------------+----------------+---------------------------------------------+ | **filename** | string | | +-------------------+----------------+---------------------------------------------+ | description | string | additional information about the picture | +-------------------+----------------+---------------------------------------------+ | data | bytea | contains the (binary) content of the file | +-------------------+----------------+---------------------------------------------+ | timestamp | date\_time | creation date of the picture | +-------------------+----------------+---------------------------------------------+ source ------ Description of an online data-generating device. A (data-) source is a specific measuring equipment, i.e. an instance of the *source_type* class. Every measurement (signal) is produced by a source. The name of a source must be unique. **Note:** For devices that change location frequently, it may be best to not list the *site_id* in this table and only specify *site_id* in the signal table. +------------------------+----------------+--------------------------------------------------------------------------------+ | field | datatype | description | +========================+================+================================================================================+ | **source\_id** | integer (pk) | | +------------------------+----------------+--------------------------------------------------------------------------------+ | **source\_type\_id** | integer (fk) | source category | +------------------------+----------------+--------------------------------------------------------------------------------+ | **name** | string (uq) | short name for device, e.g. plc_grey_v1.0, concube_grey1) | +------------------------+----------------+--------------------------------------------------------------------------------+ | description | string | description of device (e.g., S::CAN con::cube) | +------------------------+----------------+--------------------------------------------------------------------------------+ | serial | string | serial number (unique, if available) | +------------------------+----------------+--------------------------------------------------------------------------------+ | manufacturer | string | company which produced that equipment | +------------------------+----------------+--------------------------------------------------------------------------------+ source\_type ------------ Categorization of a given source. +----------------------+--------------+--------------------------------------------------------------------+ | field | datatype | description | +======================+==============+====================================================================+ | **source\_type\_id** | integer (pk) | | +----------------------+--------------+--------------------------------------------------------------------+ | **name** | string (uq) | short name for device type (e.g., plc_grey) | +----------------------+--------------+--------------------------------------------------------------------+ | description | string | device type description (e.g., WAGO programmable logic controller) | +----------------------+--------------+--------------------------------------------------------------------+ special\_value\_definition -------------------------- Certain source types produce categorical data, such as «dry», «wet», «n/a» and so on. This table is used *to correlate categorical data and numeric values* for a given source type. For example the numerical value ``1`` might encode the state «dry». +--------------------------------------+----------------+---------------------------------------+ | field | datatype | description | +======================================+================+=======================================+ | **special\_value\_definition\_id** | integer (pk) | | +--------------------------------------+----------------+---------------------------------------+ | **source\_type\_id** | integer (fk) | source\_type | +--------------------------------------+----------------+---------------------------------------+ | description | string | | +--------------------------------------+----------------+---------------------------------------+ | **categorical\_value** | string | the categorical value | +--------------------------------------+----------------+---------------------------------------+ | **numerical\_value** | float | the numeric value it is mapped to | +--------------------------------------+----------------+---------------------------------------+ variables --------- Every value in the **signal** table is connected to a specific variable which describes and defines its unit. +-------------------+---------------+---------------------------------------------------------------------------------------+ | field | datatype | description | +===================+===============+=======================================================================================+ | **variable\_id** | integer (pk) | | +-------------------+---------------+---------------------------------------------------------------------------------------+ | **name** | string (uq) | short name for variable | | | | (e.g. lab_cod, actuator_bf, sensor_bq_cond) | +-------------------+---------------+---------------------------------------------------------------------------------------+ | description | string | explanation of measurement; include reference to SOP(s) where available | +-------------------+---------------+---------------------------------------------------------------------------------------+ | **unit** | string | unit of measurement (e.g. “m3 h-1”); | | | | use *no unit* when the variable is not a continuous scale (e.g., pump on/off ); | | | | use *1* for dimensionless variables (e.g., pH) | +-------------------+---------------+---------------------------------------------------------------------------------------+ comment ------- There are two types of signal annotations: comments and quality. A comment is an arbitrary text, where as quality annotations have a controlled vocabulary. A signal may contain more than one comment. **Note:** Note that current implementation uses an associative table to link each comment (*comment_id*) to a signal (*signal_id*). This allows many-to-one associations (multiple comments for same signal by different people) as well as one-to-many associations (one comment for multiple measurements). +-------------------+-----------------+----------------------------------------------------+ | field | datatype | description | +===================+=================+====================================================+ | **comment\_id** | integer (pk) | | +-------------------+-----------------+----------------------------------------------------+ | **signal\_id** | integer (fk) | (via association table) | +-------------------+-----------------+----------------------------------------------------+ | **text** | string | the comment itself | +-------------------+-----------------+----------------------------------------------------+ | **timestamp** | date\_time | the time the comment was added | +-------------------+-----------------+----------------------------------------------------+ | **person\_id** | integer (fk) | identified of the author who added the annotation | +-------------------+-----------------+----------------------------------------------------+ signal\_quality --------------- An online sensor measurement may contain more than one quality flag (but not the same quality flag twice). The combination of *signal\_id*, *quality\_id*, and *person_id* must be unique. **Note:** Note that current implementation uses an associative table to link each label (*signal\_quality\_id*) to a signal (*signal_id*). This allows many-to-one associations (multiple labels for same signal by different people) as well as one-to-many associations (one label for multiple measurements). +-------------------------------+----------------+-----------------------------------------+ | field | datatype | description | +===============================+================+=========================================+ | **signal\_quality\_id** | integer (pk) | | +-------------------------------+----------------+-----------------------------------------+ | **quality\_definition\_id** | integer (fk) | | +-------------------------------+----------------+-----------------------------------------+ | **signal\_id** | integer (fk) | (via association table) | +-------------------------------+----------------+-----------------------------------------+ | **timestamp** | date\_time | date when annotation was added | +-------------------------------+----------------+-----------------------------------------+ quality ------- Measurements contain errors. This table holds the controlled vocabulary mentioned above. As some quality flags may be assigned automatically, the *method* field indicates the origin of such a quality entry. +-------------------+----------------+--------------------------------------------------+ | field | datatype | description | +===================+================+==================================================+ | **quality\_id** | integer (pk) | | +-------------------+----------------+--------------------------------------------------+ | **flag** | string (uq) | a textual description of *quality_id* | +-------------------+----------------+--------------------------------------------------+ | method | string | a description how the quality flag is generated. | +-------------------+----------------+--------------------------------------------------+ | **person\_id** | integer (fk) | author who added the annotation | +-------------------+----------------+--------------------------------------------------+ persons ------- Information about the staff involved in sampling, laboratory analysis, data quality checks, and commenting. +-------------------+----------------+--------------------------------------------------+ | field | datatype | description | +===================+================+==================================================+ | **person\_id** | integer (pk) | | +-------------------+----------------+--------------------------------------------------+ | **abbrev** | string (uq) | unique identifier for the staff member (alias) | +-------------------+----------------+--------------------------------------------------+ | **name** | string | name of staff member | +-------------------+----------------+--------------------------------------------------+ | **department** | string | department acronym | +-------------------+----------------+--------------------------------------------------+ project ------- Information about projects. **Note:** This is considered legacy code obtained from the original UWO datapool software. Since the entries in this table are not linked anywhere, filling this table has lowest priority. +-------------------+----------------+--------------------------------------------------+ | field | datatype | description | +===================+================+==================================================+ | **project\_id** | integer (pk) | | +-------------------+----------------+--------------------------------------------------+ | **abbrev** | string (uq) | unique identifier for the project | +-------------------+----------------+--------------------------------------------------+ | **title** | string | project title | +-------------------+----------------+--------------------------------------------------+ | description | string | project goals | +-------------------+----------------+--------------------------------------------------+