Instructions for data users¶
A typical data user is an Eawag research staff member or student.
All data is store in a PostgrSQL database so that arbitrary queries can be performed. The figure below shows the database layout.
Connect to the WHDP¶
First-time use ~
- Contact the WaterHub project coordinator to request access
- Obtain the password via the WaterHub project coordinator
- Take note of the following information
- Host: eng-whdp1.eawag.wroot.emp-eaw.ch Port: 22 Name of the database: whdp Default user name: whdp_user
- Continue with steps as outlined below
Connect with terminal¶
You can connect directly to the database via psql. However, it is more convenient to load the data required directly in the environment used for further analysis:
Connect within R¶
With the RPostgreSQL package data can be loaded directly into R.
However, for Eawag internal usage the DatapoolR package should be preferred. It connects automatically to the database and provides convenient helper functions.
Example SQL queries¶
The SQL language may look cumbersome at first. However, it gives a lot of flexibility and allows to express even very complex queries. This SQL tutorial is a helpful reference.
Also note that the DatapoolR package provides functions to simplify common queries.
List all data sources:
SELECT srctype.name, src.name, src.serial, srctype.description, src.description
FROM source_type AS srctype, source AS src
WHERE src.source_type_id = srctype.source_type_id;
List all sites and check how many images a site has:
SELECT name, coord_x, coord_y, coord_z, street,
postcode, COUNT(picture.filename), site.description
FROM site
LEFT JOIN picture ON site.site_id=picture.site_id
GROUP BY site.site_id;
Get all signals measured at site site_A
within a given time interval:
SELECT signal.timestamp, value, unit, parameter.name, source_type.name, source.name
FROM signal
INNER JOIN site ON signal.site_id = site.site_id
INNER JOIN parameter ON signal.parameter_id = parameter.parameter_id
INNER JOIN source ON signal.source_id = source.source_id
INNER JOIN source_type ON source.source_type_id = source_type.source_type_id
WHERE site.name = site_A AND
?tmin::timestamp <= "2017-01-01 00:12:00"::timestamp AND
signal.timestamp <= "2017-01-01 00:18:00"::timestamp;