Instructions for data users¶
Data users are scientists analysing data from the water hub data pool.
All data is store in a PostgrSQL database so that arbitrary queries can be performed. The figure below shows the database layout.
Connect to water hubgdata pool¶
To connect with the database you need the following information from your admin:
- The host, e.g. “a.server.com”
- The port of the database, e.g. 5432
- The name of the database, e.g. “whdp”
- A database user name, e.g “data_user”
- The database password
You can connect directly to the database via psql. However, it is more convenient to load the data required directly in the environment used for further analysis:
R¶
With the RPostgreSQL package data can be loaded directly into R.
However, for Eawag internal usage the DatapoolR package should be preferred. It connects automatically to the database and provides convenient helper functions.
Example SQL queries¶
The SQL language may look cumbersome at first. However, it gives a lot of flexibility and allows to express even very complex queries. This SQL tutorial is a helpful reference.
Also note that the DatapoolR package provides functions to simplify common queries.
List all data sources:
SELECT srctype.name, src.name, src.serial, srctype.description, src.description
FROM source_type AS srctype, source AS src
WHERE src.source_type_id = srctype.source_type_id;
List all sites and check how many images a site has:
SELECT name, coord_x, coord_y, coord_z, street,
postcode, COUNT(picture.filename), site.description
FROM site
LEFT JOIN picture ON site.site_id=picture.site_id
GROUP BY site.site_id;
Get all signals measured at site site_A
within a given time interval:
SELECT signal.timestamp, value, unit, parameter.name, source_type.name, source.name
FROM signal
INNER JOIN site ON signal.site_id = site.site_id
INNER JOIN parameter ON signal.parameter_id = parameter.parameter_id
INNER JOIN source ON signal.source_id = source.source_id
INNER JOIN source_type ON source.source_type_id = source_type.source_type_id
WHERE site.name = site_A AND
?tmin::timestamp <= "2017-01-01 00:12:00"::timestamp AND
signal.timestamp <= "2017-01-01 00:18:00"::timestamp;