Instructions for data users

A typical data user is an Eawag research staff member or student.

All data is store in a PostgrSQL database so that arbitrary queries can be performed. The figure below shows the database layout.

_images/DataModel_WHDP.svg

Connect to the WHDP

First-time use ~

  1. Contact the WaterHub project coordinator to request access
  2. Obtain the password via the WaterHub project coordinator
  3. Take note of the following information
    Host: eng-whdp1.eawag.wroot.emp-eaw.ch Port: 22 Name of the database: whdp Default user name: whdp_user
  4. Continue with steps as outlined below

Connect with terminal

You can connect directly to the database via psql. However, it is more convenient to load the data required directly in the environment used for further analysis:

Connect within R

With the RPostgreSQL package data can be loaded directly into R.

However, for Eawag internal usage the DatapoolR package should be preferred. It connects automatically to the database and provides convenient helper functions.

Connect within Python

Different options exist, psycopg2 is widely used.

Example SQL queries

The SQL language may look cumbersome at first. However, it gives a lot of flexibility and allows to express even very complex queries. This SQL tutorial is a helpful reference.

Also note that the DatapoolR package provides functions to simplify common queries.

List all data sources:

SELECT srctype.name, src.name, src.serial, srctype.description, src.description
FROM source_type AS srctype, source AS src
WHERE src.source_type_id = srctype.source_type_id;

List all sites and check how many images a site has:

SELECT name, coord_x, coord_y, coord_z, street,
  postcode, COUNT(picture.filename), site.description
FROM site
  LEFT JOIN picture ON site.site_id=picture.site_id
GROUP BY site.site_id;

Get all signals measured at site site_A within a given time interval:

SELECT signal.timestamp, value, unit, parameter.name, source_type.name, source.name
FROM signal
  INNER JOIN site ON signal.site_id = site.site_id
  INNER JOIN parameter ON signal.parameter_id = parameter.parameter_id
  INNER JOIN source ON signal.source_id = source.source_id
  INNER JOIN source_type ON source.source_type_id = source_type.source_type_id
WHERE site.name = site_A AND
  ?tmin::timestamp <= "2017-01-01 00:12:00"::timestamp AND
  signal.timestamp <= "2017-01-01 00:18:00"::timestamp;