Instructions for data users =========================== A typical data user is an Eawag research staff member or student. All data is store in a PostgrSQL database so that arbitrary queries can be performed. The figure below shows the database layout. .. image:: ./graphics/DataModel_WHDP.svg Connect to the WHDP ------------------- First-time use ~ 1. Contact the WaterHub project coordinator to request access 2. Obtain the password via the WaterHub project coordinator 3. Take note of the following information Host: eng-whdp1.eawag.wroot.emp-eaw.ch Port: 22 Name of the database: whdp Default user name: whdp_user 4. Continue with steps as outlined below Connect with terminal ~~~~~~~~~~~~~~~~~~~~~ You can connect directly to the database via `psql `_. However, it is more convenient to load the data required directly in the environment used for further analysis: Connect within R ~~~~~~~~~~~~~~~~ With the `RPostgreSQL package `_ data can be loaded directly into R. However, for *Eawag internal* usage the `DatapoolR package `_ should be preferred. It connects automatically to the database and provides convenient helper functions. Connect within Python ~~~~~~~~~~~~~~~~~~~~~ Different options exist, `psycopg2 `_ is widely used. Example SQL queries ------------------- The SQL language may look cumbersome at first. However, it gives a lot of flexibility and allows to express even very complex queries. This `SQL tutorial `_ is a helpful reference. Also note that the `DatapoolR package `_ provides functions to simplify common queries. List all data sources: .. code-block:: sql SELECT srctype.name, src.name, src.serial, srctype.description, src.description FROM source_type AS srctype, source AS src WHERE src.source_type_id = srctype.source_type_id; List all sites and check how many images a site has: .. code-block:: sql SELECT name, coord_x, coord_y, coord_z, street, postcode, COUNT(picture.filename), site.description FROM site LEFT JOIN picture ON site.site_id=picture.site_id GROUP BY site.site_id; Get all signals measured at site ``site_A`` within a given time interval: .. code-block:: sql SELECT signal.timestamp, value, unit, parameter.name, source_type.name, source.name FROM signal INNER JOIN site ON signal.site_id = site.site_id INNER JOIN parameter ON signal.parameter_id = parameter.parameter_id INNER JOIN source ON signal.source_id = source.source_id INNER JOIN source_type ON source.source_type_id = source_type.source_type_id WHERE site.name = site_A AND ?tmin::timestamp <= "2017-01-01 00:12:00"::timestamp AND signal.timestamp <= "2017-01-01 00:18:00"::timestamp;