User Tools

Site Tools


postgresql

Installation & Setup

For Ubuntu 12.04:

sudo apt-get install postgresql-9.1 pgadmin3

pgAdmin III is a strongly recommended GUI for twiddling with your PostgreSQL. However, here are the first steps with the command line right after installing.

Server Configuration

Enable remote TCP/IP connections

Allow TCP/IP connections from all IPs (default = localhost only) by adding this line to /etc/postgresql/9.1/main/postgresql.conf:

listen_addresses = '*'

Then allow password authentication for all users from all IPs by adding this line to /etc/postgresql/9.1/main/pg_hba.conf:

host    all         all         0.0.0.0/0               md5

To further twiddle with more restrictive client authentication settings have a look at pg_hba.conf.

User Management

First steps with a user of your choosing

# log in with user created by default (which has no password set)
sudo -u postgres psql
 
# create a user with your linux user name and log in
# (in Ubuntu the postgres user that matches your linux user name
# can by default log in without supplying a password) 
sudo -u postgres createuser --superuser $USER
psql -d postgres
# set a password
\password
 
# make an existing user superuser (or remove superuser rights)
sudo -u postgres psql -c "ALTER USER theuser WITH SUPERUSER;"
sudo -u postgres psql -c "ALTER USER theuser WITH NOSUPERUSER;"
 
# create a restricted user with password (long and short version)
sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username
sudo -u postgres createuser -SDRP username
psql -U username -d postgres
# drop a user
sudo -u postgres dropuser username

Setting a password for user postgres

\password postgres

Create Readonly User

1. Create the user

sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username

2. Grant readonly permissions

This is the tedious part (in postgres versions pre 9.0). We need to grant the user the permission to

  • connect to the DB
  • use the schema
  • perform a select on everay object in the schema
GRANT CONNECT ON DATABASE mydb TO username;
GRANT USAGE ON SCHEMA public TO username;
 
-- Generate GRANT SELECT statements for all tables/views in the schema
SELECT 'GRANT SELECT ON public.' || relname || ' TO username;'
FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
WHERE nspname = 'public' AND relkind IN ('r', 'v');

Slightly different approach

Creating a true read-only user is a real pain in postgres. Just revoking privileges from a user still allows him to create database objects which he owns. You need to revoke the CREATE and USAGE privileges from the SCHEMA public and the GROUP public.

-- Revoke default permissions from public group:
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
REVOKE USAGE ON SCHEMA public FROM PUBLIC;
 
-- Add back permissions for the users allowed to create stuff
GRANT CREATE ON SCHEMA public TO rw_user;
GRANT USAGE ON SCHEMA public TO rw_user;
 
-- Add SELECT permissions for read-only user
GRANT USAGE ON SCHEMA public TO readonly_user;
GRANT SELECT ON my_table TO readonly_user;

see http://www.ruizs.org/archives/89 for details.

Grant on all tables

With a single GRANT command permission for all tables in a schema can be changed:

GRANT ALL ON ALL TABLES IN SCHEMA public TO user;

Change owner on all tables

CLI: Unfortunately this can only be achieved via scripting, but we don't have to reinvent the wheel: https://gist.github.com/jirutka/afa3ce62b1430abf7572

GUI: Or in pgadminIII right-click on the schema, go to properties, and set the owner to your liking.

PSQL meta-commands

After you logged into PostgreSQL it's good to know some often-used meta-commands (read more in man psql):

-- list all users
\du
 
-- list all schemas
\dn
 
-- list tables
\d
\d+  -- (also show comments)
\d tablename -- show details of a table
 
-- list all tables / tables of a certain schema
\dt *.* -- list all tables in DB
\dt schemaname.* -- list all tables in a certain schema
 
-- list all views
\dv *.* -- list all views in DB
 
-- list all databases on the host
\l
 
-- connect to a database
\c dbname
 
-- execute a command on the Linux shell (as-is)
\! command
 
-- quit
\q

For more meta-commands check out the documentation.

List and Disconnect Active Sessions

-- list active connections (and get procpids)
SELECT * FROM pg_stat_activity;
 
-- cancel a process but leave the user's session open
SELECT pg_cancel_backend( <procpid> );
 
-- terminate a user's query and session
SELECT pg_terminate_backend( <procpid> );

DB Management

Show DB Sizes

SELECT pg_database.datname,  
       pg_size_pretty(pg_database_size(pg_database.datname)) AS SIZE  
  FROM pg_database 
  ORDER BY pg_database_size(pg_database.datname) DESC;

Find large databases

SELECT d.datname AS Name,  pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
    CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
        THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
        ELSE 'No Access'
    END AS SIZE
FROM pg_catalog.pg_database d
    ORDER BY
    CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
        THEN pg_catalog.pg_database_size(d.datname)
        ELSE NULL
    END DESC -- nulls first
    LIMIT 20

see also: https://wiki.postgresql.org/wiki/Disk_Usage

Find large tables (in one DB)

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_relation_size(C.oid)) AS "size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
  ORDER BY pg_relation_size(C.oid) DESC
  LIMIT 20;

Show Table size

-- size of data
SELECT pg_size_pretty(pg_total_relation_size('<TABLE_NAME>'));
 
-- size of indices
SELECT c2.relname, c2.relpages * 8 / 1024 AS "MB"
FROM pg_class c, pg_class c2, pg_index i
WHERE c.relname = '<TABLE_NAME>' AND
      c.oid = i.indrelid AND
      c2.oid = i.indexrelid
ORDER BY c2.relname;

SQL

Date and Time

Add 15-min interval to time

   SELECT DATE(TIMESTAMP) + (interval_id) * TIME '00:15:00' FROM ...

Extract date-parts

   SELECT EXTRACT(MINUTE FROM TIMESTAMP '2008-08-01 12:14:13');

see http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

Extract TIME from TIMESTAMP

SELECT "time"(mytimestamp) FROM TABLE; 

Convert Timestamp to Seconds

To convert a timestamp / interval to seconds (in case of timestamp seonds since 01-01-1970)

   SELECT EXTRACT (EPOCH FROM  timestamp_or_interval );

Creating Timestamps from Text Fields with missing leading Zeros

If your data looks like this:

“2011/3/15 ”;“6:17:3 ”

You can construct timestamps using the “FM” (fill mode) modifier:

   SELECT to_timestamp(utc_date|' '|utc_time,'YYYY/FMMM/FMDD FMHH24:FMMI:FMSS' );

Update parts of date

UPDATE datetesting SET d = to_timestamp( to_char (d,'YYYY-MM-DD 12:00:00'),'YYYY-MM-DD HH24:MI:SS' ) ;

Time Zone Information

Boy that was confusing so here it goes…

Converting from string to TIMESTAMP

SELECT '2013-05-18 12:30:00'::TIMESTAMP WITH TIME ZONE ; 
-- 2013-05-18 12:30:00+02         no timezone info provided. Postgres simply assumes the current TIME ZONE
 
SELECT'2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE AS with_tz, '2013-05-18 12:30:00+05'::TIMESTAMP WITHOUT TIME ZONE AS without_tz;
--        with_tz         |     without_tz      
------------------------+---------------------
-- 2013-05-18 14:30:00+02 | 2013-05-18 12:30:00

Notes

  • When no TIME ZONE info is provided, but Postgres wants TIME ZONE info, then it simply assumes the current TIME ZONE of the session.
  • Timestamps WITH TIME ZONE are displayed in the TIME ZONE of the current session. Unless explicitly requested otherwise (see later).
  • The TIME ZONE part of a string will simply be ignored if you are converting to TIMESTAMP WITHOUT TIME ZONE
    • this also applies when you insert to a column of type TIMESTAMP WITHOUT TIME ZONE
SELECT '2013-05-18 12:30:00+06'::TIMESTAMP;
-- 2013-05-18 12:30:00         --> the  +06 has been completely ignored
 
 
SELECT '2013-05-18 12:30:00+06'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 08:30:00+02      -->   6:30 UTC --> 8:30CET (+02)

Explicitly setting the TIME ZONE

SET TIME ZONE 'UTC';
SELECT '2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 12:30:00+00
 
 
 
SET TIME ZONE 'CET';
SELECT '2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 14:30:00+02

Converting from epoch values to TIMESTAMP

SELECT to_timestamp(1358749391);
 
--> "2013-01-21 07:23:11+01"

Notes:

  • Postgres converts the epoch value to TIMESTAMP WITH TIME ZONE. Thats why the result is displayed in the local time zone (CEST, winter time).

Discard TIME ZONE Information

SELECT to_timestamp(1358749391)::TIMESTAMP WITHOUT TIME zone;
--> "2013-01-21 07:23:11"

Notes

  • does the same thing as before, but also discards the time zone information and returns a TIMESTAMP WITHOUT TIME ZONE

Convert to other TIME ZONE

SELECT
  to_timestamp(1358749391) at TIME zone 'utc'; 
 
--> "2013-01-21 06:23:11"

Notes

Useful Functions

Aggregate Concat

  CREATE AGGREGATE textcat_all(
    basetype    = text,
    sfunc       = textcat,
    stype       = text,
    initcond    = ''
  );

Usage:

   SELECT company_id, textcat_all(employee | ', ') FROM mytable GROUP BY company_id;

row_number

Is e.g. useful if you need an OID where none is present:

SELECT  ROW_NUMBER() OVER (ORDER BY NULL) AS oid FROM TABLE;

the “order by” clause is required, but in this special case we do not require any particular order, so we use NULL here.

Round time to interval

CREATE OR REPLACE FUNCTION round_time_to_interval(TIMESTAMP) 
RETURNS TIMESTAMP AS $$ 
  SELECT date_trunc('hour', $1) + INTERVAL '15 min' * FLOOR(date_part('minute', $1) / 15.0) 
$$ LANGUAGE SQL;

Rows to Columns

This is an obscure one, but it is very handy if you must transform a number of rows in a number of columns. This can be achieved with the crosstab() function provided by the tablefunc package of postgres-contrib. So first we need to install postgresql-contrib and enable the function in our db:

sudo apt-get install postgresql-contrib
sudo service postgresql-8.4 restart
cd /usr/share/postgresql/8.4/contrib
psql -U username -d dbname -h localhost -f tablefunc.sql

Next have a look at the documentation of crosstab here: http://www.postgresql.org/docs/8.4/static/tablefunc.html

Here is an example of how to aggregate a traffic state from rows to columns (for a number of known roads).

SELECT * FROM
crosstab(
	'SELECT valid_from, road_id, q from ait.its_traffic_state WHERE road_id IN (23005242,23008577) order by 1',
	'SELECT distinct road_id FROM ait.its_traffic_state WHERE road_id IN (23005242,23008577) order by 1'
 
) 
AS its_traffic_state(TIME TIMESTAMP, road1 INTEGER, road2 INTEGER);

…for a higher number of columns you may need a script to generate the column_name/datatype row at the end of the query (if anyone knows a better solution plz tell).

Updating Sequence Counters

When data is inserted into the database with a specified value for the autoincrementing column the sequence is not touched and must be updated manually. This is the case when e.g. importing dumps using COPY, or a simple INSERT statement with the autoincrementing column set to a value.

While the existing values (e.g. id 1) are reused the sequence counters stay the same (i.e. start with 0): this prevents new data records from being created as the new object would have a duplicate object id (1 == 1).

The fix:

SELECT SETVAL('sequence_name', MAX(auto_incrementing_column)) FROM TABLE_NAME;

SQL Hints

Timeseries completion

Assume you have incomplete timeseries in your database but want to select a complete series (pad the missing values with a Zero). The following query should take care of the padding and replacing missing values by 0-values:

SELECT intervals.intervals AS INTERVAL, COALESCE(q,0)
FROM timeseries t 
RIGHT OUTER JOIN generate_series(0,24) intervals
ON t.interval = intervals.intervals ; 

Timeseries creation

To create a series of timestamps

select * from
generate_series('2013-11-01'::timestamp, '2013-11-02'::timestamp, '15 minutes')

Convert Array to Rows

This might be useful to replace WHERE x IN ('a','b','c') statements (which are notoriously slow in postgres) with JOINs.

SELECT i, (ARRAY['a','b','c'])[i] 
FROM generate_series(1,array_upper(ARRAY['a','b','c'],1)) i;
 
 
 i | array
---+-------
 1 | a
 2 | b
 3 | c
 
(3 ROWS)

It is not very elegant though especially as you need to specify the array twice (or know the size of the array, then you can replace the array_upper statement with the size). If someone has a better solution I would be interested :)

Stored Procedures

PL/Python

Aggreagte Functions in PL/Python

PL/Java

Installing

Download the most recent version from http://pgfoundry.org/frs/?group_id=1000038 . Most of the instructions in this sections are taken from the README file in the tarball.

  • unzip pljava to /opt/pljava
  • sudo vi /etc/postgresql/8.3/main/postgresql.conf
SET the following parameters:
    dynamic_library_path = '$libdir:/opt/pljava'
    log_min_messages = info   #optional
    custom_variable_classes = 'pljava'

ADD the following parameters:
    pljava.classpath = '/opt/pljava/pljava.jar'

* update ldconfig so postgres finds the reqired shared objects. Create a file /etc/ld.so.conf.d/pljava.conf containing

. /usr/lib/jvm/java-6-sun/jre/lib/amd64 /usr/lib/jvm/java-6-sun/jre/lib/amd64/server /usr/lib/jvm/java-6-sun/jre/lib/amd64/native_threads

  • run
  sudo ldconfig

* install pl/java

    sudo su postgres
    psql -f install.sql

PL/R

Installing

* Install the postgres-plr package

. sudo aptitude install postgresql-8.3-plr

   wget http://de.archive.ubuntu.com/ubuntu/pool/universe/p/plr/postgresql-8.3-plr_8.3.0.6-3_amd64.deb
   dpkg -i postgresql-8.3-plr_8.3.0.6-3_amd64.deb

* Install PL/R in the database of choice

   psql <dbname> < /usr/share/postgresql/8.3/plr.sql

* Test

  CREATE OR REPLACE FUNCTION r_max (INTEGER, INTEGER) RETURNS INTEGER AS '
    if (arg1 > arg2)
       return(arg1)
    else
       return(arg2)
  ' LANGUAGE 'plr' STRICT;
 
  SELECT r_max(10,20);

Using

Have a look at the guide here http://www.joeconway.com/plr/doc/index.html

Useful commands

Inserting from CSV Files

To efficiently insert a lot of records from csv-files into the table “tolldata” use the http://www.postgresql.org/docs/9.1/static/sql-copy.html COPY command:

cat *.csv | psql -c "COPY tolldata FROM stdin WITH DELIMITER ';'" -d toll_tst -U toll -h localhost

Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. COPY naming a file is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.

Outputting in CSV Format

If you need to export an entire table

COPY table TO '/tmp/table.csv' DELIMITER ',' CSV HEADER;

If query is more complicated:

--- toggle to unaligned output mode
\a
--- set ; as field delimiter
\f ;
--- disable the footer (row count)
\pset footer off
 
SELECT * FROM TABLE \g file.csv

Piping query result to file

In psql command line:

SELECT * FROM TABLE \g filename

(http://archives.postgresql.org/pgsql-general/2001-12/msg00687.php see more details)

Show Tables

  \dt
OR
  \dt my_schema

drop all tables in a schema

 SELECT 'drop table '|schemaname|'.'|tablename|';' FROM pg_tables WHERE schemaname IN ('public') ORDER BY schemaname, tablename;

drop all sequences in a schema

 SELECT 'drop sequence public.'|sequence_name|';' FROM information_schema.sequences WHERE sequence_schema IN ('public') ORDER BY  sequence_name;

Grant on all tables

 SELECT 'grant select, insert, update, delete on ' | tablename | ' to muh;' FROM pg_tables WHERE schemaname = 'public';

Backups

There are many options available on how to back up your databases. Postgres already contains some tools itself:

  • pg_dump: per-database dumping of db contents
  • pg_basebackup: binary backup of whole postgres clusters (per db backup is not possible)

However, especially for larger databases (hundreds of GB) more powerful third-party tools featuring parallelism and incremental backups are recommended:

pg_dump

It makes sense to use the postgres user

sudo su postgres

There are two important variants of making backups:

Create and restore a backup using plain-text sql: pg_dump and psql

pg_dump -U user source_db -f db_backup.sql
 
# restore
createdb target_db
psql -U user -d target_db -f db_backup.sql

Create and restore a backup using the compressed 'custom' format: pg_dump and pg_restore

pg_dump -Fc -U user source_db > db_backup.dmp
 
# restore (requires an empty database)
createdb target_db
pg_restore -Fc -d target_db db_backup.dmp

In addition we can use gzip (and zcat) to further compress the compressed 'custom' format and give the dump a nice name:

pg_dump -Fc -U user source_db | gzip > "$(date +%Y-%m-%d)_db_backup.dmp.gz"
zcat YYYY-MM-DD_db_backup.dmp.gz | pg_restore -Fc -d target_db

Or, when creating the backup via cronjob in /etc/cron.d:

  0   5   *   *   *   postgres  pg_dump -Fc dbname | gzip > /path/to/db_backup_`date +\%Y-\%m-\%dT\%H_\%M`.dmp.gz
# ┬   ┬   ┬   ┬   ┬   ┬         ┬
# │   │   │   │   |   │         └ command
# │   │   │   │   |   └───── user executing the command
# │   │   │   │   └───────── day of week (0 - 7) (0 to 7 are Sunday to Sunday, or use names)
# │   │   │   └───────────── month (1 - 12)
# │   │   └───────────────── day of month (1 - 31)
# │   └───────────────────── hour (0 - 23)
# └───────────────────────── min (0 - 59)

pg_basebackup

see e.g. this tutorial

pgBackRest

A typical use of pgBackRest would be to install a backup server that backs up several DB servers. For now we only use a simpler approach: pgBackRest backing up on localhost. For more see the excellent user guide or this good introduction blog post.

Install from postgres repo:

sudo add-apt-repository "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main"
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update
sudo apt install pgbackrest

Incremental Backups on Localhost

In addition to the global section, where e.g. retention is configured, you must add a so-called “stanza” for each cluster you want to back up. Make sure the repo1-path exists and is owned by user postgres.

/etc/pgbackrest/pgbackrest.conf
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=5
repo1-retention-diff=1

[STANZANAME]
pg1-path=/var/lib/postgresql/9.5/CLUSTERNAME

Then adjust the postgres config to allow for proper backups

/etc/postgresql/9.5/main/postgresql.conf
listen_addresses = '*'

# WRITE AHEAD LOG
wal_level = hot_standby 
archive_mode = on 
archive_command = 'pgbackrest --stanza=STANZANAME archive-push %p'

# REPLICATION
max_wal_senders = 3

# ERROR REPORTING AND LOGGING
log_line_prefix = ''

Reload the cluster (or the complete postgres server) to activate changes

sudo systemctl restart postgresql@9.5-CLUSTERNAME

Create and test the pgbackrest stanza

sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=info stanza-create
sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=info check

Make backups manually (e.g. from a cronjob)

sudo -u postgres pgbackrest --stanza=STANZANAME --type=full --log-level-console=detail backup
sudo -u postgres pgbackrest --stanza=STANZANAME --type=incr --log-level-console=detail backup

Make regular backups via cronjob

/etc/cron.d/pgbackrest
#minute hour day-of-month month day-of-week user command
0 4 * * 0   postgres pgbackrest --type=full --stanza=STANZANAME backup
0 4 * * 1-6 postgres pgbackrest --type=diff --stanza=STANZANAME backup

Restore latest backup. The cluster must be stopped and the cluster directory must exist and be empty.

sudo systemctl stop postgresql@9.5-CLUSTERNAME
sudo mv /var/lib/postgresql/9.5/CLUSTERNAME /opt/backups/CLUSTERNAME_backup
sudo -u postgres mkdir /var/lib/postgresql/9.5/CLUSTERNAME
sudo chmod 0700 /var/lib/postgresql/9.5/CLUSTERNAME
sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=detail restore
sudo systemctl start postgresql@9.5-CLUSTERNAME

To restore a specific backup use --set with the name of a folder in /var/lib/pgbackrest/backup/STANZANAME, e.g.:

sudo -u postgres pgbackrest --set=20180724-144511F --stanza=STANZANAME --log-level-console=detail restore

You can also copy the contents of /var/lib/pgbackrest to a different device and restore it there.

Log files can be found in /var/log/pgbackrest.

Performance Tuning

References

postgresql.txt · Last modified: 2020/07/15 09:42 by mstraub