====== Installation & Setup ====== For Ubuntu 12.04: sudo apt-get install postgresql-9.1 pgadmin3 pgAdmin III is a strongly recommended GUI for twiddling with your PostgreSQL. However, here are the first steps with the command line right after installing. ===== Server Configuration ===== ==== Enable remote TCP/IP connections ==== Allow TCP/IP connections from all IPs (default = localhost only) by adding this line to ''/etc/postgresql/9.1/main/postgresql.conf'': listen_addresses = '*' Then allow password authentication for all users from all IPs by adding this line to ''/etc/postgresql/9.1/main/pg_hba.conf'': host all all 0.0.0.0/0 md5 To further twiddle with more restrictive client authentication settings have a look at [[http://www.postgresql.org/docs/9.1/static/auth-pg-hba-conf.html|pg_hba.conf]]. ===== User Management ===== First steps with a user of your choosing # log in with user created by default (which has no password set) sudo -u postgres psql # create a user with your linux user name and log in # (in Ubuntu the postgres user that matches your linux user name # can by default log in without supplying a password) sudo -u postgres createuser --superuser $USER psql -d postgres # set a password \password # make an existing user superuser (or remove superuser rights) sudo -u postgres psql -c "ALTER USER theuser WITH SUPERUSER;" sudo -u postgres psql -c "ALTER USER theuser WITH NOSUPERUSER;" # create a restricted user with password (long and short version) sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username sudo -u postgres createuser -SDRP username psql -U username -d postgres # drop a user sudo -u postgres dropuser username ==== Setting a password for user postgres ==== \password postgres ==== Create Readonly User ==== **1. Create the user** sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username **2. Grant readonly permissions** This is the tedious part (in postgres versions pre 9.0). We need to grant the user the permission to * connect to the DB * use the schema * perform a select on everay object in the schema GRANT CONNECT ON DATABASE mydb TO username; GRANT USAGE ON SCHEMA public TO username; -- Generate GRANT SELECT statements for all tables/views in the schema SELECT 'GRANT SELECT ON public.' || relname || ' TO username;' FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace WHERE nspname = 'public' AND relkind IN ('r', 'v'); === Slightly different approach === Creating a true read-only user is a real pain in postgres. Just revoking privileges from a user still allows him to create database objects which he owns. You need to revoke the CREATE and USAGE privileges from the SCHEMA public and the GROUP public. -- Revoke default permissions from public group: REVOKE CREATE ON SCHEMA public FROM PUBLIC; REVOKE USAGE ON SCHEMA public FROM PUBLIC; -- Add back permissions for the users allowed to create stuff GRANT CREATE ON SCHEMA public TO rw_user; GRANT USAGE ON SCHEMA public TO rw_user; -- Add SELECT permissions for read-only user GRANT USAGE ON SCHEMA public to readonly_user; GRANT SELECT ON my_table to readonly_user; see http://www.ruizs.org/archives/89 for details. ==== Grant on all tables ==== With a single [[http://www.postgresql.org/docs/current/static/sql-grant.html|GRANT command]] permission for all tables in a schema can be changed: GRANT ALL ON ALL TABLES IN SCHEMA public TO user; ==== Change owner on all tables ==== CLI: Unfortunately this can only be achieved via scripting, but we don't have to reinvent the wheel: https://gist.github.com/jirutka/afa3ce62b1430abf7572 GUI: Or in pgadminIII right-click on the schema, go to properties, and set the owner to your liking. ===== PSQL meta-commands ===== After you logged into PostgreSQL it's good to know some often-used meta-commands (read more in ''man psql''): -- list all users \du -- list all schemas \dn -- list tables \d \d+ -- (also show comments) \d tablename -- show details of a table -- list all tables / tables of a certain schema \dt *.* -- list all tables in DB \dt schemaname.* -- list all tables in a certain schema -- list all views \dv *.* -- list all views in DB -- list all databases on the host \l -- connect to a database \c dbname -- execute a command on the Linux shell (as-is) \! command -- quit \q For more meta-commands check out [[http://www.postgresql.org/docs/9.1/static/app-psql.html|the documentation]]. ===== List and Disconnect Active Sessions ===== -- list active connections (and get procpids) SELECT * FROM pg_stat_activity; -- cancel a process but leave the user's session open SELECT pg_cancel_backend( ); -- terminate a user's query and session SELECT pg_terminate_backend( ); ====== DB Management ====== ===== Show DB Sizes ===== SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database ORDER BY pg_database_size(pg_database.datname) desc; ===== Find large databases ===== SELECT d.datname AS Name, pg_catalog.pg_get_userbyid(d.datdba) AS Owner, CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname)) ELSE 'No Access' END AS Size FROM pg_catalog.pg_database d ORDER BY CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT') THEN pg_catalog.pg_database_size(d.datname) ELSE NULL END DESC -- nulls first LIMIT 20 see also: https://wiki.postgresql.org/wiki/Disk_Usage ===== Find large tables (in one DB) ===== SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') ORDER BY pg_relation_size(C.oid) DESC LIMIT 20; ===== Show Table size ===== -- size of data SELECT pg_size_pretty(pg_total_relation_size('')); -- size of indices SELECT c2.relname, c2.relpages * 8 / 1024 AS "MB" FROM pg_class c, pg_class c2, pg_index i WHERE c.relname = '' AND c.oid = i.indrelid AND c2.oid = i.indexrelid ORDER BY c2.relname; ====== SQL ====== ===== Date and Time ===== * http://www.postgresql.org/docs/9.1/interactive/datatype-datetime.html * http://www.postgresql.org/docs/9.1/static/functions-datetime.html ==== Add 15-min interval to time ==== SELECT DATE(timestamp) + (interval_id) * time '00:15:00' FROM ... ==== Extract date-parts ==== select extract(minute from timestamp '2008-08-01 12:14:13'); see http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT ==== Extract TIME from TIMESTAMP ==== SELECT "time"(mytimestamp) FROM table; ==== Convert Timestamp to Seconds ==== To convert a timestamp / interval to seconds (in case of timestamp seonds since 01-01-1970) SELECT EXTRACT (EPOCH FROM timestamp_or_interval ); ==== Creating Timestamps from Text Fields with missing leading Zeros ==== If your data looks like this: "2011/3/15 ";"6:17:3 " You can construct timestamps using the "FM" (fill mode) modifier: SELECT to_timestamp(utc_date|' '|utc_time,'YYYY/FMMM/FMDD FMHH24:FMMI:FMSS' ); ==== Update parts of date ==== update datetesting set d = to_timestamp( to_char (d,'YYYY-MM-DD 12:00:00'),'YYYY-MM-DD HH24:MI:SS' ) ; ==== Time Zone Information ==== Boy that was confusing so here it goes... === Converting from string to TIMESTAMP === select '2013-05-18 12:30:00'::TIMESTAMP WITH TIME ZONE ; -- 2013-05-18 12:30:00+02 no timezone info provided. Postgres simply assumes the current TIME ZONE select'2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE as with_tz, '2013-05-18 12:30:00+05'::TIMESTAMP WITHOUT TIME ZONE as without_tz; -- with_tz | without_tz ------------------------+--------------------- -- 2013-05-18 14:30:00+02 | 2013-05-18 12:30:00 **Notes** * When no TIME ZONE info is provided, but Postgres wants TIME ZONE info, then it simply assumes the current TIME ZONE of the session. * Timestamps WITH TIME ZONE are displayed in the TIME ZONE of the current session. Unless explicitly requested otherwise (see later). * The TIME ZONE part of a string will simply be **ignored** if you are converting to TIMESTAMP WITHOUT TIME ZONE * this also applies when you insert to a column of type TIMESTAMP WITHOUT TIME ZONE SELECT '2013-05-18 12:30:00+06'::TIMESTAMP; -- 2013-05-18 12:30:00 --> the +06 has been completely ignored SELECT '2013-05-18 12:30:00+06'::TIMESTAMP WITH TIME ZONE; -- 2013-05-18 08:30:00+02 --> 6:30 UTC --> 8:30CET (+02) === Explicitly setting the TIME ZONE === set TIME ZONE 'UTC'; select '2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE; -- 2013-05-18 12:30:00+00 set TIME ZONE 'CET'; select '2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE; -- 2013-05-18 14:30:00+02 === Converting from epoch values to TIMESTAMP === select to_timestamp(1358749391); --> "2013-01-21 07:23:11+01" **Notes:** * Postgres converts the epoch value to TIMESTAMP WITH TIME ZONE. Thats why the result is displayed in the local time zone (CEST, winter time). === Discard TIME ZONE Information === select to_timestamp(1358749391)::timestamp without time zone; --> "2013-01-21 07:23:11" **Notes** * does the same thing as before, but also discards the time zone information and returns a TIMESTAMP WITHOUT TIME ZONE === Convert to other TIME ZONE === select to_timestamp(1358749391) at time zone 'utc'; --> "2013-01-21 06:23:11" **Notes** * Displays the value at time zone UTC --> 6:23 * The result is a TIMESTAMP WITHOUT TIME ZONE. In this case postgres effectively removes the time zone information from the result * see http://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-ZONECONVERT ===== Useful Functions ===== ==== Aggregate Concat ==== CREATE AGGREGATE textcat_all( basetype = text, sfunc = textcat, stype = text, initcond = '' ); **Usage:** SELECT company_id, textcat_all(employee | ', ') FROM mytable GROUP BY company_id; ==== row_number ==== Is e.g. useful if you need an OID where none is present: SELECT row_number() OVER (ORDER BY NULL) AS oid FROM table; the "order by" clause is required, but in this special case we do not require any particular order, so we use NULL here. ==== Round time to interval ==== CREATE OR REPLACE FUNCTION round_time_to_interval(TIMESTAMP) RETURNS TIMESTAMP AS $$ SELECT date_trunc('hour', $1) + INTERVAL '15 min' * FLOOR(date_part('minute', $1) / 15.0) $$ LANGUAGE SQL; ==== Rows to Columns ==== This is an obscure one, but it is very handy if you must transform a number of rows in a number of columns. This can be achieved with the crosstab() function provided by the tablefunc package of postgres-contrib. So first we need to install postgresql-contrib and enable the function in our db: sudo apt-get install postgresql-contrib sudo service postgresql-8.4 restart cd /usr/share/postgresql/8.4/contrib psql -U username -d dbname -h localhost -f tablefunc.sql Next have a look at the documentation of crosstab here: http://www.postgresql.org/docs/8.4/static/tablefunc.html Here is an example of how to aggregate a traffic state from rows to columns (for a number of known roads). SELECT * FROM crosstab( 'SELECT valid_from, road_id, q from ait.its_traffic_state WHERE road_id IN (23005242,23008577) order by 1', 'SELECT distinct road_id FROM ait.its_traffic_state WHERE road_id IN (23005242,23008577) order by 1' ) AS its_traffic_state(time timestamp, road1 integer, road2 integer); ...for a higher number of columns you may need a script to generate the column_name/datatype row at the end of the query (if anyone knows a better solution plz tell). ===== Updating Sequence Counters ===== When data is inserted into the database with a specified value for the autoincrementing column the sequence is not touched and must be updated manually. This is the case when e.g. importing dumps using COPY, or a simple INSERT statement with the autoincrementing column set to a value. While the existing values (e.g. id 1) are reused the sequence counters stay the same (i.e. start with 0): this prevents new data records from being created as the new object would have a duplicate object id (1 == 1). The fix: SELECT SETVAL('sequence_name', MAX(auto_incrementing_column)) FROM table_name; ===== SQL Hints ===== ==== Timeseries completion ==== Assume you have incomplete timeseries in your database but want to select a complete series (pad the missing values with a Zero). The following query should take care of the padding and replacing missing values by 0-values: SELECT intervals.intervals as interval, coalesce(q,0) FROM timeseries t RIGHT OUTER JOIN generate_series(0,24) intervals ON t.interval = intervals.intervals ; ==== Timeseries creation ==== To create a series of timestamps select * from generate_series('2013-11-01'::timestamp, '2013-11-02'::timestamp, '15 minutes') ==== Convert Array to Rows ==== This might be useful to replace WHERE x IN ('a','b','c') statements (which are notoriously slow in postgres) with JOINs. SELECT i, (ARRAY['a','b','c'])[i] FROM generate_series(1,array_upper(ARRAY['a','b','c'],1)) i; i | array ---+------- 1 | a 2 | b 3 | c (3 rows) It is not very elegant though especially as you need to specify the array twice (or know the size of the array, then you can replace the array_upper statement with the size). If someone has a better solution I would be interested :) ====== Stored Procedures ====== ===== PL/Python ===== * http://www.postgresql.org/docs/8.3/interactive/plpython.html ==== Aggreagte Functions in PL/Python ==== * Aggregates in Postgres (Basics) : [[http://www.postgresonline.com/periodical.php?i_id=7#item-67]] * PL/Python Specific : [[http://www.postgresonline.com/journal/index.php?/archives/102-PLPython-Part-4-PLPython-meets-aggregates.html ]] ===== PL/Java ===== ==== Installing ==== Download the most recent version from http://pgfoundry.org/frs/?group_id=1000038 . Most of the instructions in this sections are taken from the README file in the tarball. * unzip pljava to /opt/pljava * sudo vi /etc/postgresql/8.3/main/postgresql.conf SET the following parameters: dynamic_library_path = '$libdir:/opt/pljava' log_min_messages = info #optional custom_variable_classes = 'pljava' ADD the following parameters: pljava.classpath = '/opt/pljava/pljava.jar' * update ldconfig so postgres finds the reqired shared objects. Create a file **/etc/ld.so.conf.d/pljava.conf** containing . /usr/lib/jvm/java-6-sun/jre/lib/amd64 /usr/lib/jvm/java-6-sun/jre/lib/amd64/server /usr/lib/jvm/java-6-sun/jre/lib/amd64/native_threads * run sudo ldconfig * install pl/java sudo su postgres psql -f install.sql ===== PL/R ===== ==== Installing ==== * Install the postgres-plr package . sudo aptitude install postgresql-8.3-plr * On ubuntu 8.04 the packet does not exist but the Jaunty Version works just fine. Search for the correct package on http://packages.ubuntu.com/search?keywords=postgresql-8.3-plr then do something like wget http://de.archive.ubuntu.com/ubuntu/pool/universe/p/plr/postgresql-8.3-plr_8.3.0.6-3_amd64.deb dpkg -i postgresql-8.3-plr_8.3.0.6-3_amd64.deb * Install PL/R in the database of choice psql < /usr/share/postgresql/8.3/plr.sql * Test CREATE OR REPLACE FUNCTION r_max (integer, integer) RETURNS integer AS ' if (arg1 > arg2) return(arg1) else return(arg2) ' LANGUAGE 'plr' STRICT; SELECT r_max(10,20); ==== Using ==== Have a look at the guide here http://www.joeconway.com/plr/doc/index.html ====== Useful commands ====== ===== Inserting from CSV Files ===== To efficiently insert a lot of records from csv-files into the table "tolldata" use the [[http://www.postgresql.org/docs/9.1/static/sql-copy.html]] COPY command: cat *.csv | psql -c "COPY tolldata FROM stdin WITH DELIMITER ';'" -d toll_tst -U toll -h localhost Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must //reside on or be accessible to the database server machine//, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. COPY naming a file is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access. ===== Outputting in CSV Format ===== If you need to export an entire table COPY table TO '/tmp/table.csv' DELIMITER ',' CSV HEADER; If query is more complicated: --- toggle to unaligned output mode \a --- set ; as field delimiter \f ; --- disable the footer (row count) \pset footer off SELECT * FROM table \g file.csv ===== Piping query result to file ===== In psql command line: SELECT * FROM table \g filename ([[http://archives.postgresql.org/pgsql-general/2001-12/msg00687.php see more details]]) ===== Show Tables ===== \dt or \dt my_schema ===== drop all tables in a schema ===== select 'drop table '|schemaname|'.'|tablename|';' from pg_tables where schemaname in ('public') order by schemaname, tablename; ===== drop all sequences in a schema ===== select 'drop sequence public.'|sequence_name|';' from information_schema.sequences where sequence_schema in ('public') order by sequence_name; ===== Grant on all tables ===== select 'grant select, insert, update, delete on ' | tablename | ' to muh;' from pg_tables where schemaname = 'public'; ====== Backups ====== There are many options available on how to back up your databases. Postgres already contains some tools itself: * ''pg_dump'': per-database dumping of db contents * ''pg_basebackup'': binary backup of whole postgres clusters (per db backup is not possible) However, especially for larger databases (hundreds of GB) more powerful third-party tools featuring parallelism and incremental backups are [[http://bonesmoses.org/2017/02/03/pg-phriday-getting-back-up|recommended]]: * [[https://pgbackrest.org|pgBackRest]] * [[https://www.pgbarman.org|pgBarman]] ===== pg_dump ===== It makes sense to use the postgres user sudo su postgres There are two important variants of making backups: Create and restore a backup using plain-text sql: ''pg_dump'' and ''psql'' pg_dump -U user source_db -f db_backup.sql # restore createdb target_db psql -U user -d target_db -f db_backup.sql Create and restore a backup using the compressed 'custom' format: ''pg_dump'' and ''pg_restore'' pg_dump -Fc -U user source_db > db_backup.dmp # restore (requires an empty database) createdb target_db pg_restore -Fc -d target_db db_backup.dmp In addition we can use gzip (and zcat) to further compress the compressed 'custom' format and give the dump a nice name: pg_dump -Fc -U user source_db | gzip > "$(date +%Y-%m-%d)_db_backup.dmp.gz" zcat YYYY-MM-DD_db_backup.dmp.gz | pg_restore -Fc -d target_db Or, when creating the backup via cronjob in ''/etc/cron.d'': 0 5 * * * postgres pg_dump -Fc dbname | gzip > /path/to/db_backup_`date +\%Y-\%m-\%dT\%H_\%M`.dmp.gz # ┬ ┬ ┬ ┬ ┬ ┬ ┬ # │ │ │ │ | │ └ command # │ │ │ │ | └───── user executing the command # │ │ │ │ └───────── day of week (0 - 7) (0 to 7 are Sunday to Sunday, or use names) # │ │ │ └───────────── month (1 - 12) # │ │ └───────────────── day of month (1 - 31) # │ └───────────────────── hour (0 - 23) # └───────────────────────── min (0 - 59) ===== pg_basebackup ===== see e.g. [[https://blog.sleeplessbeastie.eu/2016/02/15/how-to-perform-base-postgresql-backup-using-pg_basebackup-utility|this tutorial]] ===== pgBackRest ===== A typical use of pgBackRest would be to install a backup server that backs up several DB servers. For now we only use a simpler approach: pgBackRest backing up on localhost. For more see the [[https://pgbackrest.org/user-guide.html|excellent user guide]] or [[https://pgstef.github.io/2018/01/04/introduction_to_pgbackrest.html|this good introduction blog post]]. Install from postgres repo: sudo add-apt-repository "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main" wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - sudo apt update sudo apt install pgbackrest ==== Incremental Backups on Localhost ==== In addition to the global section, where e.g. retention is configured, you must add a so-called "stanza" for each cluster you want to back up. Make sure the repo1-path exists and is owned by user postgres. [global] repo1-path=/var/lib/pgbackrest repo1-retention-full=5 repo1-retention-diff=1 [STANZANAME] pg1-path=/var/lib/postgresql/9.5/CLUSTERNAME Then adjust the postgres config to allow for proper backups listen_addresses = '*' # WRITE AHEAD LOG wal_level = hot_standby archive_mode = on archive_command = 'pgbackrest --stanza=STANZANAME archive-push %p' # REPLICATION max_wal_senders = 3 # ERROR REPORTING AND LOGGING log_line_prefix = '' Reload the cluster (or the complete postgres server) to activate changes sudo systemctl restart postgresql@9.5-CLUSTERNAME Create and test the pgbackrest stanza sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=info stanza-create sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=info check Make backups manually (e.g. from a cronjob) sudo -u postgres pgbackrest --stanza=STANZANAME --type=full --log-level-console=detail backup sudo -u postgres pgbackrest --stanza=STANZANAME --type=incr --log-level-console=detail backup Make **regular backups via cronjob** #minute hour day-of-month month day-of-week user command 0 4 * * 0 postgres pgbackrest --type=full --stanza=STANZANAME backup 0 4 * * 1-6 postgres pgbackrest --type=diff --stanza=STANZANAME backup **Restore latest backup.** The cluster must be stopped and the cluster directory must exist and be empty. sudo systemctl stop postgresql@9.5-CLUSTERNAME sudo mv /var/lib/postgresql/9.5/CLUSTERNAME /opt/backups/CLUSTERNAME_backup sudo -u postgres mkdir /var/lib/postgresql/9.5/CLUSTERNAME sudo chmod 0700 /var/lib/postgresql/9.5/CLUSTERNAME sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=detail restore sudo systemctl start postgresql@9.5-CLUSTERNAME To **restore a specific backup** use ''%%--set%%'' with the name of a folder in ''/var/lib/pgbackrest/backup/STANZANAME'', e.g.: sudo -u postgres pgbackrest --set=20180724-144511F --stanza=STANZANAME --log-level-console=detail restore You can also copy the contents of ''/var/lib/pgbackrest'' to a different device and restore it there. Log files can be found in ''/var/log/pgbackrest''. ====== Performance Tuning ====== [[Monitoring Server Activity]] ===== References ===== * See http://samiux.wordpress.com/2009/07/26/howto-performance-tuning-for-postgresql-on-ubuntudebian/ for a good introduction and some reasonable default values. * See http://www.postgresql.org/docs/8.4/interactive/kernel-resources.html#SYSVIPC for Kernel Parameters that may be relevant when tuning Postgres Performance on Linux Systems * Note that while SHMMAX is measured in bytes SHMALL is measured in **Pages** * The default page size on a 32bit Linux system is usually 4096 bytes (check with the command //getconf PAGE_SIZE//). So if you change SHMMAX you should also check that // SHMALL >= SHMMAX / PAGE_SIZE //