The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at
On 14th of May, Stephen Frost committed patch: Add pg_audit, an auditing extension   This extension provides detailed logging classes, ability to control logging at a per-object level, and includes fully-qualified object names for logged statements (DML and DDL) in independent fields of the log output.   Authors: Ian Barwick, Abhijit Menon-Sen, David Steele Reviews […]
Posted by Feng Tian on 2015-05-21 at 23:33:00
Say you have carefully tuned your database and wow! it is ten times faster -- but, before you push it to the production cluster, how do you know the answer of the query is correct?   Or at least, same as before?   :-)    You may have same question if you want to upgrade or migrate your database, or, for us, we want to make sure the new great join algorithm actually produce correct answer.   Well, let's just put the result before, and after, in two tables, and compare the two tables.

This is great, but turns out to be quite tricky.   You need to take care of the cases that,
  •  Either table may not have a primary key
  •  Rows in the two tables may be physically stored in different orders.
  • Columns could contain nulls. 
  • The two tables could be huge, so performance must be good.
Here are something you could do.   

Method 1: You can dump the tables to text or csv files, the run diff -- but you need to take care of ordering, so you have to copy the table out with order by clause.    And good luck diff-ing 2TB of text file.

Method 2: Use except.  How about this query?  
select * from (
    select * from ta except select * from tb 
    union all 
    select * from tb except select * from ta
) foo; 

It is totally wrong!   See attached sql script for some surprise.   And this?

select * from (
    select * from (select * from ta except select * from tb) xxx
    union all
    select * from (select * from tb except select * from ta) yyy
) foo;
Better, but it did not take care of duplicate rows -- and, checking select count(*) is not good enough.

Method 3: Join by ourselves!  Here is a script from my friend CK.

A as ( 
    select hashtext(textin(record_out(ta))) as h, count(*) as c 
    from ta group by h
B as (
    select hashtext(textin(record_out(tb))) as h, count(*) as c 
    from tb group by h
select * from A full outer join B on (A.h + A.c= B.h + B.c)
where A.h is null or B.h is null limit 5;

The hashtext part is optional -- for very wide rows, using hashtext will save some cycles and memory.   This is

[continue reading]

Posted by Joshua Drake in CommandPrompt on 2015-05-21 at 17:48:32
I spent a great deal of time working through the SQL DUMP portion of the 9.5devel docs this past week. Below is the current text of what I have and it would be great if my readers would take a look and offer some thoughtful feedback. What would you like to see added? What would you like to see changed? Please note that this is reference documentation not tutorial documentation.

This is just the straight HTML dump that is generated from Docbook but since it is inline the links won't work. <!a href="">The current -devel docs are here and the updated version I am working is below:

<!a name="BACKUP-DUMP" id="BACKUP-DUMP">24.1. SQL Dump

PostgreSQL provides the program <!a href="app-pgdump.html">pg_dump for generating a backup file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. The basic usage of <!a href="app-pgdump.html">pg_dump is:

pg_dump -C -F p -f outfile dbname

The use of -C ensures that the dump file will contain the requisite <!a href="sql-createdatabase.html">CREATE DATABASE command within the dump file. The use of -Fp ensures that you are using the plain text format and the use of -f allows you to specify the name of the file the dump will be written to. It is also possible for pg_dump to create files in other formats that allow for parallelism and fine-grained control of object backup or restoration. For more details on all options available to pg_dump please refer to the <!a href="app-pgdump.html">pg_dump reference page.

The pg_dump application requires read access to all objects within the database that it will be operating with. This generally requires database super-user access. It is possible for any database user to use pg_dump to backup the objects that they own regardless of super-user access. This can be achieved using options such as -n schema or -t table.

The primary advantage of using pg_dump over the other backup methods described is that pg_dump

[continue reading]

In this entry, I would like to explain how to use pt-table-usage and pt-index-usage commands in the Postgres Toolkit which I introduced in the previous entry. If you have never heard about Postgres Toolkit, please visit the previous entry. A Hacker's Diary: Postgres Toolkit 0.2 released pt-table-usage and pt-index-usage are the commands which can be used to obtain usages of the tables/indexes
On 15th of May, Peter Eisentraut committed patch: Add pg_settings.pending_restart column   with input from David G. Johnston, Robert Haas, Michael Paquier This is, for me, great. It might not sound like much, but it is actually something that I'm very excited about. Basically – let's assume you change system settings. Either by modifying postgresql.conf, […]
Posted by Andrew Dunstan in pgExperts on 2015-05-20 at 16:47:00
There has been some debate going on on the hackers list about problems certain drivers have with operators that contain the ? character. Since drivers such as the JDBC driver consider any such character outside of quote marks to be a parameter placeholder, its use as part of an operator makes life difficult for them. Part of the problem has been that we allow almost every available non-alphanumeric ASCII character to be a part of an operator. Why, for example, do we allow `? I've never seen it that I can recall, but the fact means we can't use it for something else (such as marking an operator). That makes doing anything about the problem hard. Some people want to deprecate the use of ? in operators. I think it's far too late for that - its use is far too widespread.
On 12th of May, Andrew Dunstan committed patch: Additional functions and operators for jsonb   jsonb_pretty(jsonb) produces nicely indented json output. jsonb || jsonb concatenates two jsonb values. jsonb - text removes a key and its associated value from the json jsonb - int removes the designated array element jsonb - text[] removes a key […]

A couple of days back the following commit has landed in the Postgres world, for the upcoming 9.5 release:

commit: a486e35706eaea17e27e5fa0a2de6bc98546de1e
author: Peter Eisentraut <>
date: Thu, 14 May 2015 20:08:51 -0400
Add pg_settings.pending_restart column

with input from David G. Johnston, Robert Haas, Michael Paquier

Particularly useful for system doing a lot of server parameter updates, this allows tracking parameters in need of a server restart when their value is updated to have the new value take effect on the system. Note that this applies to all the parameters marked as PGC_POSTMASTER in guc.c, shared_buffers being one, as well as the custom parameters a system may have after their load by a plugin. This information is tracked by a new column called pending_restart in the system view pg_settings with a boolean value set to "true" if a given GUC parameter is indeed waiting for a server restart.

In order to make visible the fact that parameter waits for a restart, the server can have its parameters be reloaded with either pg_reload_conf(), "pg_ctl reload" or a SIGHUP signal. Of course, modifications made in postgresql.conf, as well as any configuration files included, or ALTER SYSTEM are taken into account. See for example:

=# \! echo "port = 6666" > $PGDATA/postgresql.conf
=# ALTER SYSTEM SET shared_buffers TO '1GB';
=# SELECT pg_reload_conf();
(1 row)
=# SELECT name FROM pg_settings WHERE pending_restart;
(2 rows)

This will prove to be useful for many systems around, like those doing automatic tuning of system parameters or even containers (not limited to it of course).

Posted by Giulio Calacoci in 2ndQuadrant on 2015-05-19 at 08:30:45

The release of PostgreSQL 9.5 is imminent so the time has come to analyse what’s new in this latest version.

A very interesting feature of version 9.5 is the ability to import a schema from a remote database, using Foreign Data Wrapper and the IMPORT FOREIGN SCHEMA command.


Foreign Data Wrappers (FDW)

Before the introduction of Foreign Data Wrappers, the only way to connect a Postgres database with an external data source was using the dblink module.

In 2003 the set of rules for the standard management of external data sources was defined within the SQL language: SQL/MED  (management of external Data).

PostgreSQL 9.1 introduced a first implementation of the standard SQL/MED with the Foreign Data Wrappers, which provided Postgres with direct access to data sources such as files or other databases (Oracle, Mysql…), allowing their use as tables.

The advantage of this approach is obvious: it gives you the ability to connect to an external data source to extract data natively by running a simple query. The fact that it is not necessary to use external modules to obtain this result considerably simplifies the DBAs’ work.

If you want to know more, take a look at our 2011 blog post: PostgreSQL 9.1: Tabelle esterne con SQL/MED (Warning: blog post in Italian).

Small example of an FDW

PostgreSQL 9.3 introduces the support for Foreign Data Wrappers in writing (Warning: blog post in Italian) and also added support to the foreign data wrapper for PostgreSQL. Let’s have a look at a simple example of how to use an FDW by connecting together two Postgres databases.

First we create two databases:

CREATE DATABASE destination;

Within the source we create a test table with test data:

\c source
CREATE TABLE test1 AS SELECT id, md5(random()::text) FROM generate_series(1,5) id;

Now we connect to the destination db and then, connect the two databases:

\c destination
CREATE EXTENSION postgres_fdw ;
CREATE SERVER src_srv FOREIGN DATA WRAPPER postgres_fdw OPTIONS( dbname 'source' );

[continue reading]

Tom Lane introduced new state for TOAST types - expanded state. The side effect of this change together with some changes in plpgsql internals has big impact to speed of array operation in plpgsql.
I have two plpgsql block with cycles. First enforces a array fields update, second a array append.
DO $$ 
DECLARE a int[] = array_fill(10, ARRAY[10000]);
FOR i IN 1 .. 10000 LOOP
a[i] := 0;

DO $$
DECLARE a int[] = '{}';
FOR i IN 1 .. 10000 LOOP
a := a || 10;
You can try this code on PostgreSQL 9.4 - and you can get time 450 and 220ms. Same code needs 6 and 5 ms only on PostgreSQL 9.5! It is more than one order speedup.
Posted by gabrielle roth on 2015-05-19 at 02:07:00
I do a lot of “is this !#@(* really working correctly” testing. (See also: simple test for autovacuum log messages.) Here’s a simple test to verify that log_lock_waits is configured correctly. This is essential for troubleshooting lock contention. -- verify settings -- log_lock_waits should be 'on'; it is off by default. SHOW log_lock_waits; -- lock_wait […]
Posted by Greg Sabino Mullane in End Point on 2015-05-18 at 12:00:00

Determining if your current database connection is using PgBouncer, or going directly to Postgres itself, can be challenging, as PgBouncer is a very low-level, transparent interface. It is possible, and here are some detection methods you can use.

This was inspired by someone asking on the Perl DBD IRC channel if it was possible to easily tell if your current database handle (usually "$dbh") is connected to PgBouncer or not. Since I've seen this question asked in other venues, I decided to take a crack at it.

There are actually two questions to be answered: (1) are we connected to PgBouncer, and if so, (2) what pool_mode is being run? The quickest and easiest wat I found to answer the first question is to try and connect to a non-existent database. Normally, this is a FATAL message, as seen here:

$ psql testdb -p 5432
testdb=# \c ghostdb
FATAL:  database "ghostdb" does not exist
Previous connection kept

However, a slightly different ERROR message is returned if the same thing is attempted while connected to PgBouncer:

$ psql testdb -p 6432
testdb=# \c ghostdb
ERROR:  No such database: ghostdb
Previous connection kept

Thus, an ERROR will always indicate that you are connected to PgBouncer and not directly to Postgres, which will always issue a FATAL.

In the future, there will be an even simpler method. As of this writing, pgBouncer 1.6 has not been released, but it will have the ability to customize the application_name. This is a configurable session-level variable that is fairly new in Postgres. Andrew Dunstan wrote a patch which enables adding this to your pgbouncer.ini file:

application_name_add_host = 1

This will make PgBouncer modify the application_name to append some information to it such as the remote host, the remote port, and the local port. This is a feature many PgBouncer users will appreciate, as it offers an escape from the black hole of connection information that PgBouncer suffers from. Here is what it looks like on both a normal Postgres conne

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-05-17 at 17:20:00
One of the things I like about data recovery jobs is that each one is different. Right now I'm wrestling with this problem: how can I recover whatever is possible from a backup that's missing a handful of xlog files. We know this will result in a probably corrupt database, but we still want to see what data is available.

My first two approaches didn't work too well. I tried mangling the control file, and creating dummy xlog files to replace the handful that are missing.

Right now I'm trying to follow a suggestion from Andres Freund to drive the process via the debugger, moving it on to the next available log file when it gets to where it would want the first missing log file.  The trouble is it's hard to find what to set exactly and where to set it (this is Postgres 9.3). Every time I try something I get errors like "unexpected page_addr". If I were more familiar with this section of the code it might be easier, but it's something I've pretty much never dabbled in.

It all makes for an exciting life.
Posted by Robins Tharakan on 2015-05-16 at 14:50:00
While reviewing an existing application, I thought it'd be worthwhile to review how good / bad PostgreSQL is in terms of OLAP. This (growing) post is going to be my (un)learning of how ready is PostgreSQL. Row Numbering Support: Yes.  Use: Row_Number() function numbers rows generated in a result-set. Example: SELECT   row_number() OVER (ORDER BY marks DESC) AS rn,  nameFROM x; Review: Some
Finally ! A *much* awaited feature, this attempt at adding the GROUPING SETS / ROLLUP / CUBE feature to PostgreSQL has been in the works for about a year (besides the so many in the past decade and a half that didn't get through), and thankfully this has finally got the approval of the powers that be, so the upcoming Postgres 9.5 would finally have this long pending SQL feature. MSSQL and
Posted by Joel Jacobson on 2015-05-16 at 10:08:21

By coincidence I stumbled upon a table where one of its columns didn’t have a foreign key. I found this strange, since our convention is to always add foreign keys when appropriate.

But humans are humans, and will eventually forget to add a foreign key, so we better have both belt and suspenders and find a way to deal with this inevitable problem in an efficient way.

It would be a tedious job to manually look for missing foreign keys in all tables and columns.
But if you’re lucky enough to have the pleasure of working with a system where all tables and columns have been given their names by following a strict naming convention, it might be possible to fully automate the task.

This is the namning convention we use:

  1. Table names are always in plural form with a tailing “s”, e.g. users
  2. Primary key column names are always equal to the table name in singular form with a tailing id, e.g. userid.
  3. Foreign key columns are always equal to the primary key they are referecning, e.g. transactions.userid -> users.userid

This means you always know based on the column name alone, what table if any that column might be referencing.

Example: If table transactions have a column named userid, and there is a table where userid is also the primary key, but if there isn’t any foreign key on transactions.userid, then it’s a missing foreign key, or else someone has not followed the namning convention.

Thanks to PostgreSQL’s fantastic pg_catalog system tables, we can write a query which uses the rules from the naming convention and returns all the columns which appears to be missing foreign keys. It doesn’t support multi-column keys, but we don’t have many of those, so it’s not a problem in my case.

Thanks to the view below, I automatically found three more missing foreign keys of the same type, which saves me many hours of boring work today.

pg1:joel=#* SELECT * FROM view_missing_foreign_keys;
 nspname |      relname    | attname 
 public  | transactions    | userid
 public  | someohtertable1 | use

[continue reading]

Yesterday I gave the developers at my company what I call a DBA Chat. It’s something I try to do every month to keep them apprised on various features, caveats, performance considerations, and so on. I find that educating the folks who regularly work with the database does wonders for application performance and my sanity. The benefit of this long format is that I can go over more information than a time constrained set of slides.

This month we went over one of my slightly older talks, and even though the material was three years old from my perspective, it was all new to them. And why not? Developers have a job to do, and while they do work with the database, it generally isn’t their responsibility to research the voluminous minutia and quirks commonly associated with a specific platform. That’s my job.

So here is the first of a ten part series on anti-patterns that can annihilate PGDB (PostgreSQL) performance, some of which are outright insidious. This week we’ll start slow with a short discussion on the importance of EXPLAIN. This is PGDB’s first line of defense against bad queries, and it’s overlooked more than you might expect, as well as somewhat difficult to interpret for the uninformed. I’ll probably have a more in-depth version of this article in the future for truly interpreting EXPLAIN output, but this time we’ll be focusing on using it in general.

First, we need to set up a use case to illustrate a couple of variants. Two tables with a close relationship should do it:

CREATE TABLE sys_product
    product_id   SERIAL  PRIMARY KEY,
    prod_name    TEXT    NOT NULL,
    descr        TEXT    NOT NULL DEFAULT now()
INSERT INTO sys_product (prod_name, descr)
SELECT 'Product ' ||, 'It does stuff.'
  FROM generate_series(1, 1000) a(id);
CREATE TABLE sys_order
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
INSERT INTO sys_order (product_id, item_count)

[continue reading]

Posted by Josh Berkus on 2015-05-15 at 19:13:28

We're having two meetups in May: one in Oakland, and one in San Francisco:

Unfortunately, the Oakland one already has a waiting list, but get on it if you're interested in case of cancellations. And a big thank-you to and for sponsoring these meetups.

Posted by Andrew Dunstan in pgExperts on 2015-05-15 at 13:04:00
Every so often I've wanted a SET metatype in Postgres.

We do have bit fields, which can be used as sets with a little work, and this is how MySQL does it's SET types as I understand it. But what if you want a set of enums (our enums can have very large lists of values), or a set of text values, or of integers? Bit fields won't work for these - we would need some sort of sparse representation.

One obvious way would be to store the set as a one-dimensional array, kept in sorted order for efficient searching. For text values, we could probably use some smart form of sorting such as is used by jsonb and hstore to store object keys, to reduce the need for string comparisons.

Obviously, I have done no work on this, but it seems like it would be a worthwhile project.

In the meantime, you can use hstore more or less in this way. Just treat it as a mapping from whatever values you have to 1.  It's a bit of a hack, but it should work. You'll just need to cast your values to text.
Posted by Andrew Dunstan in pgExperts on 2015-05-13 at 16:32:00
Yesterday I committed some new features for jsonb. These are:
  • jsonb_pretty(jsonb) produces nicely indented json output.
  • jsonb || jsonb concatenates two jsonb values.
  • jsonb - text removes a key and its associated value, or a matching array element, from the json
  • jsonb - int removes the designated array element
  • jsonb - text[] removes a key and associated value or array element at the designated path
  • jsonb_replace(jsonb,text[],jsonb) replaces the array element designated by the path or the value associated with the key designated by the path with the given value.
This was based Dmitry Dolgov's jsonbx extension for 9.4, which in turn was based on earlier work by Oleg Bartunov, Teodor Sigaev and Alexander Korotkov on nested hstore. Petr Jelinek also did some useful review and tidy up work on the core features. So, a good team effort.
Posted by Amit Kapila in EnterpriseDB on 2015-05-13 at 14:26:00

As of now, one can't reliably use tar format to take backup on Windows
because it can't restore tablespaces data which is stored in form of symbolic
links in <data_directory>/pg_tblspc/.  The reason for the same is that  native
windows utilites are not able to create symbolic links while extracting files
from tar.  It might be possible to create symbolic links if cygwin is installed
on your system, however we need this feature to work for native windows as

In PostgreSQL 9.5, a new feature (commit id - 72d422a5) to extend existing
tar format made it possible to reliably take the backup (in tar mode). 
From user perspective, there is nothing much that is changed to take the backup
except that  tar format mode (--format=tar) in pg_basebackup (of the PostgreSQL
9.5 version) will only work with server version 9.5 or later.  This feature is mainly
required for windows, but for the sake consistency it has been changed for all
platforms and also it should enable long (length greater than 99) target symbolic
link for tar format (I think the changes for same are still not done, but we can do
the same now as this feature is committed).

The basic idea behind the feature is that it forms the tablespace map of
all the tablespace symbolic links that are present inside
<data_directory>/pg_tblspc/ and store the same in data_directory for
Exclusive backups (aka backups taken via pg_start_backup() and
pg_stop_backup() functions) and store in backup archive for Non-Exclusive
backups (aka backups taken by pg_basebackup).

The format of tablespace_map file is:
16384 E:\WorkSpace\PostgreSQL\master\tbs
16388 E:\WorkSpace\PostgreSQL\master\tbs              2               3

The tablespace symbolic links are restored during archive recovery and the
tablespace_map file will be renamed to tablespace_map.old at the end of
recovery similar to backup_label file.
Posted by Satoshi Nagayasu on 2015-05-12 at 00:34:00
Today, we pleased to announce the release of Postegres Toolkit 0.2. ■What is "Postgres Toolkit"? Postgres Toolkit is a collection of scripts and utilities which is intended to help PostgreSQL DBA to improve quality and productivity of their daily jobs and operations. With having Postgres Toolkit,
Posted by Josh Berkus in pgExperts on 2015-05-12 at 00:24:00
In prior posts, I've gone over some methods to prevent cycles from being added to your database.  However, imagine that someone has handed you an existing adjacency list tree -- perhaps migrated from another DBMS -- and you need to find all of the cycles as part of data cleaning?  How do you do that?

One way, obviously, would be just explore all paths and flag the ones where any ID appeared twice:

        SELECT, 1 AS depth, array[id] as seen, false as cycle
        FROM folders
        UNION ALL
        SELECT, prev.depth + 1, path || as seen,
   = any(seen) as cycle
        FROM prev
        INNER JOIN folders on = parent_id
    SELECT *
    FROM prev;

However, the above has a serious issue: the query itself will cycle and never complete (in fact, it will error out). So we need to terminate each cycle when the first repeat happens.  Fortunately, that's easy to do, and we'll filter for only the cycles while we're at it:

        SELECT, 1 AS depth, array[id] as seen, false as cycle
        FROM folders
        UNION ALL
        SELECT, prev.depth + 1, seen || as seen,
   = any(seen) as cycle
        FROM prev
        INNER JOIN folders on = parent_id
        AND prev.cycle = false
    SELECT *
    FROM prev
    WHERE cycle = true;

The results of the above query look like this:

    id | depth |       seen       | cycle
    21 |     2 | {21,21}          | t
    13 |     5 | {13,14,15,11,13} | t
    14 |     5 | {14,15,11,13,14} | t
    15 |     5 | {15,11,13,14,15} | t
    11 |     5 | {11,13,14,15,11} | t
    (5 rows)

One thing to notice is that you'll get a row for every node in a cycle loop.  That's because with cycles, the choice of starting point is arbitrary.  So the above query isn't the best choice for a deep tree where you have a lot of existing cycles.  There's a 2nd

[continue reading]

On 12th of April, Magnus Hagander committed patch: Add system view pg_stat_ssl   This view shows information about all connections, such as if the connection is using SSL, which cipher is used, and which client certificate (if any) is used.   Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael Paquier It was committed […]
I had to setup it relatively recently, and hit some roadblocks, so figured I'll write about my experiences – for myself in the future, or for anyone else that might want to set it up. First, let's state goals: remote communication to PostgreSQL (as in: not within localhost) should go via ssl encrypted channels to […]
If you are trying to get your head wrapped around exclusion constraints via a Google search you may get the impression that the only thing it's good for is solving the double booking problem common to reservation systems. The PostgreSQL documentation expands the possibility a teeny tiny bit by mentioning preventing overlapping circles. But that's it! So here is this truly phenomenal feature that
Posted by Michael Paquier on 2015-05-11 at 08:05:27

Marked as the number #1 wanted feature in Postgres that has been missing for years by many people, upsert support has landed in the Postgres world and will be released with the upcoming 9.5:

commit: 168d5805e4c08bed7b95d351bf097cff7c07dd65
author: Andres Freund <>
date: Fri, 8 May 2015 05:31:36 +0200

The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.


Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.

For years, application developers have been using row triggers with a set of dedicated functions to handle manually the case of UPSERT, as known as how to deal on-the-fly with constraint violation when a tuple is inserted in a relation. On top of being slow, because triggers add on the call stack the overhead of a function call, many developers have for sure pested for a feature that has been supported for years by the many RDBMS available, be they proprietary or open source. Have a look for example at the function merge_db() defin

[continue reading]

Posted by Pavel Stehule on 2015-05-11 at 06:04:00
The PL/pgSQL is important part of PostgreSQL. A implementation of plpgsql is terrible simple, what has two impacts (one positive, one negative).
  • We are able to implement new features very simply - implementation of some new functionality like ASSER or RAISE WITH CONTEXT needs only few hours.
  • Some usual features from other languages are missing or it its implementation is suboptimal. But there are significant progress in prepared PostgreSQL 9.5.
One less known issue of PLpgSQL is casting. PLpgSQL uses IO casting - every time, when type of variables or typmod isnot identical, plpgsql runtime transforms values from source type to text and back to target type. This IO casting is very simple for implementation, but it should be very hidden performance killer. plpgsql_check can raise warnings about it. You can see a impact of this IO cast on following test:
-- tested on PostgreSQL 9.4 (synthetic test, worst case) 
-- IO cast
postgres=# do $$declare s numeric = 0; begin for i in 1..10000000 loop s := i; end loop; end $$;
Time: 6708.727 ms
-- enforced binary casts
postgres=# do $$declare s numeric = 0; begin for i in 1..10000000 loop s := i::numeric; end loop; end $$;
Time: 4738.093 ms
Tom Lane wrote and commited patch, that enforce binary casting everywhere it is possible. When it is impossible, then IO casting is used as before.
-- tested on PostgreSQL 9.5
postgres=# do $$declare s numeric = 0; begin for i in 1..10000000 loop s := i; end loop; end $$;
Time: 3992.551 ms
postgres=# do $$declare s numeric = 0; begin for i in 1..10000000 loop s := i::numeric; end loop; end $$;
Time: 3693.739 ms
There is still some overhead, but it is significantly less 30% versus 7.5%. Other positive - PostgreSQL is in this test about 30% faster (it is synthetic test, so reality should be different, but, the expectation so 9.5 will be faster than 9.4 is valid).
On 8th of May, Andres Freund committed patch: Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.   The newly added ON CONFLICT clause allows to specify an alternative to raising a unique or exclusion constraint violation error when inserting. ON CONFLICT refers to constraints that can either be specified using a inference clause (by […]
Posted by Quinn Weaver in pgExperts on 2015-05-08 at 21:37:00
A client of ours was suffering mysterious, intermittent lock pileups on a large table. To diagnose the problem, I ran our lock-logging scripts.

Querying the resulting log_transaction_locks table, I saw lots of extend locks and RowExclusiveLocks piling up behind normal UPDATEs (and INSERTs) — there were even extend locks waiting on other extend locks. This is pretty unusual; you'd expect that when one UPDATE completed, the extend lock would succeed, PostgreSQL would quickly extend the relation (in this case, the big table's TOAST relation), and queries would keep running smoothly. These were all single-row writes; they should have taken that long. However, a complicating factor was that a couple of columns in that big table were giant TEXT columns storing JSON strings. Was that the issue? The problem was unclear, and we needed more info.

At this point my colleague Christophe Pettus made the excellent suggestion to do 'strace -p' on one of the locked processes. The dump revealed lots of sendto() calls with giant strings, which he immediately recognized as syslog messages based on their formatting.

Now, this was a problem. syslog writes to /var, and /var was hanging off the root partition, which was an EBS volume (without provisioned IOPS). So each basically the entire SQL of each query was being written across the network, including those giant JSON strings, and the query itself wouldn't terminate till the write was fsync()'ed. In the meantime, extend locks and other UPDATEs piled up behind it. The PostgreSQL data directory was on a big, fast software RAID across SSD instance storage, but performance was being bottlenecked by that single EBS volume.

The client elected to turn off log_lock_waits, and the problem went away. In addition, Christophe suggested prepending a minus sign to the filename in /etc/rsyslog.d/postgres.conf, like so, which tells syslog not to flush between each log statement:

local0.* -/var/log/postgres.log

I thought that was neat: a potential fix to a huge performance problem, in one chara

[continue reading]