The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

A couple of days back a new mode has been added in vacuumdb for the support of parallel jobs:

commit: a17923204736d8842eade3517d6a8ee81290fca4
author: Alvaro Herrera <>
date: Fri, 23 Jan 2015 15:02:45 -0300
vacuumdb: enable parallel mode

This mode allows vacuumdb to open several server connections to vacuum
or analyze several tables simultaneously.

Author: Dilip Kumar.  Some reworking by Álvaro Herrera
Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund

When specifying a number of jobs with -j, the number of maximum connections defined by max_connections should be higher than the number of jobs specified as process creates a number of connections to the remote database equal to the number of jobs, and then reuses those connections to process the tables specified.

This of course supports all the modes already present in vacuumdb, like --analyze, --analyze-in-stages, etc. The list of tables processed in parallel can as well be customized when passing several values via --tables.

An important thing to note is that when using this feature with -f (VACUUM FULL), there are risks of deadlocks when processing catalog tables. For example in this case what happens was a conflict between pg_index and pg_depend:

$ vacuumdb -j 32 -f -d postgres
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming of database "postgres" failed: ERROR:  deadlock detected
DETAIL:  Process 2656 waits for RowExclusiveLock on relation 2608 of database 12974; blocked by process 2642.
Process 2642 waits for AccessShareLock on relation 2610 of database 12974; blocked by process 2656.
HINT:  See server log for query details.
$ psql -At -c "SELECT relname FROM pg_class WHERE oid IN (2608,2610);"

Note that this has higher chances to happen if:

  • the number of relations defined on the database processed is low.
  • the quantity of data to be processed is low
  • the number of jobs is high

So be careful when using parallel jobs with FULL on a complete database.

On 19th of January, Robert Haas committed patch: Use abbreviated keys for faster sorting of text datums.   This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the […]
Posted by Daniel Pocock on 2015-01-26 at 21:37:32

The other day I demonstrated how to get your Github issues/bugs as an iCalendar feed.

I'm planning to take this concept further and I just whipped up another Python script, exposing Nagios issues as an iCalendar feed.

The script is nagios-icalendar. Usage is explained concisely in the README file, it takes just minutes to get up and running.

One interesting feature is that you can append a contact name to the URL and just get the issues for that contact, e.g.:


Here I demonstrate using Mozilla Lightning / Iceowl-extension to aggregate issues from Nagios, the Fedora instance of Bugzilla and Lumicall's Github issues into a single to-do list.

Today version 1.4.0 of Barman has been officially released. The most important feature is incremental backup support, which relies on rsync and hard links and helps you reduce both backup time and disk space by 50-70%.

Stacked coffee cupsBarman adds one configuration option, called reuse_backup. By setting this option to link, Barman will transparently reuse the latest available backup in the catalogue of a given server when issuing a barman backup command.

“Mhh … hang on … Did you just say ‘Reuse’? In what terms?”

Essentially, Barman will reuse the previous backup in two phases:

  • when reading the backup files;
  • when permanently saving the backup files.

Barman simply exploits rsync’s robust and proven technology in order:

  • to skip transferring those database files that have not changed from the latest base backup;
  • use hard links on the backup side in order to save disk space (data deduplication).

If you happen to follow me on Twitter, you might have already seen this message about data deduplication on one of our customer’s database:

#pgbarman #postgresql #IncrementalBackup #database size: 13.2 TB. Actual size on disk: 5.0 TB (-62.01% #deduplication ratio). Saved 8.2TB!This particular case involves a very large database of 13.2 Terabyte. Consider doing a weekly backup of a 13.2 TB database. You face two major problems:

  • backup time
  • backup size (with a large impact on retention policies)

As you can see, over 8.2 TB of data had not changed between the two backups, with a subsequent reduction of both backup time (17 hours instead of more than 50!) and disk space (5TB instead of 13TB!), as well as network bandwidth (by the way, Barman allows you to enable network compression too).

I must confess that a 13TB database is quite an exceptional case. However, what we have witnessed so far, even on smaller (and much smaller) databases, is an almost constant deduplication ratio in all the backups that we are managing with 2ndQuadrant. On average, deduplication ratio is between 50% and 70%.

The experience gained on this topic by Barman’s development team won’t stop here. A similar approach is being followed by Marco Nenciarini in its attempt to introduce file level incrementa

[continue reading]

On 23rd of January, Alvaro Herrera committed patch: vacuumdb: enable parallel mode   This mode allows vacuumdb to open several server connections to vacuum or analyze several tables simultaneously.   Author: Dilip Kumar. Some reworking by Álvaro Herrera Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund This is great for multi-table vacuums/analyzes. Previously, […]
Posted by Guillaume LELARGE in Dalibo on 2015-01-25 at 21:28:00

The simplest tools are usually the best.

One of the tools I usually need when I go see customers is vmstat. Nothing beats vmstat to give me a real overview of what the server is really doing. This overview gives system metrics, such as CPU usage, and disk usage. That's quite useful to check where the bottleneck comes from.

I wish I had a PostgreSQL tool like that. I wished enough to eventually build it. I call it pgstat because I couldn't find a better name for it.

It's an online command tool that connects to a database and grabs its activity statistics. As PostgreSQL has many statistics, you have a command switch to choose the one you want (-s):

  • archiver for pg_stat_archiver
  • bgwriter for pg_stat_bgwriter
  • connection for connections by type
  • database for pg_stat_database
  • table for pg_stat_all_tables
  • tableio for pg_statio_all_tables
  • index for pg_stat_all_indexes
  • function for pg_stat_user_function
  • statement for pg_stat_statements
  • pbpools for pgBouncer pools statistics
  • pbstats for pgBouncer general statistics

It looks a lot like vmstat. You ask it the statistics you want, and the frequency to gather these statistics. Just like this:

$ pgstat -s connection
 - total - active - lockwaiting - idle in transaction - idle -
    1546       15             0                     0   1531  
    1544       17             0                     0   1527  
    1544       14             0                     0   1530  
    1546       26             0                     0   1520  
    1543       21             0                     0   1522 

Yeah, way too many idle connections. Actually, way too many connections. Definitely needs a pooler there.

This is what happens on a 10-secondes 10-clients pgbench test:

$ pgstat -s database 1
- backends - ------ xacts ------ -------------- blocks -------------- -------------- tuples -------------- ------ temp ------ ------- misc --------
                commit rollback     read    hit read_time write_time      ret    fet    ins    upd    del    files     bytes   conflicts deadlocks

[continue reading]

Posted by Daniel Pocock on 2015-01-24 at 23:07:10

I've just whipped up a Python script that renders Github issue lists from your favourite projects as an iCalendar feed.

The project is called github-icalendar. It uses Python Flask to expose the iCalendar feed over HTTP.

It is really easy to get up and running. All the dependencies are available on a modern Linux distribution, for example:

$ sudo apt-get install python-yaml python-icalendar python-flask python-pygithub

Just create an API token in Github and put it into a configuration file with a list of your repositories like this:

api_token: 6b36b3d7579d06c9f8e88bc6fb33864e4765e5fac4a3c2fd1bc33aad
bind_address: ::0
bind_port: 5000
- repository: your-user-name/your-project
- repository: your-user-name/another-project

Run it from the shell:

$ ./github_icalendar/ github-ics.cfg

and connect to it with your favourite iCalendar client.

Consolidating issue lists from Bugzilla, Github, Debian BTS and other sources

A single iCalendar client can usually support multiple sources and thereby consolidate lists of issues from multiple bug trackers.

This can be much more powerful than combining RSS bug feeds because iCalendar has built-in support for concepts such as priority and deadline. The client can use these to help you identify the most critical issues across all your projects, no matter which bug tracker they use.

Bugzilla bugtrackers already expose iCalendar feeds directly, just look for the iCalendar link at the bottom of any search results page. Here is an example URL from the Mozilla instance of Bugzilla.

The Ultimate Debian Database consolidates information from the Debian and Ubuntu universe and can already export it as an RSS feed, there is discussion about extrapolating that to an iCalendar feed too.

Further possibilities

  • Prioritizing the issues in Github and mapping these priorities to iCalendar priorities
  • Creating tags in Github that allow issues to be ignored/excluded from the feed (e.g. excluding wishlist items)
  • Creating summary entries instead of listing all the issues, e.g.

[continue reading]

One of the foreign data wrappers I included in the PostgreSQL 9.3 Windows FDW bag and PostgreSQL 9.4 Windows FDW bag is the www_fdw extension used for querying web services. Someone asked that since I didn't build curl with SSL support, they are unable to use it with https connections. The main reason I didn't is that the EDB installs come with ssleay32.dll and libeay32.dll (even the 64-bit) which are dependencies of curl when built with SSL support. I wanted to minimize the issue of distributing dlls that are packaged with Windows PostgreSQL installers already.

Though this article is specific to using www_fdw on Windows systems, many of the issues are equally applicable to other platforms, so may be worth a read if you are running into similar issues with using specialty SSL certificates on Linux/Unix/Mac.

Continue reading "Using SSL https connections with www_fdw on windows"
Posted by Amit Kapila on 2015-01-24 at 10:36:00

In PostgreSQL 9.5, we will see a boost in scalability for read workload
when the data can fit in RAM.  I have ran a pgbench read-only load to
compare the performance difference between 9.4 and HEAD (62f5e447)
on IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM
and here is the performance data

The data is mainly taken for 2 kind of workloads, when all the data fits
in shared buffers (scale_factor = 300) and when all the data can't fit in
shared buffers, but can fit in RAM (scale_factor = 1000).

First lets talk about 300 scale factor case, in 9.4 it peaks at 32 clients,
now it peaks at 64 clients and we can see the performance improvement
upto (~98%) and it is better in all cases at higher client count starting from
32 clients.  Now the main work which lead to this improvement is
commit - ab5194e6 (Improve LWLock scalability).  The previous implementation
has a bottleneck around spin locks that were acquired for  LWLock
Acquisition and Release and the implantation for 9.5 has changed the
LWLock implementation to use atomic operations to manipulate the state.
Thanks to Andres Freund (and according to me the credit goes to reviewers
(Robert Haas and myself) as well who have reviewed multiple versions
of this patch) author of this patch due to whom many PostgreSQL users will
be happy.

Now lets discuss about 1000 scale factor case,  in this case, we could

see the good performance improvement (~25%) even at 32 clients and it

went upto (~96%) at higher client count, in this case also where in 9.4
it was peaking at 32 client count, now it peaks at 64 client count and
the performance is better at all higher client counts.  The main work
which lead to this improvement is commit id 5d7962c6 (Change locking
regimen around buffer replacement) and  commit id  3acc10c9 (Increase
the number of buffer mapping partitions to 128).  In this case there were
mainly 2 bottlenecks (a) a BufFreeList LWLock was getting acquired to
find a free buffer for a page (to find free buffer, it needs to ex

[continue reading]

On Monday, Robert Haas committed a patch of mine that considerably speeds up the sorting of text in PostgreSQL. This was the last and the largest in a series of such patches, the patch that adds "abbreviated keys". PostreSQL 9.5 will have big improvements in sort performance.

In realistic cases, CREATE INDEX operations on text are over 3 times faster than in PostgreSQL 9.4. Not every such utility operation, or data warehousing query involving a big sort is sped up by that much, but many will be.

This was a piece of work that I spent a considerable amount of time on over the past few months. It's easy to justify that effort, though: sorting text is a very fundamental capability of any database system. Sorting is likely the dominant cost when creating B-Tree indexes, performing CLUSTER operations, and, most obviously, for sort nodes that are required by many plans that are executed in the service of queries with ORDER BY or DISTINCT clauses, or aggregates using the GroupAggregate strategy. Most of the utility statements that need to perform sorts must perform them with a very disruptive lock on the target relation (CREATE INDEX CONCURRENTLY is a notable exception), so quite apart from the expense of the sort, the duration of sorts often strongly influences how long a production system is seriously disrupted.

My interest in sorting is not new: I first worked on it in 2011. Early research on it back then prompted Robert Haas and Tom Lane to write the SortSupport infrastructure, which I've now extended here. Originally, the SortSupport infrastructure was all about providing alternative versions of comparators for use in sort routines, versions that avoided certain overhead otherwise inherent to calling functions that are generally accessible from SQL. As a highly extensible system, PostgreSQL requires that sort behavior be defined in terms of a default B-Tree operator class, which is itself defined in terms of SQL operators with underlying SQL-callable functions. These functions are written in C for built-in type

[continue reading]

Posted by robert berry on 2015-01-24 at 00:00:00

Regular Expression Stop Words for Postgresql

Jan 24, 2015 – Portland

While working on a NLP project with OCR-derived text, I discovered a need that wasn’t immediately met by the built in parsers and dictionaries which ship with Postgresql. Fortunately text search in Postgresql is fairly extensible.

This post looks at the process of implementing a custom dictionary which can be used to create stop words which are really stop regular expressions.

Background on Parsing, Dictionaries, and Stop Words

The first step in using most text search features is converting unstructured text into text search vectors. A text search vector is a set of (lexeme, position) pairs.

=# select to_tsvector('Time is an illusion.  Lunchtime doubly so.');
 'doubli':6 'illus':4 'lunchtim':5 'time':1
(1 row)

As you might expect, the internal representation of a ts_vector is simply information about where words are located inside a big string.

typedef struct
  int32   vl_len_;    /* varlena header (do not touch directly!) */
  int32   size; 
  WordEntry entries[1];   /* variable length */
  /* lexemes follow the entries[] array */
} TSVectorData;

The transformation from text to ts_vector involves parsing text into tokens, then filtering the tokens through a dictionary which may change or eliminate words.

A text search configuration can be used to map token categories to dictionaries. The official documentation contains additional details.

The Problem Statement

The text corpus is derived from an OCR process which results in some unhelpful tokens.

-43 44 40
Lunchtime is 46 #@$dfsdf an illusion.Q!~ Lunchtime ksdfkjsadjfksdjf so.

So how to get pleasant ts_vectors?

One solution would be adding a token type ocr gibberish to the parser and removing mappings from ocr_gibberish to any dictionary. While you can write your own parser, the default is pretty good, and it does not appear to be easily extended.

Another approach would be to use a dictionary as a white

[continue reading]

Posted by Binod Nirvan on 2015-01-23 at 21:36:32

MixERP PgDoc is a console-based application that creates beautiful PostgreSQL database documentation. The first release supported Windows and with the second release today, we now support OSX and Linux (tested on OSX Yosemite and Ubuntu 14.4). Please make sure you have Mono installed first before you download PgDoc here:

Running on OSX and Linux

Extract the downloaded archive Open terminal and type

mono /path/to/mixerp-pgdoc.exe <arguments>

Or simply

mono /users/nirvan/desktop/pg-doc/mixerp-pgdoc.exe -s=localhost -d=mixerp -u=postgres -p=secret -o=/users/nirvan/desktop/db-doc

I have created a category in MixERP Forums for further discussion.

PostgreSQL 9.4 introduces a new statistic in the catalogue, called pg_stat_archiver.
Thanks to the SQL language it is now possible, in an instant, to check the state of the archiving process of transactional logs (WALs), crucial component of a PostgreSQL disaster recovery system.

PostgreSQL and Barman

Introduction and reasons

The need for the pg_stat_archiver view comes from the last few years of experience with Barman as a solution for disaster recovery of PostgreSQL databases in business continuity environments.

In particular, some of the needs and recurring questions that DBAs, system administrators, CTOs and CIOs repeatedly – and legimitately – share with us, are:

  • how much disk space will I need?
  • how can I keep the whole backup process, including continuous archiving, under control?

The starting point can only be a business requirement and it is defined through the concept of retention policy.
Usually, a company defines a disaster recovery plan within a business continuity plan, where it is clearly defined the period of retention of backup data. In the same documents we find both the recovery point objective (RPO) and the recovery time objective (RTO) definitions, the two key metrics that respectively measure the amount of data that a business can afford to lose and the maximum allowed time to recover from a disaster.
Monitoring and studying the past behaviour of a database are important elements for correctly sizing the storage requirements of a PostgreSQL backup solution.

For example, a company may decide to retain data of a PostgreSQL database for a month, in order to reconstruct the state of the database in a consistent manner at any point in time from the first available backup to the last available WAL file (through the rock solid Point-in-Time-Recovery technology of PostgreSQL, introduced 10 years ago).

The required size is given not only by the number of periodic full backups (for example, one a week), but also the number of WAL files stored in the Barman archive, each containing all the transactions that have been pro

[continue reading]

In the spirit of open-source, we would like to share how we handled the talk selection process for PGConf US 2015. This post will discuss the entire process of how a talk ends up in one of our United States PostgreSQL Association conferences; our goal is to help you understand what our conference is looking for in talk proposals and help you decide what you submit for PGConf US 2016!

read more

One of the changes coming to you in PostGIS 2.2 are additional extensions. Two ones close to my heart are the address_standardizer (which was a separate project before, but folded into PostGIS in upcoming 2.2) and the SFCGAL extension for doing very advanced 3D stuff. We had a need to have address standardizer running on our Ubuntu box, but since PostGIS 2.2 isn't released yet, you can't get it without some compiling. Luckily the steps are fairly trivial if you are already running PostGIS 2.1. In this article, I'll walk thru just building an installing the address_standardizer extension from the PostGIS 2.2 code base. Though I'm doing this on Ubuntu, the instructions are pretty much the same on any Linux, just replacing with your Linux package manager.

Continue reading "Installing PostGIS packaged address_standardizer on Ubuntu"
Posted by gabrielle roth on 2015-01-23 at 01:53:00

When: Thu Jan 29, 6pm-8pm
Where: Renewable Funding, 400 SW 6th Ave

We’ll go over Foreign Data Wrappers, focusing on the Postgres FDW.

Our labs are casual, attendee-led affairs with no agenda set in advance. The goal is to share what we know & learn together.

Show up with a couple of Pg databases – they can be on the same cluster or different ones, doesn’t really matter. One of them should have at least one table with data in it.

We do need to know who’s coming so we can notify building security, so please sign up in advance here.

Grab dinner at the carts & come on over!

Posted by damien clochard in Dalibo on 2015-01-22 at 20:52:14

We’ve just released a new version of PoWA, the PostgreSQL Workload Analyzer. Check out complete release note here.

This new version adds an highly requested feature : you can now collect stats from multiple PostgreSQL servers using a single PoWA instance. This should be usefull if you have many servers to monitor. We’ve also improved the UI in many ways : the graph should be easier to read on full screen and the install process is more convienient

This version is probably the major release of the 1.x branch. We’re currently working on the branch that will change almost everything : PoWA 2.x will be only compatible with PostgreSQL 9.4 because we want to use the new stats and features of latest version. This should allow PoWA 2.x to some nice things like finding missing indexes or displaying advanced filesysteme stats. The 2.x version will also have a brand new user interface and we hope this will be similar to what we did when we rewrote completely the pgBadger interface 2 years ago…

Here’s a some preview:

PoWA 2.x screenshot

PoWA 2.x screenshot

PoWA 2.x screenshot

Of course we will continue to maintain PoWA 1.x for the users who installed it on PostgreSQL 9.3 servers

If you go to FOSDEM 2015 next week, come see us ! We will be presenting these new features and more !

If you can’t go to FOSDEM 2015, subscribe to the powa-users mailing list to receive news and updates.

Postgres provides much clear ERROR reporting messages compare to other databases/data-stores I have worked or been working at my $DAYJOB. However, someone reported following error on their Secondary Postgres replicated database server.

2015-01-22 15:20:49.247 GMT 56914 LOG: unexpected pageaddr 142D/73000000 in log file 5166, segment 75, offset 0
2015-01-22 15:20:49.413 GMT 42933 FATAL: could not connect to the primary server: ERROR: Unsupported startup parameter: replication

On the first site, it looks scary but it’s actually not !

I googled a bit but I couldn’t able to find quick answer. So, I investigated further to understand what does reported error mean?  Fortunately, the replication was up-to-date because most of our clients have been recommended or setup to use hybrid replication (WAL shipping using OmniPITR + Streaming Replication).

While discussing further with the team, it turned out they introduced pgbouncer, a connection pooling solution, in to their architecture to better manage connections on the db server. The pgbouncer is set up to run on port 5432 (default Postgres port) on the master database server and make Postgres to listen on 5433. This change wasn’t reflected on secondary replicated database server recovery.conf file and it was still pointing to port 5432. So, we found the cause of the error & the mystery is solved !!

recovery.conf file has been modified to use port 5433 and restarted secondary database to load the config changes. Yay !! Secondary database is connected to primary db server using streaming replication.

2015-01-22 15:41:09.357 GMT 49294 LOG: database system is ready to accept read only connections
2015-01-22 15:41:09.512 GMT 49296 LOG: restored log file "000000030000142E0000006A" from archive
2015-01-22 15:41:10.016 GMT 49296 LOG: unexpected pageaddr 142D/8E000000 in log file 5166, segment 107, offset 0
2015-01-22 15:41:10.182 GMT 49408 LOG: streaming replication successfully connected to primary

I wanted to share my experience that can be useful if you see this ERROR  in you

[continue reading]

I'm excited to have my talk "Choosing a Logical Replication System" accepted to PGConf.US! I'll be speaking on Friday, March 27th from 2:00 - 2:50, as part of the Strategy track.

In this talk I will cover a variety of existing Logical Replication systems for PostgreSQL and go over some of the differences between requirements, supported capabilities, and why you might choose one system over another. I'll also cover some of the changes in PostgreSQL 9.4.

Read about the talk here.

Posted by Dimitri Fontaine in 2ndQuadrant on 2015-01-22 at 00:48:00

Thanks to the Postgres Weekly issue #89 and a post to Hacker News front page (see Pgloader: A High-speed PostgreSQL Swiss Army Knife, Written in Lisp it well seems that I just had my first Slashdot effect...

Well actually you know what? I don't...

So please consider using the new mirror and maybe voting on Hacker News for either tooling around your favorite database system, PostgreSQL or your favorite programming language, Common Lisp...

It all happens at

Coming to FOSDEM?

If you want to know more about pgloader and are visiting FOSDEM PGDAY or plain FOSDEM I'll be there talking about Migrating to PostgreSQL, the new story (that's pgloader) and about some more reasons why You'd better have tested backups...

If you're not there on the Friday but still want to talk about pgloader, join us at the PostgreSQL devroom and booth!

On 15/01/2015 Pavel Stehule wrote about implementing a dead simple history table using the hstore type. On Friday evening I wanted to copy this almost line for line switching the hstore type for jsonb , but I counldn’t really see how to replicate the update part so simply without creating a delete operator. Once that […]

President Obama recently signed Executive Order: #13685 [1] , in short this Order states:

(a) The following are prohibited:
(i) new investment in the Crimea region of Ukraine by a United States person, wherever located;
(ii) the importation into the United States, directly or indirectly, of any goods, services, or technology from the Crimea region of Ukraine;
(iii) the exportation, reexportation, sale, or supply, directly or indirectly, from the United States, or by a United States person, wherever located, of any goods, services, or technology to the Crimea region of Ukraine;

read more

Posted by Binod Nirvan on 2015-01-19 at 13:45:34

We chose PostgreSQL database for our ERP software MixERP, which is going Beta 1 release very soon. Unlike other ERP solutions which support PostgreSQL, we do not use PostgreSQL just to merely only store the tables. In fact, we try and take the full advantage of PostgreSQL database power and capabilities. Just to remind you, MixERP is an open source ERP solution, which has been undergoing development since a year. Despite of being a very small team, we have been getting a substantial number of requests for our database design.

Since we needed a good documentation generator, I investigated the existing tools which could do that. I was not so pleased with the existing tools related to documentation generation. I thought I would give it a try myself.

For the task, I used and Npgsql, and of course PostgreSQL, as well. During the last few days, I was in PostgreSQL documentation site all the time:

The documentation site provided metadata information profoundly well. There was nothing that was not already there, except for the fact that I could not find a find a way to ask PostgreSQL to give me the definitions of Composite Types. Interestingly, pgAdmin3 could tell me exactly how a Type was defined. Upon investigating what pgAdmin was doing under the hood, I was able to create a function to extract that information. :)

Coming back to my project MixERPPgDoc, it is a small console application exposing a few arguments, and packs all necessary dependencies inside itself. Since it is a self-contained executable file, one should be able to quickly use it and play around with. To give you an idea, MixERPPgDoc creates HTML documentation on the fly. I used a new and sexy CSS component library called SemanticUI for design, and for code highlighting task, prism.js.


mixerp-pgdoc.exe -s=[server] -d=[database] -u=[pg_user] -p=[pwd] -o=[output_dir]


mixerp-pgdoc.exe -s=localhost -d=mixerp -u=postgres -p=secret -o="c:\mixerp-doc"

Example Documentation Site (

[continue reading]

Many improvements are being done in the ODBC driver for PostgreSQL these days, one being for example the addition of more and more integrated regression tests insuring the code quality. One feature particularly interested has been committed these days and consists of the following commit:

commit: e85fbb24249ae81975b1b2e46da32479de0b58d6
author: Heikki Linnakangas <>
date: Wed, 31 Dec 2014 14:49:20 +0200
Use libpq for everything.

Instead of speaking the frontend/backend protocol directly, use libpq's
functions for executing queries. This makes it libpq a hard dependency, but
removes direct dependencies to SSL and SSPI, and a lot of related code.

This feature can be defined in one single word: simplification. Before discussing about it, see for example the cleanup that this has done in the driver code in terms of numbers:

$ git log -n1 e85fbb2 --format=format: --shortstat
53 files changed, 1720 insertions(+), 8173 deletions(-)

Note the total size of the source code after this commit, which has been reduced by a bit more than 10% in total, which is huge!

# All code
$ git ls-files "*.[c|h]" | xargs wc -l | tail -n1
55998 total
# Regression tests
$ cd test/ && git ls-files "*.[c|h]" | xargs wc -l | tail -n1
5910 total

Now, let's consider the advantages that this new feature has.

First of all, libpq is an in-core library of PostgreSQL managing communication with the backend server that is well-maintained by the core developers of Postgres. Before doing the all-libpq move in Postgres ODBC it was a soft dependency: the driver being usable as well through SSPI, SSL or even nothing thanks to the additional code it carried for managing directly the backend/frontend communication protocol, while with libpq there are APIs directly usable for this purpose. So a large portion of the simplification is related to that (and also to some code used to manage communication socket and SSPI).

Hence, this move is an excellent thing particularly for the Windows installer of Postgres ODBC because unti

[continue reading]

PostgreSQL DO statement doesn't support parametrization. But with psql variables we are able to "inject" do statement safely and we can do it:
bash-4.1$ cat 

echo "
set myvars.msgcount TO :'msgcount';
DO \$\$
FOR i IN 1..current_setting('myvars.msgcount')::int LOOP
END \$\$" | psql postgres -v msgcount=$1

bash-4.1$ ./ 3
Time: 0.386 ms
Time: 1.849 ms
PostgreSQL 9.4 intorduced the jsonb type, but it’d be nice to be able to delete keys and pairs using the “-” operator; just like you can with the hstore type. Fortunately postgres makes creating an operator really easy for us, so lets have a go at creating a delete operator for jsonb. First lets try […]
Posted by Dimitri Fontaine in 2ndQuadrant on 2015-01-16 at 08:35:00

PostgreSQL comes with an awesome bulk copy protocol and tooling best known as the COPY and \copy commands. Being a transactional system, PostgreSQL COPY implementation will ROLLBACK any work done if a single error is found in the data set you're importing. That's the reason why pgloader got started: it provides with error handling for the COPY protocol.

That's basically what pgloader used to be all about

As soon as we have the capability to load data from unreliable sources, another use case appears on the horizon, and soon enough pgloader grew the capacity to load data from other databases, some having a more liberal notion of what is sane data type input.

To be able to adapt to advanced use cases in database data migration support, pgloader has grown an advanced command language wherein you can define your own load-time data projection and transformations, and your own type casting rules too.

New in version 3.2 is that in simple cases, you don't need that command file any more. Check out the pgloader quick start page to see some examples where you can use pgloader all from your command line!

Here's one such example, migrating a whole MySQL database data set over to PostgreSQL, including automated schema discovery, automated type casting and on-the-fly data cleanup (think about zero dates or booleans in tinyint(1) disguise), support for indexes, primary keys, foreign keys and comments. It's as simple as:

$ createdb sakila
$ pgloader mysql://root@localhost/sakila pgsql:///sakila
2015-01-16T09:49:36.068000+01:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2015-01-16T09:49:36.074000+01:00 LOG Data errors in '/private/tmp/pgloader/'
                    table name       read   imported     errors            time
------------------------------  ---------  ---------  ---------  --------------
               fetch meta data         43         43          0          0.222s
                  create, drop          0         36          0          0.130s
------------------------------  ---------  ---------  ---

[continue reading]

Posted by David Kerr on 2015-01-15 at 21:42:00
Two simple gems to help tune your Postgres databases.

I'm a big fan of PgTune. I think that in many cases you can run PgTune and set-it-and-forget-it for your Postgres parameters. I like it so much that I often wish I had access to it in my code - especially when working with Puppet to provision new databases servers.

When I started looking into RDS Postgres a while back I realized that the default configuration for those instances was lacking and I really wished I could run PgTune on the RDS instances.

It was to solve those problems above that these two projects formed.

  • RDSTune will create a MrTuner-ized RDS Parameter Group 
  • MrTuner is a Ruby gem that follows in the sprit of PgTune if not directly in it's footsteps.

Both will run from the command line but, more importantly, they can be `required` by your ruby projects to allow you to access these values programmatically. 

Both Gems are available on rubygems and source, examples, configuration and docks available at their respective bitbucket pages.

RDSTune -
MrTuner -

Feedback and Pull requests very welcome!
Postgres has this nice extension (hstore) lot of years. It can be used for simulation some features of doc databases - or can be used for implementation of generic triggers for history table:

I have a table test and table history:

CREATE TABLE test(a int, b int, c int);

event_time timestamp(2),
executed_by text,
origin_value hstore,
new_value hstore

INSERT INTO history(event_time, executed_by, new_value)
$$ LANGUAGE plpgsql;

INSERT INTO history(event_time, executed_by, origin_value)
$$ LANGUAGE plpgsql;

hs_new hstore := hstore(NEW);
hs_old hstore := hstore(OLD);
INSERT INTO history(event_time, executed_by, origin_value, new_value)
VALUES(CURRENT_TIMESTAMP, SESSION_USER, hs_old - hs_new, hs_new - hs_old);
$$ LANGUAGE plpgsql;

CREATE TRIGGER test_history_insert AFTER INSERT ON test

CREATE TRIGGER test_history_delete AFTER DELETE ON test

CREATE TRIGGER test_history_update AFTER UPDATE ON test
INSERT INTO test VALUES(1000, 1001, 1002);
UPDATE test SET a = 10, c = 20;

postgres=# SELECT * FROM history ;
event_time | executed_by | origin_value | new_value
2015-01-15 20:59:05.52 | pavel | | "a"=>"1000", "b"=>"1001", "c"=>"1002"
2015-01-15 20:59:05.6 | pavel | "a"=>"1000", "c"=>"1002" | "a"=>"10", "c"=>"20"

[continue reading]

Postgres provides various parameters in postgresql.conf  for performance tuning related to help better use database server resources. At OmniTI, One of our client’s application was reported slowness during high intensive but very critical daily operations. The Postgres DB server is servicing mail generation application with 3-4K TPS and containing more than 400K partitioned tables.  The systems, including DB servers, are getting monitored through Circonus, so it was easier to review graph patterns to find out anything changed recently.

While investigating the graph patterns, we came across significantly high IO increase  (~10-11K per Sec) on Circonus IOSTAT graph around end of Nov. This pattern change matches with the Software upgrade. The software was upgraded by end of November !! As you can see below in the graph, the IO increased up to 10-11K per seconds after upgrading software since last week of Nov,2014.


It was easier to dig further because daily pgbadger log analysis reports were ready for review! Since Software upgrade, the most accessible stored function was generating ~8TB of of temp files per day!! Digging further into report, the average size of generated temp files were around ~100MB size.  As you might be aware, hash tables and sort operations should happen in memory , which depends on work_mem setting,  if the size of the operation is larger than work_mem , then it will end up happening on disks. The work_mem was set to 50MB. It’s clear that the last software upgrade introduced a change in function and underlying query. This change was the root cause of spike in disk IO activities observed in the IOSTAT graph.

The first attempt to tune the query so it avoids disk sorting but there are two challenges to tune the query:

(1) It wasn’t easy to change the application code because it has to go through application change process, which might take 1-2 months.

(2) the query is accessing pg catalog tables to gather details. The pg catalog tables are pretty large because of large number of partitioned tabl

[continue reading]