PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Guillaume LELARGE in Dalibo on 2015-01-25 at 21:28:00

The simplest tools are usually the best.

One of the tools I usually need when I go see customers is vmstat. Nothing beats vmstat to give me a real overview of what the server is really doing. This overview gives system metrics, such as CPU usage, and disk usage. That's quite useful to check where the bottleneck comes from.

I wish I had a PostgreSQL tool like that. I wished enough to eventually build it. I call it pgstat because I couldn't find a better name for it.

It's an online command tool that connects to a database and grabs its activity statistics. As PostgreSQL has many statistics, you have a command switch to choose the one you want (-s):

  • archiver for pg_stat_archiver
  • bgwriter for pg_stat_bgwriter
  • connection for connections by type
  • database for pg_stat_database
  • table for pg_stat_all_tables
  • tableio for pg_statio_all_tables
  • index for pg_stat_all_indexes
  • function for pg_stat_user_function
  • statement for pg_stat_statements
  • pbpools for pgBouncer pools statistics
  • pbstats for pgBouncer general statistics

It looks a lot like vmstat. You ask it the statistics you want, and the frequency to gather these statistics. Just like this:

$ pgstat -s connection
 - total - active - lockwaiting - idle in transaction - idle -
    1546       15             0                     0   1531  
    1544       17             0                     0   1527  
    1544       14             0                     0   1530  
    1546       26             0                     0   1520  
    1543       21             0                     0   1522 

Yeah, way too many idle connections. Actually, way too many connections. Definitely needs a pooler there.

This is what happens on a 10-secondes 10-clients pgbench test:

$ pgstat -s database 1
- backends - ------ xacts ------ -------------- blocks -------------- -------------- tuples -------------- ------ temp ------ ------- misc --------
                commit rollback     read    hit read_time write_time      ret    fet    ins    upd    del    files     bytes   conflicts deadlocks
         1

[continue reading]

Posted by Daniel Pocock on 2015-01-24 at 23:07:10

I've just whipped up a Python script that renders Github issue lists from your favourite projects as an iCalendar feed.

The project is called github-icalendar. It uses Python Flask to expose the iCalendar feed over HTTP.

It is really easy to get up and running. All the dependencies are available on a modern Linux distribution, for example:

$ sudo apt-get install python-yaml python-icalendar python-flask python-pygithub

Just create an API token in Github and put it into a configuration file with a list of your repositories like this:

api_token: 6b36b3d7579d06c9f8e88bc6fb33864e4765e5fac4a3c2fd1bc33aad
bind_address: ::0
bind_port: 5000
repositories:
- repository: your-user-name/your-project
- repository: your-user-name/another-project

Run it from the shell:

$ ./github_icalendar/main.py github-ics.cfg

and connect to it with your favourite iCalendar client.

Consolidating issue lists from Bugzilla, Github, Debian BTS and other sources

A single iCalendar client can usually support multiple sources and thereby consolidate lists of issues from multiple bug trackers.

This can be much more powerful than combining RSS bug feeds because iCalendar has built-in support for concepts such as priority and deadline. The client can use these to help you identify the most critical issues across all your projects, no matter which bug tracker they use.

Bugzilla bugtrackers already expose iCalendar feeds directly, just look for the iCalendar link at the bottom of any search results page. Here is an example URL from the Mozilla instance of Bugzilla.

The Ultimate Debian Database consolidates information from the Debian and Ubuntu universe and can already export it as an RSS feed, there is discussion about extrapolating that to an iCalendar feed too.

Further possibilities

  • Prioritizing the issues in Github and mapping these priorities to iCalendar priorities
  • Creating tags in Github that allow issues to be ignored/excluded from the feed (e.g. excluding wishlist items)
  • Creating summary entries instead of listing all the issues, e.g.

[continue reading]

One of the foreign data wrappers I included in the PostgreSQL 9.3 Windows FDW bag and PostgreSQL 9.4 Windows FDW bag is the www_fdw extension used for querying web services. Someone asked that since I didn't build curl with SSL support, they are unable to use it with https connections. The main reason I didn't is that the EDB installs come with ssleay32.dll and libeay32.dll (even the 64-bit) which are dependencies of curl when built with SSL support. I wanted to minimize the issue of distributing dlls that are packaged with Windows PostgreSQL installers already.

Though this article is specific to using www_fdw on Windows systems, many of the issues are equally applicable to other platforms, so may be worth a read if you are running into similar issues with using specialty SSL certificates on Linux/Unix/Mac.


Continue reading "Using SSL https connections with www_fdw on windows"
Posted by Amit Kapila on 2015-01-24 at 10:36:00

In PostgreSQL 9.5, we will see a boost in scalability for read workload
when the data can fit in RAM.  I have ran a pgbench read-only load to
compare the performance difference between 9.4 and HEAD (62f5e447)
on IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM
and here is the performance data


















The data is mainly taken for 2 kind of workloads, when all the data fits
in shared buffers (scale_factor = 300) and when all the data can't fit in
shared buffers, but can fit in RAM (scale_factor = 1000).

First lets talk about 300 scale factor case, in 9.4 it peaks at 32 clients,
now it peaks at 64 clients and we can see the performance improvement
upto (~98%) and it is better in all cases at higher client count starting from
32 clients.  Now the main work which lead to this improvement is
commit - ab5194e6 (Improve LWLock scalability).  The previous implementation
has a bottleneck around spin locks that were acquired for  LWLock
Acquisition and Release and the implantation for 9.5 has changed the
LWLock implementation to use atomic operations to manipulate the state.
Thanks to Andres Freund (and according to me the credit goes to reviewers
(Robert Haas and myself) as well who have reviewed multiple versions
of this patch) author of this patch due to whom many PostgreSQL users will
be happy.

Now lets discuss about 1000 scale factor case,  in this case, we could

see the good performance improvement (~25%) even at 32 clients and it

went upto (~96%) at higher client count, in this case also where in 9.4
it was peaking at 32 client count, now it peaks at 64 client count and
the performance is better at all higher client counts.  The main work
which lead to this improvement is commit id 5d7962c6 (Change locking
regimen around buffer replacement) and  commit id  3acc10c9 (Increase
the number of buffer mapping partitions to 128).  In this case there were
mainly 2 bottlenecks (a) a BufFreeList LWLock was getting acquired to
find a free buffer for a page (to find free buffer, it needs to ex

[continue reading]

On Monday, Robert Haas committed a patch of mine that considerably speeds up the sorting of text in PostgreSQL. This was the last and the largest in a series of such patches, the patch that adds "abbreviated keys". PostreSQL 9.5 will have big improvements in sort performance.

In realistic cases, CREATE INDEX operations on text are over 3 times faster than in PostgreSQL 9.4. Not every such utility operation, or data warehousing query involving a big sort is sped up by that much, but many will be.

This was a piece of work that I spent a considerable amount of time on over the past few months. It's easy to justify that effort, though: sorting text is a very fundamental capability of any database system. Sorting is likely the dominant cost when creating B-Tree indexes, performing CLUSTER operations, and, most obviously, for sort nodes that are required by many plans that are executed in the service of queries with ORDER BY or DISTINCT clauses, or aggregates using the GroupAggregate strategy. Most of the utility statements that need to perform sorts must perform them with a very disruptive lock on the target relation (CREATE INDEX CONCURRENTLY is a notable exception), so quite apart from the expense of the sort, the duration of sorts often strongly influences how long a production system is seriously disrupted.


My interest in sorting is not new: I first worked on it in 2011. Early research on it back then prompted Robert Haas and Tom Lane to write the SortSupport infrastructure, which I've now extended here. Originally, the SortSupport infrastructure was all about providing alternative versions of comparators for use in sort routines, versions that avoided certain overhead otherwise inherent to calling functions that are generally accessible from SQL. As a highly extensible system, PostgreSQL requires that sort behavior be defined in terms of a default B-Tree operator class, which is itself defined in terms of SQL operators with underlying SQL-callable functions. These functions are written in C for built-in type

[continue reading]

Posted by robert berry on 2015-01-24 at 00:00:00

Regular Expression Stop Words for Postgresql

Jan 24, 2015 – Portland

While working on a NLP project with OCR-derived text, I discovered a need that wasn’t immediately met by the built in parsers and dictionaries which ship with Postgresql. Fortunately text search in Postgresql is fairly extensible.

This post looks at the process of implementing a custom dictionary which can be used to create stop words which are really stop regular expressions.

Background on Parsing, Dictionaries, and Stop Words

The first step in using most text search features is converting unstructured text into text search vectors. A text search vector is a set of (lexeme, position) pairs.

=# select to_tsvector('Time is an illusion.  Lunchtime doubly so.');
                to_tsvector
--------------------------------------------
 'doubli':6 'illus':4 'lunchtim':5 'time':1
(1 row)

As you might expect, the internal representation of a ts_vector is simply information about where words are located inside a big string.

typedef struct
{
  int32   vl_len_;    /* varlena header (do not touch directly!) */
  int32   size; 
  WordEntry entries[1];   /* variable length */
  /* lexemes follow the entries[] array */
} TSVectorData;

The transformation from text to ts_vector involves parsing text into tokens, then filtering the tokens through a dictionary which may change or eliminate words.

A text search configuration can be used to map token categories to dictionaries. The official documentation contains additional details.

The Problem Statement

The text corpus is derived from an OCR process which results in some unhelpful tokens.

a-433
-43 44 40
Lunchtime is 46 #@$dfsdf an illusion.Q!~ Lunchtime ksdfkjsadjfksdjf so.
431,-

So how to get pleasant ts_vectors?

One solution would be adding a token type ocr gibberish to the parser and removing mappings from ocr_gibberish to any dictionary. While you can write your own parser, the default is pretty good, and it does not appear to be easily extended.

Another approach would be to use a dictionary as a white

[continue reading]

Posted by Binod Nirvan on 2015-01-23 at 21:36:32

MixERP PgDoc is a console-based application that creates beautiful PostgreSQL database documentation. The first release supported Windows and with the second release today, we now support OSX and Linux (tested on OSX Yosemite and Ubuntu 14.4). Please make sure you have Mono installed first before you download PgDoc here:

http://mixerp.org/erp/mixerp-pgdoc-mono.zip

Running on OSX and Linux

Extract the downloaded archive mixerp-pgdoc-mono.zip. Open terminal and type

mono /path/to/mixerp-pgdoc.exe <arguments>

Or simply

mono /users/nirvan/desktop/pg-doc/mixerp-pgdoc.exe -s=localhost -d=mixerp -u=postgres -p=secret -o=/users/nirvan/desktop/db-doc

I have created a category in MixERP Forums for further discussion.

PostgreSQL 9.4 introduces a new statistic in the catalogue, called pg_stat_archiver.
Thanks to the SQL language it is now possible, in an instant, to check the state of the archiving process of transactional logs (WALs), crucial component of a PostgreSQL disaster recovery system.

PostgreSQL and Barman

Introduction and reasons

The need for the pg_stat_archiver view comes from the last few years of experience with Barman as a solution for disaster recovery of PostgreSQL databases in business continuity environments.

In particular, some of the needs and recurring questions that DBAs, system administrators, CTOs and CIOs repeatedly – and legimitately – share with us, are:

  • how much disk space will I need?
  • how can I keep the whole backup process, including continuous archiving, under control?

The starting point can only be a business requirement and it is defined through the concept of retention policy.
Usually, a company defines a disaster recovery plan within a business continuity plan, where it is clearly defined the period of retention of backup data. In the same documents we find both the recovery point objective (RPO) and the recovery time objective (RTO) definitions, the two key metrics that respectively measure the amount of data that a business can afford to lose and the maximum allowed time to recover from a disaster.
Monitoring and studying the past behaviour of a database are important elements for correctly sizing the storage requirements of a PostgreSQL backup solution.

For example, a company may decide to retain data of a PostgreSQL database for a month, in order to reconstruct the state of the database in a consistent manner at any point in time from the first available backup to the last available WAL file (through the rock solid Point-in-Time-Recovery technology of PostgreSQL, introduced 10 years ago).

The required size is given not only by the number of periodic full backups (for example, one a week), but also the number of WAL files stored in the Barman archive, each containing all the transactions that have been pro

[continue reading]

In the spirit of open-source, we would like to share how we handled the talk selection process for PGConf US 2015. This post will discuss the entire process of how a talk ends up in one of our United States PostgreSQL Association conferences; our goal is to help you understand what our conference is looking for in talk proposals and help you decide what you submit for PGConf US 2016!

read more

One of the changes coming to you in PostGIS 2.2 are additional extensions. Two ones close to my heart are the address_standardizer (which was a separate project before, but folded into PostGIS in upcoming 2.2) and the SFCGAL extension for doing very advanced 3D stuff. We had a need to have address standardizer running on our Ubuntu box, but since PostGIS 2.2 isn't released yet, you can't get it without some compiling. Luckily the steps are fairly trivial if you are already running PostGIS 2.1. In this article, I'll walk thru just building an installing the address_standardizer extension from the PostGIS 2.2 code base. Though I'm doing this on Ubuntu, the instructions are pretty much the same on any Linux, just replacing with your Linux package manager.


Continue reading "Installing PostGIS packaged address_standardizer on Ubuntu"
Posted by gabrielle roth on 2015-01-23 at 01:53:00

When: Thu Jan 29, 6pm-8pm
Where: Renewable Funding, 400 SW 6th Ave

We’ll go over Foreign Data Wrappers, focusing on the Postgres FDW.

Our labs are casual, attendee-led affairs with no agenda set in advance. The goal is to share what we know & learn together.

Show up with a couple of Pg databases – they can be on the same cluster or different ones, doesn’t really matter. One of them should have at least one table with data in it.

We do need to know who’s coming so we can notify building security, so please sign up in advance here.

Grab dinner at the carts & come on over!


Posted by damien clochard in Dalibo on 2015-01-22 at 20:52:14

We’ve just released a new version of PoWA, the PostgreSQL Workload Analyzer. Check out complete release note here.

This new version adds an highly requested feature : you can now collect stats from multiple PostgreSQL servers using a single PoWA instance. This should be usefull if you have many servers to monitor. We’ve also improved the UI in many ways : the graph should be easier to read on full screen and the install process is more convienient

This version is probably the major release of the 1.x branch. We’re currently working on the branch that will change almost everything : PoWA 2.x will be only compatible with PostgreSQL 9.4 because we want to use the new stats and features of latest version. This should allow PoWA 2.x to some nice things like finding missing indexes or displaying advanced filesysteme stats. The 2.x version will also have a brand new user interface and we hope this will be similar to what we did when we rewrote completely the pgBadger interface 2 years ago…

Here’s a some preview:

PoWA 2.x screenshot

PoWA 2.x screenshot

PoWA 2.x screenshot

Of course we will continue to maintain PoWA 1.x for the users who installed it on PostgreSQL 9.3 servers

If you go to FOSDEM 2015 next week, come see us ! We will be presenting these new features and more !

If you can’t go to FOSDEM 2015, subscribe to the powa-users mailing list to receive news and updates.

Postgres provides much clear ERROR reporting messages compare to other databases/data-stores I have worked or been working at my $DAYJOB. However, someone reported following error on their Secondary Postgres replicated database server.

2015-01-22 15:20:49.247 GMT 56914 LOG: unexpected pageaddr 142D/73000000 in log file 5166, segment 75, offset 0
2015-01-22 15:20:49.413 GMT 42933 FATAL: could not connect to the primary server: ERROR: Unsupported startup parameter: replication

On the first site, it looks scary but it’s actually not !

I googled a bit but I couldn’t able to find quick answer. So, I investigated further to understand what does reported error mean?  Fortunately, the replication was up-to-date because most of our clients have been recommended or setup to use hybrid replication (WAL shipping using OmniPITR + Streaming Replication).

While discussing further with the team, it turned out they introduced pgbouncer, a connection pooling solution, in to their architecture to better manage connections on the db server. The pgbouncer is set up to run on port 5432 (default Postgres port) on the master database server and make Postgres to listen on 5433. This change wasn’t reflected on secondary replicated database server recovery.conf file and it was still pointing to port 5432. So, we found the cause of the error & the mystery is solved !!

recovery.conf file has been modified to use port 5433 and restarted secondary database to load the config changes. Yay !! Secondary database is connected to primary db server using streaming replication.

2015-01-22 15:41:09.357 GMT 49294 LOG: database system is ready to accept read only connections
2015-01-22 15:41:09.512 GMT 49296 LOG: restored log file "000000030000142E0000006A" from archive
2015-01-22 15:41:10.016 GMT 49296 LOG: unexpected pageaddr 142D/8E000000 in log file 5166, segment 107, offset 0
2015-01-22 15:41:10.182 GMT 49408 LOG: streaming replication successfully connected to primary

I wanted to share my experience that can be useful if you see this ERROR  in you

[continue reading]

I'm excited to have my talk "Choosing a Logical Replication System" accepted to PGConf.US! I'll be speaking on Friday, March 27th from 2:00 - 2:50, as part of the Strategy track.

In this talk I will cover a variety of existing Logical Replication systems for PostgreSQL and go over some of the differences between requirements, supported capabilities, and why you might choose one system over another. I'll also cover some of the changes in PostgreSQL 9.4.

Read about the talk here.

Posted by Dimitri Fontaine in 2ndQuadrant on 2015-01-22 at 00:48:00

Thanks to the Postgres Weekly issue #89 and a post to Hacker News front page (see Pgloader: A High-speed PostgreSQL Swiss Army Knife, Written in Lisp it well seems that I just had my first Slashdot effect...

Well actually you know what? I don't...

So please consider using the new mirror http://dimitri.github.io/pgloader/ and maybe voting on Hacker News for either tooling around your favorite database system, PostgreSQL or your favorite programming language, Common Lisp...

It all happens at https://news.ycombinator.com/item?id=8924270.

Coming to FOSDEM?

If you want to know more about pgloader and are visiting FOSDEM PGDAY or plain FOSDEM I'll be there talking about Migrating to PostgreSQL, the new story (that's pgloader) and about some more reasons why You'd better have tested backups...

If you're not there on the Friday but still want to talk about pgloader, join us at the PostgreSQL devroom and booth!

On 15/01/2015 Pavel Stehule wrote about implementing a dead simple history table using the hstore type. On Friday evening I wanted to copy this almost line for line switching the hstore type for jsonb , but I counldn’t really see how to replicate the update part so simply without creating a delete operator. Once that […]

President Obama recently signed Executive Order: #13685 [1] , in short this Order states:

(a) The following are prohibited:
(i) new investment in the Crimea region of Ukraine by a United States person, wherever located;
(ii) the importation into the United States, directly or indirectly, of any goods, services, or technology from the Crimea region of Ukraine;
(iii) the exportation, reexportation, sale, or supply, directly or indirectly, from the United States, or by a United States person, wherever located, of any goods, services, or technology to the Crimea region of Ukraine;
and

read more

Posted by Binod Nirvan on 2015-01-19 at 13:45:34

We chose PostgreSQL database for our ERP software MixERP, which is going Beta 1 release very soon. Unlike other ERP solutions which support PostgreSQL, we do not use PostgreSQL just to merely only store the tables. In fact, we try and take the full advantage of PostgreSQL database power and capabilities. Just to remind you, MixERP is an open source ERP solution, which has been undergoing development since a year. Despite of being a very small team, we have been getting a substantial number of requests for our database design.

Since we needed a good documentation generator, I investigated the existing tools which could do that. I was not so pleased with the existing tools related to documentation generation. I thought I would give it a try myself.

For the task, I used C#.net and Npgsql, and of course PostgreSQL, as well. During the last few days, I was in PostgreSQL documentation site all the time:

http://www.postgresql.org/docs/9.4/static/catalogs.html

The documentation site provided metadata information profoundly well. There was nothing that was not already there, except for the fact that I could not find a find a way to ask PostgreSQL to give me the definitions of Composite Types. Interestingly, pgAdmin3 could tell me exactly how a Type was defined. Upon investigating what pgAdmin was doing under the hood, I was able to create a function to extract that information. :)

Coming back to my project MixERPPgDoc, it is a small console application exposing a few arguments, and packs all necessary dependencies inside itself. Since it is a self-contained executable file, one should be able to quickly use it and play around with. To give you an idea, MixERPPgDoc creates HTML documentation on the fly. I used a new and sexy CSS component library called SemanticUI for design, and for code highlighting task, prism.js.

Syntax

mixerp-pgdoc.exe -s=[server] -d=[database] -u=[pg_user] -p=[pwd] -o=[output_dir]

Example

mixerp-pgdoc.exe -s=localhost -d=mixerp -u=postgres -p=secret -o="c:\mixerp-doc"

Example Documentation Site (

[continue reading]

Many improvements are being done in the ODBC driver for PostgreSQL these days, one being for example the addition of more and more integrated regression tests insuring the code quality. One feature particularly interested has been committed these days and consists of the following commit:

commit: e85fbb24249ae81975b1b2e46da32479de0b58d6
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Wed, 31 Dec 2014 14:49:20 +0200
Use libpq for everything.

Instead of speaking the frontend/backend protocol directly, use libpq's
functions for executing queries. This makes it libpq a hard dependency, but
removes direct dependencies to SSL and SSPI, and a lot of related code.

This feature can be defined in one single word: simplification. Before discussing about it, see for example the cleanup that this has done in the driver code in terms of numbers:

$ git log -n1 e85fbb2 --format=format: --shortstat
53 files changed, 1720 insertions(+), 8173 deletions(-)

Note the total size of the source code after this commit, which has been reduced by a bit more than 10% in total, which is huge!

# All code
$ git ls-files "*.[c|h]" | xargs wc -l | tail -n1
55998 total
# Regression tests
$ cd test/ && git ls-files "*.[c|h]" | xargs wc -l | tail -n1
5910 total

Now, let's consider the advantages that this new feature has.

First of all, libpq is an in-core library of PostgreSQL managing communication with the backend server that is well-maintained by the core developers of Postgres. Before doing the all-libpq move in Postgres ODBC it was a soft dependency: the driver being usable as well through SSPI, SSL or even nothing thanks to the additional code it carried for managing directly the backend/frontend communication protocol, while with libpq there are APIs directly usable for this purpose. So a large portion of the simplification is related to that (and also to some code used to manage communication socket and SSPI).

Hence, this move is an excellent thing particularly for the Windows installer of Postgres ODBC because unti

[continue reading]

PostgreSQL DO statement doesn't support parametrization. But with psql variables we are able to "inject" do statement safely and we can do it:
bash-4.1$ cat test.sh 
#!/bin/bash

echo "
set myvars.msgcount TO :'msgcount';
DO \$\$
BEGIN
FOR i IN 1..current_setting('myvars.msgcount')::int LOOP
RAISE NOTICE 'Hello';
END LOOP;
END \$\$" | psql postgres -v msgcount=$1

Usage:
bash-4.1$ ./test.sh 3
SET
Time: 0.386 ms
NOTICE: Hello
NOTICE: Hello
NOTICE: Hello
DO
Time: 1.849 ms
PostgreSQL 9.4 intorduced the jsonb type, but it’d be nice to be able to delete keys and pairs using the “-” operator; just like you can with the hstore type. Fortunately postgres makes creating an operator really easy for us, so lets have a go at creating a delete operator for jsonb. First lets try […]
Posted by Dimitri Fontaine in 2ndQuadrant on 2015-01-16 at 08:35:00

PostgreSQL comes with an awesome bulk copy protocol and tooling best known as the COPY and \copy commands. Being a transactional system, PostgreSQL COPY implementation will ROLLBACK any work done if a single error is found in the data set you're importing. That's the reason why pgloader got started: it provides with error handling for the COPY protocol.

That's basically what pgloader used to be all about

As soon as we have the capability to load data from unreliable sources, another use case appears on the horizon, and soon enough pgloader grew the capacity to load data from other databases, some having a more liberal notion of what is sane data type input.

To be able to adapt to advanced use cases in database data migration support, pgloader has grown an advanced command language wherein you can define your own load-time data projection and transformations, and your own type casting rules too.

New in version 3.2 is that in simple cases, you don't need that command file any more. Check out the pgloader quick start page to see some examples where you can use pgloader all from your command line!

Here's one such example, migrating a whole MySQL database data set over to PostgreSQL, including automated schema discovery, automated type casting and on-the-fly data cleanup (think about zero dates or booleans in tinyint(1) disguise), support for indexes, primary keys, foreign keys and comments. It's as simple as:

$ createdb sakila
$ pgloader mysql://root@localhost/sakila pgsql:///sakila
2015-01-16T09:49:36.068000+01:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2015-01-16T09:49:36.074000+01:00 LOG Data errors in '/private/tmp/pgloader/'
                    table name       read   imported     errors            time
------------------------------  ---------  ---------  ---------  --------------
               fetch meta data         43         43          0          0.222s
                  create, drop          0         36          0          0.130s
------------------------------  ---------  ---------  ---

[continue reading]

Posted by David Kerr on 2015-01-15 at 21:42:00
Two simple gems to help tune your Postgres databases.

I'm a big fan of PgTune. I think that in many cases you can run PgTune and set-it-and-forget-it for your Postgres parameters. I like it so much that I often wish I had access to it in my code - especially when working with Puppet to provision new databases servers.

When I started looking into RDS Postgres a while back I realized that the default configuration for those instances was lacking and I really wished I could run PgTune on the RDS instances.

It was to solve those problems above that these two projects formed.

  • RDSTune will create a MrTuner-ized RDS Parameter Group 
  • MrTuner is a Ruby gem that follows in the sprit of PgTune if not directly in it's footsteps.

Both will run from the command line but, more importantly, they can be `required` by your ruby projects to allow you to access these values programmatically. 

Both Gems are available on rubygems and source, examples, configuration and docks available at their respective bitbucket pages.

RDSTune - https://bitbucket.org/davidkerr/rdstune
MrTuner - https://bitbucket.org/davidkerr/mrtuner

Feedback and Pull requests very welcome!
Postgres has this nice extension (hstore) lot of years. It can be used for simulation some features of doc databases - or can be used for implementation of generic triggers for history table:

I have a table test and table history:

CREATE TABLE test(a int, b int, c int);

CREATE TABLE history(
event_time timestamp(2),
executed_by text,
origin_value hstore,
new_value hstore
);

CREATE OR REPLACE FUNCTION history_insert()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO history(event_time, executed_by, new_value)
VALUES(CURRENT_TIMESTAMP, SESSION_USER, hstore(NEW));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION history_delete()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO history(event_time, executed_by, origin_value)
VALUES(CURRENT_TIMESTAMP, SESSION_USER, hstore(OLD));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION history_update()
RETURNS TRIGGER AS $$
DECLARE
hs_new hstore := hstore(NEW);
hs_old hstore := hstore(OLD);
BEGIN
INSERT INTO history(event_time, executed_by, origin_value, new_value)
VALUES(CURRENT_TIMESTAMP, SESSION_USER, hs_old - hs_new, hs_new - hs_old);
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER test_history_insert AFTER INSERT ON test
FOR EACH ROW EXECUTE PROCEDURE history_insert();

CREATE TRIGGER test_history_delete AFTER DELETE ON test
FOR EACH ROW EXECUTE PROCEDURE history_delete();

CREATE TRIGGER test_history_update AFTER UPDATE ON test
FOR EACH ROW EXECUTE PROCEDURE history_update();
Result:
INSERT INTO test VALUES(1000, 1001, 1002);
UPDATE test SET a = 10, c = 20;
DELETE FROM test;

postgres=# SELECT * FROM history ;
event_time | executed_by | origin_value | new_value
------------------------+-------------+-----------------------------------+---------------------------------------
2015-01-15 20:59:05.52 | pavel | | "a"=>"1000", "b"=>"1001", "c"=>"1002"
2015-01-15 20:59:05.6 | pavel | "a"=>"1000", "c"=>"1002" | "a"=>"10", "c"=>"20"
2015-01

[continue reading]

Postgres provides various parameters in postgresql.conf  for performance tuning related to help better use database server resources. At OmniTI, One of our client’s application was reported slowness during high intensive but very critical daily operations. The Postgres DB server is servicing mail generation application with 3-4K TPS and containing more than 400K partitioned tables.  The systems, including DB servers, are getting monitored through Circonus, so it was easier to review graph patterns to find out anything changed recently.

While investigating the graph patterns, we came across significantly high IO increase  (~10-11K per Sec) on Circonus IOSTAT graph around end of Nov. This pattern change matches with the Software upgrade. The software was upgraded by end of November !! As you can see below in the graph, the IO increased up to 10-11K per seconds after upgrading software since last week of Nov,2014.

IOSTAT_graph_0

It was easier to dig further because daily pgbadger log analysis reports were ready for review! Since Software upgrade, the most accessible stored function was generating ~8TB of of temp files per day!! Digging further into report, the average size of generated temp files were around ~100MB size.  As you might be aware, hash tables and sort operations should happen in memory , which depends on work_mem setting,  if the size of the operation is larger than work_mem , then it will end up happening on disks. The work_mem was set to 50MB. It’s clear that the last software upgrade introduced a change in function and underlying query. This change was the root cause of spike in disk IO activities observed in the IOSTAT graph.

The first attempt to tune the query so it avoids disk sorting but there are two challenges to tune the query:

(1) It wasn’t easy to change the application code because it has to go through application change process, which might take 1-2 months.

(2) the query is accessing pg catalog tables to gather details. The pg catalog tables are pretty large because of large number of partitioned tabl

[continue reading]

Posted by Hans-Juergen Schoenig in Cybertec on 2015-01-15 at 14:52:56
In the past couple of years replication has been made easier and easier. Setting up streaming replication is pretty easy these days. When doing training I am getting a lot of positive feedback and people seem to like the way entire database instances can be cloned. In many cases DBAs use a straight forward approach […]
Posted by Marko Tiikkaja on 2015-01-15 at 00:15:00
Roughly five years ago I presented a way of pulling items off a queue table using advisory locks.  Now that the upcoming 9.5 release is going to have a similar method built in, I wanted to explore the performance characteristics of the different approaches available starting from PostgreSQL 9.5.  To recap, the problem is: given a table of items (or "jobs"), distribute the items to a set of
Posted by damien clochard in Dalibo on 2015-01-14 at 10:52:14

As in previous years, Dalibo will particpate to FOSDEM 2015 the greatest FLOSS event in Europe which is held from Jan. 31 to Feb. 1

FOSDEM is a unique free event that offers open source communities a place to meet, share ideas and collaborate. The PostgreSQL Europe association organize an additional event called FOSDEM PG Day the day before (Friday Jan. 30), which means you can enjoy 2 days entirely dedicated to PostgreSQL and its community. Yay !

The schedule contains 2 talks from Dalibo’s team :

  • Friday at 10h20 : Julien Rouhaud and Ronan Dunklau will present Identifying missing indexes and the forthcoming version 2.0 of POWA a dynamic workload analyzer for PostgreSQL. This new version will feature major improvements, especially a missing index detector and an advanced WHERE clause analyzer ( check the pg_qualstats for more details), more filesystem I/O stats (thanks to the pg_stat_kcache extension) and a complete rewrite of the user interface.

  • Saturday at 14h00 : Ronan will talk about Foreign Data Wrappers in PostgreSQL. Ronan is the creator of the multicorn project and wrote the IMPORT FOREIGN SCHEMA patch, he will present the latest improvements of the various Foreign Data Wrapppers.

These 2 talks are part of rich and complete schedule. If you’re a PostgreSQL user, come spend 2 days with us in Brussels !

Details and registration at http://fosdem2015.pgconf.eu/

PS : Thanks to Virginie for her help on this post !

In PostgreSQL high availability(Streaming Replication/Hot Standby), one of the thing require human interference and has no automation, that is in the event of master database crash; initiating failover procedure(trigger file creation or pg_ctl promote command) on the standby. Because in core of PostgreSQL there's no such built-in functionality to identify a failure of master and notify the standby. Thus, we require some automated tools to take over manual failover work or we have to dive into scripting land for writing our own script to do it.

Today, we have very good external tools to handle automatic failover like Replication Manager(repmgr),  EDB Failover Manager(EFM),  pgHA and HandyRep. Thanks to all for filling the gap of automatic failover in PostgreSQL.

In this post, am demonstrating Replication Manager on single node(localhost) on RHEL 6.5 - PostgreSQL 9.3.5. In order to achieve an easy and good understanding of concept I have compiled repmgr with EnterpriseDB One Click Installer(a pre-build binary package) instead of PG source.

To compile repmgr, we need to install few mandatory dependency packages gcc, postgresql-devel, libxslt-devel, pam-devel, libopenssl-devel, krb5-devel and libedit-develby using yum or rpm. After installing dependencies, download repmgr 2.0 from here and set pg_config in your path and start compiling.
[postgres@localhost:/opt/PostgreSQL/9.3~]$ type pg_config
pg_config is hashed (/opt/PostgreSQL/9.3/bin/pg_config)

export PATH=/opt/PostgreSQL/9.3/bin:$PATH
export LD_LIBRARY_PATH=/opt/PostgreSQL/9.3/lib:$LD_LIBRARY_PATH

tar xvzf repmgr-2.0.tar.gz
cd repmgr
make USE_PGXS=1
make USE_PGXS=1 install
Mostly, repmgr compiles smoothly without any hiccups if we have installed all dependency packages, since am compiling against PG pre-build binaries, there may be diverse variants of libraries came with pre-build and rpm which might throw some compilation errors. Like one you see here:
/lib64/libldap_r-2.4.so.2: undefined reference to `ber_sockbuf_io_udp'
collect2: ld returned 1 exit status
make:

[continue reading]

Posted by gabrielle roth on 2015-01-13 at 03:48:00
Do you have to load data from funky files into postgres frequently? I do. I get a few requests per week of the “hey, can you give me a mass data extract for things that match the criteria in this spreadsheet” variety. Frequently, these are windows-formatted all-in-one-line “csv” files. A year or so ago, I […]