PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Michael Paquier on 2015-03-05 at 08:16:54

The last months have showed a couple of vulnerabilities in openssl, so sometimes it is handy to get a status of how SSL is used on a given instance. For this purpose, there is a nice tool called sslyze that can help scanning SSL usage on a given server and it happens that it has support for the SSLrequest handshake that PostgreSQL embeds in its protocol (see here regarding the message SSLrequest for more details).

The invocation of Postgres pre-handshake, support added recently with this commit, can be done using --starttls=postgres. With "auto", the utility can guess that the Postgres protocol needs to be used thanks to the port number.

Something important to note is that when writing this blog post the last release of sslyze does not include the support of Postgres protocol, so it is necessary to fetch the raw code from github, and to add nassl/ in the root of the git code tree to have the utility working (simply fetch it from the last release build for example).

Once the utility is ready, simply scan a server with SSL enabled with a command similar to that:

python sslyze.py --regular --starttls=postgres $SERVER_IP:$PORT

PORT would be normally 5432.

Now let's take the case of for example EXPORT ciphers which are not included by default in the list of ciphers available on server. The scanner is able to detect their presence:

$ python sslyze.py --regular --starttls=postgres 192.168.172.128:5432 | grep EXP
EXP-EDH-RSA-DES-CBC-SHA       DH-512 bits    40 bits
EXP-EDH-RSA-DES-CBC-SHA       DH-512 bits    40 bits
EXP-EDH-RSA-DES-CBC-SHA       DH-512 bits    40 bits
$ psql -c 'show ssl_ciphers'
         ssl_ciphers
------------------------------
 HIGH:MEDIUM:+3DES:EXP:!aNULL
(1 row)

And once disabled the contrary happens:

$ python sslyze.py --regular --starttls=postgres 192.168.172.128:5432 | grep EXP
$ psql -c 'show ssl_ciphers'
          ssl_ciphers
-------------------------------
 HIGH:MEDIUM:!EXP:+3DES:!aNULL
(1 row)

This makes this utility quite a handy tool to get a full status of a given server reg

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2015-03-04 at 23:50:00
Keith Fiske's pg_partman is a pretty nice tool for managing partitions of tables. I've recommended it recently to a couple of clients, and it's working well for them.

Based on that I have made a couple of suggestions for improvement, and today he's made a release including one of them. Previously, the script to rebuild the child indexes essentially removed them all and rebuilt them. Now it only removes those that are no longer on the parent table, and only adds those that are on the parent but not on the child, so if you just add or delete one index on the parent that's all that gets done on the children too.

I'm happy to say that he's also working on my other, more significant suggestion, which is to have a hybrid partitioning mode where the trigger has static inserts for the likely common tables and dynamic inserts for the rest. That will mean you don't have to make nasty choices between flexibility and speed. I'm looking forward to it.
Posted by Tomas Vondra in 2ndQuadrant on 2015-03-02 at 20:20:00

When I gave my Performance Archaeology talk at pgconf.eu 2014 in Madrid, I mentioned I plan to do a few blog posts about that topic soon. It's been four months since the conference, so it's about time to fulfill the promise. In my defense, a lot changed since that talk (e.g. I moved to 2ndQuadrant.

If you look at the pgconf.eu talk, you'll see it has about four sections:

  1. intro - motivation, gotchas and brief description of benchmarks/hardware
  2. pgbench - pgbench results (TPC-B like workload)
  3. TPC-DS - TPC-DS results (warehouse / analytical workload)
  4. fulltext - fulltext results (applicable to GIN/GiST indexes in general)

I've decided to keep this structure for the blog posts too, so this post is a short explanation into the benchmarking, why I did that and what to be careful about when interpreting the data. And then there will be three blog posts, each discussing results for one of the benchmarks.

The series of blog posts may be seen as a different way to present the talk, but in case you'd like to see the actual talk (say, on a local meetup), let me know.

PostgreSQL 7.4, the first release I've been using in production, was released more than 10 years ago (in November 2003 to be precise). When it comes to features, it's simple (although tedious) to skim through the Release Notes and check what new features were added - but what about performance? How did the performance evolve (hopefully improved) since 7.4?

Why should you care?

The question you're maybe asking at this point is "Why do I care?" I personally see three main reasons why that might be interesting for you.

If you're already using an older PostgreSQL release (particularly one of those already unsupported ones, so anything older than 9.0), this might give you another reason to upgrade - not only you'll get a properly maintained version and new features, you'll also get better much performance in many cases.

If you're considering using PostgreSQL, this might give you a sense of how careful and reliable the development community is. Sure, we do mistakes

[continue reading]

Posted by damien clochard in Dalibo on 2015-03-02 at 19:03:14

Yesterday, DALIBO released the second major version of PoWA, the PostgreSQL Workload Analyzer. We’re kind of proud of what we’ve achieved with this new release, however we did crack some eggs while making the omelette… Here’s a brief explanation for all the disruptive choices we made…

When we launched the PoWA project in 2014, we decided to release the project very early and see where it would lead us to. Over the last few months, we took some time to analyze the situation. We looked at the users feedback. We tried to imagine how the project would interact with the other monitoring and performance tools we’re working on, especially OPM, pg_stat_kcache and pg_qual_stats

We came to the conclusion that in order to move forward, we had to make a lot of disruptive technical choices. Please fasten your seatbelt:

  • We’re splitting the code in 2 sub-projects : The PostgreSQL extension is now called PoWA-archivist and the graphic interface is now called PoWA-web.

  • We’re switching to Semantic Version to avoid dependency nightmares. In particular, you will have the guarantee that minor versions of the sub-projects are compatible. For instance, PoWA-archivist 2.0.1 will be compatible with PoWA-web 2.0.9

  • We’re dropping the mojolicous web framework in favor of tornado. There’s a lot we could say about this, but in a nutshell we found that mojolicious release cycle is too erratic for us and switching to Python makes it easier to open the project to other contributors. And yes this means we rewrote entirely the GUI code from scratch.

  • We’re dropping support for PostgreSQL 9.3 and from now on the new versions of PoWA will only work with the 9.4 versions and later. This was a hard choice but if we want to implement ground-breaking feature like the Missing Index Finder, we need to use the latest capacities of PostgreSQL. That being said, we will continue to maintain PoWA v1.2 and you can continue to use it on your PostgreSQL 9.3 servers.

  • We’re allowing multiple source of stats. So far PoWA is able to collect stats from 3 di

[continue reading]

One of the neat little features that arrived at PostgreSQL 9.4 is the WITH ORDINALITY ANSI-SQL construct. What this construct does is to tack an additional column called ordinality as an additional column when you use a set returning function in the FROM part of an SQL Statement.


Continue reading "LATERAL WITH ORDINALITY - numbering sets"
Posted by Guillaume LELARGE in Dalibo on 2015-02-28 at 22:46:00

Since the last time I talked about it, I had quite a few feedbacks, bug issues, pull requests, and so on. Many issues were fixed, the last of it tonight.

I also added two new reports. I had the idea while working on my customers' clusters.

One of them had a lot of writes on their databases, and I wanted to know how much writes occured in the WAL files. vmstat would only tell me how much writes on all files, but I was only interested in WAL writes. So I added a new report that grabs the current XLOG position, and diff it with the previous XLOG position. It gives something like this with a pgbench test:

$ ./pgstat -s xlog 
-------- filename -------- -- location -- ---- bytes ----
 00000001000000000000003E   0/3EC49940        1053071680
 00000001000000000000003E   0/3EC49940                 0
 00000001000000000000003E   0/3EC49940                 0
 00000001000000000000003E   0/3EC875F8            253112
 00000001000000000000003E   0/3ED585C8            856016
 00000001000000000000003E   0/3EE36C40            910968
 00000001000000000000003E   0/3EEFCC58            811032
 00000001000000000000003E   0/3EFAB9D0            716152
 00000001000000000000003F   0/3F06A3C0            780784
 00000001000000000000003F   0/3F0E79E0            513568
 00000001000000000000003F   0/3F1354E0            318208
 00000001000000000000003F   0/3F1F6218            789816
 00000001000000000000003F   0/3F2BCE00            814056
 00000001000000000000003F   0/3F323240            418880
 00000001000000000000003F   0/3F323240                 0
 00000001000000000000003F   0/3F323240                 0

That's not big numbers, so it's easy to find it writes at 253K/s, but if the number were bigger, it might get hard to read. One of my co-worker, Julien Rouhaud, added a human readable option:

$ ./pgstat -s xlog -H
-------- filename -------- -- location -- ---- bytes ----
 00000001000000000000003F   0/3F32EDC0      1011 MB
 00000001000000000000003F   0/3F32EDC0      0 bytes
 00000001000000000000003F   0/3F32EDC0      0 bytes
 0000000100000000

[continue reading]

Posted by Hubert 'depesz' Lubaczewski on 2015-02-27 at 18:59:40
One of my clients is upgrading some servers. The procedure we have took some time to get to current state, and we found some potential problems, so decided to write more about it. First, what we have, and what we want to have. We have usually 3 servers: master slave slave2 Both slaves use streaming […]
Posted by Shaun M. Thomas on 2015-02-27 at 17:54:29

Long have CASE statements been a double-edged sword in the database world. They’re functional, diverse, adaptive, and simple. Unfortunately they’re also somewhat bulky, and when it comes to using them to categorize aggregates, something of a hack. This is why I wanted to cry with joy when I found out that PostgreSQL 9.4 introduced a feature I’ve always wanted, but found difficult to express as a need. I mean, CASE statements are fine, right? Well, yes they are, but now we have something better. Now, we have the FILTER aggregate expression.

I always like working with examples, so let’s create some test data to illustrate just what I’m talking about.

CREATE TABLE sys_order
(
    order_id     SERIAL     NOT NULL,
    product_id   INT        NOT NULL,
    item_count   INT        NOT NULL,
    order_dt     TIMESTAMP  NOT NULL DEFAULT now()
);

INSERT INTO sys_order (product_id, item_count)
SELECT (a.id % 100) + 1, (random()*10)::INT + 1
  FROM generate_series(1, 1000000) a(id);

ALTER TABLE sys_order ADD CONSTRAINT pk_order_order_id
      PRIMARY KEY (order_id);

We now have a table for tracking fake orders, using 100 nonexistent products. I added the primary key after loading the table as a well known DBA trick. Doing this after data loading means the index can be created as a single step, which is much more efficient than repeatedly extending an existing index.

With that out of the way, let’s do a basic product order count, since that’s something many people are already familiar with:

SELECT sum(item_count) AS total
  FROM sys_order;

-[ RECORD 1 ]--
total | 1000000

No surprises here. But what happens when Jeff from Accounting wants to know how many people ordered five specific products as a column list? In the old days, we might do something like this:

SELECT sum(CASE WHEN product_id = 1 THEN item_count ELSE 0 END) AS horse_mask_count,
       sum(CASE WHEN product_id = 7 THEN item_count ELSE 0 END) AS eyeball_count,
       sum(CASE WHEN product_id = 13 THEN item_count ELSE 0 END) AS badger_count,
       sum(C

[continue reading]

Posted by gabrielle roth on 2015-02-27 at 03:20:00

Mark & I left David Wheeler in charge of the PDXPUG February meeting while we were at SCALE last week.

Here’s David’s report:

This week Dave Kerr discussed using Bucardo mutli-master replication to gradually migrate a production database from EC2 to RDS. This work allowed his team to switch back and forth between the two systems with the assurance that the data would be the same on both. It also allowed them a fallback in case the RDS database didn’t work out: the S3 system would still be there. The discussion allowed those present to complain about Bucardo, EC2, RDS, and the French.

Thanks!


Having all parts from the previous post in place, some mechanism to do the routine maintenance by calling the transfer function automagically, is needed.

Of course this could be done with pgAgent or it could be done with cron, but since it should be elegant, this calls for a background worker process.

For illustration purposes, I wrote a sample implementation called  worker_ltt based on the worker_spi sample code. Sloppy - even the orginal comments are still in there.

Adding worker_ltt to shared_preload_libraries and

worker_ltt.naptime = 60
worker_ltt.database = 'yourdb'
worker_ltt.user = 'youruser'
worker_ltt.function = 'move_longtail'
 
to postgresql.conf starts executing move_longtail() every 60 seconds in yourdb as youruser. If the user is omitted, it runs with superuser rights!

Since move_longtail() basically can do anything, restricting the user is a good idea.

For more security, the SQL statements could be moved entirely into the background worker, but then the approach loses much of its flexibility... But this is a concept anyway, there is always room for improvement.

But it really works.

In part III I'll try to give a raw estimate how big the performance penalty is when the partitioned table switches from fast to slow storage during a query. And there is another important problem to be solved...

PostBooks has been successful on Debian and Ubuntu for a while now and for all those who asked, it is finally coming to Fedora.

The review request has just been submitted and the spec files have also been submitted to xTuple as pull requests so future upstream releases can be used with rpmbuild to create packages.

Can you help?

A few small things outstanding:

  • Putting a launcher icon in the GNOME menus
  • Packaging the schemas - they are in separate packages on Debian/Ubuntu. Download them here and load the one you want into your PostgreSQL instance using the instructions from the Debian package.

Community support

The xTuple forum is a great place to ask any questions and get to know the community.

Screenshot

Here is a quick look at the login screen on a Fedora 19 host:

Attachment Size
postbooks-fedora.png 227.54 KB
Posted by Josh Berkus in pgExperts on 2015-02-26 at 18:27:00
Here's a commonplace ops crisis: the developers push a new dashboard display widget for user homepages on your application.  This seems to work fine with in testing, and they push it out to production ... not realizing that for some large subset of users dissimilar from your tests, the generated query triggers a sequential scan on the second-largest table in the database.   Suddenly your database servers are paralyzed with load, and you have to shut down the whole site and back out the changes.

Wouldn't it be nice if you could just tell the database server "don't run expensive queries for the 'web' user"?  Well, thanks to my colleague Andrew Dunstan, who wrote plan_filter with support from Twitch.TV, now you can.

Sort of.  Let me explain.

PostgreSQL has had statement_timeout for a while, which can be set on a per-user basis (or other places) to prevent application errors from running queries for hours.  However, this doesn't really solve the "overload" issue, because the query runs for that length of time, gobbling resources until it's terminated.  What you really want to do is return an error immediately if a query is going to be too costly.

plan_filter is a loadable module which allows you to set a limit on the cost of queries you can execute.  It works, as far as we know, with all versions of Postgres starting at 9.0 (we've tested 9.1, 9.3 and 9.4). 

Let me show you.  First, you have to load the module in postgresql.conf:

    shared_preload_libraries = 'plan_filter'

Then you alter the "web" user to have a strict limit:

    ALTER USER web SET plan_filter.statement_cost_limit = 200000.0

Then try some brain-dead query as that user, like a blanket select from the 100m-row "edges" graph table:

    \c - web
    SELECT * FROM edges;

    STATEMENT:  select * from edges;
    ERROR:  plan cost limit exceeded
    HINT:  The plan for your query shows that it would probably
    have an excessive run time. This may be due to a logic error
    in the SQL, or it maybe just a very costly query. Rewrite 
    your query

[continue reading]

DISCLAIMER: This is just an idea, I don't have tried this in a production environment!

Having said that, any input is welcome. :-)

Storing tons of rows in a table of which only a small percentage of rows are frequently queried is a common scenario for RDBMS, especially with databases that have to keep historical information just in case they might be audited, e.g. in environments regulated by law.

With the advent of the first affordable 8TB harddisk and fast SSDs still being much more expensive per GB, I wondered if such long-tailed tables could be split over a SSD holding the frequently accessed pages and a near-line storage HDD keeping the archive - elegantly - with PostgreSQL.

With elegant, I mean without fiddling around with VIEWs, INSTEAD OF triggers and exposing a clean and familiar interface to the developer.

OK, since PostgreSQL already supports horizontal partitioning, spreading one table transparently over many parallel tables, how about vertical partitioning, spreading one table over a hierarchy of speed?

The speed zones can be mapped to tablespaces:

CREATE TABLESPACE fast LOCATION '/mnt/fastdisk';
CREATE TABLESPACE slow LOCATION '/mnt/slowdisk';

Next comes the table(s):

CREATE TABLE the_table
(
  id integer NOT NULL,
  value real
)
WITH (
  OIDS=FALSE
)
TABLESPACE fast;


CREATE TABLE the_table_archive
(
)
INHERITS (the_table)
WITH (
  OIDS=FALSE
)
TABLESPACE slow;


Table inheritance in PostgreSQL is so cool...

And a function to move data from fast to slow:

CREATE OR REPLACE FUNCTION move_longtail()
  RETURNS boolean AS
$BODY$
DECLARE worked BOOLEAN;
DECLARE rowcount INTEGER;
BEGIN
worked := false;
rowcount := count(*) FROM ONLY the_table WHERE id >= 5000000;
IF (rowcount > 100000) THEN
INSERT INTO the_table_archive SELECT * FROM ONLY the_table WHERE id >= 5000000;
DELETE FROM ONLY the_table WHERE id >= 5000000;
worked := true;
END IF;
RETURN worked;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE STRICT;


This function runs only if a minimum of movable rows qualify. This is recommended since SMR di

[continue reading]

Up to Postgres 9.4, when a node in recovery checks for the availability of WAL from a source, be it a WAL stream, WAL archive or local pg_xlog and that it fails to obtain what it wanted, it has to wait for a mount of 5s, amount of time hardcoded directly in xlog.c. 9.5 brings more flexibility with a built-in parameter allowing to control this interval of time thanks to this commit:

commit: 5d2b45e3f78a85639f30431181c06d4c3221c5a1
author: Fujii Masao <fujii@postgresql.org>
date: Mon, 23 Feb 2015 20:55:17 +0900
Add GUC to control the time to wait before retrieving WAL after failed attempt.

Previously when the standby server failed to retrieve WAL files from any sources
(i.e., streaming replication, local pg_xlog directory or WAL archive), it always
waited for five seconds (hard-coded) before the next attempt. For example,
this is problematic in warm-standby because restore_command can fail
every five seconds even while new WAL file is expected to be unavailable for
a long time and flood the log files with its error messages.

This commit adds new parameter, wal_retrieve_retry_interval, to control that
wait time.

Alexey Vasiliev and Michael Paquier, reviewed by Andres Freund and me.

wal_retrieve_retry_interval is a SIGHUP parameter (possibility to update it by reloading parameters without restarting server) of postgresql.conf that has the effect to control this check interval when a node is in recovery. This parameter is useful when set to values shorter than its default of 5s to increase for example the interval of time a warm-standby node tries to get WAL from a source, or on the contrary a higher value can help to reduce log noise and attempts to retrieve a missing WAL archive repetitively when for example WAL archives are located on an external instance which is priced based on the amount of connections attempted or similar (note as well that a longer interval can be done with some timestamp control using a script that is kicked by restore_command, still it is good to have a built-in option to do it instea

[continue reading]

Today we're releasing a code for a small PostgreSQL module called plan_filter that lets you stop queries from even starting if they meet certain criteria. Currently the module implements one such criterion: the estimated cost of the query.

After you have built and installed it, you add a couple of settings to the postgresql.conf file, like this:
shared_preload_libraries = 'plan_filter'
plan_filter.statement_cost_limit = 100000.0
Then if the planner estimates the cost as higher than the statement_cost_limit it will raise an error rather than allowing the query to run.

This module follows an idea from a discussion on the postgresql-hackers mailing list some time ago. It was developed by PostgreSQL Experts Inc for our client Twitch.TV, who have generously allowed us to make it publicly available.
Posted by Andrew Dunstan in pgExperts on 2015-02-25 at 23:16:00
Yesterday I ordered a Raspberry Pi 2 Model B, and it should be delivered in a few days. I'm intending to set it up as a buildfarm member. The fact that you can purchase a general purpose computer the size of a credit card with  a quad-core processor and 1Gb of memory (I remember when RAM was counted in kilobytes) and all for USD35.00 is amazing, even when you remember Moore's Law.

New repository

It’s been almost a year now that I wrote the first version of the btree bloat estimation query. Then, came the first fixes, the bloat estimation queries for tables, more fixes, and so on. Maintaining these queries as gists on github was quite difficult and lack some features: no documented history, multiple links, no doc, impossible to fork, etc.

So I decided to move everything to a git repository you can fork right away: http://github.com/ioguix/pgsql-bloat-estimation. There’s already 10 commits for improvements and bug fixes.

Do not hesitate to fork this repo, play with the queries, test them or make pull requests. Another way to help is to discuss your results or report bugs by opening issues. This can lead to bug fixes or the creation of a FAQ.

Changelog

Here is a quick changelog since my last post about bloat:

  • support for fillfactor! Previous versions of the queries were considering any extra space as bloat, even the fillfactor. Now, the bloat is reported without it. So a btree with the default fillfactor and no bloat will report a bloat of 0%, not 10%.
  • fix bad tuple header size for tables under 8.0, 8.1 or 8.2.
  • fix bad header size computation for varlena types.
  • fix illegal division by 0 for the btrees.
  • added some documentation! See README.md.

In conclusion, do not hesitate to use this queries in your projects, contribute to them and make some feedback!

Posted by Ernst-Georg Schmid on 2015-02-25 at 21:33:00
pgchem::tigress 3.2 is finally out!

  • This builds against PostgreSQL 9.4 and OpenBabel 2.3.2 on Linux.
  • It contains all fixes and contributions of the previous versions.
  • Windows is not supported anymore - and since it builds and runs way better on Linux, probably never will be again.
  • Depiction functions have been removed. Their run time was too unpredictable to be run inside a database server.
  • Theoretical isotope pattern generation with MERCURY7 is now available with isotopes for 39 elements.

So: CREATE EXTENSION pgchem_tigress;
On 23rd of February, Heikki Linnakangas committed patch: Replace checkpoint_segments with min_wal_size and max_wal_size.   Instead of having a single knob (checkpoint_segments) that both triggers checkpoints, and determines how many checkpoints to recycle, they are now separate concerns. There is still an internal variable called CheckpointSegments, which triggers checkpoints. But it no longer determines how […]
Last week was one of the busiest community weeks I have had in a long time. It started with an excellent time in Vancouver, B.C. giving my presentation, "An evening with PostgreSQL!" at VanLUG. These are a great group of people. They took all my jibes with good humor (Canadians gave us Maple Syrup, we gave them Fox News) and we enjoyed not only technical discussion but discussions on technology in general. It is still amazing to me how many people don't realize that Linux 3.2 - 3.8 is a dead end for random IO performance.

After VanLUG I spent the next morning at the Vancouver Aquarium with my ladies. Nothing like beautiful weather, dolphins and jelly fish to brighten the week. Once back in Bellingham, we moved on to a WhatcomPUG meeting where I presented, "Practical PostgreSQL: AWS Edition". It was the inaugural meeting but was attended by more than just the founders which is a great start!

I got to rest from community work on Wednesday and instead dug my head into some performance problems on a client High Availability Cluster. It is amazing that even with proper provisioning how much faster ASYNC rep is over SYNC rep. Some detailed diagnosis and proving data demonstrated, we switched to ASYNC rep and all critical problems were resolved.

On Thursday it was off to Southern California Linux Expo where I presented, "Suck it! Webscale is dead; long live PostgreSQL!". The room was packed, people laughed and for those who might have been offended, I warned you. Your offense is your problem. Look inside yourself for your insecurities! All my talks are PG-13 and it is rare that I will shy away from any topic. My disclosure aside, I had two favorite moments:

  1. When someone was willing to admit they hadn't seen Terminator. I doubt that person will ever raise his hand to one of my questions again.
  2. When Berkus (who knew the real answer) suggested it was Elton John that wrote the lyrics at the end of the presentation.

After I spent the evening with JimmyM (BigJim, my brother), Joe Conway of SDPUG/Credativ , Jim Nasby of the

[continue reading]

Posted by Tomas Vondra in 2ndQuadrant on 2015-02-24 at 22:00:00

So ... Prague PostgreSQL Developer Day 2015, the local PostgreSQL conference, happened about two weeks ago. Now that we collected all the available feedback, it's probably the right time for a short report and sketch of plans for next year.

p2d2-2015-panorama.jpg

The first year of Prague PostgreSQL Developer Day (P2D2) happened in 2008, and from the very beginning was organized as a community event for developers - from students of software engineering to people who use PostgreSQL at work.

We've changed the venue a few times, but in most cases we've just moved from one university / faculty to another one, and the same happened this year for capacity reasons. The previous venue at Charles University served us well, but we couldn't stuff more than 120 people in, and we usually reached that limit within a week after opening the registration. The new venue, located at Czech Technical University can handle up to ~180 people, which should be enough for the near future - this year we registered 150 people, but a few more ended on a wait list.

The most obvious change was adding a full day of trainings on February 11 (i.e. the day before the main conference day), similarly to what happens at pgconf.eu and various other conferences. The feedback to this is overwhelmingly good, so we're pretty sure we'll preserve this for the next years.

The main conference (on February 12) consisted of 9 talks, not counting the initial "welcome" speech. We had the usual mixture of talks, from a brief talk about features introduced in 9.4, talks about using PostgreSQL in actual projects, to a talk about Bi-Directional Replication.

Although the conference is aimed at local users, and thus the majority of talks is either in Czech or Slovak, every year we invite a few foreign speakers to give talks in english. This year we had the pleasure to welcome Marc Balmer, who gave a talk "Securing your PostgreSQL applications", and Keith Fiske explaining that "When PostgreSQL Can't You Can".

In the early years of the conference we've been getting "too many talks in english"

[continue reading]

Posted by Luca Ferrari in ITPUG on 2015-02-24 at 17:52:00
2014 was a very bad year, one I will remember forever for the things and the people I missed. But it was also the first year I missed the PGDay.IT, but today, thank to the board of directors and volounteers, I received the shirts of the event. This is a great thing for me, as being part of this great community. A special thank also to the OpenERP Iitalia!
Posted by Greg Sabino Mullane in EndPoint on 2015-02-24 at 12:00:00

Way back in 2005 I added the ON_ERROR_ROLLBACK feature to psql, the Postgres command line client. When enabled, any errors cause an immediate rollback to just before the previous command. What this means is that you can stay inside your transaction, even if you make a typo (the main error-causing problem and the reason I wrote it!). Since I sometimes see people wanting to emulate this feature in their application or driver, I thought I would explain exactly how it works in psql.

First, it must be understood that this is not a Postgres feature, and there is no way you can instruct Postgres itself to ignore errors inside of a transaction. The work must be done by a client (such as psql) that can do some voodoo behind the scenes. The ON_ERROR_ROLLBACK feature is available since psql version 8.1.

Normally, any error you make will throw an exception and cause your current transaction to be marked as aborted. This is sane and expected behavior, but it can be very, very annoying if it happens when you are in the middle of a large transaction and mistype something! At that point, the only thing you can do is rollback the transaction and lose all of your work. For example:

greg=# CREATE TABLE somi(fav_song TEXT, passphrase TEXT, avatar TEXT);
CREATE TABLE
greg=# begin;
BEGIN
greg=# INSERT INTO somi VALUES ('The Perfect Partner', 'ZrgRQaa9ZsUHa', 'Andrastea');
INSERT 0 1
greg=# INSERT INTO somi VALUES ('Holding Out For a Hero', 'dx8yGUbsfaely', 'Janus');
INSERT 0 1
greg=# INSERT INTO somi BALUES ('Three Little Birds', '2pX9V8AKJRzy', 'Charon');
ERROR:  syntax error at or near "BALUES"
LINE 1: INSERT INTO somi BALUES ('Three Little Birds', '2pX9V8AKJRzy'...
greg=# INSERT INTO somi VALUES ('Three Little Birds', '2pX9V8AKJRzy', 'Charon');
ERROR:  current transaction is aborted, commands ignored until end of transaction block
greg=# rollback;
ROLLBACK
greg=# select count(*) from somi;
 count
-------
     0

When ON_ERROR_ROLLBACK is enabled, psql will issue a SAVEPOINT before every command you sen

[continue reading]

In my previous posting on PostgreSQL 9.4 I have shown aggregate FILTER clauses, which are a neat way to make partial aggregates more readable. Inspired by some comments to this blog post I decided to create a follow up posting to see which impact this new FILTER clause has on performance. Loading some demo data To […]
Posted by Luca Ferrari in ITPUG on 2015-02-22 at 15:53:00
Thanks to the effort of some of our associates, we were able to perform a short interview to our associates themselves in order to see how ITPUG is working and how they feel within the association. The results, in italian, are available here for a first brief description. As a general trend, ITPUG is going fine, or even better of how it was going a few years before. However there is still a
I had to deal with this question, or some version of it, quite a few times. So, decided to write a summary on what one could (or should) do, after data is in database, and application is running. Namely – setup some kind of replication and backups. What to use, how, and why? This is […]
Taking a second look at the execution plans, I've noticed that the scan on slow read twice the number of pages from disk than the one on fast:

Buffers: shared read=442478 vs. Buffers: shared read=221239

Since I loaded all rows into slow first and then moved 50% of them into fast, this makes sense, I guess.
If I understand it correctly, those pages in slow are now empty, but PostgreSQL keeps them for future use.

So I tried a VACUUM FULL on slow and ran my queries again. That changed the plans:

Buffers: shared read=221239 vs. Buffers: shared read=221239

And execution times are now about equal.


Posted by Jim Mlodgenski in OpenSCG on 2015-02-20 at 16:30:40

Some of our customers really like writing their business logic inside of PostgreSQL. While this is really cool that PostgreSQL is capable of handling, trying to performance tune large amounts of PL/pgSQL code becomes unwieldy. If your functions are small enough, it’s possible add some logging statements, but that is not possible with hundreds or even thousands lines of legacy code.

Several years ago as part of the PL/pgSQL debugger, Korry Douglas wrote a PL/pgSQL profiler, but over the years, it seems to have suffered from bit rot. A profiler for PL/pgSQL code helps solve a lot of problems and gives us insight into how your server side code is running.

Below is an example output from the profiler showing how many times each line of code executed and what was the time taken for each line.

PL/pgSQL Profiler

 

 

The plprofiler has not been tested is many different environments yet, so be careful in rolling it out to production servers. Check it out and let me know if you find any issues.

https://bitbucket.org/openscg/plprofiler

 

Not later than this week a bug regarding pg_dump and compression with zlib when dumping data has been reported here.

The issue was that when calling -Fd, the compression level specified by -Z was ignored, making the compressed dump having the same size for Z > 0. For example with a simple table:

=# CREATE TABLE dump_tab AS SELECT random() as a,
                                   random() as b
   FROM generate_series(1,10000000);
SELECT 10000000

A dump keeps the same size whatever the compression level specified:

$ for num in {0..4}; do pg_dump -Fd -t dump_tab -f \
    level_$num.dump -Z num ; done
$ ls -l level_?.dump/????.dat.gz level_0.dump/????.dat
-rw-r--r--  1 michael  staff  419999247 Feb 20 22:13 level_0.dump/2308.dat
-rw-r--r--  1 michael  staff  195402899 Feb 20 22:13 level_1.dump/2308.dat.gz
-rw-r--r--  1 michael  staff  195402899 Feb 20 22:14 level_2.dump/2308.dat.gz
-rw-r--r--  1 michael  staff  195402899 Feb 20 22:15 level_3.dump/2308.dat.gz
-rw-r--r--  1 michael  staff  195402899 Feb 20 22:16 level_4.dump/2308.dat.gz

After a couple of emails exchanged, it was found out that a call to gzopen() missed the compression level: for example to do a compression of level 7, the compression mode (without a strategy) needs to be something like "w7" or "wb7" but the last digit was simply missing. An important thing to note is how quickly the bug has been addressed, the issue being fixed within one day with this commit (that will be available in the next series of minor releases 9.4.2, 9.3.7, etc.):

commit: 0e7e355f27302b62af3e1add93853ccd45678443
author: Tom Lane <tgl@sss.pgh.pa.us>
date: Wed, 18 Feb 2015 11:43:00 -0500
Fix failure to honor -Z compression level option in pg_dump -Fd.

cfopen() and cfopen_write() failed to pass the compression level through
to zlib, so that you always got the default compression level if you got
any at all.

In passing, also fix these and related functions so that the correct errno
is reliably returned on failure; the original coding supposes that free()
cannot change

[continue reading]

Hm, for something I'm trying at the moment on 9.4, I created two identical tables where the second one inherits all from the first:

CREATE UNLOGGED TABLE everything.slow
(
  id integer NOT NULL,
  value real
)
WITH (
  OIDS=FALSE
)


CREATE UNLOGGED TABLE everything.fast
(
)
INHERITS (everything.slow)
WITH (
  OIDS=FALSE
)


No indexes. Default tablespace.

If I then load 50 million records into each of those tables and query them individually using the ONLY restrictor, a count(*) on the parent table (slow) is slower than on the descendant (fast):

select count(*) from only everything.slow;

"Aggregate  (cost=1067783.10..1067783.11 rows=1 width=0) (actual time=4973.812..4973.813 rows=1 loops=1)"
"  Output: count(*)"
"  Buffers: shared read=442478"
"  ->  Seq Scan on everything.slow  (cost=0.00..942722.08 rows=50024408 width=0) (actual time=1012.708..3416.349 rows=50000000 loops=1)"
"        Output: id, value"
"        Buffers: shared read=442478"
"Planning time: 0.118 ms"
"Execution time: 4973.901 ms"


select count(*) from only everything.fast;

"Aggregate  (cost=846239.00..846239.01 rows=1 width=0) (actual time=3988.235..3988.235 rows=1 loops=1)"
"  Output: count(*)"
"  Buffers: shared read=221239"
"  ->  Seq Scan on everything.fast  (cost=0.00..721239.00 rows=50000000 width=0) (actual time=0.101..2403.813 rows=50000000 loops=1)"
"        Output: id, value"
"        Buffers: shared read=221239"
"Planning time: 0.086 ms"
"Execution time: 3988.302 ms"


This works with other aggregates like avg() too.

I had expected some overhead when querying without ONLY on slow, because of the traversal of the inheritance hierarchy, but not when I restrict the query to a specific table with ONLY...

Can someone explain this?