The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
  • Get in touch with the Planet PostgreSQL administrators at planet at

Helsinki PostgreSQL User Group will convene tomorrow, on Tuesday September 1st 2015 at 15:00.

I will be babbling about the various index types in PostgreSQL. GIN, GiST, and the humble B-tree. What does the structure of each one look like? What are the strengths and weaknesses of each? And of course, a lot of free chatter on whatever other topics you want to bring up.

We have also created a Meetup group. Registration at Meetup is recommended to ensure you get a seat:

This is the fourth meeting of the Helsinki PostgreSQL User Group. Come and meet fellow PostgreSQL users and developers. Welcome!

September 1st 2015

Richardinkatu Library
Rikhardinkatu 3

Posted by Oleg Bartunov in Postgres Professional on 2015-08-30 at 20:14:37
Our company Postgres Professional has conducted a performance testing on big P8 server, provided by IBM. One excited thing is that we were able to identify scalability issue and made a patch (kludge for now), which gave us performance of postgres close to million tps. We'll discuss the approach used in patch in -hackers.


More details are in Alexander's Korotkov blog post.

Next post we'll compare performance of postgres and MariaDB :)
Posted by gabrielle roth on 2015-08-30 at 19:00:00
PostgresOpen is coming up in just a few (Disclosure: I am on the conference committee.) We are still working out the room assignments; your conference badge/printed program will have the right info. The committee is very excited to have snagged Lacey Williams Henschel to give our keynote, on Open Source & Higher Ed. I’m looking […]
A few days ago, I have released a brand-new PostgreSQL extension, called "sql_firewall". sql_firewall is intended to protect PostgreSQL database from SQL injection attacks by limiting SQL queries to be executed on the database. In this entry, I would like to introduce how it works and how to use it. How
Posted by Josh Berkus in pgExperts on 2015-08-28 at 22:16:00
I write a lot more PL/pgSQL than I'd like to.  Not that I don't like SQL, but as a language PL/pgSQL really shows its thrown-together origin; it's like 80's primitive.  One thing that PL/pgSQL lacks is good string manipulation tools, which is particularly tragic given that the #1 thing to do in PL/pgSQL is to generate queries from parameters and run EXECUTE.

Postgres has two built-in ways to do string substitution: concatenation and format().  Both have drawbacks.  Let me give you an example:

EXECUTE 'SELECT ' || col1 || ', ' || col2 || ' FROM ' || userschema ||
  '.accounts WHERE ' || filterclause || ' ORDER BY ' || col1 || ',' || col2;

EXECUTE format('SELECT %s, %s FROM %s.accounts WHERE %s ORDER BY %s, %s', col1, col2, userschema, filterclause, col1, col2);

You can see the problem here.  Both formats are hard to read and hard to maintain.  Python and Perl have a good fix for this: dictionary/hash-based string substitution, where you can swap in the dictionary keys for the values.  So I wrote up a quick hack to do this in PL/pgSQL.

Here's the dict-replace function.

Using it, you'd replace the above with:

EXECUTE replace_vars('SELECT ${col1}, ${col2} FROM ${userschema}.accounts
  WHERE ${filterclause} ORDER BY ${col1}, ${col2}', vardict);

Of course, you need to first set up the vardict as a JSON value, in the form:

vardict := '{ "col1" : "username", 'col2' : "branchname", ...}'

Still, much more readable, eh?  No refcounting, no repeating variables, no string breaks.  I used Bash's variable substitution syntax of ${var} because it seemed like the thing least likely to conflict with user text, unlike anything involving %.

There's some caveats, though: it's not performant, and you could probably find strings which will break it, particularly if you're swapping in JSON values.  You still have to do your own quoting of strings.  And if you have access to PL/Perl or PL/Python you don't need this nonsense.

But for the minimal case, it should help.
Posted by Shaun M. Thomas on 2015-08-28 at 17:00:23

This week we’ll be covering another method of Postgres partitioning. This is a technique I personally prefer and try to use and advocate at every opportunity. It’s designed to straddle the line between traditional partitioning and standard monolithic table structure by using table inheritance as a convenience factor. The assumption here is that end-user applications either:

  1. Know that partitioning is in use.
  2. Only load “current” data and don’t care about partitions.

These two things may seem mutually exclusive, but they can actually be complementary. There are also notable performance benefits to this approach that we’ll explore later.

The SQL for this structure is similar to what we used last week, but there are a couple of notable omissions.

TRUNCATE TABLE sensor_log;
CREATE TABLE sensor_log_part_2012 (
    CHECK (reading_date >= '2012-01-01' AND
           reading_date < '2013-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2013 (
    CHECK (reading_date >= '2013-01-01' AND
           reading_date < '2014-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2014 (
    CHECK (reading_date >= '2014-01-01' AND
           reading_date < '2015-01-01')    
) INHERITS (sensor_log);

This time, we don’t need the function that distributes the rows to the correct partition, or the associated trigger. Further, note that the 2015 partition is missing. This is possible because we’ve made two assumptions about how partitions will be used:

  1. Current data will be stored in the base table.
  2. Old or “archived” data will be moved into partitions.

To fulfill the first requirement, we made a slight change to the python loader script introduced at the beginning of this series. We could have done this with the partitioning introduced last week as well, but while that is a performance tweak for that format, this method depends on it.

# Set these to modify how large the C

[continue reading]

PostgreSQL JDBC Driver Version 9_4_1202 released

Lots of bug fixes and some awesome performance enhancements, including statement caching

Version 9.4-1201 (2015-02-25)

  • ResultSet positioning methods in some particular cases PR #296 (282536b)
Author: Craig Ringer
  • Disable binary xfer on batches returning generated keys PR #273 (763ae84)
  • Add a new test case demonstrating a bug in returning support PR #273 (4d2b046)
  • Always Describe a query in a batch that returns generated keys PR #273 (a6bd36f)
Author: Dave Cramer
  • chore: fix build.xml to allow releasing to maven PR #262 (34f9361)
  • fix: BlobInputStream ignores constructor parameters #263 PR #273 (c1c6edc)
  • don't reset forceBinary transfer if we setPreparedThreshold (937a11c)
  • Revert "perf: Remove expensive finalize methods from Statement and Connection" PR #293 (a0d3997)
  • updated copyright PR #312 (263375c)
  • Revert "Issue 250 -- Adding setURL/getURL to" PR #312 (a1ac380)
  • fixed mailing list href PR #326 (c3e86a6)
  • increment driver version PR #346 (b8ee75d)
Author: David R. Bild:
  • feat: add equality support to PSQLState PR #277 (7698cd9)
  • Improve version checking PR #355 (f7a84db)
Author: Eugene Koontz
  • Add support within "private Object buildArray (PgArrayList input, int index, int count)" for array elements whose type is jsonb PR #349 (d313138)
  • Added setter method for logging level. The method exactly matches property name in documentation. PR #282 (d9595d1)
  • Added getter method. PR #282 (65759f0)
  • Adding XML catalog to help unit tests not remote entity resolution. PR #284 (cb87067)
  • Added support to locally resolve dtd or entity files. PR #284 (017970d)
  • Disable verbose logging of the catalog resolver. PR #284 (fcc34f5)
Author: Kris Jurka
  • Improve error message for failure to update multicolumn primary key RSs. PR #284 (05ff811)
  • Remove all JDBC3 code as JDK 1.4/1.5 are no longer supported. PR #284 (f9a956b)
  • Add preliminary support for JDBC4.2. PR #284 (bd05fd2)
Author: Lo

[continue reading]

I've been writing a little provisioning script for a vagrant Centos 7 machine that will be a buildfarm client.

Stripped of the vagrant stuff and some stuff that's special for the machine, here is what I did to get it running:

sudo yum install -y wget gcc make flex bison ccache git \
perl-libwww-perl perl-Digest-SHA perl-devel perl-ExtUtils-Embed \
zlib-devel openssl-devel readline-devel python-devel tcl-devel \
libxml2-devel libxslt-devel openldap-devel

mkdir bf
cd bf
mv latest-client.tgz buildfarm-latest-client.tgz
tar -z --strip-components=1 -xf buildfarm-latest-client.tgz
mkdir root
sed -i "s!build_root =>.*!build_root => '$HERE/root',!" build-farm.conf
At this stage the buildfarm client will run quite successfully, as can be tested using:
perl --test

While hacking PostgreSQL it’s very useful to know pid of the backend you are working with. You need to know pid of the process to attach debugger, profiler etc. Luckily, .psqlrc provides us an elegant way to define the shortcuts for psql. Using config line below one can find out backend pid just by typing :pid.

\set pid 'SELECT pg_backend_pid();'

=# :pid
(1 row)

In 9.6 it becomes possible to even include backend pid into psql prompt.

However, it’s possible to automate more complex actions in psql. I’ve configured my psql to run gdb attached to current backend in new tab of iTerm2 just by typing :gdb.

The :gdb command selects pid of current backend and puts it to the input of pg_debug script.

\set gdb 'SELECT pg_backend_pid() \\g |pg_debug'

pg_debug extracts pid from its input and then runs OSA script which runs gdb in the new tab of iTerm2.



while read line
    # Extended display off
    if [[ $line =~ ^\ +([0-9]+) ]]; then
    # Extended display on
    if [[ $line =~ ^pg_backend_pid.*\ ([0-9]+) ]]; then

# Open gdb session
osascript -e "
tell application \"iTerm\"
    tell the current terminal
        set mysession to (the current session)
        launch session \"Default Session\"
        tell the last session
            write text \"gdb --pid=$PID -x <(echo continue)\"
        end tell
        select mysession
    end tell
end tell"

This script works for Mac OS X and iTerm2, but the same approach should work for other platforms and terminal emulators.

Posted by Pavan Deolasee in 2ndQuadrant on 2015-08-26 at 11:12:40

It’s been busy few months as we work towards merging Postgres-XL with the latest and greatest release of PostgreSQL. Postgres-XL is an open source fork of PostgreSQL that provides a scalable platform for OLTP and Business Intelligence. The current release of Postgres-XL is based on PostgreSQL 9.2, so it lacks all the improvements made to PostgreSQL over the last three years.

2ndQuadrant and other companies are working on bringing distributed scalability into PostgreSQL core as well as building tools and extensions outside the core. As part of that, Postgres-XL has a number of features that we’d like to bring back into core PostgreSQL, so 2ndQuadrant has picked up the task of updating the Postgres-XL code base to the latest PostgreSQL release as the first step. After more than 3 months work, PostgreSQL 9.5 is still in alpha stage, so we wanted to give a progress report on how the work is proceeding. I also need to say the magic words: This ongoing work on Postgres-XL is part of the AXLE project, funded by the European Union under grant agreement 318633.

Preparation for the Merge

Since PostgreSQL and Postgres-XL both use GIT as source control system, it makes the merge process much simpler since GIT provides many tools to assist the process. But as soon as we tried the merge, we faced the first hurdle.

We realised that the current Postgres-XL repository is based on an older minor 9.2 release of PostgreSQL. That means there were commits and changes in the Postgres-XL master branch which either never made to PostgreSQL’s master branch or had different commit ids. So merge with PostgreSQL master branch threw a lot more conflicts than what we would have other expected. So the first task we must accomplish was to rebase the Postgres-XL 9.2 repository on a later commit point. This obviously required careful treading, making sure that nothing breaks during the process. Once we had the basic rebase done, we also merged all Postgres-XL bug fixes and enhancements, created a Postgres-XL 9.2 stable branch and merged the 9.2 br

[continue reading]

Posted by Yann Larrivee on 2015-08-26 at 00:50:48

ConFoo is once more seekconfooing passionate speakers for the upcoming conference.

The event is happening in Montreal, Canada, between February 24th and 26th, 2016. It is an exciting conference for web developers with speakers from all over the world. It unites many web programming languages under one roof, as well as other topics related to web development. The call for papers closes on September 20th.

ConFoo renews 50% of its speakers each year. If you’re new to this conference, you should definitely submit.

If you would just like to attend, there is a discount until October 13th.

Posted by Terry Erisman in CitusData on 2015-08-25 at 17:13:49

We are pleased to announce that the PGConf Silicon Valley breakout sessions and tutorials are now posted on the conference website. Our outstanding Conference Committee has worked through the large pool of submissions and selected the sessions which will be presented November 17-18, 2015 at the South San Francisco Conference Center. As the first dedicated PostgreSQL conference in the Bay Area, the conference is a great opportunity for PostgreSQL community members to exchange technical knowledge and make new connections. Early Bird registration prices are available through October 4, 2015.

Breakout Sessions

The breakout sessions are organized into five tracks:

  • DevOps
  • In the Trenches
  • Features
  • Hackery
  • PostgreSQL at Scale

The outstanding list of speakers include representatives from companies including TripAdvisor, Urban Airship, Square, Heap, Chartio, AWS, Rackspace, Joyent, and Heroku. The breakout sessions, speakers, and talk descriptions are available on the conference website.


The tutorials will be presented by leading PostgreSQL practitioners and range from beginner to advanced level. Presenters include representatives from 2ndQuadrant, EnterpriseDB, Citus Data, OmniTI, and PostgreSQL Experts. Check the conference website for talk descriptions and speaker information.

More Information Coming Soon

The schedule for the tutorials and breakout sessions will be posted to the PGConf Silicon Valley conference website in the near future. We will also post a series of interviews with conference speakers who will provide insights into their sessions and what they are looking forward to at the conference. 

Take advantage of Early Bird pricing before it ends on October 4th. I hope to see you PGConf Silicon Valley!


PGConf Silicon Valley Breakout Sessions and Tutorials are Now Posted

November 17-18, 2015 at the South San Francisco Conference Center

ALTER TABLE has been known for many years in the Postgres ecosystem as being a command taking systematically an ACCESS EXCLUSIVE lock on the relation being modified, preventing all operations on the relation in parallel. Those locks are getting more and more relaxed, with for example in Postgres 9.4 the following commands that got improvements:

  • ALTER COLUMN SET and ALTER COLUMN RESET for attribute options

In 9.5 as well those commands have been improved:


Now, Postgres 9.6, which is currently in development, brings in more lock reduction, with the following commit:

commit: 47167b7907a802ed39b179c8780b76359468f076
author: Simon Riggs <>
date: Fri, 14 Aug 2015 14:19:28 +0100
Reduce lock levels for ALTER TABLE SET autovacuum storage options

Reduce lock levels down to ShareUpdateExclusiveLock for all
autovacuum-related relation options when setting them using ALTER TABLE.

Add infrastructure to allow varying lock levels for relation options in
later patches. Setting multiple options together uses the highest lock
level required for any option. Works for both main and toast tables.

Fabrízio Mello, reviewed by Michael Paquier, mild edit and additional
regression tests from myself

Code speaking, ALTER TABLE SET has been improved to be able to define different types of locks depending on the parameter touched, and in the case of this commit all the parameters tuning autovacuum and auto-analyze at relation level have been updated to use SHARE UPDATE EXCLUSIVE LOCK. In short, this allows read as well as write operations to occur in parallel of the ALTER TABLE, something that will definitely help leveraging activity bloat on such relations.

Note as well that when multiple subcommands are used, the stronger lock of the whole set is taken for the duration of the ALTER TABLE command. So for example, should an update on the parameter fillfactor be mixed with a

[continue reading]

Posted by Gulcin Yildirim in 2ndQuadrant on 2015-08-23 at 16:01:35

PgBouncer is lightweight connection pooler for PostgreSQL.

The latest release of PgBouncer was announced on 1st of August 2015. In this blog post we’ll talk about the major new improvements of PgBouncer.

Main new features of PgBouncer

Load user password hash from postgres database

With this hot feature, PgBouncer allows loading user’s password from database with two config parameters which are auth_user and auth_query.

Note: These config parameters are defined in the configuration file pgbouncer.ini.

  • auth_user
    If auth_user is set, any user not specified in auth_file will be queried from pg_shadow in the database using auth_user. Auth_user’s password will be taken from auth_file. This parameter can be set per-database too.

    Note: auth_file is the name of the file to load user names and passwords from.

  • auth_query
    This parameter allow us to write a SQL query to load user’s password from database. It runs under auth_user.See the default query below:
    SELECT usename, passwd FROM pg_shadow WHERE usename=$1

Pooling mode can be configured both per-database and per-user

With this feature, independent of the main pooling mode, clients can now connect to different databases with one of the 3 pooling modes described below. This is also applicable to users as well. For example, if the pooling mode is session pooling, a specific user can be configured to use transaction pooling. This gives us database-level and user-level flexibility to apply more appropriate pooling options.

PgBouncer provides 3 connection pooling modes:

  • Session pooling
    During the lifetime of a client connection, an existing server connection is assigned to the client and after the client disconnection, the assigned server connection is put back to the connection pool.
  • Transaction pooling
    In this mode, a server connection is not assigned to a connected client immediately: only during a transaction a server connection is assigned. As soon as the transaction is over, the connection is put back into the pool.
  • Statement pooling
    This is similar to the transa

[continue reading]

Posted by Shaun M. Thomas on 2015-08-21 at 16:32:15

PG Phriday: Basic Partitioning

Most PGDB (PostgreSQL) users who are familiar with partitioning use the method described in the partitioning documentation. This architecture comes in a fairly standard stack:

  • One empty base table for structure.
  • At least one child table that inherits the base design.
  • A trigger to redirect inserts based on the partitioning scheme.
  • A constraint on each child table to enforce the partition scheme, and help the planner exclude child partitions from inapplicable queries.

It looks simple, but there’s a lot of SQL involved. Let’s take the sensor_log table we introduced a couple weeks ago] and turn it into a partitioned structure.

TRUNCATE TABLE sensor_log;
CREATE TABLE sensor_log_part_2012 (
    CHECK (reading_date >= '2012-01-01' AND
           reading_date < '2013-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2013 (
    CHECK (reading_date >= '2013-01-01' AND
           reading_date < '2014-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2014 (
    CHECK (reading_date >= '2014-01-01' AND
           reading_date < '2015-01-01')    
) INHERITS (sensor_log);
CREATE TABLE sensor_log_part_2015 (
    CHECK (reading_date >= '2015-01-01' AND
           reading_date < '2016-01-01')    
) INHERITS (sensor_log);
    WHEN 2015 THEN
      INSERT INTO sensor_log_part_2015 VALUES (NEW.*);
    WHEN 2014 THEN
      INSERT INTO sensor_log_part_2014 VALUES (NEW.*);
    WHEN 2013 THEN
      INSERT INTO sensor_log_part_2013 VALUES (NEW.*);
    WHEN 2012 THEN
      INSERT INTO sensor_log_part_2012 VALUES (NEW.*);
$$ LANGUAGE plpgsql;
CREATE TRIGGER t_correct_partition
   FOR EACH ROW EXECUTE PROCEDURE sensor_partition();

[continue reading]

Posted by Andreas Scherbaum on 2015-08-20 at 20:17:49
Andreas 'ads' Scherbaum

What happens, if you ask the Chinese PostgreSQL community for a Meetup-like event, one or two speakers? You end up getting a full day conference ;-)


On September 12, Saturday, a full day PostgreSQL conference will take place at the Pivotal office in Beijing. If you want to attend, please sign up here.

The full address is: 17/F, South Block, Tower C, Raycom Info Tech Park, NO.2, Kexueyuan South Road, Beijing 100190, China.


Continue reading "PostgreSQL Day in Beijing"

At this year's PGCon, we gave a talk on pg_shard that included a futuristic pg_shard demo: a distributed table with JSONB fields, backed by a dynamically changing row and columnar store. The demo is based on the Github archive data, which comprises a log of all events across all public Github repositories with detailed metadata in JSON format.

The pg_shard extension can store very large event logs in a PostgreSQL table by transparently sharding the table across many servers. We envision a use-case where new events are added to the table in real-time and frequent look-ups occur on data less than 7 days old. Data older than 7 days is compressed inside the database using cstore_fdw. A video of the pg_shard talk and demo is available on our website and at the end this post.

In our demo, the events table was range-partitioned by time, such that each shard contains events for a specific hour of the day. One of the advantages of range-partitioning a table by time is that older shards are read-only, which makes it easier to perform certain operations on the shards, such as replacing the shard with an equivalent cstore_fdw table. We consider this demo to be futuristic since pg_shard does not yet have functions to create a range-partitioned table. It can only be set up by manually changing the pg_shard metadata tables, for which we created several PL/pgSQL functions.

The architecture used in the demo resembles a shard assembly line as shown below. In the first stage of the assembly line the shard is filled wih data, after that the shard is frequently read by SELECT queries, and after 7 days the shard is compressed for archival. When pg_shard receives an INSERT, it compares the timestamp of the time ranges of the shards in the metadata tables to find the shard for the current hour. Assuming the INSERTS only happen for the current time, then there is only ever one shard to which new data is being written. At the end of the hour, INSERTs will start going into a new shard. The shard for the next hour needs to be set up in ad

[continue reading]

On 30th of July, Andrew Dunstan committed patch: Add IF NOT EXISTS processing to ALTER TABLE ADD COLUMN   Fabrízio de Royes Mello, reviewed by Payal Singh, Alvaro Herrera and Michael Paquier. Nice. I always like when there is new “IF EXISTS" or “IF NOT EXISTS", because it makes my life as dba simpler. This […]
Posted by Josh Berkus on 2015-08-18 at 23:58:57

For September, two of the founders of PipelineDB are going to be introducing this streaming PostgreSQL fork. Come learn what PipelineDB is about and how you'd use it alongside PostgreSQL. This meetup will be at a new venue, If(we), a social networking company which uses PostgreSQL together with internal databases. RSVP on meetup.

P.S. we are looking for a lightning talk for this meetup! Contact

Posted by Glyn Astill on 2015-08-18 at 18:52:44
Prior to pg 9.1, you could connect to any database with psql regardless of encoding and you’d get the server encoding as your client encoding unless you set it: That wasn’t quite right; the client_encoding is a lie. On a modern psql version that’s quite rightly prevented: This is not an issue if you want […]
Posted by Julien Rouhaud in Dalibo on 2015-08-18 at 14:03:34


The work memory, or work_mem is one of the hardest thing to configure. It can be used for various purposes. It’s mainly used when sorting data or creating hash tables, but it can also be used by set returning functions using a tuplestore for instance, like the generate_series() function. And each node of a query can use this amount of memory. Set this parameter too low, and a lot of temporary files will be used, set it too high and you may encounter errors, or even an Out Of Memory (OOM) depending on your OS configuration.

I’ll focus here on the amount of memory needed when sorting data, to help you understand how much memory is required when PostgreSQL runs a sort operation.

Truth is out

I often hear people say there is a correlation between the size of the temporary files generated and the amount of data needed. It’s wrong, you can’t make any assumption on the value of work_mem based on the size of a sort temporary file.

It’s because when the data to be sorted don’t fit in the allowed memory, PostgreSQL switches to an external sort. In addition to the currently used memory, a temporary file is used multiple times, to avoid wasting disk space. If you want more details on this, the relevant source code is present in tuplesort.c and logtapes.c. As a brief introduction, the header of tuplesort.c says:

[…] This module handles sorting of heap tuples, index tuples, or single Datums (and could easily support other kinds of sortable objects, if necessary). It works efficiently for both small and large amounts of data. Small amounts are sorted in-memory using qsort(). Large amounts are sorted using temporary files and a standard external sort algorithm.

See Knuth, volume 3, for more than you want to know about the external sorting algorithm. We divide the input into sorted runs using replacement selection, in the form of a priority tree implemented as a heap (essentially his Algorithm 5.2.3H), then merge the runs using polyphase merge, Knuth’s Algorithm 5.4.2D. The logical “tapes” used by Algorithm D are imp

[continue reading]

Q: So, what is TABLESAMPLE actually good for?

A: To speed things up that can be answered from a small sample instead of the whole population!

Like so:

CREATE TABLE public.ts_test
  id integer NOT NULL DEFAULT nextval('ts_test_id_seq'::regclass),
  val double precision,
  CONSTRAINT ts_test_pkey PRIMARY KEY (id)

Insert 100 million random values in the range 0.0 <= x < 1.0:

INSERT INTO public.ts_test (val)
    generate_series(1, 100000000);

And now, some queries to get the standard deviation.

For the whole population:

SELECT round(stddev_pop(val)::numeric,5) FROM ts_test;

0.28869 in 15005 ms
Error: 0%
Speedup: 1x

For a sample of size 20000 (good for 1% error and 99% confidence):

SELECT round(stddev_samp(val)::numeric,5) FROM ts_test TABLESAMPLE BERNOULLI(0.02);

0.28867 in 2387 ms
Error: < 1%
Speedup: 6.3x

SELECT round(stddev_samp(val)::numeric,5) FROM ts_test TABLESAMPLE SYSTEM(0.02);

0.28791 in 25 ms
Error: < 1%
Speedup: 600x

With a table that large, both methods BERNOULLI and SYSTEM are accurate enough to give results well below the 1% error target.

(For potential problems with SYSTEM, see this post)

SYSTEM aces BERNOULLI with a speedup of 600x vs. 6.3x relative to a full population query.

And this is hard to beat...
On 17th of July, Robert Haas committed patch: Add new function pg_notification_queue_usage.   This tells you what fraction of NOTIFY's queue is currently filled.   Brendan Jurd, reviewed by Merlin Moncure and Gurjeet Singh. A few further tweaks by me. LISTEN / NOTIFY are very cool features, that don't get enough recognition. For completeness sake […]
Posted by Federico Campoli on 2015-08-17 at 21:19:00
Friday 14th August we kicked off the Brighton PostgreSQL  Meetup.
We had a nice evening with cool people all togheter discussing about PostgreSQL and how we can run effectively the meetup.

We decided to have a regular monthly meetup hosted around Brighton, possibly, by companies or any suitable venue.

The next meetup will be the 25th of September and this time there will be some PostgreSQL talks. The general interest favours the standby servers and the streaming replication. Another topic could be the query planning and execution.
I can cover both of the topics. We just need to decide how many talks will be in one hour session.

After the meetup we’ll head to the nearest pub for a nice beer.

I can only say a big thank you, to all the meetup members.

That’s all for now.  
On 7th of July, Andres Freund committed patch: Add psql PROMPT variable showing the pid of the connected to backend.   The substitution for the pid is %p.   Author: Julien Rouhaud Discussion: <a href=";a=object;h=116262CF971C844FB6E793F8809B51C6E99D48">116262CF971C844FB6E793F8809B51C6E99D48</a> It was over a month ago, but in the mean time I was on vacation, and I wanted to test […]
Posted by Andrew Dunstan in pgExperts on 2015-08-17 at 18:19:00
Found in the last couple of days.

1. If you have more than one active database, and you're using text format logs, make sure you have %d (the database name) in your log line prefix. Not knowing which database the log line comes from can be confusing and makes life more difficult.

2. If you report on a possible database error, report from the Postgres logs, not from your client's logs. We need to see things from the database's point of view. In the case in point, the Postgres logs conatined no relevant errors at all, i.e. this wasn't a Postgres error.

3. Found when trying to resize a data volume on Amazon: certain Linux disk partitioning utilities don't optimally align partitions by default, notably sfdisk and cfdisk. Stay away from those, or make sure you explicitly choose proper alignment for partitions. Programs that do the right thing include fdisk and parted. In general you are probably best to stick to using parted.
Posted by David Fetter in Farmers Business Network on 2015-08-17 at 16:00:49
PostgreSQL has MVCC, which is awesome until you want an approximate count on what could be a long view...until now. Here's how. Continue reading "Approximate Counts"
Posted by Hans-Juergen Schoenig in Cybertec on 2015-08-17 at 07:44:44
There are still many people out there, who cannot spell the name of their favorite database. “PostgreSQL”, “PostgresSQL”, “Postgre SQL” … the list goes on and on. Can we blame those people? Actually no. Some words are simply pretty tricky. Each of us has failed once in a while. After all database work is not […]

So I had this PostgreSQL database that was getting a bit too big, and since it was really only for analytics, I figured it would be a good fit for putting in Hadoop+Hive instead.

(For those not completely familiar with this: Hadoop is sort of a job tracker and distributed file system. Hive is an SQL-like layer on top of that. I know the cool kids are now using Spark. Maybe for another day.)

The first thing you need to learn about the Hadoop ecosystem is its idiosyncratically fragmented structure. With PostgreSQL, you basically have the community website, the community mailing lists, the community source code distribution, the community binaries, and a handful of binaries made by Linux distributions. If you search the web for a problem with PostgreSQL, you will normally gets hits on one or more of: the documentation, the mailing lists, third-party mirrors of the mailing lists, or Stack Overflow. With Hadoop, you have the resources provided by the Apache Software Foundation, including the source distribution, bug tracker, documentation, and then bunch of commercial vendors with their parallel universes, including their own mutually incompatible binary distributions, their own copy of the documentation, their own mailing lists, their own bug trackers, etc. When you search for a problem with Hadoop, you will typically get hits from three separate copies of the documentation, about eight mailing lists, fifteen tutorials, and one thousand blog posts. And about 20 unanswered posts on Stack Overflow. Different vendors also favor different technology extensions. So if, say, you read that you should use some storage method, chances are it’s not even supported in a given distribution.

The next thing to know is that any information about Hadoop that is older than about two years is obsolete. Because they keep changing everything from command names to basic architecture. Don’t even bother reading old stuff. Don’t even bother reading anything.

So Hive. The basic setup is actually fairly well documented. You set up a Hadoop clu

[continue reading]

Posted by Josh Berkus in pgExperts on 2015-08-14 at 23:14:00
Since we're still working on the schedule, I wanted to give folks a preview of the talks we've picked for pgConfSV.  We still have a few more talks to pick and the tutorials to finalize.  Regardless, it's looking like a really exciting lineup! 
We'll have folks from Heap, Wanelo, TreasureData, Rackspace,, EMC, Joyent, Square, and more.   The sad part will be missing three talks in each timeslot.

Note that this set of talks is not final; we're still swapping a couple of things, and some speakers have not confirmed.

pgConfSV is November 17th and 18th in South San Francisco.  If you're going to AnsibleCon, that's the day after, so you can do both!