PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

Finding its origin as a fix for pg_rewind where rewind process could fail if a file marked as listed in the source server was removed before fetching its data (see more details here, this could impact temporary files or relation file for example), here is a small feature of PostgreSQL 9.5 that may be useful for application developers:

commit: cb2acb1081e13b4b27a76c6b5311115528e49c59
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Sun, 28 Jun 2015 21:35:46 +0300
Add missing_ok option to the SQL functions for reading files.

This makes it possible to use the functions without getting errors, if there
is a chance that the file might be removed or renamed concurrently.
pg_rewind needs to do just that, although this could be useful for other
purposes too. (The changes to pg_rewind to use these functions will come in
a separate commit.)

The read_binary_file() function isn't very well-suited for extensions.c's
purposes anymore, if it ever was. So bite the bullet and make a copy of it
in extension.c, tailored for that use case. This seems better than the
accidental code reuse, even if it's a some more lines of code.

Michael Paquier, with plenty of kibitzing by me.

Postgres has a set of superuser functions allowing to have a look at the files of PGDATA from the SQL interface:

  • pg_ls_dir, to list the files of a given path.
  • pg_read_file, to read a given file and return its data as text.
  • pg_read_binary_file, to read a given file and return its data as bytea.
  • pg_stat_file, to get statistics about a file similarly to the system function stat().

With the above commit, all those functions have gained a missing_ok option allowing to not fail should a file (or path for pg_ls_dir) selected by the user not exist, returning a NULL result instead (or more or less an empty record for pg_ls_dir). In each case, the default is false.

In the case of pg_read_file and pg_read_binary_file, this new boolean flag is available with a new 4-argument version of those functions (for backward-compatibility purposes).

=# SELEC

[continue reading]

Posted by Barry Jones on 2015-07-03 at 15:04:23
PostgreSQL is becoming the relational database of choice for web development for a whole host of good reasons. That means that development teams have to make a decision on whether to host their own or use a database as a service provider. The two biggest players in the world of PostgreSQL are Heroku PostgreSQL and Amazon RDS for PostgreSQL. Here's a detailed comparison.
Posted by Christoph Berg in credativ on 2015-07-02 at 18:03:35

Today saw the release of PostgreSQL 9.5 Alpha 1. Packages for all supported Debian and Ubuntu releases are available on apt.postgresql.org:

deb http://apt.postgresql.org/pub/repos/apt/ YOUR_RELEASE_HERE-pgdg main 9.5

The package is also waiting in NEW to be accepted for Debian experimental.

Being curious which PostgreSQL releases have been in use over time, I pulled some graphics from Debian's popularity contest data:

Before we included the PostgreSQL major version in the package name, "postgresql" contained the server, so that line represents the installation count of the pre-7.4 releases at the left end of the graph.

Interestingly, 7.4 reached its installation peak well past 8.1's. Has anyone an idea why that happened?

In the beginning

There was Unix, Linux and Windows. They all run on hardware and that hardware all has bugs. What is the best way to work around hardware bugs? Backups.

You haven't had bad hardware, only bad developers? That's o.k., we have a solution to them too. It is called backups.

You haven't had bad hardware or bad developers, just bosses who still demand to have direct access to the data even though they haven't proven an ability to extract useful information without an extreme amount of hand holding? That's o.k., we have a solution to them too. It is called backups.

You haven't had any of the above? Lucky you! Can we move to your country or will you share whatever it is that you are putting in your tea? It sounds fantastic.

Either way, we have a solution to your data confidence, it is called backups and that is what this training is about.

See more here!

Register here!

Yay, the first alpha of PostgreSQL 9.5 is out!  This has lots of cool stuff in it, which you can read about elsewhere.  What I'm posting here is a new experiment: offering up Docker images for testing the alphas and betas.

TL:DR = Image here, information and instructions on the wiki.

This occurred to me last week at DockerCon (naturally enough); one of the reasons more people don't test the PostgreSQL beta releases is that it can be a pain to install them, especially with all of the stuff you want to test.  So I've just put a couple days' work into making it easier for you: all you gotta do is install the image and you can test away.

And, since you're testing in a container, you don't have to worry about messing up the PostgreSQL installation on the desktop/laptop/server you normally use.  If you feel like doing your testing in a public cloud, many of them have ways to run containers as cloud servers, so that's easy to do.

I created this image crammed full of stuff: all of contrib, jsquery, pgxn, PL/python and PL/perl, a sample database, etc., so that you'd have something to test.  I wasn't able to include PL/Lua due to build issues, nor PostGIS because I'm building from source, and building PostGIS from source is kinda scary.  If you want to help me build a better test image, though, the Dockerfile project for it is here.

In the meantime, I've removed another excuse for not testing PostgreSQL 9.5.  So what are you waiting for?  Test already!
Posted by Julien Rouhaud in Dalibo on 2015-07-02 at 10:08:03

After so much time missing this features, HypoPG implements hypothetical indexes support for PostgreSQl, available as an extension.

Introduction

It’s now been some time since the second version of PoWA has been announced. One of the new feature of this version is the pg_qualstats extension, written by Ronan Dunklau.

Thanks to this extension, we can now gather real-time statistics to detect missing indexes, and much more (if you’re interested in this extension, you should read Ronan’s article about pg_qualstats). And used with PoWA, you have an interface that allows you to find the most consuming queries, and will suggest you the missing indexes if they’re needed.

That’s really nice, but now a lot of people come with this natural question: Ok, you say that I should create this index, but will PostgreSQL eventually use it ?. That’s actually a good question, because depending on many parameters (in many other things), PostgreSQL could choose to just ignore your freshly created index. That could be a really bad surprise, especially if you had to wait many hours to have it built.

Hypothetical Indexes

So yes, the answer to this question is hypothetical indexes support. That’s really not a new idea, a lot of popular RDBMS support them.

There has already been some previous work on this several years ago, presented at pgCon 2010, which was implenting much more than hypothetical indexes, but this was a research work, which means that we never saw those features coming up in PostgreSQL. This great work is only available as a fork of a few specific PostgreSQL versions, the most recent being 9.0.1.

lightweight implementation: HypoPG

I had quite a different approach in HypoPG to implement hypothetical indexes support.

  • first of all, it must be completely pluggable. It’s available as an extension and can be used (for now) on any 9.2 or higher PostgreSQL server.
  • it must be as non intrusive as it’s possible. It’s usable as soon as you create the extension, without restart. Also, each backend has it’s own set of hypothetical inde

[continue reading]

Posted by Greg Sabino Mullane in End Point on 2015-07-01 at 18:22:00

Back in the old days, upgrading Postgres required doing a pg_dump and loading the resulting logical SQL into the new database. This could be a very slow, very painful process, requiring a lot of downtime. While there were other solutions (such as Bucardo) that allowed little (or even zero) downtime, setting them up was a large complex task. Enter the pg_upgrade program, which attempts to upgrade a cluster with minimal downtime. Just how fast is it? I grew tired of answering this question from clients with vague answers such as "it depends" and "really, really fast" and decided to generate some data for ballpark answers.

Spoiler: it's either about 3.5 times as fast as pg_dump, or insanely fast at a flat 15 seconds or so. Before going further, let's discuss the methodology used.

I used the venerable pgbench program to generate some sample tables and data, and then upgraded the resulting database, going from Postgres version 9.3 to 9.4. The pgbench program comes with Postgres, and simply requires an --initialize argument to create the test tables. There is also a --scale argument you can provide to increase the amount of initial data - each increment increases the number of rows in the largest table, pgbench_accounts, by one hundred thousand rows. Here are the scale runs I did, along with the number of rows and overall database size for each level:

Effect of --scale
--scale Rows in pgbench_accounts Database size
100 10,000,000 1418 MB
150 15,000,000 2123 MB
200 20,000,000 2829 MB
250 25,000,000 3535 MB
300 30,000,000 4241 MB
350 35,000,000 4947 MB
400 40,000,000 5652 MB
450 45,000,000 6358 MB
500 50,000,000 7064 MB
550 55,000,000 7770 MB
600 60,000,000 8476 MB

To test the speed of the pg_dump program, I used this simple command:

$ pg_dump postgres | psql postgres -q -p 5433 -f -

I did make one important optimization, which was to set fsync off on the target database (version 9.4). Although this setting should never be turned off in production - or anytime y

[continue reading]

PostGIS 2.2 is planned to reach feature freeze June 30th 2015 so we can make the September PostgreSQL 9.5 curtain call with confidence. Great KNN enhancements for PostgreSQL 9.5 only users. I've been busy getting all my ducks lined up. A lot on tiger geocoder and address standardizer extension to be shipped with windows builds, story for later. One other feature we plan to ship with the windows PostGIS 2.2 builds is the ogr_fdw ogr_fdw Foreign data wrapper extension. I've been nagging Paul Ramsey a lot about issues with it, this in particular https://github.com/pramsey/pgsql-ogr-fdw/issues/25, and after some prodding, he finally put his nose in and fixed them and pinged Even Rouault for some help on a GDAL specific item.

Needless to say, I've been super happy with the progress and support I've gotten with ogr_fdw development and really enjoying my ogr_fdw use. The XLSX reading a file saved after the connection was open required a fix in GDAL 2.0 branch (which missed GDAL 2.0.0 release, so because of this, this new package contains a GDAL 2.0.1ish library. Hopeful GDAL 2.0.1 will be out before PostGIS 2.2.0 comes out so I can release without guilt with this fix.


Continue reading "PostgreSQL OGR FDW update and PostGIS 2.2 news"

jsonb is coming up with a set of new features in Postgres 9.5. Most of them have been introduced by the following commit:

commit: c6947010ceb42143d9f047c65c1eac2b38928ab7
author: Andrew Dunstan <andrew@dunslane.net>
date: Tue, 12 May 2015 15:52:45 -0400
Additional functions and operators for jsonb

jsonb_pretty(jsonb) produces nicely indented json output.
jsonb || jsonb concatenates two jsonb values.
jsonb - text removes a key and its associated value from the json
jsonb - int removes the designated array element
jsonb - text[] removes a key and associated value or array element at
the designated path
jsonb_replace(jsonb,text[],jsonb) replaces the array element designated
by the path or the value associated with the key designated by the path
with the given value.

Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core
by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.

Note that some slight modifications have been done after this commit though. So the list of new operators and functions presented here is not exactly the one listed in this commit log but the one that will be included in Postgres 9.5 alpha 1 that will be released next week. Also, something worth mentioning is that portion of this work is available as the extension jsonbx that is compatible even with 9.4 installations (see here).

So, 4 new operators have been added in the existing jsonb set in 9.5.

jsonb || jsonb for concatenation on two jsonb fields, where two things can be noted. First, key name ordering is done depending on their names (this is not surprising as on-disk-format is a parsed tree). Then, the last value of a given key will be used as jsonb enforces key uniqueness, even of course if values are of json type.

=# SELECT '{"a1":"v1","a3":"v3"}'::jsonb || '{"a2":"v2"}'::jsonb AS field;
                 field
--------------------------------------
 {"a1": "v1", "a2": "v2", "a3": "v3"}
(1 row)
=# SELECT '{"a1":{"b1":"y1","b2":"y2"},"a2":"v2"}'::jsonb ||
          '{"a1":"v1"}'::jsonb AS field;
          field

[continue reading]

Posted by Josh Berkus in pgExperts on 2015-06-26 at 21:36:00
It's boomtime in San Francisco, which means we're also full swing into The Cult of the NDA.  This includes many of our venues for SFPUG meetups; they require signing a confidentiality disclaimer before entering their office.  While I question the utility of this, since these hosts are providing us with free space, food, and drink, I'm really not in a position to argue.  So I launched a survey a month ago to see how much of a problem this is for our members.  I thought it might be useful to share the results with other PUG leaders so that they can also develop policies around this.

Here's the results. First, let me give you the overall results in a pie chart.  Scroll down to the bottom of my post for my suggested policy conclusions.


Incidentally, these are all graphed using iPython Notebook and Pylab, which is awesome way to do one-off graphs.  Here's the code for that graph:

    %matplotlib inline
    import psycopg2
    from pylab import *

    conn=psycopg2.connect('dbname=sfpug host=127.0.0.1')
    cur = conn.cursor()
    cur.execute("""SELECT att, count(*) as members
        FROM ndasurvey GROUP BY att ORDER BY att""");

    labels = []
    fracs = []
    explode = []
    for rec in cur:
        labels.append(rec[0])
        fracs.append(rec[1])
        explode.append(0.05)

    figure(1, figsize=(6,6))
    pie(fracs, explode=explode, labels=labels,
                    autopct='%1.0f%%', shadow=True, startangle=90)
    title('Attitudes Towards Venue NDAs: Overall')
    show()


So overall we have a somewhat split distribution.  BTW, here's the definitions of the attitudes:

  • won't attend: I won't go to a meetup which requires signing
  • needs review: I/my employer must review the agreement first
  • depends on text: depends on what agreement says
  • go anway: I don't like them, but I'll still go
  • don't care: I don't care, whatever
Does this differ for which of our three divisions (San Francisco, East Bay, and South Bay) it is?






So, East Bay attendees don't seem to care in general, and South Bay attendees are m

[continue reading]

Posted by Josh Berkus on 2015-06-26 at 18:25:22

main-image

In the wake of DockerCon, the July meetup in San Francisco will be all about Linux Containers:

  • Chris Winslett of Compose.io will present Governor, an auto-failover system for managing a cluster of PostgreSQL replicas.
  • Madhuri Yechuri of ClusterHQ will explain Flocker, which uses the brand-new plugin interface for Docker in order to support volume management and data migration for containers.

If you're not familiar with all of this containerish stuff, Josh Berkus will do a 5-minute primer on using Postgres with Docker at the beginning of the session. Hosted by Salesforce.com in their new Rincon Cafe.

(note: Salesforce will require attendees to RSVP with their full names, and sign a confidentiality statement, which will be supplied to attendees in advance)

As a database, PGDB (PostgreSQL) is fairly standard in its use of SQL. Developers of all colors however, might have trouble switching gears and thinking in set operations, since so many language constructs focus on conditionals and looping. Last week in the performance pitfalls series, we discussed a bit of Set Theory, and how ignorance of its implications can be disastrous. But what about the more mundane?

What happens, for instance, when we treat a database like a programming language?

This time, the example tables will emulate a pretty terrible ordering system with a user table of 1000 users, and a million orders distributed over roughly the last three years. Here it is, in all its glory:

CREATE TABLE sys_user
(
    user_id      SERIAL       NOT NULL,
    username     VARCHAR      NOT NULL,
    password     VARCHAR      NOT NULL,
    last_order   TIMESTAMPTZ  NULL,
    created_dt   TIMESTAMPTZ  NOT NULL DEFAULT now(),
    modified_dt  TIMESTAMPTZ  NOT NULL DEFAULT now()
);
 
INSERT INTO sys_user (username, password, created_dt, modified_dt)
SELECT 'user' || a.id,
       md5('use-bcrypt-instead' || 'user' || a.id || 'somepassword'),
       now() - (a.id % 1000 || 'd')::INTERVAL,
       now() - (a.id % 100 || 'd')::INTERVAL
  FROM generate_series(1, 1000) a(id);
 
ALTER TABLE sys_user ADD CONSTRAINT pk_user_id
      PRIMARY KEY (user_id);
 
ANALYZE sys_user;
 
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    user_id      INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (product_id, item_count, user_id, order_dt, valid_dt)
SELECT (a.id % 100000) + 1, (a.id % 100) + 1, (a.id % 1000) + 1,
       now() - (a.id % 1000 || 'd')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CON

[continue reading]

I am back from the PGCon 2015 and was fortunate enough to present my first paper on "Native Compilation" technology. Also got opportunity to meet most creamy  folks of PostgreSQL community, found everyone to be very polite and easy to go.

As part of this Native Compilation technology, I mostly focused on the Native Compilation of Relation, which we call it as Schema Binding.

Some of the details from presentation are as below (For complete presentation please visit Go Faster With Native Compilation):

Native Compilation:
Native Compilation is a methodology to reduce CPU instructions by executing only instruction specific to given query/objects unlike interpreted execution. Steps are:

  • Generate C-code specific to objects/query.
  • Compile C-code to generate DLL and load with server executable.
  • Call specialized function instead of generalized function.
Schema Binding:
Native Compilation of relation is called the Schema Binding. Since most of the properties of a particular remains same once it is created, so its data gets stored and accessed in the similar patter irrespective of any data. So instead of accessing tuples for same relation in generic way, we create a specialized access function for the relation during its creation and the same gets used for further query containing that table.


It gives performance improvement of upto 30% on standard TPC-H benchmark.


The nseq datatype allows to store DNA and RNA sequences consisting of the letters AGCT or AGCU respectively in PostgreSQL.

By encoding four bases per Byte, it uses 75 percent less space on disk than text. While this idea is obvious and far from being novel, it still is one of the most efficient compression schemes for DNA/RNA, if not the most efficient.

As of now, nseq only supports very basic native operations. The main shortcoming is, that it has to be decompressed into text hence expanding by 4x, for e. g. substring operations and the like.

This will change.

Enough said - here it is...
A short blog to enable write-able mongo_fdw extension in PostgreSQL 9.4. PostgreSQL provides a powerful feature called Foreign Data Wrappers (FDW), which allows DBAs to connect to other data sources from within PostgreSQL. Foreign Data Wrapper implementation is based on SQL/MED, that's supported from PostgreSQL 9.1 version onwards, which means we can now access remote database through PostgreSQL seamlessly. Today we have variety of FDW's available, in this blog, we will be compiling a latest version of write-able FDW "mongo_fdw" to access MongoDB.

Latest mongo_fdw extension is based on Mongo-c-driver and Libbson. To implement mongo_fdw, first we need to compile all the dependencies required by the extension. Below are the step-by-step execution on my CentOS 7 (64bit) machine with PostgreSQL 9.4 installed.

Step 1. First install dependency packages required by Mongo-c-Driver and Libbson.
yum install git automake autoconf libtool gcc
Step 2. Clone mongo_fdw repository from Github.
git clone https://github.com/EnterpriseDB/mongo_fdw.git
Step 3. Pre-compilation require pkgconfig/pkg-config (installed in Step 1) and PostgreSQL pg_config location set in the path.
[root@localhost ~]# export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
[root@localhost ~]# export PATH=/opt/PostgreSQL/9.4/bin:$PATH

[root@localhost mongo_fdw]# type pg_config
pg_config is /opt/PostgreSQL/9.4/bin/pg_config
Step 4. Mongo_fdw compilation can be done manually or with the help of auto-compilation script (autogen.sh) provided in the bundle. Here, I will be using auto-compilation script, which will  download and install required mongo-c-driver and libbson libraries in default location(/usr/local/lib). For more details on compilation script refer to the documentation here.
cd mongo_fdw/
./autogen.sh --with-master
make
make install
After compilation, we can notice the files created in PostgreSQL home directory.
-bash-4.2$ find $PWD -name "mongo*"
/opt/PostgreSQL/9.4/lib/postgresql/mongo_fdw.so
/opt/PostgreSQL/9.4/share/postgresql/exte

[continue reading]

Posted by Shaun M. Thomas on 2015-06-23 at 20:30:24

Well, I’ve returned from PGCon 2015 in Canada, and after a couple days to decompress, it’s time to share. I wrote about the PGCon 2014 unconference after returning to Chicago last year, so I felt it was only fitting that I start there. I feel as strongly now as I did a year ago, that directly interacting with the PostgreSQL maintainers at this level helps the community thrive. Even though PGCon is generally a more developer-focused conference, being able to brainstorm with the bigwigs, even if nothing comes of it, means the ideas have been given a fair shake.

The format this year was quite a bit different than last year. I attribute this to Josh Berkus once again, in what I see as his attempt to formalize the process and give it more weight. Indeed, it’s hard to argue with his results. Just take a look at the 2015 Unconference Wiki page. It’s a deluge of information I wish we had about the 2014 talks, from attendees and talk summaries, to relevant external links and the all-important schedule. I’m a bit biased in that regard because I tend to produce and consume vast swaths of excessive information, but it’s an excellent reflection on how much the PostgreSQL community values documentation in general.

Unfortunately due to inclement weather, I missed the voting process and the first day of talks entirely. I desperately missed watching the talk selection process, though Josh said they did a lot of that electronically because several people would be late to the conference. I’m not sure how I missed that, so I’ll blame email; it deserves it anyway. Regardless, after witnessing the 2014 talk selection, I stand by my earlier assertion that it’s a sight to behold. It warms my crooked old heart to watch people so excited about technology and the amicable interaction they have with the friendly developers.

Despite the setbacks, I did attend several chats on Wednesday, which is another departure from last year’s format. In 2014, the selection process and every talk were constrained to one long Saturday and was very ad-ho

[continue reading]

Posted by Bruce Momjian in EnterpriseDB on 2015-06-20 at 18:07:50

This nice photo was taken during this year's PgCon Developer Meeting. My family was surprised at how many other developers they have met over the years. A signed copy of the photo was sold for USD ~$500 at the conference's charity auction.

The meeting was very productive, and in fact the entire conference has been productive.

Here are my slides from pgCon. First, my slides and Dmitry Dolgov's slides from the presentation on 9.5 Json Features, and also those from my lightning talk on the Rotfang Foreign Data Wrapper.
Posted by Andrew Dunstan in pgExperts on 2015-06-20 at 01:54:00
The other day I gave a lightning talk at pgCon about a Foreign Data Wrapper called Rotfang, which is for generating arbitrary amounts of random data. This is intended for generating test cases. The software supports a small number of data types natively, but also allows you to use user-supplied functions to generate data. It's available on bitbucket. A short example:
CREATE EXTENSION rotfang_fdw;

CREATE FUNCTION random_tstz (typmod int)
RETURNS timestamptz
LANGUAGE SQL as
$$
SELECT now() - (random() * 20.0) * interval '1 year'
$$;

CREATE FOREIGN TABLE rand2 (b boolean, ts timestamptz)
SERVER rotfang
OPTIONS (maxrows '10', func_ts 'random_tstz');

SELECT * FROM rand2;

Functions are great. Having cut my teeth on a database that didn’t even provide the ability to define functions, I’ve come to almost take them for granted in PGDB (PostgreSQL). However, with this kind of ubiquity, sometimes they can be overused in ways that don’t seem to be part of the common programmer lexicon. In this week’s PG Phriday series on performance-killing missteps, I’m going to talk a bit about set theory, and how a certain amount of familiarity is necessary to properly interact with a database.

One of the axioms of set theory states that f(x) = y, and x !~ y. That is, a function applied to some value produces a value that may not be equivalent to the original. Put another way, a spayed pet does not have the same reproductive capacity as a regular pet, and is thus not equal to the original. In the context of a database, this is extremely relevant because equivalence is what makes calculations and restrictions possible.

Given this, consider what a database index does. When creating an index, the database takes the value of one or more columns and essentially builds a pointer tree to quickly locate a record in ln time. This is much faster than reading every record in a table and applying a filter to remove unwanted records. That’s something even novice developers tend to know. But they don’t know how functions modify this scenario.

Given the very simplified introduction to set theory, applying a function to the column value means the database must discard equality. Remember: x !~ y. The database has indexed x, not y. So when a function is used in a WHERE clause, any index on x will be ignored. This makes perfect sense, considering there are an infinite number of possible functions, and the output of each is indeterminate. It’s not possible to predict the end result of every possible function.

To further illustrate this, we’ll reuse one of our tried-and-true examples:

CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT        

[continue reading]

After seeing an email on pgsql-general about a user willing to be able to archive WAL from a standby to store them locally and to save bandwidth by only receiving the WAL segments through a WAL stream, let's talk about a new feature of Postgres 9.5 that will introduce exactly what this user was looking for, as known as being able to archive WAL from a standby to have more complicated archiving strategies. This feature has been introduced by this commit:

commit: ffd37740ee6fcd434416ec0c5461f7040e0a11de
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Fri, 15 May 2015 18:55:24 +0300
Add archive_mode='always' option.

In 'always' mode, the standby independently archives all files it receives
from the primary.

Original patch by Fujii Masao, docs and review by me.

As mentioned in the commit message, setting archive_mode = 'always' will make a standby receiving WAL from a primary server archive the segments whose reception has been completed. While it can be interesting for even a set of nodes running on the same host to have each of them archive independently WAL segments on different partitions, this becomes more interesting when nodes are on separate hosts to be able for example to reduce the bandwidth usage as the bandwidth necessary to archive the WAL segments on the standby host is directly included in the WAL stream that a standby gets from its root node, saving resources at the same time.

Let's have a look at how this actually works with a simple set of nodes, one master and one standby running on the same host, listening respectively to ports 5432 and 5433 for example. Each node runs the following archiving configuration:

$ psql -At -c 'show archive_command' -p 5432
cp -i %p /path/to/archive/5432/%f.master
$ psql -At -c 'show archive_command' -p 5433
cp -i %p /path/to/archive/5432/%f.standby
$ psql -At -c 'show archive_mode' -p 5432
always
$ psql -At -c 'show archive_mode' -p 5433
always

So with that, both the standby and its primary node will archive their WAL segments once they are cons

[continue reading]

Posted by Denish Patel in OmniTI on 2015-06-18 at 19:02:59
Postgres 9.4 introduced an awesome feature Replication Slots. This allows you to implement Postgres replication without using any external archive management tools. Yesterday, I presented three hours long tutorial on “Out of the box Postgres 9.4 Replication using Replication slots” at PgCon (Postgres conference) in Ottawa,Canada. If you want to follow along slides with VM, you … Continue reading Postgres Replication using Replication Slots
Posted by Andrew Dunstan in pgExperts on 2015-06-14 at 15:44:00
Yesterday I proposed an Unconference talk about Set types, and I've had a couple of people ask me about what Set types are and what they would be for, so here is a brief rundown.

Say you have a table of things that you want to be able to apply some set of tags to. It might be blog posts, for example. The tags might be subject classifications, or reader ratings. Let's say for the sake of argument that it's going to be subject classifications, and that we actually have a (large) enum type to standardize that.

The classic way to do this is to have a table of {blog_post_id, tag} which will be unique on the combination. Up to now adding a tag is a nuisance - you need to make sure the tag doesn't already exist or you'll get a uniqueness violation. In 9.5 that's improved with INSERT ... ON CONFLICT IGNORE. Even then it's a bit more work that I really want to do. What I really want is something like this:
UPDATE blog_posts
SET subject_tags = subject_tags + 'my_new_subject'
WHERE post_id = 12345
and then I'll be able to do
SELECT * FROM blog_posts
WHERE subject_tags ? 'some subject'
It's also possible to do this, somewhat inefficiently, by using hstore or jsonb fields (in jsonb you'd just have a top level object). In both cases you would use dummy values - say make everything have a value of 1. But that's ugly, and error prone, and rather hackish.

Now I don't know if there is enough extra value here to justify the work and code maintenance involved. I've managed to live without it all these years without attacking what little hair I have left. But it's something where I have occasionally thought "I could do this so much more simply and elegantly if I had a set type." So I'd like to see if there is enough interest to make it into a real project.
You may have noticed that we've had a lot of churn in PostgreSQL update releases lately, doing three updates in less than two months.  We expect that we're done with that now, so it's time to get on updating all of your servers with today's update release.

Users of PostgreSQL 9.3 and 9.4 should update this weekend if possible.  This cumulative update fixes several problems with our "multixact" mechanism which have been plaguing Postgres since the release of 9.3.0.  While most users will not hit these issues, if you do, the bugs cause unrecoverable data corruption.  In other words, waiting "until you see a problem" is a really, really bad idea.

Additionally, one of the problems fixed requires cleaning up some garbage left by the version of pg_upgrade which shipped with Postgres versions 9.3.0 to 9.3.4.  If you used pg_upgrade to upgrade to one of those versions -- even if you subsequently upgraded to 9.4 -- then the server is liable to trigger a database-wide all-table autovacuum as soon as you restart after applying 9.3.9 or 9.4.2. 

If you are prepared for minutes to hours of degraded performance after applying the update, this is not a problem.  If it is a problem, though, you can do precautionary vacuuming before you apply the update.  Detailed instructions are supplied in the release notes on how to do this.

Now, go update those servers!
Posted by Bruce Momjian in EnterpriseDB on 2015-06-12 at 18:45:01

The draft Postgres 9.5 release notes are now online. This document will be adjusted regularly until the final release, which is expected later this year.

There are a lot of database engines out there. As such, a developer or DBA will naturally have varying levels of experience with each, and some of this might conflict with how PGDB (PostgreSQL) operates. These kinds of embedded misunderstandings can cause potential issues by themselves, but in this particular case, corrective action is fairly simple.

So this week, I’d like to talk about indexes. Many people treat them as a “make query faster” button, and this often results in egregious misuse. Indexes take space, require CPU resources to maintain, and bring overhead that adversely affects INSERT and UPDATE performance. Most know these drawbacks and generally acknowledge that too many indexes can be detrimental. However, with some databases, even column order can make a drastic difference. PGDB is among these.

To illustrate the point, let’s make a simple test case of one-million records in an inventory tracking table of one thousand products over the course of about three years.

CREATE TABLE sys_inventory_snapshot
(
    product_id   SERIAL       NOT NULL,
    record_dt    TIMESTAMPTZ  NOT NULL,
    item_count   INT          NOT NULL,
    order_count  INT          NOT NULL
);
 
INSERT INTO sys_inventory_snapshot (
       product_id, item_count, order_count, record_dt
)
SELECT b.id, a.id, a.id,
       now() - (a.id || 'd')::INTERVAL
  FROM generate_series(1, 1000) a(id),
       generate_series(1, 1000) b(id);
 
ALTER TABLE sys_inventory_snapshot ADD CONSTRAINT pk_inventory_snapshot
      PRIMARY KEY (product_id, record_dt);

At first glance, this looks pretty good. If we want information based on date or product, it’s right there. Things are even better if we have both values available! The query plan is also encouraging:

EXPLAIN ANALYZE
SELECT *
  FROM sys_inventory_snapshot
 WHERE record_dt >= CURRENT_DATE - INTERVAL '1 day';
 
                            QUERY PLAN                             
--------------------------------------------------------------------
 Bitmap Heap Scan ON sys_inventory_snap

[continue reading]

As title suggests,  if you are running Postgres in your environment, it is very important that you plan to upgrade Postgres with the latest announced release.

I upgraded Postgres recently in last couple of weeks. Should I upgrade again? Why?

Yes, you should plan to upgrade again.

Postgres released data corruption and security bug fix release on May 22nd and follow up release on June 4th to fix some of the issue introduced in previous release. However, the June 4th release did not fix the  multixact wraparound bugs (Thread1Thread2 ) introduced in recent versions.  The wraparound bug could cause problem starting Postgres after the crash , so it is critical to apply. The wraparound bug is more critical for 9.3 & 9.4 but it is very good idea to upgrade for other versions to make sure there isn’t any problem. Read the release notes for the further details/explanation.

Special care should be taken regarding vacuuming , if you ever ran Postgres 9.3 version and used pg_upgrade for the production databases.

What if I run Postgres 8.X (8.1,8.2,8.3.8.4) ?

Yes, absolutely!!  You are running EOL or unsupported version of Postgres. It is very important that you upgrade Postgre database to supported release  because it is most likely to hit any or all of the  security and/or data loss bugs fixed in recent Postgres releases.

What if I run Postgres 9.X  but have not applied the the latest minor release?

Yes, you should upgrade to the latest minor release (3rd digit in version numbering). The minor release only requires installing new binaries and restart the database.  It does _not_ require to use pg_dump/restore or pg_upgrade.

You should be upgrading to the following latest relevant Postgres release for your environment ASAP:

  • 9.4.4
  • 9.3.9
  • 9.2.13
  • 9.1.18
  • 9.0.22
How can I keep track of the supported Postgres releases?

Postgres releases support can be found here. It is worth to note that , if you are running Postgres 9.0,  you should plan for major version upgrade (most likely to latest Postgres 9.4.X release)  before the holiday

[continue reading]

Temporary tables are a core feature of SQL and are commonly used by people around the globe. PostgreSQL provides a nice implementation of temporary tables, which has served me well over the years. An interesting question arises when using a temporary table: What if a temporary table has the same name as a “real” table? What […]
Posted by gabrielle roth on 2015-06-12 at 00:00:00

When: 6-8pm Thursday June 18, 2015
Where: Iovation
Who: Mark Wong
What: Pg Performance

Come have a chat with our local performace expert, Mark Wong.

This presentation will take a look at the performance changes in the development version of PostgreSQL. A selection of features, some proposed and some already committed, are evaluated using synthetic open source workloads. One of the workloads used is the former OSDL’s Database Test 3 that consists of business oriented ad-hob queries with concurrent data modification.


If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Elevators open at 5:45 and building security closes access to the floor at 6:30.

The building is on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots.

See you there!


PG Partition Manager has been the most popular project I’ve ever done a significant amount of work on and I really appreciate everyone’s feedback for the roughly 2 years it’s been out there. I’ve got plenty more ideas for development and features and look forward to being able to move forward on them with this new major version released.

PostgreSQL 9.3 introduced the ability for user created, programmable background workers (BGW). 9.4 then introduced the ability to dynamically start & stop these with an already running cluster. The first thing that popped into my mind when I heard about this was hopefully having some sort of built-in scheduling system. There still hasn’t been a generalized version of anything like this, so in the mean time I studied the worker_spi contrib module. This is a very simple BGW example with a basic scheduler that runs a process with a configurable interval. This is basically all pg_partman needs for partition maintenance, and what required an external scheduler like cron before.

pg_partman_bgw.dbname
    * Required. The database(s) that `run_maintenance()` will run on. If more than one, use a comma separated list. If not set, BGW will do nothing.

pg_partman_bgw.interval
    * Number of seconds between calls to `run_maintenance()`. Default is 3600 (1 hour).
    * See further documenation on suggested values for this based on partition types & intervals used.

pg_partman_bgw.role
    * The role that `run_maintenance()` will run as. Default is "postgres". Only a single role name is allowed.

pg_partman_bgw.analyze
    * Same purpose as the p_analyze argument to `run_maintenance()`. Set to 'on' for TRUE. Set to 'off' for FALSE. Default is 'on'.

pg_partman_bgw.jobmon
     Same purpose as the p_jobmon argument to `run_maintenance()`. Set to 'on' for TRUE. Set to 'off' for FALSE. Default is 'on'.

The above are the new postgresql.conf options that pg_partman v2.0.0 can use to control the background worker it now comes with. These options can be changed at any time with a simple reload, but

[continue reading]