PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Martín Marqués in 2ndQuadrant on 2016-09-30 at 11:57

upgrade-illustration-var-1-1-3-by-shahzaman-30-aug-2016

PostgreSQL 9.6 has just been released and most of the postgres users will start asking themselves how to upgrade to the new major version. This post has the intention of showing different procedures for upgrading your PostgreSQL server.

Upgrading to a new major version is a task which has a high ratio of preparation over total execution time. Specifically when skipping a release in the middle, for example, when you jump from version 9.3 to version 9.5.

Point releases

On the other hand, point release upgrades don’t need as much preparation. Generally, the only requirement is for the postgres service to be restarted. There are no changes to the underlying data structure, so there’s no need to dump and restore. In the worst case scenario you may need to recreate some of your indexes after you’ve finished upgrading the point release.

It’s very wise to always stay on the latest point release, so you don’t stumble over a known (and likely fixed) bug. This means that once the latest version is out, schedule time for the upgrade as soon as possible.

Major release upgrade

When doing complex tasks like this one, it’s good to consider all possible options to accomplish the final result.

For major release upgrades, there are three possible paths you can take:

  • Upgrade restoring from a logical dump
  • Physical upgrade
  • On-Line upgrade

Let me explain each one in detail:

1) Upgrade restoring from a logical dump

This is the simplest of all possible ways to upgrade your cluster’s data structure.

To make it short, the process here requires a logical dump using pg_dump from the old version, and pg_restore on a clean cluster created with the newly installed version.

Key points in favor of using this path are:

  • It’s the most tested
  • Compatibility goes back to 7.0 versions so you could eventually upgrade from 7.x to one of the recent releases

Reasons why you should avoid using this option:

  • The total downtime on large databases can be a problem, as you have to stop write connections before you start running pg_dump;
  • If there are many l
[...]

As Devrim  blogged about  Postgres YUM repo changes , I wanted to write down procedure I have to follow this morning  :)

For example,

If you want to update YUM repo for Postgres 9.4 on CentOS x86_64, you can update executing below command:

yum localinstall https://download.postgresql.org/pub/repos/yum/9.4/redhat/rhel-6-x86_64/pgdg-centos94-9.4-3.noarch.rpm

Now,  you can follow normal Postgres version upgrade procedure.

Hats off to Devrim to build new packages and all the recent improvements in repo management process! He updated wiki page as well.

Hope this will help someone :)

Over the last few weeks, I spent some time to shuffle the PostgreSQL YUM repo a bit, before 9.6 is released. With the help of Magnus, we also made some changes on the server side, so I will summarize all changes in this post:

Direct downloads from yum.PostgreSQL.org is now disabled. All RPMs are already being served from download.postgresql.org over the last year, so it is the time to disable it. Please change your scripts to use download.PostgreSQL.org/pub/repos/yum .

Starting Sep 25th, yum.PostgreSQL.org started speaking https. I updated all repo packages to point to the new URL for GPG checks. Website will be HTTPS only in near future.

Starting Sep 25th, we started keeping only last n-2 releases in the repo. This means, when there is new release available, we will delete n-3th one. This is a policy to keep the repo clean.

We started serving delta RPMs for all distros except RHEL 5. This will save great bandwidth.

Please let us know if you have more questions, via mailing list: pgsql-pkg-yum@PostgreSQL.org

Thanks!

New major release of PostgreSQL is approaching. PostgreSQL 9.6 is expected to be released later today. This is a great release which provides to users set of outstanding new features. I’m especially happy that Postgres Professional did substantial contribution to this release. In particular, full-text search for phrases and scalability improvements are listed as major enhancements of this new PostgreSQL release.

The full list of Postgres Professional constributions including:

  • Improve the pg_stat_activity view’s information about what a process is waiting for (Amit Kapila, Ildus Kurbangaliev)

    Historically a process has only been shown as waiting if it was waiting for a heavyweight lock. Now waits for lightweight locks and buffer pins are also shown in pg_stat_activity. Also, the type of lock being waited for is now visible. These changes replace the waiting column with wait_event_type and wait_event.

    See blog post for details.

  • Fix the default text search parser to allow leading digits in email and host tokens (Artur Zakirov)

    In most cases this will result in few changes in the parsing of text. But if you have data where such addresses occur frequently, it may be worth rebuilding dependent tsvector columns and indexes so that addresses of this form will be found properly by text searches.

    See mailing list discussion for details.

  • Allow GIN index builds to make effective use of maintenance_work_mem settings larger than 1 GB (Robert Abraham, Teodor Sigaev)

    See mailing list discussion for details.

  • Add pages deleted from a GIN index’s pending list to the free space map immediately (Jeff Janes, Teodor Sigaev)

    This reduces bloat if the table is not vacuumed often.

    See mailing list discussion for details.

  • Improve handling of dead index tuples in GiST indexes (Anastasia Lubennikova)

    Dead index tuples are now marked as such when an index scan notices that the corresponding heap tuple is dead. When inserting tuples, marked-dead tuples will be removed if needed to make space on the page.

    See mailing list discuss

[...]

postgres-9-6

The latest version of PostgreSQL 9.6 is planned to be released later today, bringing with it some much anticipated features and updates. As the most advanced open source database, PostgreSQL strives to release a major version roughly once every year. With an active and collaborative community, this PostgreSQL release boasts impressive features and updates thanks to contributions from many of the highly knowledgeable community members.  

The expanding team at 2ndQuadrant has continued to show dedication to the PostgreSQL database project by contributing heavily to the PostgreSQL 9.6 release. Parallel execution of large queries has been a known shortcoming of PostgreSQL for some time, but this is no longer an issue with the 9.6 release. David Rowley and Simon Riggs contributed to this effort through working on initial support for parallel execution of large queries and the capability of an aggregate function to be split into two separate parts. For example, now the parallel worker processes can cooperate on computing an aggregate function. David and Simon worked alongside Robert Haas, Amit Kapila, and many others to develop this robust feature.

Simon also contributed to the improvement of functions for very large databases by reducing interlocking on standby servers during replay of btree index vacuuming operations. Other contributions from Simon include:

  • Improved performance of short-lived prepared transactions (with Stas Kelvich and Pavan Deolasee)
  • Improved performance by moving buffer content locks into the buffer descriptors (with Andres Freund)
  • Reducing the lock strength needed by ALTER TABLE when setting fillfactor and autovacuum-related relation options (with Fabrízio de Royes Mello)
  • Raising the maximum allowed value of checkpoint_timeout to 24 hours

Tomas Vondra was another heavy contributor by contributing the following to the 9.6 release:

  • Ability to use an index-only scan with a partial index when the index’s predicate involves column(s) not stored in the index (with Kyotaro Horiguchi)
  • Improved plann
[...]

It is not every day you get to hear about how a big organization switched from Oracle to Postgres, but it was covered this year at PGCon in Ottawa. I just got time to watch the video (English/Russian slides) of Vladimir Borodin from Yandex Mail explaining the massive migration.

He goes into great detail about how they planned and deployed the migration of 300TB across ninety Postgres shards serving 200+ million users. I loved the line he mentioned twice, "We expected lots of problems, and there weren't." Of course, slide 41 is their wish list of features, but fortunately most of those are being worked on.

I am attending Highload++ this year in Moscow, and I am sure that Yandex will be a hot topic. The migration was also recently discussed on Hacker News. If you are interested in the challenges of deploying Postgres in large enterprises, the management-focused Postgres Vision conference in two weeks would be a good choice. I know Oleg Bartunov will be there and perhaps he will share more details. Of course, there are many other conferences this season, and several I am attending.

Continue Reading »

The PostgreSQL 9.6 Release

It feels like only just a few months ago that we were celebrating the release of PostgreSQL 9.5.0, but already we’re very close to the 9.6.0 release! For me personally, I’m very excited about this particular release of PostgreSQL. It was just 5 or 6 months ago that I was busy refectoring my Parallel Aggregate patch after Tom Lane made some major changes to the output of the sub-query planner.

I was really excited when my work was finally accepted as having the ability for aggregate functions to be calculated in multiple processes really help to make the whole parallel query infrastructure shine. If you missed all that, then please see my blog post about it.

Aside from a few bug fixes I submitted, all of the work I contributed related to improving performance of some workload, and for me this made working on the 9.6 release lot of fun, as I can’t even walk around my local supermarket without thinking of efficiency and performance as I try to plan out in my head what the shortest path is to get to the checkout (via all the shelves that I need to visit first).

There are lots of things in this release that are very exciting. I already mentioned Parallel Query, and also Tom Lane’s changes to the output of the final state of the sub-query planner to allow it to generate multiple different Paths rather than a final plan. For me the most exciting part about this is that it paves the way for lots of future optimisations that were previously just too hard, or just too disruptive to the code base to be worthwhile. Optimizing UNION to make use of a MergeAppend of presorted results rather than having to Sort the Appended results, and Uniquify them is now a fairly trivial change, where before such a change was rejected.

The infrastructure I worked on to allow Parallel Aggregate is also quite exciting as in the future it could also be used to calculate aggregate results over multiple PostgreSQL instances, rather than just across multiple processes in the same instance. That could allow aggregation of

[...]

backtothefuture_02

In the previous blog article we have seen how pg_rewind works with a simple HA cluster, composed of a master node replicating to a standby. In this context, an eventual switchover involves just two nodes that have to be aligned. But what happens with HA clusters when there are several (also cascading) standbys?

Now, consider a more complicated HA cluster, composed of a master with two standbys, based on PostgreSQL 9.5; similar to what has been made in the first blog article dedicated to pg_rewind, we now create a master node replicating to two standby instances. Let’s start with the master:

# Set PATH variable
export PATH=/usr/pgsql-9.5/bin:${PATH}

# This is the directory where we will be working on
# Feel free to change it and the rest of the script
# will adapt itself
WORKDIR=/var/lib/pgsql/9.5

# Environment variables for PGDATA and archive directories
MASTER_PGDATA=${WORKDIR}/master
STANDBY1_PGDATA=${WORKDIR}/standby1
STANDBY2_PGDATA=${WORKDIR}/standby2
ARCHIVE_DIR=${WORKDIR}/archive

# Initialise the cluster
initdb --data-checksums -D ${MASTER_PGDATA}

# Basic configuration of PostgreSQL
cat >> ${MASTER_PGDATA}/postgresql.conf <<EOF
archive_command = 'cp %p ${ARCHIVE_DIR}/%f'
archive_mode = on
wal_level = hot_standby
max_wal_senders = 10
min_wal_size = '32MB'
max_wal_size = '32MB'
hot_standby = on
wal_log_hints = on
EOF

cat >> ${MASTER_PGDATA}/pg_hba.conf <<EOF
# Trust local access for replication
# BE CAREFUL WHEN DOING THIS IN PRODUCTION
local replication replication trust
EOF

# Create the archive directory
mkdir -p ${ARCHIVE_DIR}

# Start the master
pg_ctl -D ${MASTER_PGDATA} -l ${WORKDIR}/master.log start

# Create the replication user
psql -c "CREATE USER replication WITH replication"

And now let’s proceed with the first standby server:

# Create the first standby
pg_basebackup -D ${STANDBY1_PGDATA} -R -c fast -U replication -x

cat >> ${STANDBY1_PGDATA}/postgresql.conf <<EOF
port = 5433
EOF

# Start the first standby
pg_ctl -D ${STANDBY1_PGDATA} -l ${WORKDIR}/standby1.log start

Likewise, we

[...]
Posted by David Rader in OpenSCG on 2016-09-28 at 08:00

Replication is a key part of creating an “Enterprise” Postgres deployment, to support high availability, failover, disaster recovery, or scale-out queries.

Built-in streaming replication was added to PostgreSQL in version 9.0, but the Postgres community has had a number of trigger based replication options for many years, with the big 3 being Slony-I, Londsite, and Bucardo (we’re biased towards Slony-I with it’s high performance C-language triggers — and not just because of the cute slonik logo or that the original author works at OpenSCG). And a whole new generation of logical replication has been introduced in the pglogical project.

How do you choose which replication solution? Use this checklist to decide which Postgres replication solution to use.

1. Version Support

  • Are all of your PostgreSQL instances (master and all slaves) the same version? (Streaming replication requires same PG version so cannot be used for upgrades)
  • Are you using a recent Postgres version?
  • Are you using an ancient Postgres 7.x?

The different solutions have very different version compatibility so if you are working with an old Postgres version or upgrading to a new major version you can cross some off:

Approach Supported Version
Streaming Replication WAL file 9.0+ (all nodes must be same version)
Slony-I v1.2.23 Triggers 7.3.3+ up to 8.4
Slony-I v2.0.8 Triggers 8.3+
Bucardo Triggers 8.1+
PG Logical WAL decoding 9.4+

If you’re trying to upgrade an ancient version 7.x or < 8.3, you’ll probably need a 2-step upgrade using Slony-I (Sorry. Slony-I 1.2.23 works on 7.3.3 to 8.4. Slony-I 2.x supports 8.3+). Starting from 8.1 or 8.2 Bucardo is likely the best way to get to a modern version.

2. Topology

  • Do you need bi-directional replication or master-> slave?
  • Are you trying to replicate a single database or the entire Postgres instance?

Slony-I, Bucardo, and pglogical can all be configured to replicate a single database or individual tables, while Streaming replication works on the entire database instance (cluster). Stre

[...]
Posted by Gabriele Bartolini in 2ndQuadrant on 2016-09-27 at 10:59

barman2-0

This is my (very) biased opinion, but I am ready to bet that once you try Barman 2.0 you’ll agree with me.

Version 2.0 takes Barman to a new level, thanks to full support of PostgreSQL’s streaming replication protocol for both backup operations, continuous, and synchronous WAL shipping.

You might ask – what does this mean exactly? Well, I will give you some examples:

  • You can now safely have zero data loss PostgreSQL clusters with just one standby!
  • You can integrate Barman with Docker in a much easier way!
  • You can now backup PostgreSQL servers that run on Windows (experimentally)!

Of course, you can also continue to rely on rsync/SSH for backup and WAL archiving, if you prefer.

Also, thanks to the new “barman-cli” package, which delivers the “barman-wal-restore” tool, Barman sits at the heart of your PostgreSQL clusters and better integrates with any standby server. We are also working with our fellow repmgr developers to improve the integration between these two critical tools for PostgreSQL business continuity delivered by 2ndQuadrant.

Barman 2.0 is the product of months of development and effort by our devops team at 2ndQuadrant.

The amount of code that has been developed is relevant. However, the hardest job has been to test the different options that Barman offers for backup and WAL archiving with all the supported versions of PostgreSQL, including the new release of PostgreSQL 9.6. Without a true devops culture, which nurtures cooperation and collaboration and promotes global goals, we would have never been able to build the continuous integration pipeline that rigorously and automatically verifies Barman’s capabilities for backup, recovery, archiving, and so on.

As project leader, I would like to thank everyone on the team for their fantastic contributions and for what we are working on for the future.

In particular, improved performance on VLDBs and taking managing backups of PostgreSQL servers to a whole new level!

And that’s why I think this is the best Barman ever. ( So far anyways ;) )

Now… let’s

[...]

This is my very first post on Planet PostgreSQL, so thank you for having me here! I’m not sure if you’re aware, but the PostgreSQL Events page lists the conference as something that should be of interest to PostgreSQL users and developers.

There is a PostgreSQL Day on October 4 2016 in Amsterdam, and if you’re planning on just attending a single day, use code PostgreSQLRocks and it will only cost €200+VAT.

I for one am excited to see Patroni: PostgreSQL High Availability made easy, Relational Databases at Uber: MySQL & Postgres, and Linux tuning to improve PostgreSQL performance: from hardware to postgresql.conf.

I’ll write notes here, if time permits we’ll do a database hackers lunch gathering (its good to mingle with everyone), and I reckon if you’re coming for PostgreSQL day, don’t forget to also signup to the Community Dinner at Booking.com.

Posted by Hubert 'depesz' Lubaczewski on 2016-09-26 at 16:41
It looks that for some reason my subscription to pgsql-committers disappeared, and I stopped receiving mails from this list. I also didn't notice it, for some time – not sure when it got lost. I did resubscribed now, but if you can think of any specific commits that I should have written about, but haven't, […]
Posted by Regina Obe in PostGIS on 2016-09-26 at 00:00

The PostGIS development team is pleased to announce the release of PostGIS 2.3.0.
This is the first version to utilize the parallel support functionality introduced in PostgreSQL 9.6. As such, if you are using PostgreSQL 9.6, we strongly encourage you to use this version.

Parallel support will make many queries using PostGIS relationship operators and functions faster. In order to take advantage of parallel query support, make sure to set max_parallel_workers_per_gather to something greater than 0 as noted in max_parallel_workers_per_gather PostgreSQL runtime configs

Best served with [PostgreSQL 9.6+] which is due out this week and pgRouting 2.3.0 which also just got released.

Packages from maintainers will be out in the coming days and weeks.

Continue Reading by clicking title hyperlink ..
Posted by Shaun M. Thomas on 2016-09-23 at 19:56

When it comes to putting Postgres through its paces, we often turn to benchmarks to absolutely bury it under a torrent of oppressive activity. It’s a great way to obtain maximum performance metrics and also observe how Postgres reacts and breaks down under such pressure. But these kinds of tests aren’t really practical, are they? After all, many such simulated workloads are nothing but bragging rights measured against previous Postgres releases, or for hardware comparisons. But while functionality beyond defaults is often overlooked, tools like pgbench are actually critical to the development process.

I’ve used pgbench frequently in the past for examples in these articles. It’s much like Apache’s own ab for bombarding a server with various workloads, except pgbench must bootstrap by creating a set of test tables. Or does it? What if we have our own existing tables from a project that we want to measure, either for a hardware migration, or for scalability concerns?

Let’s see how we can utilize pgbench with our trusty sensor_log table. It’s not really designed to experience any updates, so we need something else as well. Let’s say the sensor log is an endpoint capture for a very busy sensor table that is only updated when sensors detect a change in the environment. That should provide us with plenty of opportunities for contention!

CREATE TABLE sensor (
  sensor_id      SERIAL PRIMARY KEY NOT NULL,
  location       VARCHAR NOT NULL,
  reading        BIGINT NOT NULL,
  modified_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor (location, reading, modified_date)
SELECT s.id, s.id % 100,
       CURRENT_DATE + INTERVAL '8h' 
                    - (s.id % 60 || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);
 
CREATE INDEX idx_sensor_location
    ON sensor (location);
 
ANALYZE sensor;
 
CREATE TABLE sensor_log (
  sensor_id     BIGINT NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
CREATE UNIQUE INDEX udx_sensor_log_sensor_id_reading_date
[...]

In the previous post, we provided a business and architectural
background for the Postgres FDWs that we are developing for Spark,
Hadoop and Cassandra. In particular, we highlighted the key benefits of
bringing Cassandra and PostgreSQL together.

With this post, we will start taking a more technical look at the
Cassandra FDW.

The C* FDW speaks natively with Cassandra on two levels; it:

  • uses the binary CQL protocol instead of the legacy Thrift protocol.
  • directly relies on the DataStax Native C++ driver for Cassandra.

The DataStax C++ driver is performant and feature-rich; various load
balancing and routing options are available and configurable. We are
already making use of some of these features and plan to provide more of
these to our users.

While exploring Cassandra as a Postgres user, the defaults such as
automatic inclusion of the ALLOW FILTERING clause are useful as they
allow gradual familiarity; especially useful in small development
environments. Our intent is to support tuning for large environments
but to default to a configuration geared toward existing PostgreSQL
users.

At this point, let us consider whether we are introducing a new SPOF by
using PostgreSQL with a Cassandra system. We believe not; a PostgreSQL
node at the edge of a Cassandra cluster – as a transactional or open-SQL
end point – is not at all the same as a central master node critical to
the operation of an entire cluster. We see some trade-offs but mostly
we see benefits of bringing PostgreSQL to Cassandra in this way as we
intend to elucidate through this series.

In the next post, we will show you how to get started with the Cassandra
FDW.

Posted by Ernst-Georg Schmid on 2016-09-23 at 05:59
OpenBabel 2.4.0 is released, the build process worked flawlessly, 100% tests passed.

Now I have to see if the Tigress still likes it...

Well, pgchem::tigress builds against OpenBabel 2.4.0 without errors, but will it work?

Yes, pgchem::tigress works with OpenBabel 2.4.0 without any notable alterations, except changing:

OPENBABEL_SOURCE=openbabel-2.3.2

in the Makefile to:

OPENBABEL_SOURCE=openbabel-2.4.0

As always with .0 releases, subtle issues might be lurking below the surface, though.
Posted by Bruce Momjian in EnterpriseDB on 2016-09-21 at 13:45

I have written a presentation covering the important features in Postgres 9.6 and some of the features we hope for in Postgres 10.

After about 100 hours of packaging and testing work, pgadmin4 RPMs hit PostgreSQL 9.4, 9.5 and 9.6 repos, for RHEL/CentOS 7 and Fedora 23, 24!

First of all, I'd like to write down the list of packages that entered git repo for pgadmin4 dependency: Continue reading "Installing pgadmin4 to Red Hat, CentOS, and Fedora"
Posted by Craig Ringer in 2ndQuadrant on 2016-09-21 at 00:52

I’m pleased to say that Postgres-BDR is on its way to PostgreSQL 9.6, and even better, it works without a patched PostgreSQL.

BDR has always been an extension, but on 9.4 it required a heavily patched PostgreSQL, one that isn’t fully on-disk-format compatible with stock community PostgreSQL 9.4. The goal all along has been to allow it to run as an extension on an unmodified PostgreSQL … and now we’re there.

The years of effort we at 2ndQuadrant have put into getting the series of patches from BDR into PostgreSQL core have paid off. As of PostgreSQL 9.6, the only major patch that Postgres-BDR on 9.4 has that PostgreSQL core doesn’t, is the sequence access method patch that powers global sequences.

This means that Postgres-BDR on 9.6 will not support global sequences, at least not the same way they exist in 9.4. The 9.6 version will incorporate a different approach to handling sequences on distributed systems, and in the process address some issues that arose when using global sequences in production.

Since Postgres-BDR on 9.6 is now passing regression tests, I expect to have an alpha release out not long after the release of PostgreSQL 9.6 itself. Keep an eye out!

There’s also a pglogical 1.2.0 update coming to coincide with the release of PostgreSQL 9.6.

At the same time, Petr has submitted another revision of logical replication for core PostgreSQL to the 10.0 development series.

Along with updating Postgres-BDR in concurrence with 9.6, there is work in progress to enhance Postgres-BDR’s HA capabilities. I’ll have more to say on that in an upcoming post.

We’ve been busy here at 2ndQuadrant.

Posted by Martín Marqués in 2ndQuadrant on 2016-09-20 at 14:11

Intro

PostgreSQL gives developers the chance of choosing between two possible storage facilities for large binary data: Bytea and LargeObjects.

Large Objects have been around for a long time, and PostgreSQL has a smart way of storing large binary data. It does so by splitting it into chunks of LOBLKSIZE (a forth of BLCKSZ). That way the tuples from pg_largeobject don’t spill on the toast table.

On the other hand bytea stores the binary data directly in the tuple, which may lead to poor performance depending on how your schema looks.

This sounds great if you have an intelligent interface for dealing with the manipulation of these binary files, specially if update modify just a small portion of the whole binary file.

But normally we don’t bother writing code that takes advantage of this, and instead we write again of the whole binary data.

One of the things that I believe make people adopt large objects are the functions available for importing and exporting files directly from the database server to it’s filesystem. There’s a con to this: if the application is on a different server, you’ll need more code to move the file to the location where it’s needed.

A problem you might face

The past days I had to examine a database used to store information of user sessions from a Java CAS system. I found there were almost 100 million large objects in the database, not very big ones.

I went over the user tables checking the fields that had an oid field, and then I cross-referencing the values in those fields with the pg_largeobject_metadata table. I found that 96% of those large objects where orphan ones. Those are large objects which weren’t referenced by any tuple from the user tables.

Further investigation concluded that Hibernate didn’t take care of purging the largeobjects it created when deleting or updating tuples with oid fields. So it was generating a great amount of bloat which could not be clean up by vacuuming, but had to be purged from pg_largeobjects table manually.

In the particular case of the CAS database, t

[...]

I was in Jakarta a couple of weeks ago and there happened to be a meetup of the Indonesia PUG in Bandung while I was there. Because it is just a 2 hour, rather picturesque drive, from Jakarta, I thought it was too good of an opportunity to miss. So I showed up.

img_20160910_091832

 

The meetup was hosted by Julyanto Sutandang of Equnix Business Solutions and the conversation was mostly centered around convincing the local technology industry about PostgreSQL in comparison to Oracle. We got into fairly detailed discussions on the typical challenges of moving an Oracle production database to PostgreSQL. I especially love talking about hierarchical queries – Oracle’s CONNECT BY PRIOR and PostgreSQL’s WITH RECURSIVE.

It was very interesting to find out how popular BDR – the Multi Master Replication solution from 2ndQuadrant – was in Indonesia. I got a lot of questions about the technology (which, admittedly, I am not an expert at) and its roadmap (which I was able to answer more confidently). Not only is BDR already being used in production at multiple locations but many corporates, including large banks, are actively doing POCs using the technology.

We also got talking about ACID compliance, and how that figures into the whole NoSQL equation. We talked about the major NoSQL-like features of PostgreSQL: JSON/JSONB, HSTORE, XML, etc. and the fact that they are all ACID compliant, unlike exclusively NoSQL data stores. The main takeaway was that NoSQL Databases and RDBMS complement each other, they can’t replace each other. NoSQL is good for POCs and massive data writes at high speeds, but if you hold your data valuable and want to derive analytical value from it, you have to eventually move it to an RDBMS. Of course, our database of choice is PostgreSQL :-)

img_7692

 

Oh, and did I mention that Julyanto insisted on stuffing me full of traditional Indonesian food? ;-)

img_20160908_195654

It sounds like click-bait, or one of those late night TV promotions – “Improve your database performance by 100% – by changing just this one setting!” But in this case, it’s true – you can drastically improve PostgreSQL on Windows performance by changing one configuration setting – and we made this the default in our Postgres by BigSQL distribution for 9.2 thru 9.6.

tl;dr – if you have high query load, change “update_process_title” to ‘off’ on Windows, and get 100% more throughput.

Improve postgresql performance by turning off update_process_title

Performance Improvement by turning off update_process_title

Most Postgres DBA’s already know that they need to tune settings for shared buffers, WAL segments, checkpoints, etc, to get the best performance from their database. If you are running PostgreSQL on Windows, there’s another setting that you need to look at, specifically “update_process_title”. Changing this setting from “on” to “off” can improve throughput on a heavy query load by close to 100%

We ran a series of benchmark tests in our performance lab and you can see the dramatic improvement in the graphs displayed. We tested PostgreSQL 9.5 on a 16-core Windows server with fast SSD drives using a standard pgbench run in both read-only and read-write modes. Scaling from 4 to 40 clients shows a plateau in throughput (measured by TPS) after 8 clients when the setting is set to “on”. Changing the update_process_title setting to “off” allows PostgreSQL to continue to scale throughput, showing increasing TPS up to 40 clients. The throughput at 32 read-only clients increases from 20K TPS to 58K TPS (180% higher) and at 40 clients continues to climb to 76K TPS (270% higher).

Improvement in read-write transactions turning off update_process_title

Improvement in read-write transactions turning off update_process_title

This performance gain is seen for both read-only and read-write workloads. With 32 clients, the write throughput increases from 2,700 TPS to 7,700 TPS (180% higher) and at 40 clients continues to climb to 8,200 (200% higher).

The update_process_title setting controls whether or not Postgres will update the process description that you

[...]
Posted by Federico Campoli on 2016-09-20 at 05:38
Back in the 2013 I started playing with sqlalchemy to create a simple extractor from heterogeneous systems to be pushed in postgresql.
I decided to give the project a name which recalled the transformation and I called pg_chameleon.

To be honest I didn't like sqlalchemy.  Like any other ORM adds an interface to the data layer with a mental approach to the data itself. I lost the interest to developing a migrator very soon, and after all there are thousands of similar tools thousands of times better than mine (e.g. the awesome pgloader)

However recently I revamped the project after discovering a python library capable to read the mysql replication protocol. In few weeks I cleaned all the sqlalchemy stuff, rebuilt the metadata extraction using the information_schema and finally I had an usable tool to replicate the data across the two systems.

I've also changed the license from GPL to the 2 clause BSD.

The tool requires testing. I'm absolutely sure is full of bugs and issues, but it seems to work quite nice.

Some key aspects:

  • Is developed in python 2.7. Bear with me, I'll build a port to python 3.x when/if the project will get to an initial  release.
  • I use tabs (4 space tabs). Bear with me again. I tried to use spaces and I almost thrown my laptop out of the window
  • setup.py is not working. I'll fix this as soon as I'll do a release.
  • Yes, the sql part use the "hungarian notation" and the keywords are uppercase with strange indentation on the statements .  
  • The DDL are not yet replicated. I'm thinking to a clever approach to the problem.

That's it. If you want to test it please do and try to break the tool :)

The tool is on github here: https://github.com/the4thdoctor/pg_chameleon/

Sometimes, the elephant gets hurt - inducing database errors! Data corruption is a fact of life in working with computers, and Postgres is not immune. With the addition of the "data checksums" feature, detecting such corruption is now much easier. But detection is not enough - what happens after the corruption is detected? What if Postgres could fix the problem all by itself - what if we could give the elephant a mutant healing power?!?

Now we can. I wrote an extension named pg_healer that does just that - detects corruption issues, and automatically repairs them. Let's see how it works with a demonstration. For this, we will be purposefully corrupting the "pgbench_branches" table, part of the venerable pgbench utility.

For the initial setup, we will create a new Postgres cluster and install the pgbench schema. The all-important checksum feature needs to be enabled when we initdb, and we will use a non-standard port for testing:

$ initdb --data-checksums dojo
The files belonging to this database system will be owned by user "greg".
...
Data page checksums are enabled.

creating directory dojo ... ok
creating subdirectories ... ok
...
$ echo port=9999 >> dojo/postgresql.conf
$ pg_ctl start -D dojo -l log.dojo.txt
server starting
$ createdb -p 9999 $USER
$ pgbench -p 9999 -i
NOTICE:  table "pgbench_history" does not exist, skipping
NOTICE:  table "pgbench_tellers" does not exist, skipping
NOTICE:  table "pgbench_accounts" does not exist, skipping
NOTICE:  table "pgbench_branches" does not exist, skipping
creating tables...
100000 of 100000 tuples (100%) done (elapsed 0.35 s, remaining 0.00 s)
vacuum...
set primary keys...
done.

Next, we install the pg_healer extension. As it needs to access some low-level hooks, we need to load it on startup, by adding a line to the postgresql.conf file:

$ git clone git://github.com/turnstep/pg_healer.git
Cloning into 'pg_healer'...
$ cd pg_healer
$ make install
gcc -Wall -Wmissing-prototypes ... -c -o pg_healer.o pg_healer.c
gcc -Wall -Wmissing-
[...]
Posted by Regina Obe in PostGIS on 2016-09-19 at 00:00

PostGIS 2.3.0rc1 is feature complete, so we’re looking for testing and feedback! Best served with PostgreSQL 9.6rc1 and pgRouting 2.3.0-rc1

Please give this release candidate a try and report back any issues you encounter. New things since 2.3.0beta1 release

Please report bugs that you find in this release.

Important / Breaking Changes

  • 3466, Casting from box3d to geometry now returns a 3D geometry (Julien Rouhaud of Dalibo)

  • 3604, pgcommon/Makefile.in orders CFLAGS incorrectly leading to wrong liblwgeom.h (Greg Troxel)

  • 3396, ST_EstimatedExtent, now throws WARNING instead of ERROR (Regina Obe)

    New Features and Performance Enhancements

  • Add support for custom TOC in postgis_restore.pl (Christoph Moench-Tegeder)

  • Add support for negative indexing in STPointN and STSetPoint (Rémi Cura)
  • Numerous new function additions and enhancements: New Functions and Enhancements

  • 3549, Support PgSQL 9.6 parallel query mode, as far as possible (Paul Ramsey, Regina Obe)

  • 3557, Geometry function costs based on query stats (Paul Norman)
  • 3591, Add support for BRIN indexes (Giuseppe Broccolo of 2nd Quadrant, Julien Rouhaud and Ronan Dunklau of Dalibo)
  • 3496, Make postgis non-relocateable (for extension install), schema qualify calls in functions (Regina Obe) Should resolve once and for all for extensions #3494, #3486, #3076

  • 3547, Update tiger geocoder to support TIGER 2016 and use http or ftp (Regina Obe)

See the full list of changes in the news file and please report bugs that you find in the release. Binary packages will appear in repositories over the coming weeks as packagers roll out builds.

View all closed tickets for 2.3.0.

It’s been a long time since my last post. It’s time to write something useful :)

When people start working with PostgreSQL they sometimes make mistakes which are really difficult to fix later. For example during initdb of your first DB you don’t really understand whether you need checksums for data or not. Especially that by default they are turned off and documentation says that they “may incur a noticeable performance penalty”.

And when you already have several hundred databases with a few hundred terabytes of data on different hardware or (even worse) in different virtualization systems, you do understand that you are ready to pay some performance for identification of silent data corruption. But the problem is that you can’t easily turn checksums on. It is one of the things that is adjusted only once while invoking initdb command. In the bright future we hope for logical replication but until that moment the only way is pg_dump, initdb, pg_restore that is with downtime.

And if checksums may be not useful for you (e.g. you have perfect hardware and OS without bugs), lc_collate is important for everyone. And now I will prove it.

Sort order

Suppose you have installed PostgreSQL from packages or built it from sources and initialized DB by yourself. Most probably, in the modern world of victorious UTF-8 you would see something like that:

d0uble ~ $ psql -l
                               List of databases
   Name    | Owner  | Encoding |   Collate   |    Ctype    | Access privileges
-----------+--------+----------+-------------+-------------+-------------------
 postgres  | d0uble | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | d0uble | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/d0uble        +
           |        |          |             |             | d0uble=CTc/d0uble
 template1 | d0uble | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/d0uble        +
           |        |          |             |             | d0uble=CTc/d0uble
(3 rows)

d0uble ~ $

If you don’t specify explicitly, initdb will take settings

[...]
Posted by Luca Ferrari in ITPUG on 2016-09-18 at 16:42
As you probably already know the Call For Papers for the PGDay.IT 2016 is now open. Please see the details here and send your contribution following the instructions. The organizing committee will review each proposal in order to deliver a great program for the tenth edition of the italian PostgreSQL based conference.

Since version 1.7, Django has natively supported database migrations similar to Rails migrations. The biggest difference fundamentally between the two is the way the migrations are created: Rails migrations are written by hand, specifying changes you want made to the database, while Django migrations are usually automatically generated to mirror the database schema in its current state.

Usually, Django’s automatic schema detection works quite nicely, but occasionally you will have to write some custom migration that Django can’t properly generate, such as a functional index in PostgreSQL.

Creating an empty migration

To create a custom migration, it’s easiest to start by generating an empty migration. In this example, it’ll be for an application called blog:

$ ./manage.py makemigrations blog --empty -n create_custom_index
Migrations for 'blog':
  0002_create_custom_index.py:

This generates a file at blog/migrations/0002_create_custom_index.py that will look something like this:

# -*- coding: utf-8 -*-                                                                                                                                                                                             
# Generated by Django 1.9.4 on 2016-09-17 17:35                                                                                                                                                                     
from __future__ import unicode_literals                                                                                                                                                                             
                                                                                                                                                                                                                    
from django.db import migrations                                                                                                                                                                         
[...]
Posted by Shaun M. Thomas on 2016-09-16 at 21:00

There seem to be quite a few popular Postgres conferences peppering the globe these days. This year, Simon Riggs of 2ndQuadrant gave the sponsored keynote at Postgres Open. I’m not entirely sure it was intentional since it wasn’t the title of his presentation, but he uttered the words “working together to make Postgres better for everyone” at one point. The phrase “Working Together” really stood out, because that’s a significant part of what makes Postgres so great. It resonated acutely with the impetus behind the Unconference track that remains a regular fixture at PGCon.

Then Simon dropped another bomb that shocked everyone in attendance. Everywhere I looked were visions of absolute disbelief and awe. He suggested that somewhere in the twisting catacombs of 2ndQuadrant was a beta version of WordPress running on Postgres instead of MySQL. Considering its roots in PHP before there were readily available database abstraction layers, and the fact many extensions call the MySQL functions outright, this is clearly an impossibility. This is clearly a cruel hoax, and I demand proof that this particular unicorn exists for reasons completely unrelated to exorcising MySQL from my own website.

Perhaps primarily accidental, but I seem to be a regular fixture at Postgres Open since it began in 2011. Most of this is because it started in Chicago, which made it extremely convenient to attend for the first four years of its existence. Now I’m just operating on pure momentum. So what did I learn this time around? I’ve been doing this too long to learn much regarding Postgres, and short of diving into writing extensions in C, that isn’t going to change. But maybe I can offer a useful perspective on the conference itself and the talks I visited.

PGLogical – the logical replication for PostgreSQL

First on the agenda was a talk on PGLogical by Petr Jelinek, yet another 2ndQuadrant contribution. It works by interpreting the Postgres WAL stream and decoding activity into generic SQL statements that can be replayed on a remote databas

[...]
Posted by Julien Rouhaud in Dalibo on 2016-09-16 at 12:03