PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Josh Berkus on 2015-03-26 at 19:51:28

On November 18th, 2015, we will have an independent, multi-track conference all about high performance PostgreSQL: pgConf SV. This conference is being organized by CitusData at the South San Francisco Convention Center. Stay tuned for call for presentations, sponsorships, and more details soon.

Posted by David Fetter in Farmers Business Network on 2015-03-26 at 13:27:51
SQL is code.

This may seem like a simple idea, but out in the wild, you will find an awful lot of SQL programs which consist of a single line, which makes them challenging to debug.

Getting it into a format where debugging was reasonably easy used to be tedious and time-consuming, but no more!
Continue reading "Formatting!"
Posted by Rajeev Rastogi on 2015-03-25 at 05:00:00
In PostgreSQL 9.5, we can see improved performance  for Index Scan on ">" condition.

In order to explain this optimization, consider the below schema:
create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

Query as:
                select count(*) from tbl2 where id2>'a' and id3>990000;

As per design prior to this patch, Above query used following steps to retrieve index tuples:

  • Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2>'a'
  • Then it fetches each tuples from position found in step-1.
  • For each tuple, it matches all scan key condition, in our example it matches both scan key condition.
  • If condition match, it returns the tuple otherwise scan stops.


Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted).

So comparison on first scan key condition again in step-3 seems to be redundant.

So we have made the changes in BTree scan algorithm to avoid the redundant check i.e. remove the first key comparison for each tuple as it is guaranteed to be always true.

Performance result summary:



I would like to thanks Simon Riggs for verifying and committing this patch. Simon Riggs also confirmed improvement of 5% in both short and long index, on the least beneficial data-type and considered to be very positive win overall. 
Posted by Josh Berkus on 2015-03-24 at 19:30:10

main-image

On March 10th, we had our third ever pgDay for SFPUG, which was a runaway success. pgDaySF 2015 was held together with FOSS4G-NA and EclipseCon; we were especially keen to join FOSS4G because of the large number of PostGIS users attending the event. In all, around 130 DBAs, developers and geo geeks joined us for pgDay SF ... so many that the conference had to reconfigure the room to add more seating!

standing room only

The day started out with Daniel Caldwell showing how to use PostGIS for offline mobile data, including a phone demo.

Daniel Caldwell setting up

Ozgun Erdogan presented pg_shard with a a short demo.

Ozgun presents pg_shard with PostGIS

Gianni Ciolli flew all the way from London to talk about using Postgres' new Logical Decoding feature for database auditing.

Gianni Ciolli presenting

Peak excitement of the day was Paul Ramsey's "PostGIS Feature Frenzy" presentation.

Paul Ramsey making quotes

We also had presentations by Mark Wong and Bruce Momjian, and lightning talks by several presenters. Slides for some sessions are available on the FOSS4G web site. According to FOSS4G, videos will be available sometime soon.

Of course, we couldn't have done it without our sponsors: Google, EnterpriseDB, 2ndQuadrant, CitusDB and pgExperts. So a big thank you to our sponsors, our speakers, and the staff of FOSS4G-NA for creating a great day.

Last Thursday, I had this short and one-sided conversation with myself: “Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new database before I have to put it into production next week, because who knows when else I’ll get a chance. Even though I can’t use pg_dumpall, this will take me what, 20 […]
Posted by Heikki Linnakangas on 2015-03-23 at 20:05:31

Before PostgreSQL got streaming replication, back in version 9.0, people kept asking when we’re going to get replication. That was a common conversation-starter when standing at a conference booth. I don’t hear that anymore, but this dialogue still happens every now and then:

- I have streaming replication set up, with a master and standby. How do I perform failover?
- That’s easy, just kill the old master node, and run “pg_ctl promote” on the standby.
- Cool. And how do I fail back to the old master?
- Umm, well, you have to take a new base backup from the new master, and re-build the node from scratch..
- Huh, what?!?

pg_rewind is a better answer to that. One way to think of it is that it’s like rsync on steroids. Like rsync, it copies files that differ between the source and target. The trick is in how it determines which files have changed. Rsync compares timestamps, file sizes and checksums, but pg_rewind understands the PostgreSQL file formats, and reads the WAL to get that information instead.

I started hacking on pg_rewind about a year ago, while working for VMware. I got it working, but it was a bit of a pain to maintain. Michael Paquier helped to keep it up-to-date, whenever upstream changes in PostgreSQL broke it. A big pain was that it has to scan the WAL, and understand all different WAL record types – miss even one and you might end up with a corrupt database. I made big changes to the way WAL-logging works in 9.5, to make that easier. All WAL record types now contain enough information to know what block it applies to, in a common format. That slashed the amount of code required in pg_rewind, and made it a lot easier to maintain.

I have just committed pg_rewind into the PostgreSQL git repository, and it will be included in the upcoming 9.5 version. I always intended pg_rewind to be included in PostgreSQL itself; I started it as a standalone project to be able to develop it faster, outside the PostgreSQL release cycle, so I’m glad it finally made it into the main distribution now. Please give it a l

[continue reading]

Posted by Marco Slot in CitusData on 2015-03-23 at 12:53:12
Posted by Paul Ramsey on 2015-03-21 at 16:16:00

I did a new PostGIS talk for FOSS4G North America 2015, an exploration of some of the tidbits I've learned over the past six months about using PostgreSQL and PostGIS together to make "magic" (any sufficiently advanced technology...)

 

Posted by Paul Ramsey on 2015-03-20 at 20:07:00

Somehow I've gotten through 10 years of SQL without ever learning this construction, which I found while proof-reading a colleague's blog post and looked so unlikely that I had to test it before I believed it actually worked. Just goes to show, there's always something new to learn.

Suppose you have a GPS location table:

  • gps_id: integer
  • geom: geometry
  • gps_time: timestamp
  • gps_track_id: integer

You can get a correct set of lines from this collection of points with just this SQL:


SELECT
gps_track_id,
ST_MakeLine(geom ORDER BY gps_time ASC) AS geom
FROM gps_poinst
GROUP BY gps_track_id

Those of you who already knew about placing ORDER BY within an aggregate function are going "duh", and the rest of you are, like me, going "whaaaaaa?"

Prior to this, I would solve this problem by ordering all the groups in a CTE or sub-query first, and only then pass them to the aggregate make-line function. This, is, so, much, nicer.

PostgreSQL has provided table partitions for a long time. In fact, one might say it has always had partitioning. The functionality and performance of table inheritance has increased over the years, and there are innumerable arguments for using it, especially for larger tables consisting of hundreds of millions of rows. So I want to discuss a quirk that often catches developers off guard. In fact, it can render partitioning almost useless or counter-productive.

PostgreSQL has a very good overview in its partitioning documentation. And the pg_partman extension at PGXN follows the standard partitioning model to automate many of the pesky tasks for maintaining several aspects of partitioning. With modules like this, there’s no need to manually manage new partitions, constraint maintenance, or even some aspects of data movement and archival.

However, existing partition sets exist, and not everyone knows about extensions like this, or have developed in-house systems instead. Here’s something I encountered recently:

CREATE TABLE sys_order
(
    order_id     SERIAL       PRIMARY KEY,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
);

CREATE TABLE sys_order_part_201502 ()
       INHERITS (sys_order);

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-02-01'::DATE + INTERVAL '1 mon');

This looks innocuous enough, but PostgreSQL veterans are already shaking their heads. The documentation alludes to how this could be a problem:

Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don’t need to be visited.

The issue in this case, is that adding the interval of a month changes the right boundary of this range constraint into a dynamic value. PostgreSQL will not use dynamic values in evaluating check constraints. Here’s a query plan from PostgreSQL 9.4.1, which is the most recent release as of

[continue reading]

The BDR team has recently introduced support for dynamically adding new nodes to a BDR group from SQL into the current development builds. Now no configuration file changes are required to add nodes and there’s no need to restart the existing or newly joining nodes.

This change does not appear in the current 0.8.0 stable release; it’ll land in 0.9.0 when that’s released, and can be found in the bdr-plugin/next branch in the mean time.

New nodes negotiate with the existing nodes for permission to join. Soon they’ll be able to the group without disrupting any DDL locking, global sequence voting, etc.

There’s also an easy node removal process so you don’t need to modify internal catalog tables and manually remove slots to drop a node anymore.

New node join process

With this change, the long-standing GUC-based configuration for BDR has been removed. bdr.connections no longer exists and you no longer configure connections with bdr.[conname]_dsn etc.

Instead, node addition is accomplished with the bdr.bdr_group_join(...) function. Because this is a function in the bdr extension, you must first CREATE EXTENSION bdr;. PostgreSQL doesn’t have extension dependencies and the bdr extension requires the btree_gist extension so you’ll have to CREATE EXTENSION btree_gist first.

Creating the first node

Creation of the first node must now be done explicitly using bdr.bdr_group_create. This promotes a standalone PostgreSQL database to a single-node BDR group, allowing other nodes to then be joined to it.

You must pass a node name and a valid externally-reachable connection string for the dsn parameter, e.g.:

CREATE EXTENSION btree_gist;

CREATE EXTENSION bdr;

SELECT bdr.bdr_group_join(
  local_node_name = 'node1',
  node_external_dsn := 'host=node1 dbname=mydb'
);

Note that the dsn is not used by the root node its self. It’s used by other nodes to connect to the root node, so you can’t use a dsn like host=localhost dbname=mydb if you intend to have nodes on multiple machines.

Adding other nodes

You can now join other nodes to f

[continue reading]

Posted by Paul Ramsey on 2015-03-20 at 00:00:00

The 2.1.6 release of PostGIS is now available.

The PostGIS development team is happy to release patch for PostGIS 2.1, the 2.1.6 release. As befits a patch release, the focus is on bugs, breakages, and performance issues. Users with large tables of points will want to priorize this patch, for substantial (~50%) disk space savings.

http://download.osgeo.org/postgis/source/postgis-2.1.6.tar.gz

Continue Reading by clicking title hyperlink ..
Posted by Josh Berkus in pgExperts on 2015-03-19 at 20:55:00
Since the folks at Tilt.com aren't on Planet Postgres, I thought I'd link their recent blog post on cool data migration hacks.  Tilt is a YCombinator company, and a SFPUG supporter.
Posted by Jason Petersen in CitusData on 2015-03-19 at 19:53:36

Last winter, we open-sourced pg_shard, a transparent sharding extension for PostgreSQL. It brought straightforward sharding capabilities to PostgreSQL, allowing tables and queries to be distributed across any number of servers.

Today we’re excited to announce the next release of pg_shard. The changes in this release include:

  • Improved performaceINSERT commands run up to four times faster
  • Shard repair — Easily bring inactive placements back up to speed
  • Copy script — Quickly import data from CSV and other files from the command line
  • CitusDB integration — Expose pg_shard’s metadata for CitusDB’s use
  • Resource improvements — Execute larger queries than ever before

For more information about recent changes, you can view all the issues closed during this release cycle on GitHub.

Upgrading or installing is a breeze: see pg_shard’s GitHub page for detailed instructions.

Whether you want a distributed document store alongside your normal PostgreSQL tables or need the extra computational power afforded by a sharded cluster, pg_shard can help. We continue to grow pg_shard’s capabilities and are open to feature requests.

Got questions?

If you have any questions about pg_shard, please contact us using the pg_shard-users mailing list.

If you discover an issue when using pg_shard, please submit it to our issue tracker on GitHub.

Further information is available on our website, where you are free to contact us with any general questions you may have.

A nice feature extending the usage of pgbench, in-core tool of Postgres aimed at doing benchmarks, has landed in 9.5 with this commit:

commit: 878fdcb843e087cc1cdeadc987d6ef55202ddd04
author: Robert Haas <rhaas@postgresql.org>
date: Mon, 2 Mar 2015 14:21:41 -0500
pgbench: Add a real expression syntax to \set

Previously, you could do \set variable operand1 operator operand2, but
nothing more complicated.  Now, you can \set variable expression, which
makes it much simpler to do multi-step calculations here.  This also
adds support for the modulo operator (%), with the same semantics as in
C.

Robert Haas and Fabien Coelho, reviewed by Álvaro Herrera and
Stephen Frost

pgbench has for ages support for custom input files using -f with custom variables, variables that can be set with for example \set or \setrandom, and then can be used in a custom set of SQL queries:

\set id 10 * :scale
\setrandom id2 1 :id
SELECT name, email FROM users WHERE id = :id;
SELECT capital, country FROM world_cities WHERE id = :id2;

Up to 9.4, those custom variables can be calculated with simple rules of the type "var operator var2" (the commit message above is explicit enough), resulting in many intermediate steps and variables when doing more complicated calculations (note as well that additional operands and variables, if provided, are simply ignored after the first three ones):

\setrandom ramp 1 200
\set scale_big :scale * 10
\set min_big_scale :scale_big + :ramp
SELECT :min_big_scale;

In 9.5, such cases become much easier because pgbench has been integrated with a parser for complicated expressions. In the case of what is written above, the same calculation can be done more simply with that, but far more fancy things can be done:

\setrandom ramp 1 200
\set min_big_scale :scale * 10 + :ramp
SELECT :min_big_scale;

With pgbench run for a couple of transactions, here is what you could get:

$ pgbench -f test.sql -t 5
[...]
$ tail -n5 $PGDATA/pg_log/postgresql.log
LOG:  statement: SELECT 157;
LOG:  statement: SELECT 53;
LOG

[continue reading]

Amit Kapila and I have been working very hard to make parallel sequential scan ready to commit to PostgreSQL 9.5.  It is not all there yet, but we are making very good progress.  I'm very grateful to everyone in the PostgreSQL community who has helped us with review and testing, and I hope that more people will join the effort.  Getting a feature of this size and complexity completed is obviously a huge undertaking, and a significant amount of work remains to be done.  Not a whole lot of brand-new code remains to be written, I hope, but there are known issues with the existing patches where we need to improve the code, and I'm sure there are also bugs we haven't found yet.
Read more »
On 18th of March, Alvaro Herrera committed patch: array_offset() and array_offsets()   These functions return the offset position or positions of a value in an array.   Author: Pavel Stěhule Reviewed by: Jim Nasby It's been a while since my last “waiting for" post – mostly because while there is a lot of work happening, […]
Turkish PostgreSQL Users' and Developer's Association is organizing 4th PGDay.TR on May 9, 2015 at Istanbul. Dave Page, one of the community leaders, will be giving the keynote.

This year, we are going to have 1 full English track along with 2 Turkish tracks, so if you are close to Istanbul, please join us for a wonderful city, event and fun!

We are also looking for sponsors for this great event. Please email to sponsor@postgresql.org.tr for details.

See you in Istanbul!

Conference website: http://pgday.postgresql.org.tr/en/
Last night I attended the second WhatcomPUG. This meeting was about Sqitch, a interesting database revision control mechanism. The system is written in Perl and was developed by David Wheeler of PgTap fame. It looks and feels like git. As it is written in Perl it definitely has too many options. That said, what we were shown works, works well and appears to be a solid and thorough system for the job.

I also met a couple of people from CoinBeyond. They are a point-of-sale software vendor that specializes in letting "regular" people (read: not I or likely the people reading this blog) use Bitcoin!

That's right folks, the hottest young currency in the market today is using the hottest middle aged technology for their database, PostgreSQL. It was great to see that they are also located in Whatcom County. The longer I am here, the more I am convinced that Whatcom County (and especially Bellingham) is a quiet tech center working on profitable ventures without the noise of places like Silicon Valley. I just keep running into people doing interesting things with technology.

Oh, for reference:

Posted by Julien Rouhaud in Dalibo on 2015-03-18 at 08:09:10

Last month, I had the chance to talk about PostgreSQL monitoring, and present some of the tools I’m working on at pgconf.ru.

This talk was a good opportunity to work on an overview of existing projects dealing with monitoring or performance, see what may be lacking and what can be done to change this situation.

Here are my slides:

If you’re interested in this topic, or if you developped a tool I missed while writing these slides (my apologies if it’s the case), the official wiki page is the place you should go first.

I’d also like to thank all the pgconf.ru staff for their work, this conference was a big success, and the biggest postgresql-centric event ever organized.

Talking About OPM and PoWA at pgconf.ru was originally published by Julien Rouhaud at rjuju's home on March 18, 2015.

Posted by Rajeev Rastogi on 2015-03-18 at 07:01:00
POSTGRESQL is an open-source, full-featured relational database. This blog gives an overview of how POSTGRESQL engine processes queries received from the user.
Typical simplified flow of PostgreSQL engine is:

SQL Engine Flow

As part of this blog, I am going to cover all modules marked in yellow colour.

Parser:
Parser module is responsible for syntactical analysis of the query. It constitute two sub-modules:
1. Lexical scanner
2. Bison rules/actions

Lexical Scanner:
Lexical scanner reads each character from the given query and return the appropriate token based on the matching rules. E.g. rules can be as follows:

   

    Name given in the <> is the state name, in the above example <xc> is the state name for the comment start. So once it sees the comment starting character, comment body token will be read in the <xc> state only.

Bison:
Bison reads token returned from scanner and matches the same  against the given rule for a particular query and performs the associated actions. E.g. the bison rule for SELECT statement is:

       

So each returned token is matched with the rule mentioned above in left-right order, if at any time it does not find matching rule, then either it goes to next possible matching rule or throws an error.

Analyzer:
Analyzer module is responsible for doing semantic analysis of the given query.  Each raw information about the query received from the Parser module is transformed to database internal object form to get the corresponding object id. E.g. relation name "tbl" get replaces with its object id.
Output of analyzer module is Query tree, structure of same can be seen in the structure "Query" of file src/include/nodes/parsenodes.h

Optimizer:
Optimizer module also consider to be brain of SQL engine is responsible for choosing the best path for execution of the query. Best path for a query is selected based on the cost of the path. The path with least cost is considered to be a winner path.
Based on the winner path, plan is created which is used by executor to execut

[continue reading]

After discussing the pgbench and TPC-DS results, it's time to look at the last benchmark, testing performance of built-in fulltext (and GIN/GiST index implementation in general).

The one chart you should remember from this post is this one, GIN speedup between 9.3 and 9.4:

fulltext-timing-speedups.png

Interpreting this chart is a bit tricky - x-axis tracks duration on PostgreSQL 9.3 (log scale), while y-axis (linear scale) tracks relative speedup 9.4 vs. 9.3, so 1.0 means 'equal performance', and 0.5 means that 9.4 is 2x faster than 9.3.

The chart pretty much shows exponential speedup for vast majority of queries - the longer the duration on 9.3, the higher the speedup on 9.4. That's pretty awesome, IMNSHO. What exactly caused that will be discussed later (spoiler: it's thanks to GIN fastscan). Also notice that almost no queries are slower on 9.4, and those few examples are not significantly slower.

Benchmark

While both pgbench and TPC-DS are well established benchmarks, there's no such benchmark for testing fulltext performance (as far as I know). Luckily, I've had played with the fulltext features a while ago, implementing archie - an in-database mailing list archive.

It's still quite experimental and I use it for testing GIN/GiST related patches, but it's suitable for this benchmark too.

So I've taken the current archives of PostgreSQL mailing lists, containing about 1 million messages, loaded them into the database and then executed 33k real-world queries collected from postgresql.org. I can't publish those queries because of privacy concerns (there's no info on users, but still ...), but the queries look like this:

SELECT id FROM messages
 WHERE body_tsvector @@ ('optimizing & bulk & update')::tsquery
 ORDER BY ts_rank(body_tsvector, ('optimizing & bulk & update')::tsquery)
          DESC LIMIT 100;

The number of search terms varies quite a bit - the simplest queries have a single letter, the most complex ones often tens of words.

PostgreSQL config

The PostgreSQL configuration was mostly default, with only minor changes:

shared_

[continue reading]

I submitted to PgConf.US. I submitted talks from my general pool. All of them have been recently updated. They are also all solid talks that have been well received in the past. I thought I would end up giving my, "Practical PostgreSQL Performance: AWS Edition" talk. It is a good talk, is relevant to today and the community knows of my elevated opinion of using AWS with PostgreSQL (there are many times it works just great, until it doesn't and then you may be stuck).

I also submitted a talk entitled: "Suck it! Webscale is Dead; PostgreSQL is King!". This talk was submitted as a joke. I never expected it to be accepted, it hadn't been written, the abstract was submitted on the fly, improvised and in one take. Guess which talk was accepted? "Webscale is Dead; PostgreSQL is King!". They changed the first sentence of the title which is absolutely acceptable. The conference organizers know their audience best and what should be presented.

What I have since learned is that the talk submission committee was looking for dynamic talks, dynamic content, and new, inspired ideas. A lot of talks that would have been accepted in years past weren't and my attempt at humor fits the desired outcome. At first I thought they were nuts but then I primed the talk at SDPUG/PgUS PgDay @ Southern California Linux Expo.

I was the second to last presenter on Thursday. I was one hour off the plane. I was only staying the night and flying home the next morning, early. The talk was easily the best received talk I have given. The talk went long, the audience was engaged, laughter, knowledge and opinions were abound. When the talk was over, the talk was given enthusiastic applause and with a definite need for water, I left the room.

I was followed by at least 20 people, if not more. I don't know how many there were but it was more than I have ever had follow me after a talk before. I was deeply honored by the reception. One set of guys that approached me said something to the effect of: "You seem like you don't mind expressing your opinions".

[continue reading]

Database Management Systems uses MVCC to avoid the problem of
Writers blocking Readers and vice-versa, by making use of multiple
versions of data.

There are essentially two approaches to multi-version concurrency.

Approaches for MVCC
The first approach is to store multiple versions of records in the
database, and garbage collect records when they are no longer
required. This is the approach adopted by PostgreSQL and
Firebird/Interbase. SQL Server also uses somewhat similar approach
with the difference that old versions are stored in tempdb
(database different from main database).

The second approach is to keep only the latest version of data in
the database, but reconstruct older versions of data dynamically
as required by using undo. This is approach adopted by Oracle
and MySQL/InnoDB


MVCC in PostgreSQL
In PostgreSQL, when a row is updated, a new version (called a tuple)
of the row is created and inserted into the table. The previous version
is provided a pointer to the new version. The previous version is
marked “expired", but remains in the database until it is garbage collected.

In order to support multi-versioning, each tuple has additional data
recorded with it:
xmin - The ID of the transaction that inserted/updated the
row and created this tuple.
xmax - The transaction that deleted the row, or created a
new version of this tuple. Initially this field is null.

Transaction status is maintained in CLOG which resides in $Data/pg_clog.
This table contains two bits of status information for each transaction;
the possible states are in-progress, committed, or aborted.

PostgreSQL does not undo changes to database rows when a transaction
aborts - it simply marks the transaction as aborted in CLOG . A PostgreSQL
table therefore may contain data from aborted transactions.

A Vacuum cleaner process is provided to garbage collect expired/aborted
versions of a row. The Vacuum Cleaner also deletes index entries
associated with tuples that are garbage collected.

A tuple is visible if its xmin is valid and xmax is

[continue reading]

vagrant-barman-vertical
This is not the first time that 2ndQuadrant has looked at Puppet. Gabriele Bartolini has already written an article in two parts on how to rapidly configure a PostgreSQL server through Puppet and Vagrant, accompanied by the release of the code used in the example on GitHub (https://github.com/2ndquadrant-it/vagrant-puppet-postgresql).

Split into three parts, the aim of this article is to demonstrate automation of the setup and configuration of Barman to backup a PostgreSQL test server.

This article is an update of what was written by Gabriele with the idea of creating two virtual machines instead of one, a PostgreSQL server and a Barman server.

it2ndq/barman is the module released by 2ndQuadrant Italy to manage the installation of Barman through Puppet. The module has a GPLv3 licence and is available on GitHub at the address https://github.com/2ndquadrant-it/puppet-barman. The following procedure was written for an Ubuntu 14.04 Trusty Tahr but can be performed in a similar manner on other distributions.

Requirements

To start the module for Barman on a virtual machine, we need the following software:

Vagrant

Vagrant is a virtual machine manager, capable of supporting many virtualisation softwares with VirtualBox as its default.

We install VirtualBox this way:

$ sudo apt-get install virtualbox virtualbox-dkms

The latest version of Vagrant can be downloaded from the site and installed with the command:

$ sudo dpkg -i /path/to/vagrant_1.7.2_x86_64.deb

Ruby

Regarding Ruby, our advice is to use rbenv, which creates a Ruby development environment in which to specify the version for the current user, thereby avoiding contaminating the system environment. To install rbenv we suggest to use rbenv-installer (https://github.com/fesplugas/rbenv-installer).

Let’s download and execute the script:

$ curl https://raw.githubusercontent.com/fesplugas/rbenv-installer/master/bin/rbenv-installer | bash

At the end, the script will prompt you to append

[continue reading]

Posted by Josh Berkus in pgExperts on 2015-03-17 at 06:14:00
In 2008, when Heroku started, there was only one real option for cloud hosting PostgreSQL: roll-your-own on EC2, or a couple other not-very-competitive platforms.  Since then, we've seen the number of cloud hosting providers explode, and added several "PostgreSQL-As-A-Service" providers as well: first Heroku, then Gandi, CloudFoundry, RDS, OpenShift and more.  This has led many of pgExperts' clients to ask: "Where should I be hosting my PostgreSQL?"

So to provide a definitive answer to that question, for the past several weeks I've been doing some head-to-head testing of different cloud hosting options for PostgreSQL.  Even more work has been done by my collaborator, Ruben Rudio Rey of ManageACloud.com.  I will be presenting on the results of this testing in a series of blog posts, together with a series of presentations starting at SCALE and going through pgConf NYC, LinuxFestNorthWest, and culminating at pgCon.   Each presentation will add new tests and new data.

Here's my slides from SCALE, which compare AWS, RDS, and Heroku, if you want to get some immediate data.

What We're Testing

The idea is to run benchmarks against ephemeral instances of PostgreSQL 9.3 on each cloud or service.  Our main goal is to collect performance figures, since while features and pricing are publicly available, performance information is not.  And even when the specification is the same, the actual throughput is not.  From each cloud or service, we are testing two different instance sizes:

Small: 1-2 cores, 3 to 4GB RAM, low throughput storage (compare EC2's m3.medium).  This is the "economy" instance for running PostgreSQL; it's intended to represent what people with non-critical PostgreSQL instances buy, and to answer the question of "how much performance can I get for cheap".

Large: 8-16 cores, 48 to 70GB RAM, high throughput storage (compare EC2's r3.2xlarge).  This is the maximum for a "high end" instance which we could afford to test in our test runs. 

The clouds we're testing or plan to test include:
  • AWS EC2 "roll-you

[continue reading]

Posted by Dimitri Fontaine in 2ndQuadrant on 2015-03-16 at 14:22:00

I was lucky to participate as a speaker to the Nordic PostgreSQL Day 2015 and it's been another awesome edition of the conference. Really smooth, everything has been running as it should, with about one hundred people at the conference.

In action at Nordic pgDay in Copenhaguen

You can get the slides I've been using for my talk at the Nordic pgDay 2015 page on the Conferences pages of this website.

The Nordic pgDay is such a successful conference that I long wanted to have just the same in my area, and so we made pgDay Paris and modeled it against Nordic. It's planned to be all the same, just with a different audience given the location.

April 21st, save the date and join us in Paris

The pgDay Paris welcomes English Speaking speakers and the Call for Papers is now open, so please consider submitting your talk proposal for pgDay Paris by cliking on the link and filling in a form. Not too hard a requirement for being allowed to visit such a nice city as Paris, really!

Though it is a rare occurrence, we have had occasions where we need to purge ALL data from a table. Our preferred is the TRUNCATE TABLE approach because it's orders of magnitude faster than the DELETE FROM construct. You however can't use TRUNCATE TABLE unqualified, if the table you are truncating has foreign key references from other tables. In comes its extended form, the TRUNCATE TABLE .. CASCADE construct which was introduced in PostgreSQL 8.2, which will not only delete all data from the main table, but will CASCADE to all the referenced tables.


Continue reading "DELETE all data really fast with TRUNCATE TABLE CASCADE"
Posted by damien clochard in Dalibo on 2015-03-15 at 15:17:36

Support for the SQL/MED standard was introduced in PostgreSQL 9.1 (2011). Four years later, we now have more the 70 Foreign Data Wrappers (FDW) available, which you can use PostgreSQL to read and write on type of data storage : Oracle, MongoDB, XML files, Hadoop, Git repos, Twitter, you name it, …

A few days ago, during FOSDEM I attended a talk by my colleague Ronan Dunklau about Foreign Data Wrappers in PostgreSQL : Where are we now ?. Ronan is the author of the multicorn extension and during his talk I couldn’t help but thinking the multicorn is probably one of the most underrated piece of code in the PostgreSQL Community. As a quick exercise I started to count all PostgreSQL FDW I knew about… and soon realized there were too many to fit my small memory.

So I went back to the FDW page in PostgreSQL wiki and started to update and clean up the catalog of all the existing FDW and ended with a list of 70 wrappers for PostgreSQL…

Lessons learned

During this process I learned a few things :

  • Almost all the major RDBMS are covered, except DB2. Strangely DB2 is also the only other RDBMS with a SQL/MED implementation. I’m not sure if those 2 facts are related or not.

  • One third of the FDW are written in Python and based on Multicorn. The others are “native” C wrappers. Obviously C wrappers are prefered for the database wrappers (ODBC, Oracle, etc.) for performance reasons. Meanwhile Multicorn is mostly used for query web services (S3 storage,RSS files, etc.) and specific data formats (Genotype files, GeoJSON, etc.)

  • I’m not sure they were meant to be this way, but Foreign Data Wrapper are also used to innovate. Some wrappers are diverted from the original purpose to implement new features. For instance, CitusDB has released a column-oriented storage, others wrappers are going in more exotic directions such as an OS level stat collector, GPU acceleration for seq scan, or a distbributed paralelle query engine, … Most of these project are not suited for production, of course. However it shows how easy you can implem

[continue reading]

Posted by Tomas Vondra in 2ndQuadrant on 2015-03-14 at 19:00:00

I spend more and more time debugging PostgreSQL internals - analyzing bugs in my patches, working on new patches etc. That requires looking at structures used by the internals - optimizer, planner, executor etc. And those structures are often quite complex, nested in various ways so exploring the structure with a simple print gets very tedious very quickly:

(gdb) print plan
$16 = (PlannedStmt *) 0x2ab7dc0
(gdb) print plan->planTree
$17 = (struct Plan *) 0x2ab6590
(gdb) print plan->planTree->lefttree
$18 = (struct Plan *) 0x2ab5cf0
(gdb) print plan->planTree->lefttree->lefttree
$19 = (struct Plan *) 0x2ab5528
(gdb) print plan->planTree->lefttree->lefttree->lefttree
$20 = (struct Plan *) 0x2ab1290
(gdb) print *plan->planTree->lefttree->lefttree->lefttree
$21 = {type = T_SeqScan, startup_cost = 0, total_cost = 35.5, plan_rows = 2550,
        plan_width = 4, targetlist = 0x2ab4e48, qual = 0x0, lefttree = 0x0,
        righttree = 0x0, initPlan = 0x0, extParam = 0x0, allParam = 0x0}
(gdb) print *(SeqScan*)plan->planTree->lefttree->lefttree->lefttree
$22 = {plan = {type = T_SeqScan, startup_cost = 0, total_cost = 35.5,
        plan_rows = 2550, plan_width = 4, targetlist = 0x2ab4e48, qual = 0x0,
        lefttree = 0x0, righttree = 0x0, initPlan = 0x0, extParam = 0x0,
        allParam = 0x0}, scanrelid = 1}

And then you move somewhere else and have to start from the scratch :-(

Fortunately, gdb provides a Python API so that it's possible to extend it quite easily - define new commands, change the way values are printed etc.

I've hacked a small script that defines a new command pgprint that makes it easier to make my life easier. It's not perfect nor the most beautiful code in the world, but you can do this for example:

(gdb) pgprint plan
          type: CMD_SELECT
      query ID: 0
    param exec: 0
     returning: False
 modifying CTE: False
   can set tag: True
     transient: False
  row security: False

     plan tree: 
        -> HashJoin (cost=202.125...342.812 rows=2550 width=16)
                target list:

[continue reading]