PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

The other day I got a link to an interesting post published by Uber, which has caught our attention here at Cybertec: https://eng.uber.com/go-geofence The idea behind geo-fencing is to provide information about an area to users. Somebody might want to find a taxi near a certain location or somebody might simply want to order a […]

The post Beating Uber with a PostgreSQL prototype appeared first on Cybertec - The PostgreSQL Database Company.

As one of the founders of United States PostgreSQL, I have always had a vision of where the corporation should go. Our corporation has continued to move forward and we have achieved some great things. Although it has taken longer than I expected and we are still behind from some of my goals, PostgreSQL has become one of the dominant Open Source databases in the market. By sticking to a tried, true and tested model our community has built strongholds in every sector of business from small mom-and-pops to Wall Street to the Military.

read more

Some time ago, I wrote blogpost about how to pick a task from queue, without locking. It was written in 2013, and as such it couldn't reflect everything we have now in PostgreSQL – namely SKIP LOCKED – which was added to PostgreSQL over year later. Two people mentioned SKIP LOCKED in comments, but it […]
Posted by Craig Ringer in 2ndQuadrant on 2016-05-03 at 00:59

Having recently been doing some debugging work where many watchpoints, conditional breakpoints etc were necessary I’d like to shout out to a really useful tool: The standalone CDT debugger.

It’s part of the Eclipse project, but before you run screaming – it doesn’t require project setup or anything and it serves as a good GUI gdb wrapper. It’s good for working with PostgreSQL because you don’t need to set up a fake IDE project or any such time-wasting business. You just launch the debugger. You’ve still got your .gdbinit with postgresql debug helper macros and everything.

The CDT debugger still gives you access to the gdb console and all the features you’re used to, it just has a good UI for showing the values of locals, watched variables, stack browsing, setup of conditional breakpoints, etc.

Just:

dnf install eclipse-cdt

or

aptitude install eclipse-cdt

then get the backend pid of interest from ps or SELECT pg_backend_pid() and

cdtdebug -a $PID

It supports core files, debugging a program under direct control, and attaching.

The debugger does have a small startup time, unlike gdb, so it’s worth keeping open and detaching/re-attaching as needed when convenient. But it’s well worth it:

eclipse-cdt-standalone-debugger

I’m looking into how to incorporate Tomas’s gdbpg or even better, extension support for displaying a List*‘s contents in the variable browse tree. Even without that it’s pretty handy.

Posted by Eyðun Nielsen on 2016-05-02 at 10:50
Came around to this old stuff - and realizing that it didn't really work :-( Bash loops and pipes can be tricky :-)

So an update was long overdue:

#!/bin/bash
# PostgreSQL backup script
# Run as postgres user

BACKUP_HOST=replace-with-your-backup-host
BACKUP_HOST_USER=replace-with-your-backup-host-user
BACKUP_HOST_PATH=replace-with-your-backup-host-/path/to/backup/of/postgresql

# For all running PostgreSQL clusters
while read cluster;
do
# Create cluster-path on backuphost
ssh -n -q $BACKUP_HOST_USER@$BACKUP_HOST "mkdir -p $BACKUP_HOST_PATH/$cluster";
# Global stuff from cluster: roles etc.
pg_dumpall --globals-only | ssh -n -q $BACKUP_HOST_USER@$BACKUP_HOST "dd of=$BACKUP_HOST_PATH/$cluster/globals.sql" > /dev/null 2>&1;
# And then each database (except templates)
while read databasename;
do
pg_dump --cluster $cluster --format=c -- $databasename | ssh -n -q $BACKUP_HOST_USER@$BACKUP_HOST "dd of=$BACKUP_HOST_PATH/$cluster/$databasename.sqlc" > /dev/null 2>&1;
done < <(psql --cluster $cluster --no-align --tuples-only --command="SELECT datname from pg_database WHERE NOT datistemplate");
done < <( awk -- '{ print $1"/"$2 }' <(pg_lsclusters --no-header | grep online) ) 

The new feature version of pglogical is now available. The new release brings support for sequence replication, manually configured parallel subscriptions, replica triggers, numerous usability improvements and bug fixes. Let’s look into the changes in more detail.

Sequences support

As the title of this post says, pglogical now supports sequence replication. While I am sure this is good news for everybody, for me the main importance of this feature is delivering better experience with the zero-downtime upgrades. With the first version of pglogical you’d have to manage sequences manually after the data were transfered which complicates the whole procedure, now they just get replicated.

PostgreSQL does not provide any infrastructure for capturing sequence changes (not for lack of trying, it just turns out that it’s hard to add support for it). So we use similar trick as one of the previous replication projects I worked on. We periodically capture state current state of the sequence and send update to the replication queue with some additional buffer. This means that in normal situation the sequence values on subscriber will be ahead of those on provider. We try to dynamically size the buffer so that the value is as close as possible but also does not fall behind on frequently updated sequences. It is however advisable to force the sequence update using the pglogical.synchronize_sequence() function after big data loads or upgrade.

Other than the above, the sequence replication works similarly to the way tables work. This includes the replication sets, so there are functions for adding and removing sequences to replication sets and subscribers can receive updates only for some sequences.

Parallel subscriptions

Another major change is support for multiple subscriptions between one pair of node. In pglogical 1.0 there was strict limit on single subscription between one pair of nodes. You could replicate from one node to multiple nodes or from multiple nodes to one node but no parallel subscriptions between the same two

[...]
This post is in continuation of my previous post. Here I would explain main transaction working from code flow perspective. Sub-transaction, MVCC and other related details will be covered in subsequent posts. Request to see my previous post Basic of Transaction in order to get better understanding. 
Internally each transaction is represented as unique number in increasing order with an exception in-case of overflow. Also whole of transaction flow is tracked using various state as explained below:

Command Execution:

Usually each command execution has 3 steps as:
  • StartTransactionCommand
  • Do the operation specific to command
  • CommitTransactionCommmand
As mentioned in my earlier post, if transaction is not started explicitly then it will be started internally for any command execution. So this forms two way of startign the transaction. Below it explain both of the case and corresponding state transition involved. 

Meaning  of below nomenclature
                    X ====action====>Y
Current state is X, in this it does operation "action" and gets transferred to state "Y". In-case there is no action mentioned, means in current state it does not do any operation, it directly moves to state "Y".
There are mainly 3 actions mentioned below, the purpose of each actions are as below:
  • StartTransaction: Assign resources in terms of memory, initializes the guc variable specific, initialize the transaction properties like read-only transaction or read-write, create a new resource owner etc.
  • CommitTransaction: Undo all initialization done by StartTransaction. Execute any pending triggers, handle all on commit action if any. Generate a COMMIT WAL record and insert same in WAL buffer, Update commit TS data. Then depending on synchronous commit or asynchronous commit configured, wait for response from standby node or directly flush the transaction to CLOG respectively.
  • AbortTransaction: Almost similar to CommitTransaction except it writes an ABORT WAL, marks the transaction as ABORTED.

State transition mentioned in subseque
[...]
Posted by Rajeev Rastogi on 2016-04-30 at 12:19
Have you ever though about how below classic example works:
           You swipe your credit card in a shop and money gets debited from account but overall billing fails. In that case your money gets credited back to your account instantly.

If you can already relate this to one of the key property of Database called Transaction, then you already have some head start for this topic. If not then no need to worry, I will take you from very basic of Transactions.

Background

Transactions are one of the most fundamental key concept for any database including PostgreSQL. It enables multiple step actions to be grouped together in such a way that either all step executions are accepted or all are rejected. If all of step executions are success then as a group whole actions will be considered as success. If any of the step execution fails in between, then it will make sure to nullify the effect of all other previous successful steps. 
Consider an example of bank fund transfer, where in there is table consisting of customer name, their balance etc. There is a customer 'A' who wants to transfer 1000Rs to customer 'B'. In terms of SQL it will look like:

            UPDATE accounts SET balance = balance - 1000 where cust_name='A';
            UPDATE accounts SET balance = balance + 1000 where cust_name='B';

There are two steps involved in overall fund transfer activity and as per the above definition of transaction, either both of them should succeed i.e. once 1000Rs debited from A's account it must be credited to B's account or there should not be even debit from A's account. Also there should not be case where debit from A's account failed but B's account got credited with 1000Rs, in which case bank will not be happy.
Just explained property of relation is called as "Atomicity", there are other 3 additional properties of each transaction explained below:

Properties

There are four mandatory properties of each transaction:

Atomicity: 

This property make sure that all changes to data are performed as if they are a single

[...]
Posted by Shaun M. Thomas on 2016-04-29 at 19:46

Having run into a bit of a snag with Postgres-XL, and not wanting to be dead in the water with our project, I went on a bit of a knowledge quest. Database scaling is hard, so I expected a bunch of either abandoned or proprietary approaches. In addition, as a huge fans of Postgres, compatibility or outright use of the Postgres core was a strict prerequisite.

So, what options are out there? Is there even anything worth further investigation? Maybe more importantly, what do you do when you’re under a bit of a scheduling constraint? Projects need to move forward after all, and regardless of preferences, sometimes concessions are necessary. The first step was obviously the list of databases derived from Postgres.

At first glance, that’s a pretty big list. If we look carefully though, we can see that quite a few of those projects were abandoned years ago. Others are commercial, not based on scalability, or both. Being commercial isn’t automatically a disqualification, but most of the commercial options were forked from practically ancient versions of Postgres and never kept up compatibility, or don’t mention the version at all. Amazon Redshift fits that profile, being based on Postgres 8.0, which few would want to use these days. Fujitsu Enterprise is another, which doesn’t even list which version they’re based on, nor do they provide a download for testing purposes.

What’s left? It’s hard to tell from the Wiki page, so I just started with the projects that include some kind of data scaling not based on replication. These candidates present a longer list than I’d initially anticipated, which is always a good problem to have!

Let’s scrutinize the nominees.

CitusDB

It’s not really a secret that CitusDB and Postgres-XL are both tackling the same problem, and are currently the top two contenders. Unlike Postgres-XL and its approach of extending SQL syntax to directly embrace data distribution, CitusDB is actually just a Postgres extension.

As a result, it’s a bit more janky. There’s no CREATE TABLE ... DISTRIBUTE BY magic

[...]
Posted by Paul Ramsey in PostGIS on 2016-04-29 at 18:00

I’ve had a productive couple of weeks here, despite the intermittently lovely weather and the beginning of Little League baseball season (not coaching, just supporting my pitcher-in-training).

13 Days

The focus of my energies has been a long-awaited (by me) update to the OGR FDW extension for PostgreSQL. By binding the multi-format OGR library into PostgreSQL, we get access to the many formats supported by OGR, all with just one piece of extension code.

As usual, the hardest part of the coding was remembering how things worked in the first place! But after getting my head back in the game the new code flowed out and now I can reveal the new improved OGR FDW!

OGR FDW Update

The new features are:

  • Column name mapping between OGR layers and PgSQL tables is now completely configurable. The extension will attempt to guess mapping automagically, using names and type consistency, but you can over-ride mappings using the table-level column_name option.
  • Foreign tables are now updateable! That means, for OGR sources that support it, you can run INSERT, UPDATE and DELETE commands on your OGR FDW tables and the changes will be applied to the source.

    • You can control which tables and foreign servers are updateable by setting the UPDATEABLE option on the foreign server and foreign table definitions.
  • PostgreSQL 9.6 is supported. It’s not released yet, but we can now build against it.
  • Geometry type and spatial reference system are propogated from OGR. If your OGR source defines a geometry type and spatial reference identifier, the FDW tables in PostGIS will now reflect that, for easier integration with your local geometry data.
  • GDAL2 and GDAL1 are supported. Use of GDAL2 syntax has been made the default in the code-base, with mappings back to GDAL1 for compatibility, so the code is now future-ready.
  • Regression tests and continuous integration are in place, for improved code reliability. Thanks to help from Even Roualt, we are now using Travis-CI for integration testing, and I’ve enabled a growing number of integration tests.

As usual, I’m in d

[...]
Posted by Andrew Dunstan on 2016-04-29 at 12:34
I just committed a patch to allow building with Visual Studio 2015. Due to a change in the way we need to detect locales with this compiler, it will not be possible to use binaries built with it on Windows XP. Despite the fact that Microsoft declared Windows XP to be completely out of support well over a year ago, it still lives on in a huge number of installations. My own instance still gets occasional updates from Microsoft. And you can still build and run the very latest PostgreSQL on Windows XP. But you can't use it on Windows XP if it's built with Visual Studio 2015 or later.

I will keep my instance (whose only job is to run several buildfarm members) running as long as it doesn't require any significant tweaks. But a day will eventually come when it does require such tweaks, or just won't build and run successfully any more,  and at that stage I'll shut it down.


Postgres has a wonderful feature called concurrent indexes. It allows you to create indexes on a table without blocking reads OR writes, which is quite a handy trick. There are a number of circumstances in which one might want to use concurrent indexes, the most common one being not blocking writes to production tables. There are a few other use cases as well, including:


Photograph by Nicholas A. Tonelli

  • Replacing a corrupted index
  • Replacing a bloated index
  • Replacing an existing index (e.g. better column list)
  • Changing index parameters
  • Restoring a production dump as quickly as possible

In this article, I will focus on that last use case, restoring a database as quickly as possible. We recently upgraded a client from a very old version of Postgres to the current version (9.5 as of this writing). The fact that use of pg_upgrade was not available should give you a clue as to just how old the "very old" version was!

Our strategy was to create a new 9.5 cluster, get it optimized for bulk loading, import the globals and schema, stop write connections to the old database, transfer the data from old to new, and bring the new one up for reading and writing.

The goal was to reduce the application downtime as much as reasonably possible. To that end, we did not want to wait until all the indexes were created before letting people back in, as testing showed that the index creations were the longest part of the process. We used the "--section" flags of pg_dump to create pre-data, data, and post-data sections. All of the index creation statements appeared in the post-data file.

Because the client determined that it was more important for the data to be available, and the tables writable, than it was for them to be fully indexed, we decided to try using CONCURRENT indexes. In this way, writes to the tables could happen at the same time that they were being indexed - and those writes could occur as soon as the table was populated. That was the theory anyway.

The migration went smooth - the data was transferred over quickly,

[...]
Posted by Josh Berkus in pgExperts on 2016-04-28 at 18:17
This StackOverflow question reminded me of this old blog post, which is still relevant today:

pg_log, pg_xlog and pg_clog


There are three directories in a default $PGDATA directory when you create it which are named "pg_*log".


pg_log


$PGDATA/pg_log is the default location for the database activity logs, which include error messages, query logging, and startup/shutdown messages.  This is where you should first look for information when PostgreSQL won't start.  Many Linux distributions and other packaging systems relocate this log directory to somewhere like /var/log/postgresql.

You can freely delete, rename, compress, and move files in pg_log without penalty, as long as the postgres user still has rights to write to the directory. If pg_log becomes bloated with many large files, you probably need to decrease the number of things you're logging by changing the settings in postgresql.conf.

Do note that if you "delete" the current log file on a Linux or Unix system, it may remain open but not accessible, just sending any successive log messages to /dev/null until the file rotates.

pg_xlog 


$PGDATA/pg_xlog is the PostgreSQL transaction log.  This set of binary log files, with names like '00000001000000000000008E', contain images of the data from recent transactions.  These logs are also used for binary replication.

If replication, archiving, or PITR is failing, this directory can become bloated with gigabytes of logs the database server is saving for when archiving resumes. This can cause you to run out of disk space
.
Unlike pg_log, you may not freely delete, move, or compress files in this directory.  You may not even move the directory without symlinking it back to its original location.  Deleting pg_xlog files may result in unrecoverable database corruption.

If you find yourself in a situation where you've got 100GB of files in pg_xlog and the database won't start, and you've already disabled archiving/replication and tried clearing disk space every other way, then please take two steps:
  1. Move files fro
[...]

Scalable Real-time Product Search using PostgreSQL with Citus

Product search is a common, yet sometimes challenging use-case for online retailers and marketplaces. It typically involves a combination of full-text search and filtering by attributes which differ for every product category. More complex use-cases may have many sellers that offer the same product, but with a different price and different properties.

PostgreSQL has the functionality required to build a product search application, but with a large product catalog scaling it can be difficult. With the Citus extension, PostgreSQL can distribute tables and parallelize queries across many servers, making it easy to scale out your memory and compute power. Couple Citus with PostgreSQL's full-text search and it becomes fast and easy to do interactive searches on a large product catalog. While the search functionality is not as comprehensive as in dedicated search solutions, a huge benefit of keeping the data in PostgreSQL is that it can be updated in real-time and tables can be joined. This post will go through the steps of setting up an experimental products database with parallel search functions using PostgreSQL and Citus, with the goal of showcasing several powerful features. 

We start by setting up a multi-node Citus cluster on EC2 using 4 m3.2xlarge instances as workers. An even easier way to get started is to use the brand new Citus Cloud, which gives you a managed PostgreSQL cluster with Citus and full auto-fail-over. The main table in our database schema is the "product" table, which contains the name and description of a product, its price, and attributes in JSON format such that different types of products can use different attributes:

$ psql
CREATE TABLE product (
  product_id int primary key,
  name text not null,
  description text not null,
  price decimal(12,2),
  attributes jsonb
);

To distribute the table using Citus, we call the functions for hash-distributing the table into 16 shards (one per physical core). The shards are distributed

[...]
Posted by Oskari Saarenmaa on 2016-04-28 at 10:38
PGHoard is the cloud backup and restore solution we're using in Aiven. We started PGHoard development in early 2015 when the Aiven project was launched as a way to provide real-time streaming backups of PostgreSQL to a potentially untrusted cloud object storage.

PGHoard has an extensible object storage interface, which currently works with the following cloud object stores:
  • Amazon Web Services S3
  • Google Cloud Storage
  • OpenStack Swift
  • Ceph's RADOSGW utilizing either the S3 or Swift drivers 
  • Microsoft Azure Storage (currently experimental)
  •  

Data integrity

PostgreSQL backups consist of full database backups, basebackups, plus write ahead logs and related metadata, WAL. Both basebackups and WAL are required to create and restore a consistent database.

PGHoard handles both the full, periodic backups (driving pg_basebackup) as well as streaming the write-ahead-log of the database.  Constantly streaming WAL as it's generated allows PGHoard to restore a database to any point in time since the oldest basebackup was taken.  This is used to implement Aiven's Database Forks and Point-in-time-Recovery as described in our PostgreSQL FAQ.

To save disk space and reduce the data that needs to be sent over the network (potentially incurring extra costs) backups are compressed by default using Google's Snappy, a fast compression algorithm with a reasonable compression ratio. LZMA (a slower algorithm with very high compression ratio) is also supported.

To protect backups from unauthorized access and to ensure their integrity PGHoard can also transparently encrypt and authenticate the data using RSA, AES and SHA256.  Each basebackup and WAL segments gets a unique random AES key which is encrypted with RSA.  HMAC-SHA256 is used for file integrity checking.

Restoring is the key

As noted in the opening paragraph, PGHoard is a backup and restore tool: backups are largely useless unless they can be restored.  Experience tells us that backups, even if set up at some point, are usually not restorable unless restore is routinely teste[...]

There are a lot of amazing features coming in PostgreSQL 9.6, but I’m personally very happy about a really small, simple one that helps close a long-standing user foot-gun.

commit a31212b429cd3397fb3147b1a584ae33224454a6
Author: Robert Haas 
Date:   Wed Apr 27 13:46:26 2016 -0400

    Change postgresql.conf.sample to say that fsync=off will corrupt data.

    Discussion: 24748.1461764666@sss.pgh.pa.us
    
    Per a suggestion from Craig Ringer.  This wording from Tom Lane,
    following discussion.

There’s a bit of terrible advice floating around that turning fsync=off will make PostgreSQL run faster. Which is true, as far as it goes, but neglects that little risk massive data corruption on crash part. So users get bitten. I’ve tried to scrub as much of that advice from the Internet as I can or get it qualified with warnings, but I still see people advised to do it on random forums sometimes.

The docs do a good job of explaining that setting fsync=off is a bad idea, but the sample config file made it seem pretty innocuous:

#fsync = on                             # turns forced synchronization on or off

so users keep turning it off. Then teams like 2ndQuadrant land up doing expensive forensic data recovery on their database clusters once they crash weeks, months or years later and experience severe corruption… or notice the corruption they experienced a while ago. Or users just write off their data and start again because it’s all too hard and expensive.

To make turning fsync=off a little less attractive the sample now reads:

#fsync = on                            # flush data to disk for crash safety
                                       # (turning this off can cause
                                       # unrecoverable disk corruption)

It won’t stop someone using ALTER SYSTEM SET without realising, but there’s only so much you can do.

I’m really happy about this. It’s nice to knock off such minor improvements that have a disproportionate impact on usability and UX.

If you are ever tempted to set fsy

[...]

I find myself quite often having to split a large write operation such as updating millions of rows in a heavily used table. This usually boils down to writing a small Perl script that runs a small one-off function that performs the task by updating a limited numbers of rows per run, and then committing in between to avoid a long running transaction.

This time I needed to do this I decided to not write yet another Perl script and to instead do something to improve the situation. I’ve not been able to find any cron-ish tools at all to run database functions, except for pgAgent, but that looks more like a client-side daemon, I wanted something bare-minimum that can run server-side and piggy-back on the OS cron.

A few hours later and 382 lines of code later, the project was finished and pushed to Github:

https://github.com/trustly/pgcronjob

createuser pgcronjob
psql -f install.sql
crontab pgcronjob.crontab
psql

CREATE OR REPLACE FUNCTION public.CronJob_Function_Template_Skeleton()
RETURNS batchjobstate
LANGUAGE plpgsql
SET search_path TO public, pg_temp
AS $FUNC$
DECLARE
BEGIN
RAISE NOTICE 'Hello world!';
PERFORM pg_sleep(random());
RAISE NOTICE 'Slept for a while.';
IF random() < 0.5 THEN
 -- Tell CronJob() we have more work to do and we want it to run us again in due time
 RAISE NOTICE 'See you again!';
 RETURN 'AGAIN';
ELSIF random() < 0.5 THEN
 -- Throw error to CronJob() to test errors
 RAISE EXCEPTION 'Simulate error in CronJob function';
ELSE
 -- Tell CronJob() we're done and we don't want it to run us ever again
 RAISE NOTICE 'Bye world!';
 RETURN 'DONE';
END IF;
END;
$FUNC$;

GRANT EXECUTE ON FUNCTION public.CronJob_Function_Template_Skeleton() TO pgcronjob;

SELECT CronJob_Register('public','cronjob_function_template_skeleton');

 

Hopefully this will be useful for others as well. It would be fun to get some feedback.

The initial commit even comes with a nice README and install and uninstall scripts.

Extract from the README:

The settings are conditions that must all be TRUE for the cronjob to run, i.e. 
[...]
On 8th of April, Alvaro Herrera committed patch: Support \crosstabview in psql   \crosstabview is a completely different way to display results from a query: instead of a vertical display of rows, the data values are placed in a grid where the column and row headers come from the data itself, similar to a spreadsheet. […]
Posted by Pavel Golub in MicroOLAP on 2016-04-27 at 09:10

If you ever need quick help with PostgreSQL from other people (sure, you should check manuals first) just remember one link POSTGRES.chat.

On this page you will find links to online chats, e.g. Slack, Gitter.im, Telegram, IRC etc.

Thanks PostgreSQL in Russia for hint.


Filed under: PostgreSQL Tagged: community, development, PostgreSQL, support
This year, we're continuing to experiment with new formats for the pgCon unconference.  In 2013 and 2014 we had an Unconference on the Saturday of pgCon.  In 2015 we had a limited Developer Unconference on Wednesday.

This year, we will have a Developer Unconference on Wednesday, and a User Unconference on Saturday.  We're doing this because people were disappointed that we didn't do the User Unconference last year, and asked us to bring it back.  So, hopefully you planned to stay over Saturday!

The User Unconference has several purposes:

  • to give various teams and special interest groups an opportunity to schedule something
  • to let folks whose technology was released too late for the CfP another chance to present something
  • to continue discussions started around talks in the main program
So, please join us!  And if you have ideas for User Unconference sessions which you want to make sure get on the program, please list them on the wiki page using the template provided.  Note that final sessions will be chosen at 10am Saturday morning, though.


After spending pretty much the entire week on freeing a customer from the chains of Oracle’s license policy, I found an interesting issue, which made me smile: When the Oracle code was built 15 years ago or so, somebody created a table called “pg_type” on the Oracle side as part of the application. With only […]

The post Migrating from Oracle: One word about pg_type appeared first on Cybertec - The PostgreSQL Database Company.

Posted by Bruce Momjian in EnterpriseDB on 2016-04-25 at 20:30

Having worked on Postgres for twenty years, I have seen a lot of features added. However, I usually evaluate new features in isolation and rarely see the big picture of how they fit together. Last month I created a new talk, Non-Relational Postgres, which presents a comprehensive overview of non-relational Postgres storage.

The consistent use of @> (containment operator) listed on slide 70 is interesting. It also has good coverage of how character strings can be split apart and indexed.

Posted by Joshua Drake in CommandPrompt on 2016-04-25 at 18:10
I attended the fantastically presented PgConf US 2016 last week. An amazing conference, my training was well attended, my talk was at capacity, the 20th Anniversary Party was phenomenal and the conference raised money for an excellent cause. There were over 435 attendees, giving our brothers and sisters at PgConf EU something to work for during their conference in November.

While attending the hallway track, I was talking to a gentleman whose name escapes me. He asked me how he could contribute to the community. I am always excited to have that conversation because we are able to discuss all kinds of different ways to contribute, whether it be social (user groups, pgday, speaking at alternative conferences), documentation, code, or infrastructure. Bringing people to the community truly inspires me to continue building our community into something special.

Then he said, "I talked to X and he said, 'Why would I tell you that, you are my competitor?'" I was stunned. It is a sad and unacceptable ideology to hold within any open source community. It is an ideal that should be left to bad business practices from past decades. It is a horrible statement to make to any potential community member and speaks to a less than stellar understanding of how our community works.

If any person asks you how they can help with .Org, you should answer them honestly and enthusiastically. If you don't know how they can help, find a contributor and help the new person get connected. You should do this for any person looking to contribute, no matter who they work for (even if it is a competitor). Any other way of handling it is taking away from the community and is a testament to a selfishness that has no place here.

We are an inclusive community and that means we recognize something greater than ourselves. We recognize the power of a sustainable, productive .Org that is a collaborative space without the belittling behavior described in this post. I strongly hope that we can eliminate negative behavior such as this and become the most un

[...]

pg_rewind, a utility introduced in PostgreSQL 9.5 allowing one to reconnect a master to one of its promoted standbys, making unnecessary the need of a new base backup and hence saving a huge amount of time, has gained something new in Postgres 9.6 with the following commit:

commit: e50cda78404d6400b1326a996a4fabb144871151
author: Teodor Sigaev <teodor@sigaev.ru>
date: Tue, 1 Dec 2015 18:56:44 +0300
Use pg_rewind when target timeline was switched

Allow pg_rewind to work when target timeline was switched. Now
user can return promoted standby to old master.

Target timeline history becomes a global variable. Index
in target timeline history is used in function interfaces instead of
specifying TLI directly. Thus, SimpleXLogPageRead() can easily start
reading XLOGs from next timeline when current timeline ends.

Author: Alexander Korotkov
Review: Michael Paquier

The following document gives a short introduction about pg_rewind. You may want to look at it before moving on with this post and understand why here is described a damn cool feature. First, before entering in the details of this feature, let’s take the example of the following cluster:

                            Node 1
                            /    \
                           /      \
                          /        \
                       Node 2   Node 3
                       /
                      /
                     /
                  Node 4

Node 1 is a master node, with two standbys directly connected to it, presented here as nodes 2 and 3. Finally node 4 is a cascading standby connected to node 2. Then say that all the nodes have been successively promoted. At promotion, a standby finishes recovery and jumps to a new timeline to identify a new series of WAL record generated by what is now a master node to avoid overlapping WAL records from another node. The promotion of each node in the cluster is done in the following order, and results in the following timelines being taken by each node:

  • Node 2 promotes, at the end of recover
[...]
Posted by Álvaro Herrera in 2ndQuadrant on 2016-04-25 at 10:34

Over at pgsql-general, Bráulio Bhavamitra asks:

I wonder if there is any plans to move postgresql entirely to a columnar store (or at least make it an option), maybe for version 10?

This is a pretty interesting question. Completely replacing the current row-based store wouldn’t be a good idea: it has served us extremely well and I’m pretty sure that replacing it entirely with a columnar store would be disastrous performance-wise for OLTP use cases.

Some columns

Some columns. Picture courtesy of Yiming Sun on Flickr

That doesn’t mean columnar stores are a bad idea in general — because they aren’t. They just have a more limited use case than “the whole database”. For analytical queries on append-mostly data, a columnar store is a much more appropriate representation than the regular row-based store, but not all databases are analytical.

However, in order to attain interesting performance gains you need to do a lot more than just change the underlying storage: you need to ensure that the rest of the system can take advantage of the changed representation, so that it can execute queries optimally; for instance, you may want aggregates that operate in a SIMD mode rather than one-value-at-a-time as it is today. This, in itself, is a large undertaking, and there are other challenges too.

As it turns out, there’s a team at 2ndQuadrant working precisely on these matters. We posted a patch last year, but it wasn’t terribly interesting — it only made a single-digit percentage improvement in TPC-H scores; not enough to bother the development community with (it was a fairly invasive patch). We want more than that.

In our design, columnar or not is going to be an option: you’re going to be able to say Dear server, for this table kindly set up columnar storage for me, would you? Thank you very much. And then you’re going to get a table which may be slower for regular usage but which will rock for analytics. For most of your tables the current row-based store will still likely be the best option, because row-based storage is much bette

[...]
Posted by gabrielle roth on 2016-04-23 at 01:27

What a great meeting last night! Made me feel a bit better about missing PgConf.US:)

We had a good turnout, too; several first-timers showed up, some hauling themselves all the way in from Nike & Intel.

Thanks to Eric Ferreira, AWS Database Engineer, and Tony Gibbs, AWS Solutions Architect, for coming all the way down from Seattle to give this excellent talk. Eric is the originator of Redshift (it’s based on Pg 8.0.4), and he shared with us some of the features & why he made some of the decision he did.

Special thanks goes to Veronika Megler, RDS Professional Services team, for proposing the topic and arranging for Eric & Tony to come on down to PDX.

We’re already working on a return visit for late summer, so if you missed this meeting, you will get another chance.


On 7th of April, Teodor Sigaev committed patch: Phrase full text search.   Patch introduces new text search operator (<-> or <DISTANCE>) into tsquery. On-disk and binary in/out format of tsquery are backward compatible. It has two side effect: - change order for tsquery, so, users, who has a btree index over tsquery, should reindex […]
Posted by Shaun M. Thomas on 2016-04-22 at 19:36

Postgres is a great tool for most databases. Larger installations however, pretty much require horizontal scaling; addressing multi-TB tables relies on multiple parallel storage streams thanks to the laws of physics. It’s how all immense data stores work, and for a long time, Postgres really had no equivalent that wasn’t a home-grown shard management wrapper. To that end, we’ve been considering Postgres-XL as a way to fill that role. At first, everything was going well. Performance tests showed huge improvements, initial deployments uncovered no outright incompatibilities, and the conversion was underway.

Then we started to use it.

Much to our chagrin, dev and test environments aren’t always indicative of real scenarios and utilization patterns. Once Postgres-XL reached our semi-production environment, everything started to go wrong. Here’s a list of what happened, and why our 43TB (and growing) database can’t use Postgres-XL until it matures a bit more. As a note, all of the examples listed here were observed using the latest 9.5r1 release.

I also want to strongly emphasize that this is not an indictment of Postgres-XL. Postgres greatly needs the capabilities it adds to truly break into the VLDB market. Sometimes though, projects aren’t as capable as we originally hoped, and subtle issues aren’t always readily apparent until software is field-tested. We’re still not ruling it out.

That said, on to the adventure.

Trigger Support

In short, there isn’t any. This at least, was something we knew about. In our case, it wasn’t a problem because a VLDB installation is commonly a warehouse of some description. These tend to depend on ETL or other source scripts to supply data that’s already prepared and doesn’t rely on post-insert transformations.

For anyone else, this could be a roadblock. Trigger support isn’t always essential to running an active database, but they’re common enough that a conversion process will have to contend with replacing them. This missing feature practically relegates Postgres-XL to warehouse u

[...]
On 4th of April, Tom Lane committed patch: Add a \gexec command to psql for evaluation of computed queries.   \gexec executes the just-entered query, like \g, but instead of printing the results it takes each field as a SQL command to send to the server. Computing a series of queries to be executed is […]

We gave a PostGIS Intro Training and a PostGIS talk at PGConfUS 2016 in Brooklyn, New York and just got back. A number of people asked if we'd make the slides and material available. We have these posted on our presentation page: http://www.postgis.us/presentations and will be putting on the PostgreSQL Wiki as well in due time. There will be a video coming along for the talk, but the training was not recorded.

We also have two more talks coming up in North Carolina in Early May at FOSS4G NA 2016 - one on PostGIS Spatial Tricks which has more of a GIS specialist focus than the top 10 talk we gave, but there will be some overlap. The other talk is a topic a couple of people asked us in training and after our talk, on routing along constrained paths. If you are attending FOSS4G NA 2016, you won't want to miss our talk pgRouting: A Crash Course which is also the topic of our upcoming book.

Just like FOSS4G NA 2015, there is a pgDay track which is PostgreSQL specific material, useful to a spatial crowd, but not PostGIS focused.


Continue reading "PGConfUS 2016 PostGIS slides and tutorial material"