PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

Join the conversation to build better GUI for Postgres !

Last year, we organized a meeting in Warsaw, Poland to discuss and share experience on how to improve the PostgreSQL user experience. You can find details about this meeting on the PostgreSQL wiki page below:

https://wiki.postgresql.org/wiki/PostgreSQL_Graphic_Tools_Developper_Meeting_2017

During the meeting, we talked about creating a dedicated communication channel for Developers and Graphic Designers inside the PostgreSQL community to facilitate discussions, ideas, feedbacks and further meetings.

We’ve asked the Postgres admin team to open a dedicated mailing list and they kindly accepted to create a new list called : pgsql-gui-dev@lists.postgresql.org

To join this list, please go to : https://lists.postgresql.org/manage/ to register and subscribe.

This mailing list is intended to be a place where developpers from various projects can talk about UI design, UX guidelines, User Research, API, common libaries, etc. with the same principles that prevails in the Postgres community : diversity, openess, sharing, respect and self-organization. Please also note that this new mailing list should NOT be used as a end-user support channel for a specific tool.

When you join the mailing list, please take a few minutes to present yourself and your projet and explain what you tool is for, who are your users, what problems you’d like to solve, etc.

See you there !

Posted by Stuart Bishop on 2018-01-17 at 06:28

New features and improvements to Juju 2.3’s storage feature provide new mechanisms for server migrations. A PostgreSQL deployment can be made with the data stored on an attached volume, such as a Ceph mount or Amazon EBS volume. To migrate to a new instance, we can bring up new units in the same or new Juju model, or even with a new Juju controller, and reuse the storage volume to bring our data across. While this has been possible for a while, it was an ad hoc process that needed to be performed manually or with non-standard tools like the (now deprecated) BlockStorageBroker charms. With Juju 2.3, the process becomes smooth and can be managed with Juju. Charms like PostgreSQL now have standard mechanisms they can use to support these sorts of processes, and creates opportunities for new features such as major version upgrades. Starting with a configured Juju controller and a fresh model, PostgreSQL can easily be deployed using Juju, with Juju managing storage. Using Amazon for this example, this will deploy a PostgreSQL instance with 50GB of attached EBS storage:

juju deploy cs:postgresql --storage pgdata=ebs,50G

While it is possible to attach storage after the initial deploy, for PostgreSQL it is best to specify storage at deployment time. This way, new units will also be deployed with the same attached storage and have enough space to replicate the database from the primary. So to add a hot standby unit:

juju add-unit postgresql

After things settle, you end up with a deployment like this:

$ juju status

Model Controller Cloud/Region Version SLA
 rightsaidfred aws-ap-southeast-2 aws/ap-southeast-2 2.3.1 unsupported

App Version Status Scale Charm Store Rev OS Notes
 postgresql 9.5.10 active 2 postgresql jujucharms 164 ubuntu

Unit Workload Agent Machine Public address Ports Message
 postgresql/0* active idle 0 13.211.42.219 5432/tcp Live master (9.5.10)
 postgresql/1  active idle 1 52.65.20.140  5432/tcp Live secondary (9.5.10)

Machine State DNS Inst id Series AZ Message
 0 started 13.211.42.219 i-0fc0f0a
[...]
Posted by Regina Obe in PostGIS on 2018-01-17 at 00:00

The PostGIS development team is pleased to provide bug fix release 2.3.6 and 2.4.3 for the 2.3 and 2.4 stable branches.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

2.3.6

2.4.3

Posted by Bruce Momjian in EnterpriseDB on 2018-01-15 at 19:45

In the past few months I have completed four new security talks, totaling 294 slides. The first and third talks explain the fundamentals of cryptography and cryptographic hardware, respectively. The second and fourth talks cover application of these fundametals. The second talk covers TLS, including the use of SSL certificates by Postgres. The fourth covers the use of cryptographic hardware by applications, including Postgres.

I recently had an interesting support case that shows how the cause of a problem can sometimes be where you would least suspect it.

About table bloat

After an UPDATE or DELETE, PostgreSQL keeps old versions of a table row around. This way, concurrent sessions that want to read the row don’t have to wait. But eventually this “garbage” will have to be cleaned up. That is the task of the autovacuum daemon.

Usually you don’t have to worry about that, but sometimes something goes wrong. Then old row versions don’t get deleted, and the table keeps growing. Apart from the wasted storage space, this will also slow down sequential scans and – to some extent – index scans.

To get rid of the bloat, you can use VACUUM (FULL) and other tools like pg_squeeze. But it is important to find and fix the cause of the table bloat so that it does not reappear.

The problem

I got called by a customer who experienced table bloat in the pg_attribute system catalog table that contains the table column metadata.

This can happen if table columns get modified or dropped frequently. Most of the time, these are temporary tables which are automatically dropped when the session or transaction ends.

The customer made heavy use of temporary tables. But they have several Linux machines with databases that experience the same workload, and only some of those had the problem.

Searching the cause

I went through the list of common causes for table bloat:

  • Database transactions that remain open (state “idle in transaction”).
    These will keep autovacuum from cleaning up row versions that have become obsolete after the start of the transaction.
  • A rate of data modification that is so high that autovacuum cannot keep up.
    In this case, the correct answer is to make autovacuum more aggressive.

Both were not the case; the second option could be ruled out because that would cause bloat on all machines and not only on some.

Then I had a look at the usage statistics for the affected table:


dbname=> SELECT * FROM pg_stat_sys_tables
dbname->          WHERE relname
[...]

ElasticSearch is one of the most popular solutions for Full-Text Search. ElasticSearch is based on Apache Lucene, powerful and flexible framework for text processing and provides additional components to it such as indices and easy-to-use REST API. On the other hand, PostgreSQL also provides FTS features and widely used to store huge volume of data. The aim of this post is to describe a prototype of a proxy server which allows to use PostgreSQL via ElasticSearch protocol. The source code of the prototype is published on GitHub project page.

Introduction

Recently we were approached by one of our clients with an interesting task. The fact is that he has a large database, around 1.5 TB. He also has several standbys from this primary server. The main task is to get the opportunity to perform long analytical queries, and also to connect using ETL tools to the replica. But the main problem is that when performing such queries on the replica, the primary server will be affected. There may be replication conflicts and other unpleasant moments. Therefore, the customer asked to reduce the maximum impact on the primary server. In addition, these requests are of a periodic nature and a replica is not always necessary.

Possible solutions

Using Stream Replication

The first thing that comes to mind is to create another PostgreSQL standby server. Streaming replication is the fastest way to deliver changes from one server to other PostgreSQL server. But this can lead to replication conflicts and some other problems. Some words from documentation:

Conflict cases include:

  • Access Exclusive locks taken on the primary server, including both explicit LOCKcommands and various DDL actions, conflict with table accesses in standby queries.
  • Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.
  • Dropping a database on the primary conflicts with sessions connected to that database on the standby.
  • Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still “see” any of the rows to be removed.
  • Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.

Actions from WAL have already occurred on the primary server, so is required to apply them on the standby server. Moreover, allowing a WAL handler to wait indefinitely can be extremely undesirable, since the backlog of the standby server from the primary can increase. Thus, the mechanism provides for forced cancellation

[...]

You have probably noticed that everyone is talking about Artificial Intelligence and Machine Learning these days. Quite rightly because it is a very important topic, which is going to shape our future for sure. However, when looking at most of the code related to machine learning available on the net, it strikes me, how much “coding” people actually use to prepare the data. In many cases the input data is “just a matrix” and people spent a lot of time (and sometimes memory) to prepare it.

The question, which naturally arises in my case is: Why not prepare the data in SQL? It is easier, less work and a lot more flexible.

Creating some sample data

To demonstrate what we can do on the PostgreSQL side, I will create a simple table containing just 20 random values. Let us define a table first:

test=# CREATE TABLE inputdata
(
        id              int,
        data    numeric         DEFAULT random() * 1000
);
CREATE TABLE

The the table is populated:

test=# INSERT INTO inputdata
        SELECT * FROM generate_series(1, 20);
INSERT 0 20

What we got now is a table containing 20 random values. In reality you will already have some existing data, which you will use to work with:

test=# SELECT * FROM inputdata;
 id |       data       
----+------------------
  1 |  542.76927607134
  2 | 813.954454381019
  3 |  215.18046176061
  4 | 989.989245776087
  5 | 142.890753224492
  6 | 326.086463406682
  7 | 24.8975520953536
  8 | 266.512574627995
  9 | 86.0621216706932
 10 | 801.756543107331
 11 | 790.149183012545
 12 | 317.997705657035
 13 | 975.230060052127
 14 | 385.490739252418
 15 | 746.592517476529
 16 | 621.084009762853
 17 | 208.689162041992
 18 | 529.119417071342
 19 | 260.399237740785
 20 | 563.285110052675
(20 rows)

Thoughts on sampling, training, and verification

If you are training an AI model (maybe a Support Vector Machine / SVM, a neural network, or whatever) you will always start by splitting the data into various parts:

• Training data for your AI model
• Test data for your AI model

The training data is us

[...]
Posted by Michael Paquier on 2018-01-12 at 07:44

passwordcheck is a PostgreSQL contrib module able to check if raw password strings are able to respect some policies. For encrypted password, which is what should be used in most cases to avoid passing plain text passwords over the wire have limited checks, still it is possible to check for example for MD5-hashed entries if they match the user name. For plain text password, things get a bit more advanced, with the following characteristics:

  • Minimum length of 8 characters.
  • Check if password has the user name.
  • Check if password includes both letters and non-letters.
  • Optionally use cracklib for more checks.

Note that all those characteristics are decided at compilation time and that it is not possible to configure it, except by forking the code and creating your own module. passwordcheck_extra is a small module that I have written to make things more flexible with a set of configuration parameters aimed at simplifying administration:

  • Minimum length of password.
  • Maximum length of password.
  • Define a custom list of special characters.
  • Decide if password should include at least one special character, one lower-case character, one number or one upper-case character (any combination is possible as there is one switch per type).

In order to enable this module, you should update shared_preload_libraries and list it:

shared_preload_libraries = 'passwordcheck_extra'

And then this allows for more fancy checks than the native module, for example here to enforce only numbers to be present, with a length enforced between 4 and 6 (don’t do that at home):

=# LOAD 'passwordcheck_extra';
LOAD
=# SET passwordcheck_extra.restrict_lower = false;
SET
=# SET passwordcheck_extra.restrict_upper = false;
SET
=# SET passwordcheck_extra.restrict_special = false;
SET
=# SET passwordcheck_extra.minimum_length = 4;
SET
=# SET passwordcheck_extra.maximum_length = 6;
SET
=# CREATE ROLE hoge PASSWORD 'foobar';
ERROR:  22023: Incorrect password format: number missing
=# CREATE ROLE hoge PASSWORD 'fooba1';
CREATE ROLE

One property to note

[...]

After having been involved in many migration projects over the last 10 years, I decided to publish the following White Paper in order to share my learnings.

The paper is titled Migrating to PostgreSQL, Tools and Methodology and details the Continuous Migration approach. It describes how to migrate from another relational database server technology to PostgreSQL. The reasons to do so are many, and first among them is often the licensing model.

Posted by Victor Yegorov in Data Egret on 2018-01-10 at 21:56

Overview

On 30th of December 2017, Kernel page-table isolation (PTI) patch had been merged
with the mainline. Its main goal was to mitigate the Meltdown CPU bug.
This patch has also been backpatched into the stable releases by RedHat,
Debian, CentOS, Ubuntu and other Linux vendors. Therefore, the nearest security
upgrade of the servers running on such stable releases will get PTI feature.

Unfortunately, this bugfix comes with the performance penalty.
Andres Freund provided figures, showing that performance might drop by 7-23% on TPC-H alike workloads.

Meltdown vulnerability “allows a rogue process to read any kernel memory, even when it is not authorized to do so”. PTI is designed to close this flaw, i.e. restrict memory access to the unauthorized areas.
Of course, this comes at a price and it’d be good to find out what performance impact it will have after Linux kernel upgrade. As PTI affects memory-based operations, one way to achieve this is to use database that fits into the RAM of the server and to perform a series of pgbench-based RO tests.

Test setup

CPU: Intel(R) Xeon(R) CPU E5-2430 0 @ 2.20GHz
RAM: 32GB (swap 16GB)
Disks: Dell PERC H710, 120GB EDGE E3 SSD in RAID1 (110GB available)
Linux: CentOS 7.3.1611
PostgreSQL: 10.1

PostgreSQL configuration changes:
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 1000
bgwriter_lru_multiplier = 10.0
checkpoint_timeout = 1h
max_wal_size = 8GB
min_wal_size = 2GB
checkpoint_completion_target = 0.9
log_destination = stderr
logging_collector = on
log_directory = /var/log/postgresql
log_filename = postgresql-%a.log
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_checkpoints = on
log_line_prefix = '%m %p %u@%d from %h [vxid:%v txid:%x] [%i] '
log_lock_waits = on
log_temp_files = 0

Test details

Test database had been prepared with: pgbench -i -s 1700 pgbench
Database size: 26 035 MB

The following kernels and options had been tested
  1. Old kernel version: 3.10.0-514.el7.x86_64
  2. New kernel version, as-is: 3.10.0-693.11.6.el7.x86_64-pti-pc
[...]

I have connected our Postgres instance to Active Directory for authentication because I didn’t want to manage passwords for hundreds for users accessing mission critical databases. The authentication is being performed by Active Directory but authorization (managing groups, permissions at table/column/row level) is still be handled by Postgres. The biggest advantage of connecting Postgres to AD for real users, I don’t have to worry on password and password policies because the policies are handled by AD. On the other hand,  users database authentication will be dependent on AD infrastructure. Please make sure AD infrastructure is redundant enough with low latency connection  for your organization before making this decision.

I have come up with simple steps to connect Postgres to Active Directory. TBH, If you can get serviceAccount to be used to connect to organization’s active directory, the rest of the setup is fairly straight forward.

Step-1 :

Get and active directory server name and test connection from Postgres server :

postgres-dbserver$ nc -v ldap-server.example.com 389
Connection to ldap-server.example.com 389 port [tcp/ldap] succeeded!
Step-2 :

Install ldap-utils package so you can use ldap command line tools to test connections.

yum install ldap-utils
or
apt-get install ldap-utils
Step-3 :

Ask your AD/LDAP admins to get full path of OU and CN for your organization and construct ldapsearch query. I’m using sAMAccountName as username but you can use other attributes for username i.e uid

ldapsearch -v -x -H ldap://ldap-server.example.com -b "OU=Users,OU=local,DC=example,DC=com" -D "CN=service-account-username,OU=ServiceAccounts,OU=Users,OU=local,DC=example,DC=com" -W "(sAMAccountName=dpatel)"

If you are using uid for the validation,  your search query will look like this …

ldapsearch -v -x -H ldap://ldap-server.example.com -b "OU=Users,OU=local,DC=example,DC=com" -D "CN=service-account-username,OU=ServiceAccounts,OU=Users,OU=local,DC=example,DC=com" -W "(uid=dpatel)"

If the above query returns the

[...]
Recently we had a customer facing some memory issues with an Aurora PostgreSQL. What was happening is that while loading data or creating GiST indexes they got the following error :

2018-01-04 14:03:06 UTC:172.31.45.250(35508):root@test:[20457]:ERROR:  out of memory
2018-01-04 14:03:06 UTC:172.31.45.250(35508):root@test:[20457]:DETAIL:  Failed on request of size 23.
2018-01-04 14:03:06 UTC:172.31.45.250(35508):root@test:[20457]:STATEMENT:  INSERT INTO test_table1 VALUES
('domain1.com','Manual','[198.168.1.148,198.168.1.250]','t','f','2016-02-16 15:58:58.016626','',''),   ('domain2.com','Manual','[192.168.1.1,192.168.1.174]','t','f','2016-02-16 15:58:58.023136','',''),
...

At the same time they had a major drop in freeable memory :


logs showed that :

TopMemoryContext: 1184792 total in 13 blocks; 14024 free (5 chunks); 1170768 used
  CFuncHash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  Type information cache: 24472 total in 2 blocks; 2840 free (0 chunks); 21632 used
  TopTransactionContext: 8192 total in 1 blocks; 7384 free (1 chunks); 808 used
  MessageContext: 8511544 total in 11 blocks; 2077632 free (0 chunks); 6433912 used
  Operator class cache: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  smgr relation context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
    smgr relation table: 24576 total in 2 blocks; 13008 free (4 chunks); 11568 used
  TransactionAbortContext: 32768 total in 1 blocks; 32728 free (0 chunks); 40 used
  Portal hash: 8192 total in 1 blocks; 776 free (0 chunks); 7416 used
  PortalMemory: 8192 total in 1 blocks; 7880 free (0 chunks); 312 used
    PortalHeapMemory: 1024 total in 1 blocks; 840 free (0 chunks); 184 used
      ExecutorState: 24576 total in 2 blocks; 3328 free (13 chunks); 21248 used
        GiST scan context: 49245329112 total in 5881 blocks; 138720 free (3880 chunks); 49245190392 used
        ExprContext: 8192 total in 1 blocks; 8056 free (0 chunks); 136 used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
        ExprContext:
[...]

Sharding is one of those database topics that most developers have a distant understanding of, but the details aren’t always perfectly clear unless you’ve implemented sharding yourself. In building the Citus database (our extension to Postgres that shards the underlying database), we’ve followed a lot of the same principles you’d follow if you were manually sharding Postgres yourself. The main difference of course is that with Citus, we’ve done the heavy lifting to shard Postgres and make it easy to adopt, whereas if you were to shard at the application layer then there’s a good bit of of work needed to re-architect your application.

I’ve found myself explaining how sharding works to many people over the past year and realized it would be useful (and maybe even interesting) to break it down in plain English.

Defining your partition key (also called a ‘shard key’ or 'distribution key’)

Sharding at the core is splitting your data up to where it resides in smaller chunks, spread across distinct separate buckets. A bucket could be a table, a postgres schema, or a different physical database. Then as you need to continue scaling you’re able to move your shards to new physical nodes thus improving performance.

The one step that happens in every sharding implementation is deciding what you’re going to shard or partition your data on. There are a number of trade-offs to various keys, and which is right really depends on your application. Once you determine what your sharding key will be, you’re going to ensure the sharding key is in place throughout your application. You can do this a few ways, an easy one is materializing that sharding key on all your models. By materializing or de-normalizing it, you allow your application to make fewer queries to define how it needs to route the data.

When a request comes in then, if you’re doing your sharding at the application layer, then your application will need to determine how to route the request. The same principle applies with Citus, you just don’t have to do any extra wor

[...]
Posted by Bruce Momjian in EnterpriseDB on 2018-01-10 at 18:45

I just did a two-hour interview in English with the Moscow Postgres user group. A video recording of the interview is online and covers questions asked by the Russian attendees.

Posted by Simon Riggs in 2ndQuadrant on 2018-01-10 at 17:16

Spectre and Meltdown have caused severe alarm in recent days. You may have read about up to 30% impact on PostgreSQL databases, which I believe to be overstated because of misunderstandings in the media. Let’s dig into this in more detail.

TL;DR Summary: no PostgreSQL patch required, -7% performance hit

In response to these new security threats various OS patches have been released. Various authors have published benchmarks around these and they have, in some cases, stated worst-case measurements as impact measurements. For example: stating a 30% hit when, in fact, we are seeing a 7% hit on a busy server. Regrettably, it looks to me like some people have spread this news as a problem for PostgreSQL, without clearly stating the workload measured, or that it could affect databases about the same amount.
So let’s back up a little: these patches affect the OS – there is no specific vulnerability in PostgreSQL and crucially, there is no security patch planned. Yup, that’s right, no patch.

Why? The published exploits require access to the OS, which is not possible through PostgreSQL except where the user has SuperUser access through the use of untrusted PL languages, such as PL/PerlU. So in general, there is no attack vector through user access to PostgreSQL databases.
However, there is a noticeable CPU impact because of generalized patches being applied at OS level.

These will have a greater impact on CPU-bound database requests, so we expect the impact on I/O bound transactional workloads to be much smaller. The impact will be more noticeable on a server that is already heavily loaded than on a lightly loaded server, so the impact may vary as your workload increases/decreases. There is a small increase in latency.

Patches are now available for Red Hat and Ubuntu that allow you to enable the Page Table Isolation feature (pti:on).

We have performed a series of benchmarking tests to have a better idea on the real impact that can be caused by the patches. In general, a drop of around 7% on the number of transactions per

[...]

Continuing this short series of blog posts on some of my drivers for moving to Postgres from Oracle.
Please do read Part I of the series if you have not done so. It discussed the topics “History”, “More recently” and “The switch to Postgres”.

Realization

In the last months, discussing Postgres with my Oracle peers, playing with the software and the tooling, I actually quite quickly realized Postgres is a lot cooler, at least to me. Not so much of the overly complicated technology, but rather built to be super KISS. The elegance of simplicity and it still gets the job done.
Postgres handles a lot the more complex workloads than many (outsiders) might think. Some pretty serious mission-critical workloads, are handled by Postgres today. Well, basically, it has been doing this for many, many years. This obviously is very little known, because who would want to spend good money on marketing  for Open Source Software, right. You just spent your time building the stuff, let somebody else take care of that.
Well… we at EnterpriseDB do just those things, …too!

And, please, make no mistake, Postgres is everywhere, from your fridge and video camera, through TV set-top boxes up to major on-line banking software. Many other places you would not expect a database to (be able to) run. Postgres is installed in places that never get touched again. Because of the stability and the low to no-touch administrative character of Postgres, it is ideally suited for these specific implementations. Structured on some of the oldest design principles around Postgres, it doesn’t have to be easy to create the database engine, as long as it “just works” in the end.
Many years ago, an Oracle sales director also included such an overview in his pitch. All the places Oracle touches every bodies lives, every day. This is no different for Postgres, it is just not pitched anywhere, by anyone, as much.

I have the fortunate opportunity to work closely together with (for instance) Bruce Momjian (PostgreSQL core team founding member and EnterpriseDB c

[...]

The annual DBEngines ranking have just been updated, with PostgreSQL winning the “DBMS of the Year 2017” award.
https://db-engines.com/en/blog_post/76

PostgreSQL was a runner up last year.
https://db-engines.com/en/blog_post/67

So across the year, we have made more gains in popularity than all other databases. Looking at the detail we see that all of the other 4 databases that make up the Top 5 have reduced in popularity from last year. PostgreSQL has gone up, others have gone down.

What’s even more interesting, is that PostgreSQL is the only database in the top 5 systems that has increased steadily over the last 4 years.

Slow, steady progress. The word is out!

Why is that? PostgreSQL is multi-talented, offering relational features when needed, as well as JSON features for document-centric apps, full text search, GIS functionality, graph database and many other use cases. And it keeps getting better, year on year, with a steady rhythm of improvements. And a clear roadmap of new features in development.

Posted by pgCMH - Columbus, OH on 2018-01-10 at 05:00

The Jan meeting will be held at 18:00 EST on Tues, the 23rd. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.

Topic

For the first meeting of 2018, we’re going with a free-form, audience participation Q&A type event. You’ve got PostgreSQL-related questions, we’ve got PostgreSQL-related answers. So bring your questions (and thinking caps) and let’s see if we can’t get some answers for everyone.

Parking

Please park at a meter on the street or in the parking garage (see below). You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive. Park on the first level in any space that is not marked ‘24 hour reserved’. Once parked, take the elevator to the 3rd floor to reach the Miranova lobby.

Finding us

The elevator bank is in the back of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11th floor. Once you exit the elevator, look to your left and right. One side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space. The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079

When running an application in production it might happen that the data structure has to be changed once in a while. Adding columns, dropping columns, etc. might simply be necessary once in a while. However, changing data structures should not be done mindlessly – there are some things you have to take care of.

The main issue with DDLs is that in some cases locks are held for quite a long time, which can have serious consequences if you are running PostgreSQL on critical production systems. I hope that this blog can shed some light and help people to run DDLs more wisely.

Creating some demo data

As usual some demo data is needed to show, how things work. In case the following script simply creates 10 million simple integer values:


test=# CREATE TABLE data (id int);

CREATE TABLE

test=# INSERT INTO data

SELECT * FROM generate_series(1, 10000000);

INSERT 0 10000000

Be aware of locking: Indexing

One of the most common issues related to DDLs is the creation of indexes. If you create an index in PostgreSQL, “normally” writes will be blocked while the index is in creation. In case of very large tables this can be quite counterproductive. A large index simply takes time to build and concurrent writes might suffer as shown in the next example:

Connection 1 Connection 2
BEGIN;
CREATE UNIQUE INDEX idx_id ON data (id); BEGIN;
— running INSERT INTO data VALUES (0);
— running — waiting
COMMIT; — will proceed
COMMIT;

If data is really large, the INSERT might have to wait for an unacceptable amount of time. This is especially critical if the number of concurrent requests is high. Connection pools might get into trouble or your webserver might run out of connections. In short: A solution is needed.

Note that reading is still possible – PostgreSQL will only block writes while the index is built. This is extremely important to point out.

If you cannot afford to lock a table while building an index, CREATE INDEX CONCURRENTLY is the tool of choice. CREATE INDEX CONCURRENTLY takes longer than a “normal”

[...]

check_pgactivity (https://github.com/OPMDG/check_pgactivity) is designed to monitor PostgreSQL clusters from Nagios. It offers many options to measure and monitor useful performance metrics.

Imagine you have a very large database and pg_dump produce abnormal query time alerts. The upcoming release of check_pgactivity offers a way to filter out those alerts !


In the 2.4 version, it will be possible to filter out pg_dump from the oldest_idlexact service. Indeed, above PostgreSQL 9.2, the service supports --exclude to filter out connections.

The longest_query service will also, above PostgreSQL 9.0, support --exclude to filter out application names.

Let’s see an example.

Launch a pg_dump of a “not-empty” database :

$ pgbench -i -s 300 blog
$ pg_dump -d blog -f tmp.dump --exclude-table=pgbench_tellers

We can see the pg_dump progress with pg_stat_activity :

postgres=# SELECT application_name, query, (current_timestamp-state_change) AS elapsed, state, wait_event from pg_stat_activity where application_name = 'pg_dump';
-[ RECORD 1 ]----+---------------------------------------------------------------------
application_name | pg_dump
query            | COPY public.pgbench_accounts (aid, bid, abalance, filler) TO stdout;
elapsed          | 00:01:26.435591
state            | active
wait_event       | ClientWrite

The longest_query service will raise an alert :

$ check_pgactivity --service longest_query -w 1m -c 5m
POSTGRES_LONGEST_QUERY WARNING: blog: 1m31s | 
'blog max'=91s;60;300 'blog avg'=91s;60;300 'blog #queries'=1 
'postgres max'=0s;60;300 'postgres avg'=0s;60;300 'postgres #queries'=1 
'template1 max'=NaNs;60;300 'template1 avg'=NaNs;60;300 'template1 #queries'=0

We can remove that alert by adding the filter :

$ check_pgactivity --service longest_query -w 1m -c 5m --exclude ^pg_dump
POSTGRES_LONGEST_QUERY OK: 1 running querie(s) | 
'postgres max'=0s;60;300 'postgres avg'=0s;60;300 'postgres #queries'=1 
'template1 max'=NaNs;60;300 'template1 avg'=NaNs;60;300 'template1 #queries'=0 
'blog max'=NaNs;60;
[...]
Support for both BRIN and SP-GiST access methods was recently added to pg_hexedit, the experimental hex editor framework for PostgreSQL relation files. These were the final access methods among the standard Postgres index access methods that required support.

SP-GiST (Space-Partitioned GiST)



Beginning of an SP-GiST leaf page

SP-GiST is unique among index access methods whose index structure is tree-like, in that it supports tree structures that are unbalanced. SP-GiST operator classes exist that support k-d trees, quadtrees, and suffix trees. These structures are traditionally only suited to a fully in-memory representation, with dynamically-allocated nodes that contain a small number of simple pointers (byte addresses) pointing to other nodes.

SP-GiST presents a generalized interface through which all of these space-partitioned trees can be constructed for a given datatype, in a way that minimizes disk seeks (PDF) and works well with block-orientated storage. Essentially, SP-GiST maps tree nodes onto disk blocks in an adaptive fashion, rather than simply having a block directly correspond to a tree node, as happens with other access methods.

There are particularly intricate data structures needed to support all of this. Space utilization can be an issue with SP-GiST indexes, though that's probably very workload dependent. This is something that pg_hexedit can be effective at representing visually.

SP-GiST is a good example of the PostgreSQL community implementing a concept that comes directly from state of the art database research (PDF). I suspect that we have yet to fully realize the benefit of SP-GiST for specific application domains, due to a simple lack of awareness among users and potential users that work in those domains. Perhaps this enhancement can contribute in some small way towards a better understanding of what is possible.

BRIN (Block Range Index)


BRIN "revmap" page
The structure of BRIN indexes is not at all tree-like. BRIN works by summarizing the locations of ranges[...]
Image result for postgresql
No year has been better for PostgreSQL or the Postgres Ecosystem than 2017. The continued adoption and growth of open source communities over the last 40 years shows a mature and strong ecosystem. It is true what they say, "Middle age is the best time of your life." Here are just a few of the great results of 2017:
  • Amazing work from PostgreSQL.Org with the release of v10 which brought much sought after technologies such as native table partitioning, integrated logical replication, and mature support for federated tables.
  • Pivotal announced multi-cloud support for their Open Source, BigData, MPP Postgres variant Greenplum.
  • Increased support and features from Cloud Industry heavy weights AWSCompose.IO, and Microsoft. Microsoft released Azure Database for PostgreSQL, Compose increased their high availability options, and AWS announced the availability of Amazon Aurora with PostgreSQL compatibility.
  • Enterprise Consulting and Support continued to grow with support from PostgreSQL.Org Major Sponsors 2ndQuadrant and OpenSCG.
2017 was also the year we saw the launch of the International Postgres Conference, PostgresConf. The PostgresConf project is a globally aware, ecosystem centric conference focused on People, Postgres, Data. The project organized more events this year than any other Postgres Advocacy and Education project. In the United States there was PGConf US (now PostgresConf US), Austin Mini, Philadelphia, (2) NYC Minis, Seattle, and finally a full Austin event. The project also hosted PostgresConf South Africa and has several International events planned in 2018.

The PostgresConf International efforts wouldn't be possible without the fundamental support of the community and our ecosystem partners:



We have nothing but confidence in the continued growth of PostgreSQL and the Postgres related ecosystem through 2018. Thank you to the PostgreSQL.Org community, our ecosystem partners, and the global Postgres Ecosystem community; without you our efforts would not continue to succeed as a volunteer organ
[...]
It is possible. With new option --force-uniborder of pspg

See screenshots:



Posted by Thomas Munro in EnterpriseDB on 2018-01-07 at 22:32
PostgreSQL 9.6 and 10 can use all three join strategies in parallel query plans, but they can only use a partial plan on the outer side of the join.  As of commit 18042840, assuming nothing irreparably busted is discovered in the next few months, PostgreSQL 11 will ship with Parallel Hash.  Partial plans will be possible on both sides of a join for the first time.

There will certainly be some adjustments before it's released, but it seems like a good time to write a blog article to present Parallel Hash.  This is the biggest feature I've worked on in PostgreSQL so far, and I'm grateful to the reviewers, testers, committers and mentors of the PostgreSQL hacker community and EnterpriseDB for making this work possible.


So what does this feature really do?

A simple example


Using the "orders" and "lineitem" tables from TPC-H scale 30GB, here is a very simple join query answering the (somewhat contrived) question "how many lineitems have there ever been, considering only orders over $5.00?".


select count(*)
  from lineitem
  join orders on l_orderkey = o_orderkey
 where o_totalprice > 5.00;


PostgreSQL 9.6 or 10 can produce a query plan like this:


 Finalize Aggregate
   ->  Gather
         Workers Planned: 2
         ->  Partial Aggregate
               ->  Hash Join
                     Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                     ->  Parallel Seq Scan on lineitem
                     ->  Hash
                           ->  Seq Scan on orders

                                 Filter: (o_totalprice > 5.00)


Using the development master branch, it can now also produce a query plan like this:

 Finalize Aggregate
   ->  Gather
         Workers Planned: 2
         ->  Partial Aggregate
               ->  Parallel Hash Join
                     Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
                     ->  Parallel Seq Scan on lineitem
                     ->  Parallel Hash
                           ->  Parallel Seq Scan on orders
                                 Filter:
[...]
Posted by Andrew Dunstan in 2ndQuadrant on 2018-01-05 at 20:18

I have released version 6.1 of the PostgreSQL Buildfarm client. It is available at https://buildfarm.postgresql.org/downloads/releases/build-farm-6_1.tgz

This release fixes a couple of bugs that became apparent in yesterday’s release. The first was a relatively minor one where the verbosity was always set if using the run_branches.pl script. The second was a portability issue where some versions of Perl and its Time::HiRes module behaved unexpectedly and as a result log files were not sorted in correct order, leading to very strange timing results. The second one is sufficiently bad to warrant this point release.

Thanks to Tom Lane for identifying and helping to diagnose and patch these bugs.

Also, one other small bug is fixed in some utility scripts, and the BlackholeFDW module advertised for the version 6 release but not in fact present in the release file is now present.

Posted by Robert Haas in EnterpriseDB on 2018-01-05 at 18:34
In a recent blog post, I talked about how why every system that implements MVCC needs some scheme for removing old row versions, and how VACUUM meets that need for PostgreSQL. In this post, I’d like to examine the history of VACUUM improvements in recent years, the state of VACUUM as it exists in PostgreSQL today, and how it might be improved in the future.
Read more »
Posted by Bruce Momjian in EnterpriseDB on 2018-01-05 at 16:15

This email thread explores the idea of the community supporting web forums instead of or in addition to the email lists, where the majority of community development and discussion happen. Reason stated for not pursuing web forums included:

  • It has been tried before and failed
  • Larger web forum communities already exist, e.g. Stack Overflow
  • Established Postgres community members prefer email

Ultimately I think the Postgres community needs to do a better job of publicizing the existence of external communities that help Postgres users, e.g. Slack. As an example, the Postgres IRC channel is well publicized and currently has 11k connected users. Also, EnterpriseDB created the Postgres Rocks web forum six months ago.

This post is about a new feature of PostgreSQL 11 I have been working on for the last couple of months, which has finally been merged into the upstream repository. So if nothing goes wrong, we will have channel binding support for SCRAM authentication in the next release of Postgres. The feature set consists mainly of the following commits. Here is the first one, which has added tls-unique:

commit: 9288d62bb4b6f302bf13bb2fed3783b61385f315
author: Peter Eisentraut <peter_e@gmx.net>
date: Sat, 18 Nov 2017 10:15:54 -0500
Support channel binding 'tls-unique' in SCRAM

This is the basic feature set using OpenSSL to support the feature.  In
order to allow the frontend and the backend to fetch the sent and
expected TLS Finished messages, a PG-like API is added to be able to
make the interface pluggable for other SSL implementations.

This commit also adds a infrastructure to facilitate the addition of
future channel binding types as well as libpq parameters to control the
SASL mechanism names and channel binding names.  Those will be added by
upcoming commits.

Some tests are added to the SSL test suite to test SCRAM authentication
with channel binding.

Then there is a second one to control how to use channel binding from the client-side, for a libpq feature:

commit: 4bbf110d2fb4f74b9385bd5a521f824dfa5f15ec
author: Peter Eisentraut <peter_e@gmx.net>
date: Tue, 19 Dec 2017 10:12:36 -0500
Add libpq connection parameter "scram_channel_binding"

This parameter can be used to enforce the channel binding type used
during a SCRAM authentication.  This can be useful to check code paths
where an invalid channel binding type is used by a client and will be
even more useful to allow testing other channel binding types when they
are added.

The default value is tls-unique, which is what RFC 5802 specifies.
Clients can optionally specify an empty value, which has as effect to
not use channel binding and use SCRAM-SHA-256 as chosen SASL mechanism.

More tests for SCRAM and channel binding are added to the SSL test
suite.

And

[...]