PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Lukas Fittl in CitusData on 2019-05-23 at 15:30

If you’ve worked with Ruby on Rails you likely have some understanding of how your database works with Rails, traditionally that has always meant specifying a single database per environment in your config/database.yml, possibly together with an environment setting like DATABASE_URL. Based on that configuration all reads and writes will access the database.

With Rails 6 this is about to change, thanks to the work of Eileen M. Uchitelle together with contributors from GitHub, Basecamp and Shopify. In the upcoming Rails 6 (currently in RC1), you will be able to easily change which database server you are connecting to, to support a variety of scenarios such as using read replicas and splitting your database into dedicated components.

The most interesting part, which we wanted to detail in this post, is related to configuring automatic queries against a read replicas, or follower database.

First, let’s see how we can configure an additional database in Rails 6. In your config/database.yml, you can now specify multiple connections like this:

production: 
  primary: 
    <<: *defaults 
    url: <%= ENV['DATABASE_URL'] %> 
  follower: 
    <<: *defaults 
    url: <%= ENV['FOLLOWER_DATABASE_URL'] %> 

Next, we configure our model to use this new database connection. There are two default connection types, called “reading” and “writing”, for which we’d want to specify the database it should connect to:

class ApplicationRecord < ActiveRecord::Base 
  connects_to database: { writing: :primary,  reading: :follower } 
end 

Now, this doesn’t actually change anything yet. By default all queries would still go to the writing database, even if they are read queries.

First, if we want to verify this is working manually, we can enforce the use of the read database like this:

ActiveRecord::Base.connected_to(role: :reading) do 
  puts MyObject.count 
end 

But that would be a lot of work to modify our application code. Instead, we can tell Rails to automatically utilize the reading role for GET and HEAD requests (which are not sup

[...]
Posted by Pavel Stehule on 2019-05-21 at 05:23
Last week I fixed some critical bugs on profiler integrated to plpgsql_check.

Now, integrated profiler is tested on real bigger project.
Posted by Dave Conlin on 2019-05-20 at 12:08

This post covers standard Postgres B-tree indexes. To make it easier to read, I’ve just referred to “indexes” throughout, but it’s worth bearing in mind that the other Postgres index types behave very differently.

At pgMustard, we’ve been looking into multi-column indexes recently: particularly what problems they can cause and what advice we can offer to fix those problems. So I thought that now would be a good time to talk about their disadvantages, how to tell if they’re letting you down, and the alternatives.

In my introduction to indexing, I talked about multi-column indexes and how useful they can be if you have an expensive sort that you want to prepare in advance. They can also be used to speed up retrieving rows that match a filter or constraint, but unfortunately it’s not as simple as “I need to query on columns a, b and c, so I’ll make a multi-column index on a, b and c”.

Photo by Erol Ahmed

When multi-column indexes fall flat

Suppose I have a table full of information on fiction books and I want to query it by author. Then you can imagine my index is a filing cabinet full of index cards, organised in alphabetical order by author, with dividers that make it easy to jump to any point in the catalogue. So if I want to find all the John le Carré books, I look him up under “C” and find a list of excellent spy novels. Great.

Now suppose I want to add another column to my index — publication date. If I add it as a second column after the author, then this is similar to sorting all the cards in my filing cabinet, first by author, and then by publication date within each author.

This works fine if I want to know, for example, all the Ursula K. le Guin books published after 1970. Turn to “Guin, Ursula K. le”, and then within that section, take all the cards that are after the first of January 1970. Sweet.

But if I want to know which books in my library were published before 1700? I have to go through every single author and look up which of their books are pre-1700. The index won’t save us much time, in fact

[...]
Posted by pgCMH - Columbus, OH on 2019-05-20 at 04:00

The May meeting will be held at 18:00 EST on Tues, the 28th. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.

What

Seal Software’s very own Arlette will be presenting this month. She’s going to tell us all about PostgreSQL users, role and groups. The talk will discuss best practices around their user and GRANTing permissions as well as some lessons learned around their use. Arlette will even show us a trick using roles to cut down on having to manage your pg_hba.conf file.

Where

CoverMyMeds has graciously agreed to validate your parking if you use their garage so please park there:

You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive.

Park in any space that is not marked ‘24 hour reserved’.

Once parked, take the elevator/stairs to the 3rd floor to reach the Miranova lobby. Once in the lobby, the elevator bank is in the back (West side) of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11th floor. (If the elevator won’t let you pick the 11th floor, contact Doug or CJ (info below)). Once you exit the elevator, look to your left and right; one side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space:

Community space as seen from the stage

The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079 or CJ at +1.740.407.7043

Posted by Stefan Fercot in Dalibo on 2019-05-20 at 00:00

For the first PGDay organized in Belgium by the PgBE PostgreSQL Users Group Belgium, this event regrouped around 40 PostgreSQL fans on 17 May 2019 in Leuven. It was for me a really nice day and I’d like to share it with you.


This day was all about PostgreSQL and a big opportunity to meet with other people interested in PostgreSQL in Belgium. The event was suitable for everybody, from first-time users, students, to experts and from clerks to decision-makers.

With not less than 10 talks, 2 parallel tracks in the afternoon, the list of speakers was impressive, with a lot of international speakers too.

After having learned some good advices with Hans-Jürgen Schönig during his Fixing common performance problems talk, Ilya Kosmodemiansky told us how to screw it with some PostgreSQL Worst practices.

In the afternoon, I choose to stay in the main room. So, I could watch Thijs Lemmens showing a demo on how to perform Downtimeless PG upgrades using logical replication. Based on docker and docker-compose, he created 2 PostgreSQL clusters and installed pgAdmin 4 and HAProxy. If you wish to try the demo by yourself, Thijs released everything you need on his GitHub account.

Tomas Vondra explained us next what Create statistics is and when to use it. He also gave us a quick overview of what will be new in PostgreSQL 12 on this topic. It was really interesting to see examples based on Belgian cities.

Then… the pressure was on my shoulders to talk about Streaming Replication, the basics. The idea was to summarize for beginners what WALs are, how does the Streaming Replication works, give some advices and best practices.

Boriss Mejías re-explained afterwards, with his own way, Replication, where things just work but you don’t know how. Even if we had some overlap, the two talks were in fact pretty much complementary.

Unfortunately, I missed:

  • Declarative Table Partitioning - What do I need it for? - Boriss Mejías
  • Dynamically switch between datasources in code - Marco Huygen
  • PostgreSQL Buffers - Vik Fearing
  • Data Vault 2.0 & Pivo
[...]
Postgres has not global variables for PLpgSQL. This week I worked for one customer who had emulation of global variables based on temporary tables.

Depends on usage, this implementation can be very slow - more due often using temporary tables, a pg_class, pg_attribute tables can bloat, and these functions are slower and slower. So don't use it. There are alternative, much better solution, based on custom configuration variables:

Original code:

CREATE OR REPLACE FUNCTION public.setvar(_varname text, _value text)
RETURNS void
LANGUAGE plpgsql
1 AS $function$
2 begin
3 create temporary table if not exists variables(varname text unique, value text);
4 insert into variables (varname, value) values(_varname, _value) on conflict (varname) do update set value = _value;
5 end;
6 $function$

CREATE OR REPLACE FUNCTION public.getvar(_varname text)
RETURNS text
LANGUAGE plpgsql
1 AS $function$
2 begin
3 return (select value from variables where varname = _varname);
4 exception when others then
5 return null;
6 end;
7 $function$

There are more slow things: a) creating of table (first time in session can be slow (or very slow if system catalog is bloated), b) handling a exception (inside a safe point is created and dropped every time).

Better code:

CREATE OR REPLACE FUNCTION public.setvar2(_varname text, _value text)
RETURNS void
LANGUAGE plpgsql
1 AS $function$
2 begin
3 perform set_config('variables.' || _varname, _value, false);
4 end
5 $function$

CREATE OR REPLACE FUNCTION public.getvar2(_varname text)
RETURNS text
LANGUAGE plpgsql
1 AS $function$
2 begin
3 return current_setting('variables.' || _varname, true);
4 end;
5 $function$

The differences can be measured by few synthetic benchmarks (attention - because it is tested on fresh postgresql instance, the result is best case for temporary tables solution, realit[...]
Posted by Paul Ramsey in PostGIS on 2019-05-17 at 16:00

Last month I was invited to give a keynote talk at FOSS4G North America in San Diego. I have been speaking about open source economics at FOSS4G conferences more-or-less every two years, since 2009, and I look forward to revisting the topic regularly: the topic is every-changing, just like the technology.

In 2009, the central pivot of thought about open source in the economy was professional open source firms in the Red Hat model. Since they we’ve taken a ride through a VC-backed “open core” bubble and are now grappling with an environment where the major cloud platforms are absorbing most of the value of open source while contributing back proportionally quite little.

What will the next two years hold? I dunno! But I have increasingly little faith that a good answer will emerge organically via market forces.

If you liked the video and want to use the materials, the slides are available here under CC BY.

PoWA 4 is available in beta.

New remote mode!

The new remote mode is the biggest feature introduced in PoWA 4, though there have been other improvements.

I’ll describe here what this new mode implies and what changed in the UI.

If you’re interested in more details about the rest of the changes in PoWA 4, I’ll soon publish other articles for that.

For the most hurried people, feel free to directly go on the v4 demo of PoWA, kindly hosted by Adrien Nayrat. No credential needed, just click on “Login”.

Why is a remote mode important

This feature has probably been the most frequently asked since PoWA was first released, back in 2014. And that was asked for good reasons, as a local mode have some drawbacks.

First, let’s see how was the architecture up to PoWA 3. Assuming an instance with 2 databases (db1 and db2), plus one database dedicated for PoWA. This dedicated database contains both the stat extension required to get the live performance data and to store them.

A custom background worker is started by PoWA, which is responsible for taking snapshots and storing them in the dediacted powa database regularly. Then, using powa-web, you can see the activity of any of the local databases querying the stored data on the dedicated database, and possibly connect to one of the other local database when complete data are needed, for instance when using the index suggestion tool.

With version 4, the architecture with a remote setup change quite a lot:

You can see the a dedicated powa database is still required, but only for the stat extensions. Data are now stored on a different instance. Then, the background worker is replaced by a new collector daemon, which reads the performance data from the remote servers, and store them on the dedicated repository server. Powa-web will then be able to display the activity connecting on the repository server, and also on the remote server when complete data are needed.

In short, with the new remote mode introduced in this version 4:

  • a PostgreSQL restart is not required anymore to
[...]
Posted by Laurenz Albe in Cybertec on 2019-05-17 at 08:00
security problems
© xkcd.com under the Creative Commons License 2.5

 

Functions defined as SECURITY DEFINER are a powerful, but dangerous tool in PostgreSQL.

The documentation warns of the dangers:

Because a SECURITY DEFINER function is executed with the privileges of the user that owns it, care is needed to ensure that the function cannot be misused. For security, search_path should be set to exclude any schemas writable by untrusted users. This prevents malicious users from creating objects (e.g., tables, functions, and operators) that mask objects intended to be used by the function.

This article describes such an attack, in the hope to alert people that this is no idle warning.

What is SECURITY DEFINER good for?

By default, PostgreSQL functions are defined as SECURITY INVOKER. That means that they are executed with the User ID and security context of the user that calls them. SQL statements executed by such a function run with the same permissions as if the user had executed them directly.

A SECURITY DEFINER function will run with the User ID and security context of the function owner.

This can be used to allow a low privileged user to execute an operation that requires high privileges in a controlled fashion: you define a SECURITY DEFINER function owned by a privileged user that executes the operation. The function restricts the operation in the desired way.

For example, you can allow a user to use COPY TO, but only to a certain directory. The function has to be owned by a superuser (or, from v11 on, by a user with the pg_write_server_files role).

What is the danger?

Of course such functions have to be written very carefully to avoid software errors that could be abused.

But even if the code is well-written, there is a danger: unqualified access to database objects from the function (that is, accessing objects without explicitly specifying the schema) can affect other objects than the author of the function intended. This is because the configuration parameter search_path can be modified in a database session. This para

[...]

This is the 4th and last part in the series Benchmarking Managed PostgreSQLCloud Solutions. At the time of this writing, Microsoft Azure PostgreSQL was at version 10.7, newer than the two contenders: Amazon Aurora PostgreSQL at version 10.6and Google Cloud SQL for PostgreSQL at version 9.6.

Microsoft decided to run Azure PostgreSQLon Windows:

postgres=> select version();
                        version
------------------------------------------------------------
PostgreSQL 10.7, compiled by Visual C++ build 1800, 64-bit
(1 row)

For this particular test that didn’t work out too well, and I will hazard to guess that Microsoft is well aware of the limitations, the reason why under the PostgreSQL umbrella they also offer a preview version of Citus Data version of PostgreSQL. The approach looks similar to AWS PostgreSQL flavors, RDS and respectively Aurora.

As a side note, while setting up my Azure account, I was taken aback by the lack of 2FA/MFA (Two-Factor/Multi-Factor) authentication that I took as granted with Amazon’s AWS Virtual MFA and Google’s 2-step Verification. Microsoft offers MFA only to enterprise customers subscribed to Active Directory or Office 365. Since Citus Cloud enforces 2FA for production database, perhaps Microsoft isn’t that far from implementing it in Azure.

TL;DR

There are no results for Azure. On the 8-core database instance, identical in the number of cores to those used on AWS and G Cloud, the tests failed to complete due to database errors. On a 16-core instance, pgbench did complete, and sysbench got as far as creating the first 3 tables at which point I interrupted the process. While I was willing to spend a reasonable amount of effort, time, and money on performing the tests, and documenting the errors and their causes, the goal of this exercise was running the benchmark, therefore I never considered pursuing any advanced troubleshooting, or contacting Azure support, nor did I finish the sysbench test on the 16-core database.

Cloud Instances

Client

The Azure client instance the clo

[...]
Posted by Mark Wong on 2019-05-14 at 22:13

When: 6-8pm Thursday May 16, 2019
Where: PSU Business Accelerator (Parking is open after 5pm.)
Who: Mark Wong
What: Disaster Recovery and High Availability Planning

Learn about the considerations in planning for disaster recovery in order to maintain business continuity. There are solutions available that can help you achieve your backup and recovery objectives by taking advantage of PostgreSQL features, such as point-in-time recovery, and help implement retention policies.

Also learn how to take advantage of PostgreSQL’s replication features to keep your database highly-available, and the basic cluster architectures to consider to keep a PostgreSQL cluster up and running.

Mark works at 2ndQuadrant as a consultant for English Speaking Territories, based out of Oregon. He is a Contributor to PostgreSQL, co-organizer of the Portland PostgreSQL User Group, and serves as a Director and Treasurer for the United States PostgreSQL Association.

Lately reader of my blog asked about some detail, and then in next comment said that he doesn't have PostgreSQL 12 (which is currently developed version of Pg) available. Well, I answered the original question, but I figured that if someone is reading my Waiting for … series, then it would make sense that such … Continue reading "How to play with upcoming, unreleased, PostgreSQL?"

Most businesses have databases of previous customers, and data analysts will frequently be asked to join arbitrary data to the customer tables in order to provide analysis.

You buy a cool new technology for your organization in order to cut operational costs. It works really well for you, and incrementally, your entire business starts to rely on this tech for its day to day operations. You have successfully made it an essential component of your business. There are some issues now and […]
PostgreSQL Logo

Usually, the PostgreSQL Community releases minor patches on the Thursday of the second week of the second month of each quarter. This can vary depending on the nature of the fixes. For example, if the fixes include critical fixes as a postgres community response to security vulnerabilities or bugs that might affect data integrity. The previous minor release happened on February 14, 2019. In this case, the fysnc failure issue was corrected, along with some other enhancements and fixes.

The latest minor release, published on May 9, 2019, has some security fixes and bug fixes for these PostgreSQL Major versions.

PostgreSQL 11 (11.3)
PostgreSQL 10 (10.8)
PostgreSQL 9.6 (9.6.13)
PostgreSQL 9.5 (9.5.17)
PostgreSQL 9.4 (9.4.22)

Let’s take a look at some of the security fixes in this release.

Security Fixes

CVE-2019-10130 (Applicable to PostgreSQL 9.5, 9.6, 10 and 11 versions)

When you

ANALYZE
a table in PostgreSQL, statistics of all the database objects are stored in
pg_statistic
. The query planner uses this statistical data is and it may contain some sensitive data, for example, min and max values of a column. Some of the planner’s selectivity estimators apply user-defined operators to values found in
pg_statistic
. There was a security fix :
CVE-2017-7484
in the past, that restricted a leaky user-defined operator from disclosing some of the entries of a data column.

Starting from PostgreSQL 9.5, tables in PostgreSQL not only have SQL-standard privilege system but also row security policies. To keep it short and simple, you can restrict a user so that they can only access specific rows of a table. We call this RLS (Row-Level Security).

CVE-2019-10130
is about restricting a user who has SQL permissions to read a column but is forbidden to read some rows due to RLS policy from discovering the restricted rows through a leaky operator. Through this patch, leaky operators to statistical data are only allowed when there is no relevant
RLS
policy. We shall see more about
RLS
in our future blog posts.

CVE-2019-10129 (Applic

[...]
Posted by Bruce Momjian in EnterpriseDB on 2019-05-12 at 17:45

I have completed the draft version of the Postgres 12 release notes. Consisting of 186 items, this release makes big advances in partitioning, query optimization, and index performance. Many long-awaited features, like REINDEX CONCURRENTLY, multi-variate most-frequent-value statistics, and common table expression inlining, are included in this release.

The release notes will be continually updated until the final release, which is expected in September or October of this year.

pgBackRest postgresql backup solution

pgBackRest postgresql backup solutionpgBackRest addresses many of the must-have features that you’ll want to look for in a PostgreSQL backup solution. I have been a great fan of the pgBackRest project for quite some time, and it gets better all the time. Historically, it was written in perl and now over the last year, the project is making steady progress converting into native C code. At the time of writing, the latest version is 2.13 and there remains dependency on a long list of perl libraries. In case you’ve never tried pgBackRest, now it is a great time to do it. This post should help you to set up a simple backup with a local backup repository.

Installation

The pgBackRest project packages are maintained in the PGDG repository. If you have already added the PGDG repository to package manager,  installation is a breeze.

On RHEL/CentOS/OEL:

$ sudo yum install pgbackrest

On Ubuntu/Debian

$ sudo apt install pgbackrest

This will fetch all the required perl libraries too:

The backrest is a native executable now (version 2):

$ file /usr/bin/pgbackrest
/usr/bin/pgbackrest: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=5e3f6123d02e0013b53f6568f99409378d43ad89, not stripped

Some of the other changes DBAs should keep in mind are:

  1. thread-max option is no longer valid – use process-max instead.
  2. archive-max-mb option is no longer valid and has been replaced with the archive-push-queue-max option which has different semantics
  3. The default for the backup-user (deprecated), which is a new repo-host-user, has changed from backrest to pgbackrest.
  4. The configuration file has changed from /etc/pgbackrest.conf to /etc/pgbackrest/pgbackrest.conf

Building from source

We may want to build pgBackRest depending on our environment and version. Building pgBackrest from source on Debian / Ubuntu is already covered in the official documentation. Below I’ve provided the steps to follow for the Red Hat family.

Get the tarball of the latest release:

curl -LO https://github.com/pgbackrest/pgback
[...]
Propagating support functions
© Laurenz Albe 2019

 

PostgreSQL commit 74dfe58a5927b22c744b29534e67bfdd203ac028 has added “support functions”. This exciting new functionality that allows the optimizer some insight into functions. This article will discuss how this will improve query planning for PostgreSQL v12. If you are willing to write C code, you can also use this functionality for your own functions.

Functions as “black boxes”

Up to now, the PostgreSQL optimizer couldn’t really do a lot about functions. No matter how much it knew about the arguments of a function, it didn’t have the faintest clue about the function result. This also applied to built-in functions: no information about them was “wired into” the optimizer.

Let’s look at a simple example: language=”sql”

EXPLAIN SELECT * FROM unnest(ARRAY[1,2,3]);

                         QUERY PLAN
-------------------------------------------------------------
 Function Scan on unnest  (cost=0.00..1.00 rows=100 width=4)
(1 row)

PostgreSQL knows exactly that the array contains three elements. Still, it has no clue how many rows unnest will return, so it estimates an arbitrary 100 result rows. If this function invocation is part of a bigger SQL statement, the wrong result count can lead to a bad plan. The most common problem is that PostgreSQL will select bad join strategies based on wrong cardinality estimates. If you have ever waited for a nested loop join to finish that got 10000 instead of 10 rows in the outer relation, you know what I’m talking about.

There is the option to specify COST and ROWS on a function to improve the estimates. But you can only specify a constant there, which often is not good enough.

There were many other ways in which optimizer support for functions was lacking. This situation has been improved with support functions.

Support function syntax

The CREATE FUNCTION statement has been extended like this: like

CREATE FUNCTION name (...) RETURNS ...
SUPPORT supportfunction
AS ...

This way a function gets a “support function” that knows about the function and can

[...]
Improving OLAP Workload Performance for PostgreSQL

Every database management system is not optimized for every workload. Database systems are designed for specific loads, and thereby give better performance for that workload. Similarly, some kinds of queries work better on some database systems and worse on others. This is the era of specialization, where a product is designed for a specific requirement or a specific set of requirements. We cannot achieve everything performance-wise from a single database system. PostgreSQL is one of the finest object databases and performs really well in OLTP types of workloads, but I have observed that its performance is not as good as some other database systems for OLAP workloads. ClickHouse is one of the examples that outperforms PostgreSQL for OLAP.

ClickHouse is an open source, column-based database management system which claims to be 100–1,000x faster than traditional approaches, capable of processing of more than a billion rows in less than a second.

Foreign Data Wrapper (Clickhousedb_fdw)

If we have a different kind of workload that PostgreSQL is not optimized for, what is the solution? Fortunately, PostgreSQL provides a mechanism where you can handle a different kind of workload while using it, by creating a data proxy within PostgreSQL that internally calls and fetches data from a different database system. This is called Foreign Data Wrapper, which is based on SQL-MED. Percona provides a foreign data wrapper for the Clickhousedb database system, which is available at Percona’s GitHub project repository.

Benchmark Machine

  • Supermicro server:
    • Intel(R) Xeon(R) CPU E5-2683 v3 @ 2.00GHz
    • 2 sockets / 28 cores / 56 threads
    • Memory: 256GB of RAM
    • Storage: Samsung  SM863 1.9TB Enterprise SSD
    • Filesystem: ext4/xfs
  • OS: Linux smblade01 4.15.0-42-generic #45~16.04.1-Ubuntu
  • PostgreSQL: version 11

Benchmark Workload

Ontime (On Time Reporting Carrier On-Time Performance) is an openly-available dataset I have used to benchmark. It has a table size of 85GB with 109 number of different types of columns, and its designed queries more

[...]

Docker has become the most common tool to create, deploy, and run applications by using containers. It allows us to package up an application with all of the parts it needs, such as libraries and other dependencies, and ship it all out as one package. Docker could be considered as a virtual machine, but instead of creating a whole virtual operating system, Docker allows applications to use the same Linux kernel as the system that they're running on and only requires applications to be shipped with things not already running on the host computer. This gives a significant performance boost and reduces the size of the application.

In this blog, we’ll see how we can easily deploy a PostgreSQL setup via Docker, and how to turn our setup in a primary/standby replication setup by using ClusterControl.

How to Deploy PostgreSQL with Docker

First, let’s see how to deploy PostgreSQL with Docker manually by using a PostgreSQL Docker Image.

The image is available on Docker Hub and you can find it from the command line:

$ docker search postgres
NAME                                         DESCRIPTION                                     STARS               OFFICIAL            AUTOMATED
postgres                                     The PostgreSQL object-relational database sy…   6519                [OK]

We’ll take the first result. The Official one. So, we need to pull the image:

$ docker pull postgres

And run the node containers mapping a local port to the database port into the container:

$ docker run -d --name node1 -p 6551:5432 postgres
$ docker run -d --name node2 -p 6552:5432 postgres
$ docker run -d --name node3 -p 6553:5432 postgres

After running these commands, you should have this Docker environment created:

$ docker ps
CONTAINER ID        IMAGE                         COMMAND                  CREATED             STATUS                 PORTS                                                                                     NAMES
51038dbe21f8        postgres                      "docker-entrypoint.s…"   About an hour
[...]
Posted by Luca Ferrari on 2019-05-09 at 00:00

A recursive title for a kind of recursive topic: what does really mean to have a role into another one? This article tries to figure out some basic knowledge about it.

The role of a role within another role

After reading the very excellent article by Hans-Jürgen Schönig about roles, I decided to provide my own vision about users, groups and the more abstract role concept.

The word role

First of all, the word role has little to do with PostgreSQL: it is a word used in the SQL standard, so don’t blame our favourite database for using the same word to express different concepts like user and group.

Roles: are they users or groups?

The wrong part of the question is or: roles are both users and groups. Period. A role is a stereotype, an abstraction for saying a collection of permissions to do some stuff. Now, often a collection of permission is granted to a user, and therefore a role smells like an user account, but in my opinion this is just a coincidence. And in fact, as in the best system administration tradition, when you have to assign a collection of permissions to more than one user you need a group; roles can therefore smell like a group.
Remember: roles are collection of permission, what makes they smell as a group or an user is just the way you use them. If you use a role for a single user, then it is fine to think the role as an user account. If you use the role for more than one user, then it is fine to think the role as a group.
Now, if you think this is trivial and simple, consider that a role can smell...

PostgreSQL offers powerful means to manage users / roles and enables administrators to implement everything from simple to really complex security concepts. However, if the PostgreSQL security machinery is not used wisely, things might become a bit rough.

This fairly short post will try to shed some light on to this topic.

The golden rule: Distinguish between users and roles

The most important thing you got to remember is the following: You cannot drop a user unless there are no more permissions, objects, policies, tablespaces, and so on are assigned to it. Here is an example:

test=# CREATE TABLE a (aid int);
CREATE TABLE
test=# CREATE USER joe;
CREATE ROLE
test=# GRANT SELECT ON a TO joe;
GRANT

As you can see “joe” has a single permission and there is already no way to kill the user without revoking the permission first:

test=# DROP USER joe;
ERROR: role "joe" cannot be dropped because some objects depend on it
DETAIL: privileges for table a

Note that there is not such thing as “DROP USER … CASCADE” – it does not exist. The reason for that is that users are created at the instance level. A user can therefore have rights in potentially dozens of PostgreSQL databases. If you drop a user you cannot just blindly remove objects from other databases. It is therefore necessary to revoke all permissions first before a user can be removed. That can be a real issue if your deployments grow in size.

Using roles to abstract tasks

One thing we have seen over the years is: Tasks tend to exist longer than staff. Even after hiring and firing cleaning staff for your office 5 times the task is still the same: Somebody is going to clean your office twice a week. It can therefore make sense to abstract the tasks performed by “cleaning_staff” in a role, which is then assigned to individual people.

How can one implement this kind of abstraction?

test=# CREATE ROLE cleaning_staff NOLOGIN;
CREATE ROLE
test=# GRANT SELECT ON a TO cleaning_staff;
GRANT
test=# GRANT cleaning_staff TO joe;
GRANT ROLE

First we create a role called “cl

[...]
Author
Andreas 'ads' Scherbaum

For the 13th year, the PostgreSQL Project is participating in Google Summer of Code (GSoC). This project is a great opportunity to let students learn about Open Source projects, and help them deliver new features. It is also a chance to engage the students beyond just one summer, and grow them into active contributors.

In GSoC, students first learn about the Open Source organization, and either pick a summer project from the list provided by the org, or submit their own idea for review. After a “community bonding” period, the students have time to implement their idea, under supervision of mentors from the Open Source organization. There is also an incentive: first, Google pays the students for their work on improving Open Source projects. And second, having a completed GSoC project in a CV is well recognized.

Continue reading "Google Summer of Code 2019 - PostgreSQL participates with 5 projects"

For roughly ten years now, I’ve had the pleasure of running and managing databases for people. In the early stages of building an application you move quickly, adding new tables and columns to your Postgres database to support new functionality. You move quickly, but you don’t worry too much because things are fast and responsive–largely because your data is small. Over time your application grows and matures. Your data model stabilizes, and you start to spend more time tuning and tweaking to ensure performance and stability stay where they need to. Eventually you get to the point where you miss the days of maintaining a small database, because life was easier then. Indexes were created quickly, joins were fast, count(*) didn’t bring your database to a screeching halt, and vacuum was not a regular part of your lunchtime conversation. As you continue to tweak and optimize the system, you know you need a plan for the future and know how you’re going to continue to scale.

Now in Preview: Introducing Hyperscale (Citus) on Azure Database for PostgreSQL

With Hyperscale (Citus) on Azure Database for PostgreSQL, we help many of those worries fade away. I am super excited to announce that Citus is now available on Microsoft Azure, as a new deployment option on the Azure Database for PostgreSQL called Hyperscale (Citus).

Hyperscale (Citus) scales out your data across multiple physical nodes, with the underlying data being sharded into much smaller bits. The same database sharding principles that work for Facebook and Google are baked right into the database. But, unlike traditional sharded systems, your application doesn’t have to learn how to shard the data. With Azure Database on PostgreSQL, Hyperscale (Citus) takes Postgres, the open source relational database, and extends it with low level internal hooks.

This means you can go back to building new features and functionality, without having to deal with a massive database that is a pain to maintain. When you provision a Hyperscale (Citus) server group, you’ll have a c

[...]
Posted by Josh Williams in End Point on 2019-05-03 at 00:00

LinuxFest Northwest Logo Creative Commons Attribution-ShareAlike 4.0 International License

I’m sitting in an airport, writing this in an attempt to stay awake. My flight is scheduled to depart at 11:59 PM, or 2:59 AM in the destination time zone which I’m still used to. This is the first red eye flight I’ve attempted, and I’m wondering why I’ve done this to myself.

I have dedicated a good portion of my life to free, open source software. I’ll occasionally travel to conferences, sitting on long flights and spending those valuable weekends in talks about email encryption and chat bots. I’ve also done this to myself. But even with all this I have zero regrets.

This little retrospective comes courtesy of my experience at LinuxFest Northwest this last weekend in Bellingham, Washington.

Specifically I think it was some of the talks, painting things in broad strokes, that did it. I attended Jon “maddog” Hall’s beard-growing Fifty Years of Unix, and later sat in on the Q&A, which was a bit less technical than expected. So I didn’t ask about the “2038 problem.” But that’s okay.

I felt a little guilty, on one hand, doing these general interest sessions instead of something on a much more specific topic, like ZFS, which would have arguably had a more direct benefit. On the other hand, doing those general interest talks helps me stay grounded, I suppose, helps me keep perspective.

I did attend some more specialized talks, naturally. LFNW was a packed conference, often times there were a number of discussions I would have liked to attend happening at the same time. I’m hoping recordings will become available, or at least slides or other notes will appear. Some of the other talks I attended included, in no particular order:

  • Audio Production on Linux
    Like many other End Pointers, I dabble in a little bit of music. Unlike those other End Pointers, I’ve got no talent for it. Still, I try, and so I listened in on this one to find out a little more about how Jack works. I also caught wind of PipeWire, a project that’s aiming to supplant both PulseAudio and Jack. Neat!

  • Using GIS in Postgres
    I’ve got a couple PostGIS-

[...]

Your data is probably the most valuable assets in the company, so you should have a Disaster Recovery Plan (DRP) to prevent data loss in the event of an accident or hardware failure. A backup is the simplest form of DR. It might not always be enough to guarantee an acceptable Recovery Point Objective (RPO) but is a good first approach. Also, you should define a Recovery Time Objective (RTO) according to your company requirements. There are many ways to reach the RTO value, it depends on the company goals.

In this blog, we’ll see how to use pgBackRest for backing up PostgreSQL and TimescaleDB and how to use one of the most important features of this backup tool, the combination of Full, Incremental and Differential backups, to minimize downtime.

What is pgBackRest?

There are different types of backups for databases:

  • Logical: The backup is stored in a human-readable format like SQL.
  • Physical: The backup contains binary data.
  • Full/Incremental/Differential: The definition of these three types of backups is implicit in the name. The full backup is a full copy of all your data. Incremental backup only backs up the data that has changed since the previous backup and the differential backup only contains the data that has changed since the last full backup executed. The incremental and differential backups were introduced as a way to decrease the amount of time and disk space usage that it takes to perform a full backup.

pgBackRest is an open source backup tool that creates physical backups with some improvements compared to the classic pg_basebackup tool. We can use pgBackRest to perform an initial database copy for Streaming Replication by using an existing backup, or we can use the delta option to rebuild an old standby server.

Some of the most important pgBackRest features are:

  • Parallel Backup & Restore
  • Local or Remote Operation
  • Full, Incremental and Differential Backups
  • Backup Rotation and Archive Expiration
  • Backup Integrity check
  • Backup Resume
  • Delta Restore
  • Encryption

Now, let’s see how we can use pgBackRe

[...]
Posted by Ernst-Georg Schmid on 2019-05-02 at 16:32
Can somebody explain this?

PostgreSQL 11.2.

The documentation says:

"A type cast specifies a conversion from one data type to another. PostgreSQL accepts two equivalent syntaxes for type casts:

CAST ( expression AS type )

expression::type

The CAST syntax conforms to SQL; the syntax with :: is historical PostgreSQL usage."

But when I test the lower limits of PostgreSQL's integer types, strange things happen.

select cast(-9223372036854775808 as bigint);
select cast(-2147483648 as integer);
select cast(-32768 as smallint);

All OK.

select -9223372036854775808::bigint;
select -2147483648::integer;
select -32768::smallint;

All fail with SQL Error [22003]: ERROR: out of range

But:

select -9223372036854775807::bigint;
select -2147483647::integer;
select -32767::smallint;

All OK.

???

Years ago

Years ago I wrote the post describing how to implement 1-to-1 relationship in PostgreSQL. The trick was simple and obvious:

CREATE TABLE UserProfiles (
        UProfileID BIGSERIAL PRIMARY KEY,
...
);

CREATE TABLE Users (
        UID BIGSERIAL PRIMARY KEY,
        UProfileID int8 NOT NULL,
...
        UNIQUE(UProfileID),
        FOREIGN KEY(UProfileID) REFERENCES Users(UProfileID)
);

You put a unique constraint on a referenced column and you’re fine. But then one of the readers noticed, that this is the 1-to-(0..1) relationship, not a true 1-to-1. And he was absolutely correct.

Keep it simple stupid!

A lot of time is gone and now we can do this trick much simpler using modern features or PostgreSQL. Let’s check

BEGIN;

CREATE TABLE uProfiles (
        uid int8 PRIMARY KEY,
        payload jsonb NOT NULL
);

CREATE TABLE Users (
        uid int8 PRIMARY KEY,
        uname text NOT NULL,
        FOREIGN KEY (uid) REFERENCES uProfiles (uid)
);

ALTER TABLE uProfiles 
        ADD FOREIGN KEY (uid) REFERENCES Users (uid);

INSERT INTO Users VALUES (1, 'Pavlo Golub');

INSERT INTO uProfiles VALUES (1, '{}');

COMMIT;

Things are obvious. We create two tables and reference each other using the same columns in both ways.
Moreover, in such model both our foreign keys are automatically indexed!
Seems legit, but executing this script will produce the error:

SQL Error [23503]: ERROR: insert or update on table "users" 
   violates foreign key constraint "users_uid_fkey"
Detail: Key (uid)=(1) is not present in table "uprofiles".

Oops. And that was the pitfall preventing the easy solutions years ago during my first post.

What about now?

But now we have DEFERRABLE constraints:

This controls whether the constraint can be deferred. A constraint that is not deferrable will be checked immediately after every command. Checking of constraints that are deferrable can be postponed until the end of the transaction (using the SET CONSTRAINTS command). NOT DEFERRABLE is the default. Currently, only UNIQUE, PRIMARY KEY, EXC

[...]
Posted by Pavel Stehule on 2019-05-02 at 07:54
I fixed some issues and pspg can be used on Solaris too. I found some issues on Solaris side on utf8 support - but it is related just for subset of chars. Due this issues, don't use psql unicode borders.