PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
On 13rd of November 2019, Amit Kapila committed patch: Introduce the 'force' option for the Drop Database command.     This new option terminates the other sessions connected to the target database and then drop it. To terminate other sessions, the current user must have desired permissions (same as pg_terminate_backend()). We don't allow to terminate … Continue reading "Waiting for PostgreSQL 13 – Introduce the ‘force’ option for the Drop Database command."
On 6th of November 2019, Tomas Vondra committed patch: Allow sampling of statements depending on duration   This allows logging a sample of statements, without incurring excessive log traffic (which may impact performance). This can be useful when analyzing workloads with lots of short queries.   The sampling is configured using two new GUC parameters: … Continue reading "Waiting for PostgreSQL 13 – Allow sampling of statements depending on duration"
Posted by Christophe Pettus in pgExperts on 2019-11-15 at 17:00

If you have something interesting to day about PostgreSQL, we [would love to get a proposal from you]. Even if you have never spoken before, consider responding to the CfP! PgDay 2020 is particularly friendly to first-time and inexperienced speakers. You’re among friends! If you use PostgreSQL, you almost certainly have opinions and experiences that others would love to hear about… go for it!

Posted by Christophe Pettus in pgExperts on 2019-11-13 at 20:26

The very first PgDay San Francisco is coming to the Swedish-American Hall on January 21, 2020. It’s going to be an amazing event.

If you have something to say about PostgreSQL…

… the Call for Proposals is now open through November 22, 2019. We are looking for 40 minute talks about anything related to PostgreSQL. First-time speakers are particularly encouraged to send in proposals.

If you are interested in or use PostgreSQL…

Early-Bird Tickets are now available! Attendance is limited, so be sure to get your seat now.

If your company uses PostgreSQL…

consider sponsoring the event! We can’t do it without our sponsors, and it is a great way to recruit PostgreSQL people. Show off your company to the PostgreSQL community!

I’ve been at dinners before with developers that admitted developers, themselves included, can be a bit opinionated. In one case one said for example, “I love Postgres, but I have no idea why.” They were sitting at the wrong table to use Postgres as an example… But it is quite often that I am asked Why Postgres.

In fact a little over a year ago good friend Dimitri Fontaine asked if he could interview me for a book he’s working on for Postgres. I’ve long said their is a shortage of good books about Postgres and he’s done a great job with his in providing a guide targetted at developers, not just DBAs, that want to become better with their database. What follows is the excerpt of the interview from the book. And if you’re interested in picking up a copy he was friendly enough to share a discount code you can find below.

Intro

Craig heads up the Cloud team @citusdata now running product for Azure Postgres since being acquired by Microsoft. Citus extends Postgres to be a horizontally scalable distributed database. If you have a database, especially Postgres, that needs to scale beyond a single node (typically at 100GB and up) Craig is always happy to chat and see if Citus can help.

Previously Craig has spent a number of years @heroku, a platform-as-a-service, which takes much of the overhead out of IT and lets developers focus on building features and adding value. The bulk of Craig’s time at Heroku was spent running product and marketing for Heroku Data.

In your opinion, how important are extensions for the PostgreSQL open source project and ecosystem?

To me the extension APIs and growing ecosystem of extensions are the biggest advancement to Postgres in probably the last 10 years. Extensions have allowed Postgres to extend beyond a traditional relational database to much more of a data platform. Whether it’s the initial NoSQL datatypes (if we exclude XML that is) in hstore, to the rich feature set in geospatial with GIS, or approximation algorithms such as HyperLogLog or TopN you ha

[...]

The call for sponsors for pgDay Paris 2020 is now open. There are two levels available: Partner and Supporter. The Partner level, which gives you a booth at the event, is limited to just five spots so hurry up and get yours now before it's too late!

https://2020.pgday.paris/become-sponsor/

OSM to PostGIS – The Basics

Ever wondered how to import OSM (OpenStreetMap) data into PostGIS [1] for the purpose of visualization and further analytics? Here are the basic steps to do so.
There are a bunch of tools on the market— osm2pgsql; imposm; ogr2org; just to mention some of those. In this article I will focus on osm2pgsql [2] .

Let’s start with the software prerequisites. PostGIS comes as a PostgreSQL database extension, which must be installed in addition to the core database. Up till now, the latest PostGIS version is 3, which was released some days ago. For the current tasks I utilized PostGIS 2.5 on top of PostgreSQL 11.
This brings me to the basic requirements for the import – PostgreSQL >= 9.4 and PostGIS 2.2 are required, even though I recommend installing PostGIS >=2.5 on your database;  it’s supported from 9.4 upwards. Please consult PostGIS’ overall compatibility and support matrix [3] to find a matching pair of components.

Osm2pgsql Setup

Let’s start by setting up osm2pgsql on the OS of your choice – I stick to Ubuntu 18.04.04 Bionic Beaver and compiled osm2gsql from source to get the latest updates.

Install required libraries

 sudo apt-get install make cmake g++ libboost-dev libboost-system-dev \
libboost-filesystem-dev libexpat1-dev zlib1g-dev \
libbz2-dev libpq-dev libproj-dev lua5.2 liblua5.2-dev 

Grab the repo

 git clone https://github.com/openstreetmap/osm2pgsql.git 

Compile

 mkdir build && cd build
cmake ..
make
sudo make install 

If everything went fine, I suggest checking the resulting binary and its release by executing

 ./osm2pgsql-version. 
osm2pgsql version 1.0.0 (64 bit id space)

Data acquisition

In the world of OSM, data acquisition is a topic of its own, and worth writing a separate post discussing different acquisition strategies depending on business needs, spatial extent and update frequency. I won’t get into details here, instead, I’ll just grab my osm data for my preferred area directly from Geofabrik, a company offering data

[...]
Posted by Paul Ramsey in PostGIS on 2019-11-13 at 08:00

This post originally appeared in the CARTO blog.

One of the things that makes managing geospatial data challenging is the huge variety of scales that geospatial data covers: areas as large as a continent or as small as a man-hole cover.

The data in the database also covers a wide range, from single points, to polygons described with thousands of vertices. And size matters! A large object takes more time to retrieve from storage, and more time to run calculations on.

The Natural Earth countries file is a good example of that variation. Load the data into PostGIS and inspect the object sizes using SQL:

SELECT admin, ST_NPoints(the_geom), ST_MemSize(the_geom) 
FROM ne_10m_admin_0_countries 
ORDER BY ST_NPoints;
  • Coral Sea Islands are represented with a 4 point polygon, only 112 bytes.
  • Canada is represented with a 68159 point multi-polygon, 1 megabytes in size!

Countries by Size in KB

Over half (149) of the countries in the table are larger than the database page size (8Kb) which means they will take extra time to retrieve.

SELECT Count(*) 
FROM ne_10m_admin_0_countries 
WHERE ST_MemSize(the_geom) > 8192;

We can see the overhead involved in working with large data by forcing a large retrieval and computation.

Load the Natural Earth populated places into PostGIS as well, and then run a full spatial join between the two tables:

SELECT Count(*)
FROM ne_10m_admin_0_countries countries 
JOIN ne_10m_populated_places_simple places 
ON ST_Contains(countries.the_geom, places.the_geom)

Even though the places table (7322) and countries table (255) are quite small the computation still takes several seconds (about 30 seconds on my computer).

The large objects cause a number of inefficiencies:

  • Geographically large areas (like Canada or Russia) have large bounding boxes, so the indexes don’t work as efficiently in winnowing out points that don’t fall within the countries.
  • Physically large objects have large vertex lists, which take a long time to pass through the containment calcu
[...]
Added two new functions, which basically work only for JSON / YAML / XML plans. First – the view of source explain is now syntax highlighted. Just click source button on one of these to see: JSON YAML XML Next additon is text tab, which shows the same explain but converted to text-format. Hope you'll … Continue reading "More functions for json/yaml/xml plans on explain.depesz.com"
Just now pushed new version of Pg::Explain Perl library that is handling parsing for explain.depesz.com. There have been many changes, but the short summary is: Fix display of heap fetches from json (https://gitlab.com/depesz/explain.depesz.com/issues/15) Move global data (planning/execution/total time, trigger info) from top_node to explain object itself Add method to easily get real total, wall-clock, runtime … Continue reading "New Pg::Explain and explain.depesz.com"

Postgres has extensions, and that’s awesome! Of course as the author of CREATE EXTENSION I’m a little biased… just remember that the ability to extend Postgres is way more than just this command. The whole database system has been design from the ground up to allow for extensibility. Parts of the design is to be found in the way you can register new objects at runtime: functions of course, and also data types, operators, index support structures such as operator classes and families, even index access methods!

Today’s article shows a query that you can use to list those tables in your schemas that are using a data type which is provided by an extension.

I spent some time making an elephant logo to be used as FreeBSD boot loader logo.

PostgreSQL ascii logo for FreeBSD boot loader

I use FreeBSD as my main PostgreSQL server, and also as virtual machine for training courses. A long time ago, I changed the message of the day (/etc/motd) to reflect the elephant logo in ascii-art, but why not changing also the booloader logo?
FreeBSD by default shows what is called orb or the devil (named beastie), and the new Lua based bootloader use some simple string concatenation to generate a logo. However, it was not so simple to make a new logo, since I’ve no idea about how to debug it production, and that forced me to a very long and repetitive *try and reboot** process to identify all the problems with my logos.
Last, I made it!
Now there are two available logos for the bootloader that provide both the black-and-white and the coloured elephant. Below you can see a couple of screenshoots:






In order to use one of the logos, you have to:

  • download the Lua script from my Github repository, within the logos directory you can find the files
  • put the choosen file into the /boot/lua directory and provide read permissions;
  • edit your /boot/loader.conf and add the setting loader_logo depending on the chosen logo
    # for...
Posted by Colin Charles on 2019-11-10 at 14:22

I miss a proper database related newsletter for busy people. There’s so much happening in the space, from tech, to licensing, and even usage. Anyway, quick tab sweep.

Paul Vallée (of Pythian fame) has been working on Tehama for sometime, and now he gets to do it full time as a PE firm, bought control of Pythian’s services business. Pythian has more than 350 employees, and 250 customers, and raised capital before. More at Ottawa’s Pythian spins out software platform Tehama.

Database leaks data on most of Ecuador’s citizens, including 6.7 million children – ElasticSearch.

Percona has launched Percona Distribution for PostgreSQL 11. This means they have servers for MySQL, MongoDB, and now PostgreSQL. Looks very much like a packaged server with tools from 3rd parties (source).

Severalnines has launched Backup Ninja, an agent-based SaaS service to backup popular databases in the cloud. Backup.Ninja (cool URL) supports MySQL (and variants), MongoDB, PostgreSQL and TimeScale. No pricing available, but it is free for 30 days.

Comparing Database Types: How Database Types Evolved to Meet Different Needs

New In PostgreSQL 12: Generated Columns – anyone doing a comparison with MariaDB Server or MySQL?

Migration Complete – Amazon’s Consumer Business Just Turned off its Final Oracle Database – a huge deal as they migrated 75 petabytes of internal data to DynamoDB, Aurora, RDS and Redshift. Amazon, powered by AWS, and a big win for open source (a lot of these services are built-on open source).

MongoDB and Alibaba Cloud Launch New Partnership – I see this as a win for the SSPL relicense. It is far too costly to maintain a drop-in compatible fork, in a single company (Hi Amazon DocumentDB!). Maybe if the PostgreSQL layer gets open sourced, there is a chance, but otherwise, all good news for Alibaba and MongoDB.

MySQL 8.0.18 brings hash join, EXPLAIN ANALYZE, and more interestingly, HashiCorp Vault support for MySQL Keyring. (Percona has an open source variant).

HAProxy PostgreSQL pgsql-check

PostgreSQLWe discussed one of the traditional ways to configure HAProxy with PostgreSQL in our previous blog about HAProxy using Xinetd. There we briefly mentioned the limitation of the HAProxy’s built-in pgsql-check health check option. It lacks features to detect and differentiate the Primary and Hot-Standby. It tries to establish a connection to the database instance and if the connection request is progressing, it will be considered as a successful check and there is no provision to check the current role (Primary or Standby).

So the question remains:

  1. Is the HAProxy’s built-in pgsql-check completely useless as it cannot distinguish between a Primary and a hot-standby (standby that accepts reads) in an HA setup?
  2. Is there a way to tweak pgsql-check so that it can distinguish between a Primary and Hot-standby?

This blog post discusses what is possible using pgsql-check and how to achieve that.

Note: This blog demonstrates the concept.  Integration with specific HA framework/script is left to users because there are a large number of HA solutions for PostgreSQL and these concepts are equally applicable for them

Concept

When a client initiates the connection to PostgreSQL, the first stage of a check is whether it is acceptable as per rules specified in pg_hba.conf.  That stage needs to be completed before proceeding to the next stage of specific authentication mechanisms.

The pgsql-check is designed to check this first stage (pg_hba.conf) and return success if it is passed because a positive response from the server can be considered as a litmus test for whether the instance is up and capable of accepting connections. It doesn’t have to complete the authentication. pgsql-check abandons the connection after this check before completing the initial handshakes and PostgreSQL terminates it.

A connection request will be straight away rejected if pg_hba.conf rule says to “reject” it. For example, a pg_hba.conf entry like

host    postgres    pmm_user    192.168.80.20/32    reject

tel

[...]

etcd is one of several solutions to a problem that is faced by many programs that run in a distributed fashion on a set of hosts, each of which may fail or need rebooting at any moment.

One such program is Patroni; I’ve already written an introduction to it as well as a guide on how to set up a highly-available PostgreSQL cluster using Patroni.

In that guide, I briefly touched on the reason why Patroni needs a tool like etcd.
Here’s a quick recap:

  • Each Patroni instance monitors the health data of a PostgreSQL instance.
  • The health data needs to be stored somewhere where all other Patroni instances can access it.
  • Based on this data, each Patroni instance decides what actions have to be taken to keep the cluster as a whole healthy.
  • A Patroni instance may decide that it needs to promote its PostgreSQL instance to become a primary, because it registered that there is currently no primary.
  • That Patroni instance needs to be sure that while it attempts to promote the database, no other Patroni instances can try to do the same. This process is called the “leader-race”, where the proverbial finish line consists of acquiring the “leader” lock. Once a Patroni instance has acquired this lock, the others cannot acquire it unless the new leader gives it up, or fails to extend its time to live.

The challenge now lies in providing a mechanism that makes sure that only a single Patroni instance can be successful in acquiring said lock.

In conventional, not distributed, computing systems, this condition would be guarded by a device which enables mutual exclusion, aka. a mutex. A mutex is a software solution that helps make sure that a given variable can only be manipulated by a single program at any given time.

 

For distributed systems, implementing such a mutex is more challenging:

The programs that contend for the variable need to send their request for change and then a decision has to be inferred somewhere as to whether this request can be accept

[...]

1. Overview

In the previous blog, we have discussed how to correctly set up streaming replication clusters between one master and one slave in Postgres version 12. In this blog, we will simulate a failover scenario on the master database, which causes the replica (or slave) database cluster to be promoted as new master and continue the operation. We will also simulate a failback scenario to reuse the old master cluster after the failover scenario with the help of pg_rewind.

Normally it is quite easy to do a failback to the old master after slave gets promoted to master but if there is data written to the old master after slave promotion, we will have an out-of-sync case between them both and we will have to use the pg_rewind tool to synchronize the two data directories in order to bring the old master to match the state of the new master. Please note that the pg_rewind tool will remove transactions from the old master in order to match up with the new, so certain pre-caution is needed to use this tool.

Here’s a brief overview of list of actions we are going to perform:

  • simulate failover by promoting slave cluster, so it becomes a new master
  • simulate data insertion to master cluster, also referred as old master after promotion
  • shutdown the old master cluster and set it up as a standby server
  • run pg_rewind on old master to synchronize transaction states with new master
  • bring up old master as a standby server to synchronize with the new master

This blog assumes you already have streaming replication setup between one master and one slave from previous blog. If you have not checked out the previous blog titled “Streaming Replication Setup in PG12 – How to Do it Right”, it is recommended to give that a read first.

The procedures illustrated in this blog is based on Postgres version 12 built from source running on Ubuntu 18.04

2. Simulate a Failover Case

We will simply promote the slave database cluster to simulate a failover.

$ pg_ctl promote -D db-slave

2019
[...]

1. Overview

PostgreSQL 12 has been considered as a major update consisting of major performance boost with partitioning enhancements, indexing improvements, optimized planner logics and several others. One of the major changes is noticeably the removal of recovery.conf in a standby cluster. For this reason, the procedure to set up a streaming replication clusters has changed, and in this blog, I will demonstrate how to properly setup a streaming replication setup in PG12.

Streaming replication setup requires a master cluster and one or more slave clusters that will replicate the data inserted to the master by streaming the archived WAL files generated by master. The master and slaves can reside on different machines connected via network but in this blog, we will use one master and one slave setup and both will be run on the same machine with different port number.

The procedures illustrated in this blog is based on Postgres version 12 built from source running on Ubuntu 18.04

2. Master Database Cluster Setup

Create a master database cluster using initdb tool:

$ initdb /home/caryh/streaming-replication/db-master
$ cd /home/caryh/streaming-replication

/home/caryh/streaming-replication is the root folder to all the database clusters that we will be creating in this blog and db-master directory will be created here as a result of above commands. Let’s modify the default postgreql.conf and enable several important configuration options as shown below for streaming replication setup.

############# db-master/postgresql.conf #############
wal_level = replica
archive_mode = on
max_wal_senders = 10 
wal_keep_segments = 10
hot_standby = on
archive_command = 'test ! -f /home/caryh/streaming-replication/archivedir/%f && cp %p /home/caryh/streaming-replication/archivedir/%f'
port = 5432
wal_log_hints = on

The configuration above enables Postgres to archive the WAL files in the directory /home/caryh/streaming-replication/archivedir/ when it has completed writing to a full block of WAL file or whe

[...]
Posted by damien clochard in Dalibo on 2019-11-07 at 11:17

Dalibo Labs just released yesterday the version 0.5 of PostgreSQL Anonymizer, an extension that hides sensitive information inside a PostgreSQL database.

The extension already has a large panel of anonymization strategies: randomization, faking, partial destruction, shuffling, noise addition, etc.

For this new released, I worked on a different approach called Generalization and the k-anonymity property.

Generalization

The idea behing Generalization is quite simple: To avoid giving a sensitive information, we can replace the data with a broader, less accurate value.

For instance, instead of saying “Bob is 28 years old”, you can say “Bob is between 20 and 30 years old”. This is something we all do unconsciously when we don’t want to reveal a precise information. Typically when someone ask “Where do you live ?”, we often respond with a vague answer (“I live in Paris” or even “I live in France”) instead of giving our exact address.

What is interesting with this approach is that the data remains true while avoiding the risk of re-identification.

So Generalization is a instinctive and way to hide personnal information for human beings. But how would that apply to a database management system like PostgreSQL ? After all the primary purpose of a database is to store accurate information. If you have birthdate column in a SQL table, you can’t practically insert data saying “sometime in the nineties”, right ? 😀

Well… it turns out that PostgreSQL can handle generalization very easily with the RANGE data types, a very powefull way to store and manipulate a set of values contained between a lower and an upper bound.

Example: Blurring medical data

Here’s a basic table containing sensible health data:

# SELECT * FROM patient;
     ssn     | firstname | zipcode |   birth    |    disease    
-------------+-----------+---------+------------+---------------
 253-51-6170 | Alice     |   47012 | 1989-12-29 | Heart Disease
 091-20-0543 | Bob       |   42678 | 1979-03-22 | Allergy
 565-94-
[...]
Posted by Abdul Yadi on 2019-11-07 at 04:27

Why do I still need pgAdmin3? As of now, pgAdmin4 does not show nested partition table in its object tree view. Since pgAdmin3 LTS repository in bitbucket by BigSQL Development Team is no longer available, I republish it in github: https://github.com/AbdulYadi/pgadmin3 with code fixes for PostgreSQL 12 internal relation field changes:

  1. No more relhasoids in pg_class.
  2. No more cache_value, is_cycled, is_called in sequence object (since PostgreSQL 11).
  3. No more adsrc in pg_attrdef, it should be calculated as pg_catalog.pg_get_expr(adbin, adrelid) instead.

Introduction

I recently had to work on a case where a customer noticed some poor application performance after migrating from Oracle to PostgreSQL. They seemed to have tried everything in the playbook, but the problem simply wouldn’t get any better. They tried tuning autovacuum (but the tables weren’t bloated), tried tuning shared_buffers (but EXPLAIN ANALYZE wasn’t showing stuff getting pulled from heap), tried swapping JDBC drivers (but both EDB and Community PgJDBC drivers had the same performance) – in short, they poked around just about everywhere they could think of, but couldn’t make the queries run any faster. They were very convinced that the cause of the slowness was due to some waiting required after inserting/updating rows in the database; we removed the replica and had the application work with just one database, but the statistics didn’t change–it was still slower than expected.

Getting Past the Smoke

The first step we took in resolving this issue was to log all durations, just in case anything was missed. We set log_min_duration_statement = 0 and set off the test sequence. What came back was interesting (after some sed, grep, and sorting):

 Duration     Statement
------------+-------------------------------
 828.950 ms   execute : UPDATE ...
 829.322 ms   execute : UPDATE ...
 830.615 ms   execute : UPDATE ...
 831.923 ms   execute : UPDATE ...
 832.499 ms   execute : UPDATE ...
 833.595 ms   execute : UPDATE ...
 836.353 ms   execute : UPDATE ...
 863.769 ms   execute : UPDATE ...

That proved to me the bottleneck wasn’t some INSERT-delay, but that a particular UPDATE on one table was consuming about 4.5 minutes of testing time (there were ~270 instances of these UPDATEs). Easy enough–just figure out why the UPDATE was taking so long, right? Not so fast. On a table with ~3M rows, only a sequential scan would make it take ~800ms to do an UPDATE on a WHERE clause based on two indexed integer columns. I too

[...]
Posted by Dimitri Fontaine on 2019-11-06 at 19:45
Photo by unsplash-logoHanna Morris Postgres Query Planning PostgreSQL provides advanced tooling to understand how it executes SQL queries. The execution of a query follows specific steps: Parsing Planning Optimization Execution When it comes to the execution step, all Postgres does is follow the selected optimized plan. It is possible to ask Postgres for the query plan and inspect it, in order to better understand the execution of any query and then when necessary find a way to execute the same query faster.
Posted by Paul Ramsey in PostGIS on 2019-11-06 at 08:00

I love PostgreSQL extensions.

Extensions are the truest expression of the second principle of the original “design of Postgres” vision, to

provide user extendibility for data types, operators and access methods.

Extensions allow users to do more with PostgreSQL than just basic storage and retrieval. PostgreSQL is a full-on integration environment, like Python or Perl, and you can build very complete data manipulation pipelines very close to the metal using native and extension features of PostgreSQL.

Even though I’m a contributor to one of the largest PostgreSQL extensions, I have particularly come to love small extensions, that do one simple thing, particularly one simple thing we maybe take for granted in other environments.

My old HTTP extension is just a binding of libcurl to a SQL interface, so users can do web queries inside the SQL environment.

And today I’ve finished up a GZIP extension, that is just a binding of zlib to SQL, so that users can… compress and decompress things.

It’s not a lot, but it’s a little.

The GZIP entension came about because of an email on the PostGIS development list, where Yuri noted

The amazing ST_AsMVT() has two common usage patterns: copy resulting MVTs to a tile cache (e.g. .mbtiles file or a materialized view), or serve MVT to the users (direct SQL->browser approach). Both patterns still require one additional data processing step – gziping.

Huh. And this use case also applies to people generating GeoJSON directly in the database and sending it out to web clients.

The PostgreSQL core has generally frowned on compression functions at the SQL level, because the database already does compression of over-sized tuples as necessary. The last thing we want is people manually applying compression to column values, and then stuffing them into rows where the database will then to re-compress them internally. From the perspective of storage efficiency, just standing back and letting PostgreSQL do its work is preferable.

[...]
K-nearest neighbor answers the question of "What is the closest match?". PostgreSQL 12 can answer this question, and use indexes while doing it.

 

With the latest major version freshly released, it’s again time to gauge its performance. I’ve already been doing this for years , and I have my scripts ready, so it’s not too much work for me – but rather, for the machines. The v12 release, as always, adds quite a lot of nice stuff. Some highlights are a framework for changing storage engines, better partitioning, smaller multi-column and repetitive value indexes, and concurrent re-indexing, to name a few. So basically, there should be quite a bit of  performance improvements for OLAP / Warehouse use cases, but not so much for typical OLTP stuff. The smaller indexes should help, of course, if datasets get very big…but I guess the only way to get a real impression of it is to look at some test numbers, so feel free to jump to the summarizing table at the end of the post or get into the details by reading on.

By the way – from a code change statistics point of view, the numbers look quite similar to the v11 release, with the exception that there is an increase of about 10% in the number of commits. This is a good sign for the overall health of the project, I think 🙂



git diff --shortstat REL_11_5_STABLE..REL_12_0

3154 files changed, 317813 insertions(+), 295396 deletions(-)

git log --oneline REL_11_5..REL_12_0 | wc -l

2429

Test Queries

I decided to run 4 different test cases, each with a couple of different scale / client combinations. Scales were selected 100, 1000 and 5000 – with the intention that with 100, all data fits into shared buffers (which should provide the most accurate results), 1000 means everything fits into RAM (Linux buffer cache) for the used hardware and 5000 (4x RAM) to test if disk access was somehow improved. A tip – to quickly get the right “scaling factor” numbers for a target DB size, one could check out this post here.

[...]
Posted by Luca Ferrari on 2019-11-04 at 00:00
Unable to clean HTML
Posted by Luca Ferrari on 2019-11-04 at 00:00

PostgreSQL 12 is available as binary package on FreeBSD, but not in the quarterly update.

PostgreSQL 12 package on FreeBSD

In the case you need to install PostgreSQL 12 on FreeBSD please consider it has not reached the quarterly pkg(1) update, therefore if you install it via pkg(1) you will get PostgreSQL 12 rc1. However, in the ports tree, PostgreSQL is clearly at version 12 (release).
This behavior is due to the fact that since FreeBSD 12, the default repository for packages is quarterly, that in short means packages are older than the ports tree.

In order to install the official release, a new URL for the FreeBSD repository must be set up. The repository URL is placed into the file /etc/pkg/FreeBSD.conf:

FreeBSD: {
  url: "pkg+http://pkg.FreeBSD.org/${ABI}/quarterly",
  mirror_type: "srv",
  signature_type: "fingerprints",
  fingerprints: "/usr/share/keys/pkg",
  enabled: yes
}

The pkg(1) configuration allows the overriding of the default URL placing a file /usr/local/etc/pkg/FreeBSD.conf that overrides the properties of the above, so with the content:

FreeBSD: {
  url: "pkg+http://pkg.FreeBSD.org/${ABI}/latest"
}

After that, the repository can be updated and new packages will be available. Therefore, run:

% sudo pkg update
% sudo pkg install postgresql12-client-12 \
                   postgresql12-contrib-12 \
                   postgresql12-docs-12 \
                   postgresql12-plperl-12 \ 
                   postgresql12-server-12
PostgreSQL Application Connection Failover Using HAProxy

PostgreSQLRecently we published a blog about a very simple application failover using libpq features which could be the simplest of all automatic application connection routing.  In this blog post, we are discussing how a proxy server using HAProxy can be used for connection routing which is a well-known technique with very wide deployment. There are multiple ways HAProxy can be configured with PostgreSQL which we shall cover in upcoming blogs, but configuring a xinetd service to respond to Http requests on individual nodes of a database cluster is one of the most traditional approaches.

On HAProxy

HAProxy could be the most popular connection routing and load balancing software available. Along with PostgreSQL, it is used across different types of High Availability Clusters. HAProxy, as the name indicates, works as a proxy for TCP (Layer 4) and HTTP (Layer 7), but it has additional features of load balancing also. The TCP proxying feature allows us to use it for database connections of PostgreSQL. There are three objectives of connection routing of a PostgreSQL cluster:

  1. Read-Write load to Master
  2. Read-Only load to Slave
  3. Load balancing of multiple slaves is achievable by HAProxy.

HAProxy maintains an internal routing table. In this blog, we are going to take a look at the most traditional approach to configure HAProxy with PostgreSQL. This approach is independent of underlying clustering software and can be used even with the traditional PostgreSQL built-in replication feature without any clustering or automation solutions.

In this generic configuration, we won’t use any special software or capabilities offered by clustering frameworks. This requires us to have 3 components:

  1. A simple shell script to check the status of the PostgreSQL instance running on the local machine.
  2. A xinetd service daemonizer.
  3. HAProxy: Which maintains the routing mechanism.

Concept:

Every database server needs to have a xinetd service running on a port for status checks of PostgreSQL instan

[...]

Some weeks ago, in the light of PostgreSQL v12 release, I wrote a general overview on various major version upgrade methods and benefits of upgrading in general – so if upgrading is a new thing for you I’d recommend to read that posting first. But this time I’m concentrating on the newest (available since v10) and the most complex upgrade method – called “Logical Replication” or LR shortly. For demonstration purposes I’ll be migrating from v10 to freshly released v12 as this is probably the most likely scenario. But it should work the same also with v11 to v12. But do read on for details.

Benefits of LR upgrades

First a bit of recap from the previous post on why would you use LR for upgrading at all. Well, in short – because it’s the safest option with shortest possible downtime! With that last point I’m already sold…but here again the list of “pros” / “cons”:

PROS

  • Minimal downtime required

After the initial setup burden one just needs to wait (and verify) that the new instance hast all the data from the old one…and then just shut down the old instance and point applications to the new instance. Couldn’t be easier!

Also before the switchover one can make sure that statistics are up to date, to minimize the typical “degraded performance” period seen after “pg_upgrade” for more complex queries (on bigger databases). For high load application one could even be more careful here and pull the most popular relations into shared buffers by using the (relatively unknown) “pg_prewarm” Contrib extension or by just running common SELECT-s in a loop, to counter the “cold cache” effect.

  • Flexible

One can for example already make some changes on the target DB – add columns / indexes, change datatypes, leave out some old archive tables, etc. The general idea is that LR does not work on the binary, 1-to-1 level as”pg_upgrade” does, but rather JSON-like data objects are sent over to another master / primary instance, providing quite some freedom on the details.

  • Safe

Before the fi

[...]

My very simple attempt at keeping PostgreSQL up-to-date on FreeBSD machines.

Installing PostgreSQL on FreeBSD via Ansible

I’m slowly moving to Ansible to manage my machines, and one problem I’m trying to solve at best is how to keep PostgreSQL up-to-date.
In the case of FreeBSD machines, pkgng is the module to use, but in the past I was used to this very simple playbook snippet:

- name: PostgreSQL 11 become: yes with_items: - server - contrib - client - plperl pkgng: name: postgresql11- state: latest 

However, there is a very scarign warning message when running the above:

TASK [PostgreSQL 11] [DEPRECATION WARNING]: Invoking "pkgng" only once while using a loop via squash_actions is deprecated. Instead of using a loop to supply multiple items and specifying `name: "postgresql11-"`, please use `name: ['server', 'contrib', 'client', 'plperl']` and remove the loop. This feature will be removed in version 2.11. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg. 

That’s easy to fix, but also annoying (at least to me), because I have to change the above snippet to the following one:

- name: PostgreSQL 11 become: yes pkgng: name: - postgresql11-server - postgresql11-contrib - postgresql11-client - postgresql11-plperl state: latest 

So far, the better solution I’ve found that helps me keep readibility is...

Monitoring PostgreSQL with Percona Monitoring Management

PostgreSQLPostgreSQL is a widely-used Open Source database and has been ranked # 1 for the past 2 years in DB-Engine rankings. As such, there is always a need for reliable and robust monitoring solutions. While there are some commercial monitoring tools, there is an equally good number of open source tools available for monitoring PostgreSQL. Percona Monitoring and Management (PMM) is one of those open source solutions that have continuous improvements and is maintained forever by Percona. It is simple to set up and easy to use.

PMM can monitor not only PostgreSQL but also MySQL and MongoDB databases, so it is a simple monitoring solution for monitoring multiple types of databases. In this blog post, you will see all the steps involved in monitoring PostgreSQL databases using PMM.

This is what we will be discussing:

  1. Using the PMM docker image to create a PMM server.
  2. Installing PMM client on a Remote PostgreSQL server and connecting the PostgreSQL Client to PMM Server.
  3. Creating required users and permissions on the PostgreSQL server.
  4. Enabling PostgreSQL Monitoring with and without QAN (Query Analytics)

If you already know how to create a PMM Server, please skip the PMM server setup and proceed to the PostgreSQL client setup.

Using the PMM docker image to create a PMM server

PMM is a client-server architecture where clients are the PostgreSQL, MySQL, or MongoDB databases and the server is the PMM Server. We see a list of metrics on the Grafana dashboard by connecting to the PMM server on the UI. In order to demonstrate this setup, I have created 2 virtual machines where one of them is the PMM Server and the second server is the PostgreSQL database server.

192.168.80.10 is my PMM-Server
192.168.80.20 is my PG 11 Server

Step 1 : 

On the PMM Server, install and start docker.

# yum install docker -y
# systemctl start docker

Here are the installation instructions of PMM Server.

Step 2 :

Pull the pmm-server docker image. I am using the latest PMM2 docker image for t

[...]