PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Federico Campoli on 2022-08-12 at 10:30

This week the PostgreSQL pets are two lovely dogs, Gustaw, Stefan that live with a mean parrot. They are presented by Alicja Kucharczyk.

The PostgreSQL Global Development group has released the latest update to the community, crushing 40 bugs that had been previously disclosed. Critical security vulnerability CVE-2022-2625 has been announced and an associated remediation has been included in this quarter’s release.
Checking how well a Postgres cluster running in Kubernetes tolerates failures
Posted by Egor Rogov in Postgres Professional on 2022-08-11 at 00:00

So far we have covered query execution stages, statistics, sequential and index scan, and have moved on to joins.

The previous article focused on the nested loop join, and in this one I will explain the hash join. I will also briefly mention group-bys and distincs.

One-pass hash join

The hash join looks for matching pairs using a hash table, which has to be prepared in advance. Here's an example of a plan with a hash join: EXPLAIN (costs off) SELECT * FROM tickets t JOIN ticket_flights tf ON tf.ticket_no = t.ticket_no; QUERY PLAN −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− Hash Join Hash Cond: (tf.ticket_no = t.ticket_no) −> Seq Scan on ticket_flights tf −> Hash −> Seq Scan on tickets t (5 rows)

First, the Hash Join node calls the Hash node. The Hash node fetches all the inner set rows from its child node and arranges them into a hash table.

A hash table stores data as hash key and value pairs. This makes key-value search time constant and unaffected by hash table size. A hash function distributes hash keys randomly and evenly across a limited number of buckets. The total number of buckets is always two to the power N, and the bucket number for a given hash key is the last N bits of its hash function result.

So, during the first stage, the hash join begins by scanning all the inner set rows. The hash value of each row is computed by applying a hash function to the join attributes (Hash Cond), and all the fields from the row required for the query are stored in a hash table.

...

Posted by Elizabeth Garrett Christensen in Crunchy Data on 2022-08-08 at 15:00

In a world where everything is stored in git following IaC (infrastructure as code) you may want the same from your database. For many following this style of engineering modernization we see a focus on IaC and K8s. We have many users standardizing on our PGO Kubernetes Operator to help. But following an IaC approach doesn’t mean you always want to manage your database and be in Kubernetes. For those wanting to forget about their database and trust the uptime, safety, and security of it to someone else - but still want to evolve their development practices - you can have your cake and eat it too.

Crunchy Bridge is excited to announce that we have released a verified Terraform provider. Many of our Crunchy Bridge customers are already using Terraform to manage their cloud infrastructure. This addition to our platform will allow Crunchy Bridge PostgreSQL resources to be integrated into infrastructure as code and DevOps workflows for our customers.

Working with terraform allows our customers to manage their cloud infrastructure with configuration files that can be shared, reused, and version controlled across their organization. To get started with Terraform, see HashiCorp’s documentation. The Crunchy Bridge Terraform provider is intended for use with both open-source Terraform and Terraform Cloud.

Get started with our Terraform provider

Posted by Andreas 'ads' Scherbaum on 2022-08-08 at 14:00
PostgreSQL Person of the Week Interview with Adam Wright: I grew up as an Army brat - someone who moves from base to base. I am settled with my family now and live in a coastal town between Boston and Cape Cod.
Posted by Regina Obe in PostGIS on 2022-08-08 at 00:00

The PostGIS Team is pleased to release PostGIS 3.3.0rc1! Best Served with PostgreSQL 15 beta2 ,GEOS 3.11.0 , and SFCGAL 1.4.1

Lower versions of the aforementioned dependencies will not have all new features.

This release supports PostgreSQL 11-15.

3.3.0rc1

This release is a release candidate of a major release, it includes bug fixes since PostGIS 3.2.2 and new features.

Posted by cary huang in Highgo Software on 2022-08-05 at 22:51

Introduction

Network File System (NFS) is a distributed file system protocol that allows a user on a client node to access files residing on a server node over network much like local storage is accessed. Today in this blog, I will share how to set up both NFSv4 server and client on CentOS7 and run PG on it.

NFS Server

First, install the NFS server components by

$ yum install nfs-utils

This will install nfs process on the server machine in which we can go ahead to enable and start the NFS server

$ systemctl enable nfs
$ systemctl start nfs

Create a directory that will be mounted by NFS clients

mkdir /home/myserveruser/sharedir

In /etc/exports, add a new record like below

/home/myserveruser/sharedir    X.X.X.X(rw,sync,no_subtree_check,no_root_squash)

This line is allowing a client having IP address of X.X.X.X to mount the directory at /home/myserveruser/sharedir and can do read and write as specified by rw option. For all possible options, refer to the blog here for definition of each options used. This directory will be used to initialized PostgreSQL database cluster by a NFS client over the network.

If more than one client/host will mount the same directory, you will need to include them in /etc/exports as well.

/home/myserveruser/sharedir    X.X.X.X(rw,sync,no_subtree_check,no_root_squash)
/home/myserveruser/sharedir    A.A.A.A(rw,sync,no_subtree_check,no_root_squash)
/home/myserveruser/sharedir    B.B.B.B(rw,sync,no_subtree_check,no_root_squash)

Then, we are ready to restart the NFS service to take account the new changes in /etc/exports

$ systemctl restart nfs

Record the userid and groupid of the user associated with the directory to be exported to client. For example, userid = 1009, groupid = 1009

$ id myserveruser

You may also want to ensure that the firewall on centos7 is either disabled or set to allow the NFS traffic to passthrough. You can check the firewall status and add new port to be allowed with the following commands:

$ sudo firewall-cmd --zone=publi
[...]
Posted by Federico Campoli on 2022-08-05 at 10:30

This week the PostgreSQL pet is Milo the ginger cat adopted by Dave Page.

Posted by Dave Page in EDB on 2022-08-02 at 15:28

On Monday 11th July the pgAdmin Development Team opened the first pgAdmin user survey which we then ran for three weeks, closing it on Monday 1st August. The aim of the survey was to help us understand how users are using pgAdmin to help us shape and focus our future development efforts.

We had a fantastic response with 278 people taking the time to complete the survey - far exceeding our expectations. Responses were generally positive as well, with a number of people expressing their appreciation for the work of the development team, which is always nice to hear.

In this blog post I'll go through the high level topics of the survey and attempt to summarise what has been reported, and draw some initial conclusions. If you would like to take a look at the results yourself... [Continue reading...]

Posted by Andreas 'ads' Scherbaum on 2022-08-01 at 14:00
PostgreSQL Person of the Week Interview with Beena Emerson: I am an Indian who grew up in Mumbai and now settled near Chennai.
Posted by Jeremy Schneider in Amazon RDS on 2022-08-01 at 00:02

The PostgreSQL Performance Puzzle was, perhaps, too easy – it didn’t take long for someone to guess the correct answer!

But I didn’t see much discussion about why the difference or what was happening. My emphasis on the magnitude of the difference was a tip-off that there’s much more than meets the eye with this puzzle challenge – and one reason I published it is that I’d looked and I thought there were some very interesting things happening beneath the surface.

So let’s dig!

There are several layers. But – as with all SQL performance questions always – we begin simply with EXPLAIN.

[root@ip-172-31-36-129 ~]# sync; echo 1 > /proc/sys/vm/drop_caches
[root@ip-172-31-36-129 ~]# service postgresql-14 restart;
Redirecting to /bin/systemctl restart postgresql-14.service

pg-14.4 rw root@db1=# select 1;
...
The connection to the server was lost. Attempting reset: Succeeded.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)

pg-14.4 rw root@db1=# set track_io_timing=on;
SET
pg-14.4 rw root@db1=# set log_executor_stats=on;
SET
pg-14.4 rw root@db1=# set client_min_messages=log;
SET
pg-14.4 rw root@db1=# \timing on
Timing is on.

pg-14.4 rw root@db1=# explain (analyze,verbose,buffers,settings) select count(mydata) from test where mynumber1 < 500000;
LOG:  00000: EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       0.107946 s user, 0.028062 s system, 0.202054 s elapsed
!       [0.113023 s user, 0.032292 s system total]
!       7728 kB max resident size
!       89984/0 [92376/360] filesystem blocks in/out
!       0/255 [1/1656] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       197/0 [260/0] voluntary/involuntary context switches
LOCATION:  ShowUsage, postgres.c:4898
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost
[...]
Posted by Federico Campoli on 2022-07-29 at 08:30

The PostgreSQL pets of the week are Comet and Shadow, two lovely dogs adopted by Karen Jex.

If you’ve been running PostgreSQL for a while, you’ve heard about autovacuum. Yes, autovacuum, the thing which everybody asks you not to turn off, which is supposed to keep your database clean and reduce bloat automatically.

And yet—imagine this: one fine day, you see that your database size is larger than you expect, the I/O load on your database has increased, and things have slowed down without much change in workload. You begin looking into what might have happened. You run the excellent Postgres bloat query and you notice you have a lot of bloat. So you run the VACUUM command manually to clear the bloat in your Postgres database. Good!

But then you have to address the elephant in the room: why didn’t Postgres autovacuum clean up the bloat in the first place…? Does the above story sound familiar? Well, you are not alone. 😊

Autovacuum and VACUUM provide a number of configuration parameters to adapt it to fit your workload, but the challenge is figuring out which ones to tune. In this post—based on my optimizing autovacuum talk at Citus Con: An Event for Postgres—you’ll learn to figure out where the problem lies and what to tune to make it better.

More specifically, you’ll learn how to investigate—and how to fix—these 3 common types of autovacuum problems:

Another common type of autovacuum problem is transaction id wraparound related, which is a meaty topic all on its own. In the future I plan to write a separate, follow-on blog post to focus on that topic.

Overview of all 13 autovacuum tips in this blog post

This cheat sheet diagram of “autovacuum tips” gives you an overview of all the Postgres autovacuum fixes you’ll learn about in this blog post:

13 possible fixes for 3 most common autovacuum problems
Figure 1: Diagram of the 13 different types of possible autovacuum fixes for the 3 most common types of autovacuum problems in Postgres.

Intro to Au

[...]
I just released Pagila 3.0.0, with some new shiny features: Continue reading "Pagila 3.0.0 is out with shiny new features and bugfixes"
Rocky Linux 9 is finally out. Some repo names changed, so here is a short guide about installing PostgreSQL and PostGIS on Rocky Linux 9: Continue reading "How to install PostgreSQL and PostGIS on Rocky Linux 9"
Posted by Ryan Booz in Timescale on 2022-07-26 at 13:26

The State of PostgreSQL 2022 survey closed a few weeks ago, and we're hard at work cleaning and analyzing the data to provide the best insights we can for the PostgreSQL community.

In the database community, however, there are usually two things that drive lots of discussion year after year: performance and tooling. During this year's survey, we modified the questions slightly so that we could focus on three specific use cases and the PostgreSQL tools that the community finds most helpful for each: querying and administration, development, and data visualization.

PostgreSQL Tools: What Do We Have Against psql?

Absolutely nothing! As evidenced by the majority of respondents (69.4 %) that mentioned using psql for querying and administration, it's the ubiquitous choice for so many PostgreSQL users and there is already good documentation and community contributed resources (https://psql-tips.org/ by Leatitia Avrot is a great example) to learn more about it.

So that got us thinking. What other tools did folks bring up often for interacting with PostgreSQL along the three use cases mentioned above?

I'm glad we asked. 😉

PostgreSQL Querying and Administration

As we just said, psql is by far the most popular tool for interacting with PostgreSQL. 🎉

It's clear, however, that many users with all levels of experience do trust other tools as well.

Query and administration tools

pgAdmin (35 %), DBeaver (26 %), Datagrip (13 %), and IntelliJ (10 %) IDEs received the most mentions. Most of these aren't surprising if you've been working with databases, PostgreSQL or not. The most popular GUIs (pgAdmin and DBeaver) are open source and freely available to use. The next more popular GUIs (Datagrip and IntelliJ) are licensed per seat. However, if your company or team already uses JetBrain's tools, you might have access to these popular tools.

alt

What I was more interested in were the mentions that happened just after the more popular tools I expected to see. Often, it's this next set of

[...]
Posted by Laurenz Albe in Cybertec on 2022-07-26 at 09:00

We all know and value SQL functions as a handy shortcut. PostgreSQL v14 has introduced a new, better way to write SQL functions. This article will show the advantages of the new syntax.

An example of an SQL function

Let’s create a simple example of an SQL function with the “classical” syntax so that we have some material for demonstrations:

CREATE EXTENSION unaccent;

CREATE FUNCTION mangle(t text) RETURNS text
   LANGUAGE sql
   AS 'SELECT lower(unaccent(t))';

You can use the new function like other database functions:

SELECT mangle('Schön dumm');

   mangle   
════════════
 schon dumm
(1 row)

Why SQL functions?

You may ask what good an SQL function is. After all, the main purpose of a database function is to be able to run procedural code inside the database, something you cannot do with SQL. But SQL functions have their use:

  • code reuse for expressions frequently used in different SQL statements
  • to make SQL statements more readable by factoring out part of the code into a function with a meaningful name
  • whenever you need a function for syntactical reasons, like in CREATE AGGREGATE or CREATE OPERATOR

Moreover, simple SQL functions can be inlined, that is, the optimizer can replace the function call with the function definition at query planning time. This can make SQL functions singularly efficient:

  • it removes the overhead of an actual function call
  • since functions are (mostly) black boxes to the optimizer, replacing the function with its definition usually gives you better estimates

We can see function inlining if we use EXPLAIN (VERBOSE) on our example function:

EXPLAIN (VERBOSE, COSTS OFF) SELECT mangle('Schön dumm');

                  QUERY PLAN                   
═══════════════════════════════════════════════
 Result
   Output: lower(unaccent('Schön dumm'::text))
(2 rows)

Shortcomings of PostgreSQL functions

PostgreSQL functions are great. One of the nice aspects is that you are not restricted to a single programming language. Out o

[...]
Posted by Jonathan Katz on 2022-07-26 at 00:00

Around this time of year, I am reading through the upcoming PostgreSQL release notes (hello PostgreSQL 15), reading mailing lists, and talking to PostgreSQL users to understand what are the impactful features. Many of these features will be highlighted in the feature matrix and release announcement (hello PostgreSQL 14!).

However, sometimes I miss an impactful feature. It could be that we need to see how the feature is actually used post-release, or it could be that it was missed. I believe one such change is the introduction of the SQL-standard BEGIN ATOMIC syntax in PostgreSQL 14 that is used to create function and stored procedure bodies.

Let’s see how functions we could create functions before PostgreSQL 14, the drawbacks to this method, and how going forward, BEGIN ATOMIC makes it easier and safer to manage functions!

Before PostgreSQL 14: Creating Functions as Strings

PostgreSQL has supported the ability to create stored functions (or “user-defined functions”) since POSTGRES 4.2 in 1994 (thanks [Bruce Momjian[(https://momjian.us/)] for the answer on this). When declaring the function body, you would write the code as a string (which is why you see the $$ marks in functions). For example, here is a simple function to add two numbers:

CREATE FUNCTION add(int, int)
RETURNS int
LANGUAGE SQL
IMMUTABLE PARALLEL SAFE
AS $$
 SELECT $1 + $2;
$$;

If I want to have functions that calls another user-defined function, I can do so similarly to the above. For example, here is a function that uses the add function to add up a whole bunch of numbers:

CREATE FUNCTION test1_add_stuff(int, int, int, int)
RETURNS int
LANGUAGE SQL
IMMUTABLE PARALLEL SAFE
AS $$
 SELECT add(add($1, $2), add($3, $4));
$$;

After I create the two functions, if I try to run test1_add_stuff, it works:

SELECT test1_add_stuff(1,2,3,4);

 test1_add_stuff
-----------------
 10

What happens if I drop the add function?

DROP FUNCTION add(int, int);

It drops successfully:

DROP FUNCTION add(int, int);
D
[...]
Posted by Elizabeth Garrett Christensen in Crunchy Data on 2022-07-25 at 15:00

Vanilla Postgres has native partitioning?

Yes! And it's really good!

We frequently get questions like: Can Postgres handle JSON? Can Postgres handle time series data? How scalable is Postgres? Turns out the answer is most usually yes! Postgres, vanilla Postgres, can handle whatever your need is without having to go to a locked in proprietary database. Unless you're really close to the Postgres internals and code releases you might have missed that Postgres natively has partitioning. Our head of product, Craig, recently talked about the advantages of working with vanilla Postgres versus niche products. With that in mind, I wanted to take a step back and go through the basics of vanilla Postgres partitioning.

Crunchy customers on cloud and Kubernetes are often asking about partitioning features and for our general recommendations. For existing data sets, our architects will take a deep dive into their specific use case. Generally speaking though, partitioning is going to be beneficial where data needs to scale - either to keep up with performance demands or to manage the lifecycle of data. Do you need to partition a 200GB database? Probably not. If your're building something new that's likely to scale into the TB or dealing with something in the multi-TB size of data, it might be time to take a peek at native partitioning and see if that can help.

Partitioning use cases

  • Data lifecycle & cost management

The main benefit of working with partitions is helping with lifecycle management of data. Big data gets really expensive so archiving off data you no longer need can be really important to managing c[...]

Posted by Andreas 'ads' Scherbaum on 2022-07-25 at 14:00
PostgreSQL Person of the Week Interview with Elizabeth Garrett Christensen: I’m not your average Postgres person of the week - I’m a non-developer as well as a Postgres fan and marketing/sales/customer facing person. I am part of a Postgres co-working couple and my husband David Christensen and I both work for Crunchy Data from our home in Lawrence, Kansas.
Posted by Michael Christofides on 2022-07-25 at 13:48

When doing query optimization work, it is natural to focus on timing data. If we want to speed up a query, we need to understand which parts are slow.

But timings have a few weaknesses:

  • They vary from run to run

  • They are dependent on the cache

  • Timings alone can hide efficiency issues — e.g. through parallelism

Don’t get me wrong, I still think timings are very important for query tuning, but in this article we’re going to explore how you can complement them with the amount of data read and/or written by a query, by using BUFFERS.

When people share stories of 1000x query speed-ups, they are usually a result of reading far less data overall (usually as a result of adding an index). Much like the world of design, less is more.

To see the amount of data read/written by a query you can use the buffers parameter, for example by prefixing your query with:

explain (analyze, buffers)

There is a growing movement to turn buffers on by default, which I’d personally love to see. If anyone reading is able to review the patch to do so, I’d be very grateful!

One downside of including this extra data is that many folks already find Postgres explain output difficult to interpret. As such, I thought it’d be helpful to recap what exactly the buffers numbers mean, and describe some simple ways you can use them.

What are the buffer statistics again?

We’ve written about what the buffers statistics mean before, but the short version is that each of them consists of two parts, a prefix and a suffix.

There are three prefixes:

  • Shared blocks contain data from normal tables and indexes.

  • Temp blocks contain short-term data used to calculate hashes, sorts, materialize operations, and similar.

  • Local blocks contain data from temporary tables and indexes (yes, this is quite confusing, given that there is also a prefix “Temp”).

And there are four suffixes:

[...]
Posted by Regina Obe in PostGIS on 2022-07-23 at 00:00

The PostGIS Team is pleased to release PostGIS 3.2.2! This release works for PostgreSQL 9.6-15.

3.2.2

This release is a bug fix release, addressing issues found in the previous 3.2 releases.

1. Overview

Similar to PostgreSQL, Lustre file system is also an open source project which started about 20 years ago. According to Wikipedia, Lustre file system is a type of parallel distributed file system, and is designed for large-scale cluster computing with native Remote Direct Memory Access (RDMA) support. Lustre file systems are scalable and can be part of multiple computer clusters with tens of thousands of client nodes, tens of petabytes (PB) of storage on hundreds of servers, and more than a terabyte per second (TB/s) of aggregate I/O throughput. This blog will explain how to setup a simple Lustre file system on CentOS 7 and run PostgreSQL on it.

2. Lustre file system

To deliver parallel file access and improve I/O performance, Lustre file system separates out metadata services and data services. From high level architecture point of view, Lustre file system contains below basic components:

  • Management Server (MGS), provides configuration information about how the file system is configured, notifies clients about changes in the file system configuration and plays a role in the Lustre recovery process.
  • Metadata Server (MDS), manages the file system namespace and provides metadata services to clients such as filename lookup, directory information, file layouts, and access permissions.
  • Metadata Target (MDT), stores metadata information, and holds the root information of the file system.
  • Object Storage Server (OSS), stores file data objects and makes the file contents available to Lustre clients.
  • Object Storage Target (OST), stores the contents of user files.
  • Lustre Client, mounts the Lustre file system and makes the contents of the namespace visible to the users.
  • Lustre Networking (LNet) – a network protocol used for communication between Lustre clients and servers with native RDMA supported.

If you want to know more details inside Lustre, you can refer to Understanding Lustre Internals.

3. Setup Lustre on CentOS 7

To setup a simple Lustre fil

[...]

Join us virtually on August 10, 2022 as we celebrate World Elephant Day with “Elephants at the Watering Hole” – a PostgreSQL open discussion!

Instead of a single speaker, we’ll open the virtual floor and give everyone a chance to speak.

This is a joint event with our friends at SFPUG!

Share your favorite tips and tricks, your “you won’t believe this!” stories, and your enthusiasm for all things Postgres.

Schedule:
12:00 PM Announcements, discussion.
1:15 PM Closing announcements, wrap up.
1:30 PM Event ends.

Link and password for the GoToMeeting room will be sent via MeetUp on the evening before the event.

Posted by Federico Campoli on 2022-07-22 at 10:30

This week the PostgreSQL pet is Shenva the ginger cat adopted by Lætitia Avrot.

Shenva is a ginger stray cat with an interesting story.

We released Citus 11 in the previous weeks and it is packed. Citus went full open source, so now previously enterprise features like the non-blocking aspect of the shard rebalancer—and multi-user support—are all open source for everyone to enjoy. One other huge change in Citus 11 is now you can query your distributed Postgres tables from any Citus node, by default.

When using Citus to distribute Postgres before Citus 11, the coordinator node was your application’s only point of contact. Your application needed to connect to the coordinator to query your distributed Postgres tables. Coordinator node can handle high query throughput, about 100K per second but your application might need even more processing power. Thanks to our work in Citus 11 you can now query from any node in the Citus database cluster you want. In Citus 11 we sync the metadata to all nodes by default, so you can connect to any node and run queries on your tables.

Running queries from any node is awesome but you also need to be able to monitor and manage your queries from any node. Before, when you only connected the coordinator, using Postgres’ monitoring tools was enough but this is not the case anymore. So in Citus 11 we added some ways to observe your queries similar to you would do in a single Postgres instance.

In this blogpost you’ll learn about some new monitoring tools introduced in Citus 11 that’ll help you track and take control of your distributed queries, including:

New identifier for Citus processes: Global PID

You can now use the Citus global process id, global PID for short, which is new to Citus 11. The Global PID is just like Postgres’ process id, but this new value is unique across a Citus cluster. We call the new value global process identifier

[...]
Posted by Regina Obe in PostGIS on 2022-07-20 at 12:50

The PostGIS Team is pleased to release PostGIS 3.1.6! This release works for PostgreSQL 9.6-14.

3.1.6

This release is a bug fix release, addressing issues found in the previous 3.1 releases.

Posted by Regina Obe in PostGIS on 2022-07-20 at 00:00

The PostGIS Team is pleased to release PostGIS 3.0.6. This release works for PostgreSQL 9.5-13.

3.0.6

Posted by Michael Aboagye on 2022-07-19 at 22:25

What is faster: PostgreSQL or AlloyDB? Some benchmarks and a performance analysis. Continue reading AlloyDB versus PostgreSQL: a performance review

The post AlloyDB versus PostgreSQL: a performance review appeared first on Vettabase.