PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

Citus 10.1 is out! In this latest release to the Citus extension to Postgres, our team focused on improving your user experience. Some of the 10.1 fixes are operational improvements—such as with the shard rebalancer, or with citus_update_node. Some are performance improvements—such as for multi-row INSERTs or with citus_shards. And some are fixes you’ll appreciate if you use Citus with lots of Postgres partitions.

Given that the previous Citus 10 release included a bevy of new features—including things like columnar storage, Citus on a single node, open sourcing the shard rebalancer, new UDFs so you can alter your distributed table properties, and the ability to combine Postgres and Citus tables via support for JOINs between local and distributed tables, and foreign keys between local and reference tables—well, we felt that Citus 10.1 needed to prioritize some of our backlog items, the kind of things that can make your life easier.

This post is your guide to the what’s new in Citus 10.1. And if you want to catch up on all the new things in past releases to Citus, check out the release notes posts about Citus 10, Citus 9.5, Citus 9.4, Citus 9.3, and Citus 9.2.

So what is new in Citus 10.1?

In this post let’s explore the Citus 10.1 fixes to:

Or, if you want the itemized list of all the improvements in 10.1, then take a peek at the Changelog for Citus 10.1.0 over in our GitHub repo.

Eli the elephant flying in Osprey drone
Citus 10.1 release graphic with an elephant in the pilot seat and of course a decal
[...]
Improve PostgreSQL Query Performance pg_stat_monitor

Understanding query performance patterns is essentially the foundation for query performance tuning. It, in many ways, dictates how a database cluster evolves. And then there are obviously direct and indirect cost connotations as well.

PostgreSQL provides very detailed statistics through a number of catalog views and extensions that can be easily added to provide more detailed query statistics. With each view focused on a particular aspect, the picture almost always needs to be stitched together by combining different datasets. That requires effort and still, the whole picture might not be complete.

The pg_stat_monitor extension attempts to provide a more holistic picture by providing much-needed query performance insights in a single view. The extension has been evolving over the past year and is now nearing the GA release.

Some Useful Extensions

Currently, you may be relying on a number of extensions to understand how a query behaves, the time taken in planning and execution phases, min/max/meantime values, index hits, query plan, and client application details. Here are some extensions that you might already be very familiar with.

pg_stat_activity

This view is available by default with PostgreSQL. It provides one row per server process along with current activity and query text.

In case you’d like to learn more about it, hop over to the official PostgreSQL documentation here.

pg_stat_statements

This extension is part of the contrib packages provided with the PostgreSQL server. However, you’d have to create the extension manually. It’s a query-wise aggregation of statistical data with min/max/mean/standard deviation for execution and planning times and various useful information and query text.

You can read more about pg_stat_statements at the official PostgreSQL documentation site.

auto_explain

Another useful extension is provided by the PostgreSQL server. It dumps query plans in the server log for any query exceeding a time threshold specified by a GUC

(

[...]

Hardening PostgreSQL has become ever more important. Security is king these days and people want to know how to make PostgreSQL safe. Some of us might still remember what happened to MongoDB in recent years and we certainly want to avoid similar security problems in the PostgreSQL world. What happened to MongoDB is actually stunning: Thousands of databases were held ransom because of poor default security settings – it was an absolute nightmare and greatly damaged the reputation of not just MongoDB but the entire industry. PostgreSQL people do everything they can to avoid a repeat performance of that in our ecosystem.

The hashtag #ransomware is not what you want to see when somebody is talking about your company. To avoid some of the most common problems, we have compiled a “best of PostgreSQL security problems” which can be used as a guideline of 12 steps to improve your setup.

1. Avoid relaxed listen_addresses settings

PostgreSQL is running as a server process and people want to connect to the database. The question is: Where are those connections coming from? The listen_addresses setting which can be found in postgresql.conf controls those bind addresses.

In other words: If listen_addresses is set to ‘*’, PostgreSQL will listen on all network devices, consider those connections and move on to the next stage, which is evaluating the content of pg_hba.conf. Listening on all devices is a problem, because a bad actor could easily spam you with authentication requests – disaster is then just one pg_hba.conf entry away.

Recommendation:

  • In case only local connections are needed:
    • Set listen_addresses = ‘localhost’
  • In case remote connections are needed:
    • Set listen_addresses = ‘localhost,

If you don’t listen at all, you are definitely more secure. PostgreSQL will not even have to reject your connection if you are already limiting network access.

2. Using “trust” in pg_hba.conf

After dealing with listen_ad

[...]
Posted by Egor Rogov in Postgres Professional on 2021-07-28 at 00:00

To remind you, we've already talked about relation-level locks, row-level locks, locks on other objects (including predicate locks) and interrelationships of different types of locks.

The following discussion of locks in RAM finishes this series of articles. We will consider spinlocks, lightweight locks and buffer pins, as well as events monitoring tools and sampling.

...

Posted by Kat Batuigas in Crunchy Data on 2021-07-27 at 14:58

Early in on my SQL journey, I thought that searching for a piece of text in the database mostly involved querying like this:

SELECT col FROM table WHERE col LIKE '%some_value%';

Then I would throw in some wildcard operators or regular expressions if I wanted to get more specific.

Later on, I worked with a client who wanted search functionality in an app, so “LIKE” and regex weren't going to cut it. What I had known all along was just pattern matching. It works perfectly fine for certain purposes, but what happens when it's not just a matter of checking for a straightforward pattern in a single text field? 

Posted by Michael Christofides on 2021-07-27 at 10:37

Over the past 5 years, new Postgres versions have been getting better and better at running operations in parallel.

This is especially useful for analytical queries, which can sometimes speed up massively by fully utilising a large server (with lots of cores).

However, for fast, transactional queries, the overhead of starting and managing processes is almost always undesirable. As such, the default settings in Postgres are on the conservative side.

If you think your workload could benefit from more parallelism, then this post is for you.

The first limiting factor

The first setting you’re likely to be limited by is the max_parallel_workers_per_gather parameter, which is only two by default.

This means that each Gather (or Gather Merge) operation can use at most two worker processes. If you haven’t changed this, you will likely have seen “Workers Planned: 2” and “Workers Launched: 2” in your EXPLAIN plans. We look out for these fields in pgMustard, as a sign the query might benefit from increased parallelisation.

It’s worth noting that the leader process is not included in this limit or these numbers, so by default you get a maximum of three processes.

Testing and making changes

You can see what your current setting is with:

SHOW max_parallel_workers_per_gather;

You can also change it just for your current session by running a query, eg:

SET max_parallel_workers_per_gather = 4;

You can then use EXPLAIN to see whether Postgres thinks using the extra workers is a good idea for a given query. You can then use EXPLAIN ANALYZE to see if whether (or by how much) it is actually faster.

Some settings that you should definitely consider at the same time are:

  • work_mem – as each worker can utilise it separately

  • max_parallel_workers – a system-wide limit (default 8)

  • max_worker_processes – a higher limit including background jobs (default 8)

Additionally, if parallelism is kicking in too soon, or too late,

[...]
Posted by Bo Peng in SRA OSS, Inc. on 2021-07-27 at 02:00

Pgpool-II Exporter is an open source exporter which exposes Pgpool-II and PostgreSQL cluster's metrics which can be collected by Prometheus. In this previous post, I described how to build and run Pgpool-II Exporter on a local machine. PgPool Global Development Group provides Docker images for Pgpool-II Exporter to simply and speed up the deployment process. In this post, I'll show how to get Pgpool-II Exporter running using Docker. 

To deploy and verify Pgpool-II Exporter, you need:

  • Two PostgreSQL containers configured with streaming replication (1 primary and 1 replica)
  • A running Pgpool-II container. The Pgpool-II container will direct write queries to the primary and read queries to the primary or replica.

Let's get started.

Running a PostgreSQL streaming replication cluster

In this post, we use Crunchy Data Container to deploy a PostgreSQL streaming replication cluster manually. In a production environment, PostgreSQL Operator for Kubernetes is the recommended way to deploy and manage PostgreSQL containers.

The following commands and configurations are for testing only. For more advanced usage, please refer to Crunchy Data Container.

Create a network

Create a network which allows your application containers to communicate with each other by using their container name as a hostname.

docker network create --driver bridge pg-network

Start the primary container

Start the PostgreSQL primary container with the following command:

docker run -d --rm \
--name=primary \
--network pg-network \
-e MODE=postgres \
-e PG_MODE=primary \
-e PG_PRIMARY_PORT=5432 \
-e PG_PRIMARY_USER=repluser \
-e PG_PRIMARY_PASSWORD=password \
-e PG_USER=testuser \
-e PG_PASSWORD=password \
-e PG_ROOT_PASSWORD=password \
-e PG_DATABASE=userdb \
-d crunchydata/crunchy-postgres:centos8-13.2-4.6.2
  • MODE: specify the running mode of PostgreSQL. (Running modes)
  • PG_MODE: specify the role (primary or replica)
  • PG_PRIMARY_PORT: the primar
[...]
Posted by Luca Ferrari on 2021-07-27 at 00:00

pgbackrest can work in asynchronous way in order to improve the resource usage.

pgbackrest async behavior

pgbackrest is an amazing backup tool, it is rock-solid (as PostgreSQL is) and designed to work under heavy database load.
One feature it has to improve efficienty of WAL archiving is the async mode.

In “standard” mode, pgbackrest will push WAL segments to the backup machine, using the classical archive_command provided by PostgreSQL. As you probably already know, PostgreSQL will wait for archive_command to complete and acknowledge the WAL transfert. It could happen that:

  • the archive_command could take a very long time, and while PostgreSQL will continue to work, not yet transferred WALs will make pg_wal to grow;
  • the archive_command could fail, and PostgreSQL will warn you (in the logs) about this event and will try again to archive the failed WALs (forever, or better, unless it succeed).


On the other hand, when doing a restore, PostgreSQL executes the restore_command to get a new WAL segment, and this in turn results in running pgbackrest for a single WAL request.
The key concept here is probably single WAL request, both for push and get.

pgbackrest allows for an improvement on this situation by means of asynchronous archive management, both push and get. The idea is to give more control to pgbackrest so that it can optimize I/O operations.
When PostgreSQL archives a WAL segment, it executes the archive_command within a loop (allow me to simplify things): when a WAL is ready, archive_command is invoked and until it has finished, there is no chance to archive an already available WAL segment. On the other hand, when PostgreSQL needs to get a WAL in order to do a restore/recovery, it executes restore_command on every WAL segment it is expecting to replay. Therefore, if the server has to replay many WALs, it has to execute restore_command and “download” every WAL one after the other.
How does the asynchronous mode improve on the above?
When archiving, that means

[...]
Posted by Pavel Stehule on 2021-07-26 at 20:39

https://github.com/okbob/pspg/releases/tag/5.2.0

There are only two, but I hope important, features.

First feature is "progressive data load". Before this release, pspg loaded all rows before first print to screen. Now, with progressive data load, only 500 rows are loaded, these rows are printed to screen, and repeatedly next 2000 rows are loaded. Although the load should not be complete, almost all pspg commands can be used.  

pspg is designed for browsing tabular data. But it can be used for plain text too. This is important - psql can produce lot of data in plain text format - (\? \h). Now, pspg can highlight some parts of these documents.





Earlier this year, Christos Christoudias – an Engineer from Instacart – published an article on Medium about Creating a Logical Replica from a Snapshot in RDS Postgres. Recently I’ve seen discussions about this article a few times, in a few different places. Is it safe? Is it ok to do this?

A little under a year before publishing this article (just before the pandemic completely shut everything down), Christos presented at the San Francisco Bay Area PostgreSQL Users Group. The YouTube recording is well worth watching. Christos started off his talk by saying “this is a novel not a textbook” which I think is great framing for technical discussions like this. I decided to post a few thoughts here on my own blog as another chapter in that novel.

I’ll start with my colleague’s answer to the question: “Is it ok to do this?” … he said, “Whenever I’ve talked to the author, they said to ‘be careful with that blog’”

I wasn’t in that conversation between my colleague and Christos (I don’t know if Christos really said that) but it perfectly reflects my own sentiment. Be careful with this blog. Let’s dig a little deeper.

First, some background:

Logical replication always consists of two distinct things.

  1. initial sync/copy
  2. start the stream of changes

On the surface, this sounds simple enough. But when you look closer, it’s more complicated than you think – and you can lose data if you’re not careful.

The safest way to do logical replication is to simply use a single integrated solution which handles both parts, where some other engineers already worked through the data loss corner cases. The best example of this would be something like the CREATE PUBLICATION and CREATE SUBSCRIPTION sql commands in PostgreSQL, which can correctly handle both the initial copy and starting the change stream.

That said, it’s entirely possible to separate them – and many people do this for exactly the same reason as Instacart: to do the copy faster.

James Coleman from Braintree left a comment on the

[...]
Posted by Andreas 'ads' Scherbaum on 2021-07-26 at 14:00
PostgreSQL Person of the Week Interview with Roman Druzyagin: Name’s Roman. I’ve been born and lived most of my life in Russia. I grew up near Moscow, and since 2002 I’ve been residing in St. Petersburg, with plans to relocate to the European Union in the near future. I am currently 33 years old.
Posted by David Z in Highgo Software on 2021-07-23 at 21:59

1. Overview

I was working on the PostgreSQL storage related features recently, and I found PostgreSQL has designed an amazing storage addressing mechanism, i.e. Buffer Tag. In this blog, I want to share with you my understanding about the Buffer Tag and some potential usage of it.

2. Buffer Tag

I was always curious to know how PostgreSQl can find out the tuple data blocks so quickly when the first time I started to use PostgreSQL in an IoT project, but I never got the chance to look it into details even though I knew PostgreSQL is an very well organized open-source project. Until I recently got a task which needs to solve some storage related issue in PostgreSQL. There is very detailed explanation about how Buffer Manger works in an online PostgreSQL books for developers The Internals of PostgreSQL, one of the best PostgreSQL books I would recommend to the beginner of PostgreSQL development to read.

Buffer Tag, in simple words, is just five numbers. Why it is five numbers? First, all the objects including the storage files are managed by Object Identifiers, i.e. OID. For example, when user creates a table, the table name is mapped to an OID; when user creates a database, the name is mapped to an OID; when the corresponding data need to be persistent on disk, the files is also named using OID. Secondly, when a table requires more pages to store more tuples, then each page for the same table is managed by the page number in sequence. For example, when PostgreSQL needs to estimate the table size before decide what kind of scan should be used to find out the tuple faster, it need to know the number of blocks information. Thirdly, it is easy to understand that data tuples are the major user data need to be stored, but in order to better manage these data tuples, PostgreSQL needs others information, such as visibility to manage the status of these data tuples, and free space to optimize files usage. So this ends up with five numbers, i.e. Tablespace, Database, Table, ForkNumber, BlockNumber.

Give

[...]

One theme of the 3.2 release is new analytical functionality in the raster module, and access to cloud-based rasters via the "out-db" option for rasters. Let's explore two new functions and exercise cloud raster support at the same time.

Having explored one fork in the path (Elasticsearch and Kibana) in the previous pipeline blog series (here is part 5), in this blog we backtrack to the junction to explore the alternative path (PostgreSQL and Apache Superset). But we’ve lost our map (aka the JSON Schema)—so let’s hope we don’t get lost or attacked by mutant radioactive monsters.

Fork in an abandoned uranium mine
(Source: Shutterstock)

Just to recap how we got here, here’s the Kafka source connector -> Kafka -> Kafka sink connector -> Elasticsearch -> Kibana technology pipeline we built in the previous blog series:

And here’s the blueprint for the new architecture, with PostgreSQL replacing Elasticsearch as the target sink data store, and Apache Superset replacing Kibana for analysis and visualization:

Well that’s the plan, but you never know what surprises may be lurking in the unmapped regions of a disused uranium mine!

1. Step 1: PostgreSQL database

In the news recently was Instaclustr’s acquisition of Credativ, experts in the open source PostgreSQL database (and other technologies). As a result, Instaclustr has managed PostgreSQL on its roadmap, and I was lucky to get access to the internal preview (for staff only) a few weeks ago. This has all the features you would expect from our managed services, including easy configuration and provisioning (via the console and REST API), server connection information and examples, and built-in monitoring with key metrics available (via the console and REST API). 

Having a fully functional PostgreSQL database available in only a few minutes is great, but what can you do with it?

The first thing is to decide how to connect to it for testing. There are a few options for client applications including psql (a terminal-based front-end), and a GUI such as pgAdmin4 (which is what I used). Once you have pdAdmin4 running on your desktop you can easily create a new connection to the Instaclustr managed PostgreSQL server as d

[...]

My colleague Kat Batuigas recently wrote about using the powerful open-source QGIS desktop GIS to import data into PostGIS from an ArcGIS Feature Service. This is a great first step toward moving your geospatial stack onto the performant, open source platform provided by PostGIS. And there's no need to stop there! Crunchy Data has developed a suite of spatial web services that work natively with PostGIS to expose your data to the web, using industry-standard protocols. These include:

zheap has been designed as a new storage engine to handle UPDATE in PostgreSQL more efficiently. A lot has happened since my last report on this important topic, and I thought it would make sense to give readers a bit of a status update – to see how things are going, and what the current status is.

zheap: What has been done since last time

Let’s take a look at the most important things we’ve achieved since our last status report:

  • logical decoding
  • work on UNDO
  • patch reviews for UNDO
  • merging codes
  • countless fixes and improvements

zheap: Logical decoding

The first thing on the list is definitely important. Most people might be familiar with PostgreSQL’s capability to do logical decoding. What that means is that the transaction log (= WAL) is transformed back to SQL so that it can be applied on some other machine, leading to identical results on the second server. The capability to do logical decoding is not just a given. Code has to be written which can decode zheap records and turn them into readable output. So far this implementation looks good. We are not aware of bugs in this area at the moment.

 

zheap: Logical decoding

 

Work on UNDO

zheap is just one part of the equation when it comes to new storage engines. As you might know, a standard heap table in PostgreSQL will hold all necessary versions of a row inside the same physical files. In zheap this is not the case. It is heavily based on a feature called “UNDO” which works similar to what Oracle and some other database engines do. The idea is to move old versions of a row out of the table and then, in case of a ROLLBACK, put them back in .

What has been achieved is that the zheap code is now compatible with the new UNDO infrastructure suggested by the community (which we hope to see in core by version 15). The general idea here is that UNDO should not only be focused on zheap, but provide a generic infrastructure other storage engines will be able to use in the future as well. That’s why

[...]
Posted by Luca Ferrari on 2021-07-20 at 00:00

How to see the available and/or installed extensions?

PostgreSQL Extension Catalogs

There are three main catalogs that can be useful when dealing with extensions:


The former one, pg_extension provides information about which extensions are installed in the current database, while the latter, pg_available_extensions provides information about which extensions are available to the cluster.
The difference is simple: to be used an extension must appear first on pg_available_extensions, that means it has been installed on the cluster (e.g., via pgxnclient). From this point on, the extension can be installed into the database by means of a CREATE EXTENSION statement; as a result the extension will appear into the pg_extension catalog.


As an example:



testdb=> select name, default_version from pg_available_extensions;
        name        | default_version 
--------------------|-----------------
 intagg             | 1.1
 plpgsql            | 1.0
 dict_int           | 1.0
 dict_xsyn          | 1.0
 adminpack          | 2.1
 intarray           | 1.3
 amcheck            | 1.2
 autoinc            | 1.0
 isn                | 1.2
 bloom              | 1.0
 fuzzystrmatch      | 1.1
 jsonb_plperl       | 1.0
 btree_gin          | 1.3
 jsonb_plperlu      | 1.0
 btree_gist         | 1.5
 hstore             | 1.7
 hstore_plperl      | 1.0
 hstore_plperlu     | 1.0
 citext             | 1.6
 lo                 | 1.1
 ltree              | 1.2
 cube               | 1.4
 insert_username    | 1.0
 moddatetime        | 1.0
 dblink             | 1.2
 earthdistance      | 1.1
 file_fdw           | 1.0
 pageinspect        | 1.8
 pg_buffercache     | 1.3
 pg_freespacemap    | 1.2
 pg_prewarm         | 1.2
 pg_stat_statements | 1.8
 pg_trgm            | 1.5
 pg_visibility      | 1.2
 pgcrypto           | 1.3
 pgrowlocks         | 1.2
 pgstattuple        | 1.5
 postgres_fdw       | 1.0
 refint          
[...]
Posted by Paul Ramsey in Crunchy Data on 2021-07-19 at 15:01

One of the less visible improvements coming in PostGIS 3.2 (via the GEOS 3.10 release) is a new algorithm for repairing invalid polygons and multipolygons.

Algorithms like polygon intersection, union and difference rely on guarantees that the structure of inputs follows certain rules. We call geometries that follow those rules "valid" and those that do not "invalid".

Posted by Andreas 'ads' Scherbaum on 2021-07-19 at 14:00
PostgreSQL Person of the Week Interview with Rafia Sabih: I am basically from India, currently living in Berlin, Germany. I started my PostgreSQL journey in my masters and continued it by joining EDB, India. Then, I increased my spectrum to Postgres on Kubernetes working at Zalando.

A quickstart guide to create a web map with the Python-based web framework Django using its module GeoDjango, the PostgreSQL database with its spatial extension PostGIS and Leaflet, a JavaScript library for interactive maps.

Some time ago I wrote about new options for explains – one that prints settings that were modified from default. This looks like this: Aggregate (cost=35.36..35.37 rows=1 width=8) -> Index Only Scan using pg_class_oid_index on pg_class (cost=0.27..34.29 rows=429 width=0) Settings: enable_seqscan = 'off' Finally, today, I pushed a change that displays them on explain.depesz.com. To … Continue reading "Display “settings” from plans on explain.depesz.com"

Logical Replication was introduced in PostgreSQL-10 and since then it is being improved with each version. Logical Replication is a method to replicate the data selectively unlike physical replication where the data of the entire cluster is copied. This can be used to build a multi-master or bi-directional replication solution. One of the main differences as compared with physical replication was that it allows replicating the transaction only at commit time. This leads to apply lag for large transactions where we need to wait to transfer the data till the transaction is finished. In the upcoming PostgreSQL-14 release, we are introducing a mechanism to stream the large in-progress transactions. We have seen the replication performance improved by 2 or more times due to this for large transactions especially due to early filtering. See the performance test results reported on hackers and in another blog on same topic. This will reduce the apply lag to a good degree.

The first thing we need for this feature was to decide when to start streaming the WAL content. One could think if we have such a technology why not stream each change of transaction separately as and when we retrieve it from WAL but that would actually lead to sending much more data across the network because we need to send some additional transaction information with each change so that the apply-side can recognize the transaction to which the change belongs. To address this, in PostgreSQL-13, we have introduced a new GUC parameter logical_decoding_work_mem which allows users to specify the maximum amount of memory to be used by logical decoding, before which some of the decoded changes are either written to local disk or stream to the subscriber. The parameter is also used to control the memory used by logical decoding as explained in the blog.

The next thing that prevents incremental decoding was the delay in finding the association of subtransaction and top-level XID. During logical decoding, we accumulate all changes along with its (s

[...]
Posted by Dinesh Chemuduru in MigOps on 2021-07-16 at 22:38

In December 2020, you might have seen an article from CentOS about shifting their focus towards CentOS stream, which is the upstream version of the RHEL. CentOS also mentioned that the version 8 would be EOL (end of life) by the end of the 2021. This means that it will no longer receive any updated fixes from it's upstream version of RHEL. A few days after this announcement, Rocky Linux was announced by the CentOS founder, Gregory Kurtzeras, as a 100% bug-for-bug compatible with RHEL. The Rocky Linux project quickly gained so much of attention, and also got sponsors from the cloud vendors like AWS, Google Cloud and Microsoft. We wanted to take this opportunity and write an article about CentOS vs Rocky Linux Benchmark with PostgreSQL.

CentOS

As of now, CentOS is widely used in productions, because it was a downstream version of RHEL. This means that the CentOS was receiving all the RHEL criticial bug fixes for free, which makes CentOS as robust and reliable as RHEL. The future of CentOS project is it's Stream version, which is the upstream of RHEL. This means that CentOS Stream may not be receiving any such critical bug fixes from RHEL, instead CentOS Stream bug fixes will be pushed down to the RHEL project.

CentOS vs Rocky Linux Benchmark

After seeing the Rocky Linux Project announcement, we wanted to run some benchmark regarding PostgreSQL and see if we get the same performance as CentOS 8 with Rocky Linux 8. To run this benchmark, we have chosen the phoronix tool, which offers a big list of benchmarking test suites. By using this phoronix tool, we will be running a few set of general benchmarks besides to PostgreSQL's pgbench.

Phoronix test suites

Phoronix is a system benchmarking tool which offers a big list of test suites. This tool provides test suites for the CPU, MEMORY, DISK, COMPILE, etc. for most of the operating systems. We will be using this tool to perform the benchmarking on both Rocky Linux and CentOS. For the purpose of this article, we have considered run the following

[...]
Today I released pspg 5.1.0. Mostly this is bugfix and refactoring release, but there is one, I hope, interesting function. You can try to press Ctrl o for temporal switch to terminal's primary screen. In primary screen you can see psql session. After pressing any key, the terminal switch to alternative screen with pspg.

Thanks to Tomas Munro work, the psql \watch command will supports pagers (in PostgreSQL 15). In this time only pspg can do this work (in streaming mode). When you set environment variable PSQL_WATCH_PAGER, the \watch command redirects otputs to specified pager (for pspg export PSQL_WATCH_PAGER="pspg --stream". Next you can run command:


select * from pg_stat_database \watch 5
or

select * from pg_stat_activity where state='active' \watch 1
So, some time ago, Pg devs added multi ranges – that is datatype that can be used to store multiple ranges in single column. The thing is that it wasn't really simple how to get list of ranges from within such multirange. There was no operator, no way to split it. A month ago Alexander … Continue reading "How to get list of elements from multiranges?"
Posted by Aya Iwata in Fujitsu on 2021-07-15 at 01:04

The usability of the libpq feature to trace application's server/client communications has been enhanced in PostgreSQL 14, with an improved format and an option to control output.

Posted by Egor Rogov in Postgres Professional on 2021-07-15 at 00:00

We've already discussed some object-level locks (specifically, relation-level locks), as well as row-level locks with their connection to object-level locks and also explored wait queues, which are not always fair.

We have a hodgepodge this time. We'll start with deadlocks (actually, I planned to discuss them last time, but that article was excessively long in itself), then briefly review object-level locks left and finally discuss predicate locks.

Deadlocks

When using locks, we can confront a deadlock. It occurs when one transaction tries to acquire a resource that is already in use by another transaction, while the second transaction tries to acquire a resource that is in use by the first. The figure on the left below illustrates this: solid-line arrows indicate acquired resources, while dashed-line arrows show attempts to acquire a resource that is already in use.

To visualize a deadlock, it is convenient to build the wait-for graph. To do this, we remove specific resources, leave only transactions and indicate which transaction waits for which other. If a graph contains a cycle (from a vertex, we can get to itself in a walk along arrows), this is a deadlock.

...

Posted by Luca Ferrari on 2021-07-15 at 00:00

I did some more experiments with WALs.

How much data goes into the WALs? (part 2**

In order to get a better idea about how WAL settings can change the situation within the WAL management, I decided to run a kind of automated test and store the results into a table, so that I can query them back later.
The idea is the same of my previous article: produce some workload, meausere the differences in the Log Sequence Numbers, and see how the size of WALs change depending on some setting. This is not an accurate research, it’s just a quick and dirty experiment.


At the end, I decided to share my numbers so that you can have a look at them and elaborate a bit more. For example, I’m no good at all at doing graphs (I know only the very minimum about gnuplot!).

!!! WARNING !!!

WARNING: this is not a guide on how to tune WAL settings! This is not even a real and comprhensive set of experiments, it is just what I’ve played with to see how much traffic can be generated for certain amount of workloads.
Your case and situation could be, and probably is, different from the very simple test I’ve done, and I do not pretend to be right about the small and obvious conclusions I come up at the end. In the case you see or know something that can help making more clear what I write in the following, please comment or contact me!

Set up

First of all I decided to run an INSERT only workload, so that the size of the resulting table does not include any bloating and is therefore comparable to the effort about the WAL records.
No other database activity was ongoing, so that the only generated WAL traffic was about my own workload.
Each time the configuration was changed, the system was restarted, so that every workload started with the same (empty) clean situation and without any need to reason about ongoing checkpoints. Of course, checkpoints were happening as usual, but not at the beginning of the workload.

I used two tables to run the test:

  • wal_traffic stores the results of each ru
[...]

Too often, web tiers are full of boilerplate that does nothing except convert a result set into JSON. A middle tier could be as simple as a function call that returns JSON. All we need is an easy way to convert result sets into JSON in the database.

PostgreSQL has built-in JSON generators that can be used to create structured JSON output right in the database, upping performance and radically simplifying web tiers.

Fortunately, PostgreSQL has such functions, that run right next to the data, for better performance and lower bandwidth usage.

Most people in the SQL and in the PostgreSQL community have used the LIMIT clause provided by many database engines. However, what many do not know is that LIMIT / OFFSET are off standard and are thus not portable. The proper way to handle LIMIT is basically to use SELECT … FETCH FIRST ROWS. However, there is more than meets the eye.

LIMIT vs. FETCH FIRST ROWS

Before we dig into some of the more advanced features we need to see how LIMIT and FETCH FIRST ROWS can be used. To demonstrate this feature, I have compiled a simple data set:

test=# CREATE TABLE t_test (id int);
CREATE TABLE
test=# INSERT INTO t_test 
VALUES  (1), (2), (3), (3), 
(4), (4), (5);
INSERT 0 7
test=# TABLE t_test;
 id
----
  1
  2
  3
  3
  4
  4
  5
(7 rows)

Our data set has 7 simple rows. Let’s see what happens if we use LIMIT:

test=# SELECT * FROM t_test LIMIT 3;
 id
----
  1
  2
  3
(3 rows)

In this case, the first three rows are returned. Note that we are talking about ANY rows here. Whatever can be found first is returned. There is no special order.

The ANSI SQL compatible way of doing things is as follows:

test=# SELECT * 
           FROM  t_test 
           FETCH FIRST 3 ROWS ONLY;
 id
----
  1
  2
  3
(3 rows)

Many of you may never have used or seen this kind of syntax before, but this is actually the “correct” way to handle LIMIT.

However, there is more: What happens if NULL is used inside your LIMIT clause? The result might surprise you::

test=# SELECT * FROM t_test LIMIT NULL;
 id
----
  1
  2
  3
  3
  4
  4
  5
(7 rows)

The database engine does not know when to stop returning rows. Remember, NULL is undefined, so it does not mean zero. Therefore, all rows are returned. You have to keep that in mind in order to avoid unpleasant surprises…

FETCH FIRST … ROWS WITH TIES

WITH TIES has been introduced in PostgreSQL 13 and fixes a common problem: handling duplicates. If you fetch the first couple of rows, PostgreSQL stops at a fixed number of rows. However, what happens if the same data

[...]