PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

cancel a statement because it is sexist
© Laurenz Albe 2022

Sometimes a PostgreSQL query takes forever. Usually, it is easy to interrupt (cancel) such a statement, but sometimes it can turn into a problem, and a query cannot be canceled. This article discusses what might be the cause. I’ll also show you a trick how to solve the problem (not for the faint of heart!)

How to cancel a running query

The PostgreSQL protocol has provisions for interrupting a running statement. This is done by opening a new connection and sending a CancelRequest message with a secret key. That secret key was sent by the server during the start of the original connection. Without that key, everybody could cancel your query, which would be an inacceptable security problem.

The C library libpq provides the functions PQgetCancel() and PQcancel() to cancel queries, and other database APIs should have similar provisions. In an interactive psql session, you can simply hit Ctrl+C to send a cancel request, and GUI clients usually have a button for that purpose.

But it is also possible to cancel somebody else’s query by calling the database function pg_cancel_backend(). Another option is pg_terminate_backend(), which goes one step further and terminates someone else’s database session. To be able to use these functions, you must either be a superuser or a member of the default role pg_signal_backend (the next section will explain why), or you must have connected to the database as the same database user as the session you want to torpedo (you are allowed to cancel your own statements).

How the server responds to a cancel request

Inter-process communication in PostgreSQL relies a lot on signals.

When the postmaster process receives a CancelRequest, it sends the signal SIGINT to the backend process of the corresponding database session. This is also what the function pg_cancel_backend() does. pg_terminate_backend() sends the signal SIGTERM.

Now each PostgreSQL process has a signal handler that processes these signals when they are received. This s

[...]
Posted by Andreas 'ads' Scherbaum on 2022-05-23 at 14:00
PostgreSQL Person of the Week Interview with Masahiko Sawada: I’m from Japan, and live in Kanagawa. I’m a PostgreSQL user and contributor, and work for EDB. I’m the father of my kid - a 1 year old boy.

This post is about an issue that upon googling today seems to be quite widespread, and there's a lot of confusion about it. I think I have built a fairly good understanding about it, so to hopefully take away some of the confusion, I have created this blogpost.

This post is about when you encounter the following dreaded line after applying an update on Mac OSX when logging into a linux system with ssh:

warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory

This is just a message, and it allows you to carry on. However, much that uses "locales" (language specifics) can stop functioning, or generate errors. This is the confusing bit: because it doesn't immediately stop you from your work, any actual crashes (such as perl crashing) might seem random.

I did encounter did for the first time when I was doing patching at a client. Patching meant things go down, and therefore are time-sensitive. So that was really frustrating. I did find the root cause, and provided this with the vendor which software I was patching, who then put it in its non-public issue database, and the support engineer who didn't do anything took credits for it.

So, what is happening here? This is happening because of ssh (secure shell) mechanisms allowing lots of flexibility about how a session is created on a remote server, and some settings which gets reverted on patching OSX. Let's look at the message again:

warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory

LC_CTYPE makes linux try to set something. Let's query the environment for LC_CTYPE:

$ env | grep LC_CTYPE
LC_CTYPE=UTF-8

So that is the reason. But how come this seems to be set all of a sudden?

The reason is the ssh client, and a ssh setting called 'SendEnv', which allows you to send environment settings along with logging on to a remote ssh server.

So why does this pop up all of a sudden? The reason is there is a global ssh client settings file on mac at /etc/ssh/ssh_config.

[...]
Posted by Regina Obe in PostGIS on 2022-05-21 at 00:00

The PostGIS Team is pleased to release PostGIS 3.3.0alpha1! This is the first release to support PostgreSQL 15. Best Served with PostgreSQL 15 Beta1. This release supports PostgreSQL 11-15.

3.3.0alpha1

This release is an alpha of a major release, it includes bug fixes since PostGIS 3.2.1 and new features.

Posted by Lætitia AVROT in EDB on 2022-05-21 at 00:00
Thanks to [Vik Fearing]{https://twitter.com/pg_xocolatl}’s talk about PostgreSQL and the standard SQL (a talk delivered so far at [pgconf.de]{https://2022.pgconf.de/} and [pgconf.be]{http://pgconf.be/index.html}, I discovered the reason we use a * in count(*) is pedantry. Let me rewind a little and tell the story… So, most of us know there is no valid reason to use count(*) instead of count() than “some folks during the 80s thought it was ugly to use count()”. Most of us also know the * in count(*) has not the same meaning as the * in select *, which is very confusing for a lot of people.
Posted by David Christensen in Crunchy Data on 2022-05-20 at 15:00

Introduction

The PostgreSQL optimizer is an amazing thing, getting only more amazing with each release. It is able to take information about your data definitions, your data distribution, constraints, and the specific queries and come up with the generally most efficient way to return the results of that query.

Since SQL is a declarative language, we're explicitly giving up defining how the database determines the results and trusting it to get the correct results in whatever method it deems most efficient. Sometimes we can structure queries in a certain way to get a better result than we get with the straightforward approach.

The issue

I was working with one of our clients hosted on Crunchy Bridge, looking at the top queries in pg_stat_statements by mean time spent. The top outlier was a query of the form:

SELECT "table_a".*, COALESCE("table_b"."count", 0) AS table_count FROM table_a LEFT JOIN table_b ON table_a.id = table_b.a_id WHERE NOT bool_1 AND NOT bool_2 AND NOT bool_3 ORDER BY table_count DESC LIMIT 100 

The average time on this query, as reported by pg_stat_statements was in the 2-3 second range, which for this client was too long. The tables involved here were decently-sized, with millions of rows per table.

The specific query involved here utilized a table which stored the rollup counts for an additional table; this query was used to support optimizing a COUNT(*) for each row in a corresponding table. For this specific table, the data distribution was a bit lopsided, so for a common case, the underlying COUNT(*) would be fast; however there were enough outliers in the table that had huge numbers of counts that the general-purpose plan would not work. For this reason, we use an unlogged table to store the results of the lookups. (Since this version of PostgreSQL does not support UNLOGG[...]

Posted by David Rowley in CitusData on 2022-05-19 at 17:23

In recent years, PostgreSQL has seen several improvements which make sorting faster. In the PostgreSQL 15 development cycle—which ended in April 2022—Ronan Dunklau, Thomas Munro, Heikki Linnakangas, and I contributed some changes to PostgreSQL to make sorts go even faster.

Each of the improvements to sort should be available when PostgreSQL 15 is out in late 2022.

Why care about sort performance? When you run your application on PostgreSQL, there are several scenarios where PostgreSQL needs to sort records (aka rows) on your behalf. The main one is for ORDER BY queries. Sorting can also be used in:

  • Aggregate functions with an ORDER BY clause
  • GROUP BY queries
  • Queries with a plan containing a Merge Join
  • UNION queries
  • DISTINCT queries
  • Queries with window functions with a PARTITION BY and/or ORDER BY clause

If PostgreSQL is able to sort records faster, then queries using sort will run more quickly.

Let’s explore each of the 4 improvements in PostgreSQL 15 that make sort performance go faster:

  • Change 1: Improvements sorting a single column
  • Change 2: Reduce memory consumption by using generation memory context
  • Change 3: Add specialized sort routines for common datatypes
  • Change 4: Replace polyphase merge algorithm with k-way merge

Change 1: Improvements sorting a single column

PostgreSQL 14’s query executor always would store the entire tuple during a Sort operation. The change here makes it so that PostgreSQL 15 only stores a Datum when there is a single column in the result of the sort. Storing just the Datum means that the tuple no longer has to be copied into the sort’s memory.

The optimization works for the following query:

SELECT col1 FROM tab ORDER BY col1;

but not:

SELECT col1, col2 FROM tab ORDER BY col1;

The first of the above queries is likely rare in the real world. A more common reason to have single column sorts is for Merge Semi and Anti Joins. These are likely to appear in queries co

[...]
Posted by Paul Ramsey in Crunchy Data on 2022-05-19 at 15:00

Instant Heatmap with pg_featureserv

The pg_featureserv micro-service is a thin middleware that binds tables and functions in a PostgreSQL database to a JSON collections API, accessible over HTTP. Using the Crunchy Bridge container apps, I'm going to give a quick overview of how to set up a web based spatial heatmap from Postgres.

Application

The application uses PostgreSQL to store and search 2.2M geographic names in the USA. Type in the search box and the auto-fill form will find candidate words. Select a word, and the database will perform a full-text search for all the names that match your word, and return the result to the map. The map displays the result using a heat map.

More names closer together will get more vibrant colors, so you can see name density. Try some regional names: bayou, swamp, cherokee, baptist, cougar. Try it for yourself.

Architecture

Using pg_featureserv to build a demonstration application is pretty easy all you need is:

  • PostgreSQL running somewhere
  • pg_featureserv running somewhere
  • A web page hosted somewhere.

Wait, "running somewhere" is doing a lot of work here! Is there any way to do this without become a specialist in managing database and container ops?

Yes, we can do all the heavy lifting with Crunchy Bridge!

  • Crunchy Bridge hosts [...]

A statistic corruption on TOAST chunk case

Introduction

PostgreSQL stores data on a page that usually has 8KB of size, this means that when there isn’t enough space to store the data (for example, text or varchar data type), PostgreSQL uses the TOAST technique which allows us to store this extra data like chunks using other tables, each chunk is stored as a separate row in the TOAST table belonging to the main table. The users don’t need to know if the information is stored on TOAST. PostgreSQL transfers/obtains the data to/from there transparently. Of course not all tables from our database have TOAST, to know what tables have TOAST you can use the following query:

    SELECT
    t1.oid,t1.relname,  t1.relkind,     t1.reltoastrelid, t1.reltoastrelid::regclass
    FROM   pg_class t1
    INNER JOIN pg_class t2
    On t1.reltoastrelid = t2.oid
    WHERE  t1.relkind = 'r' AND t2.relkind = 't';

The data corruption in PostgreSQL is not very frequent and generally appears after some failure such as disk failures, an unexpected power loss, hardware failure, database systems configured using fsync=off, or to a lesser extent PostgreSQL bugs. Some months ago was found the following error:

    ERROR: missing chunk number 0 for toast value XXXXXXX in pg_toast_2619

This error message is an example that in our database there is data corruption in this TOAST table pg_toast_2619, this table belongs to the statistics table (pg_statistic, you can check it in the code source), and it is used by the planner to take decisions and choose the better plan to execute a query, which can impact directly the database performance, or simply can not run queries due to this error, in addition, this table is populated and updated by the AUTOVACUUM process or manual ANALYZE command, and then, How to solve this data corruption to continue to work well in our database? We will see this in the next section.

Solution

As the information stored in this table is practically temporal and is computed and stored each time

[...]
Posted by Craig Kerstiens in Crunchy Data on 2022-05-18 at 16:52

7 years ago I left Heroku. Heroku has had a lot of discussion over the past weeks about its demise, whether it was a success or failure, and the current state. Much of this was prompted by the recent, and on-going security incident, but as others have pointed out the product has been frozen in time for some years now.

I’m not here to rehash the many debates of what is the next Heroku, or whether it was a success or failure, or how it could have been different. Heroku is still a gold standard of developer experience and often used in pitches as Heroku for X. There were many that tried to imitate Heroku for years and failed. Heroku generates sizable revenue to this day. Without Heroku we’d all be in a worse place from a developer experience perspective.

But I don’t want to talk about Heroku the PaaS. Instead I want share a little of my story and some of the story of Heroku Postgres (DoD - Department of Data as we were internally known). I was at Heroku in a lot of product roles over the course of 5 yrs, but most of my time was with that DoD team. When I left Heroku it was a team of about 8 engineers running and managing over 1.5m Postgres databases–a one in a million problem was once a week, we engineered a system that allowed us to scale without requiring a 50 person ops team just for databases

This will be a bit of a personal journey, but also hopefully give some insights into what the vision was and hopefully a bit of what possibilities are for Postgres services in the future.

I wasn’t originally hired to work on anything related to Postgres. As an early PM I first worked on billing, then later on core languages and launching the Python support for Heroku. It was a few months in when I found myself having conversations with many of the internal engineers about Postgres. “Why aren’t you using hstore?”, “Transactional DDL to rollback transactions is absolutely huge!”, “Concurrent index creation runs in the background while not holding a lock, this should always be how you add an index.” Now we h

[...]
Posted by Francisco Figueiredo Jr on 2022-05-18 at 11:30

 

    Happy 20th Anniversary, Npgsql!

 

   It seems like yesterday when I decided to start Npgsql. And today, Npgsql completes 20 years since the first public beta release!

    On May 18th, 2002, this is what was written in the first release notes of Npgsql  version 0.1:

 

2002-05-18
    
    First Npgsql beta release.
    In this version there is a limited functionality. It is only possible to connect and disconnect from server.
    Only clear text authentication is supported right now.
    To compile you can use the SharpDevelop to open the Combine file (Npgsql.cmbx) or
type at command console: csc /t:library /out:Npgsql.dll NpgsqlConnection.cs AssemblyInfo.cs NpgsqlException.cs
This will create the file Npgsql.dll that can be copied to the application directory.

    I will be working in the Command functionality now.
    
    
    Play with it and send your bugs and comments :)
    (fxjr)

    

    It's been a long journey since then. And what a journey!! :)

    It all started when I decided to contribute to the open source community after so much time learning from it.

    At the time, ( year 2002 ), .Net was in its infancy but it looked very promising. Then I decided to create this new project in C#.

    As I wanted to play with network protocols and I also enjoyed working with databases, I started to check if there were .Net Data Providers for some open source database servers.  I ended up picking Postgresql because the communication protocol was very well documented.

    To me, this project is the most important contribution I have ever made to the opensource community. I'm very proud of it! :)

    I'm specially happy because since the beginning, Npgsql is like a reference in the .Net world when talking about connecting to Postgresql.

    Of course, all this wouldn't be possible without the help of a lot of people. I'd like to thank everybody who helped to make Npgsql the success it is today! 

    Thank you!

    H

[...]

For many years the documentation has contained the following statement :

If the file includes a function named _PG_fini, that function will be called immediately before unloading the file. Likewise, the function receives no parameters and should return void. Note that _PG_fini will only be called during an unload of the file, not during process termination. ( Presently, unloads are disabled and will never occur, but this may change in the future . )

meaning all those instances of _PG_fini() dutifully included in shared library code have never actually done anything (see: src/backend/utils/fmgr/dfmgr.c ).

We are now finally in that future, but it has been decided that the never-implemented idea of unloading shared librarie is offically abandoned, and _PG_fini() has been removed completely (commit ab02d702 ).

In practical terms this will have no effect on existing code containing _PG_fini() , and the code will compile and function as before. However as of PostgreSQL 15, any _PG_fini() function will be demoted from an effective no-op to a complete waste of space, so it can be removed entirely. For that matter it can just as well be removed from code meant to build against earlier PostgreSQL versions.

more...

If you are using pg_timetable to run multiple jobs on PostgreSQL, you need to understand how to calculate the number of database sessions, so you don’t overload your system. Recently, a related question came up on pg_timetable’s Github Disscussion page: What do you do when you get the following error message: “Failed to send chain to the execution channel?”
TLDR: See the bottom of the post for Key Takeaways.

Q: Run multiple jobs?

I have started multiple (20) jobs using timetable.add_job(). These are long-running, CPU-intensive jobs.
I have checked the timetable.log regularly and I sometimes found the following error message:
Failed to send chain to the execution channel
But the chain is running forward.
Could you explain to me what this message is?

A: Understand channel capacity

Assuming you’re running the latest version, you’re out of the capacity of the channel for passing chains to workers.
The channel accumulates jobs and then workers pull jobs one by one from this channel. If all of your workers are busy and new jobs are fetched for execution, the channel will be out of capacity to hold new jobs.

Solution:

  • increase the number of workers
  • make chains less aggressive, e.g. execute not every minute but every 10 minutes
  • use AFTER interval chains, so new chains will start only AFTER the period of time the last one finished

Q: Rules regarding active sessions?

I now have 2 workers. I initiated 10 multiple chains using self-destruction and job_max_instances = 1.
After the 10 chains finished their run, I see 10 active pg_timetable sessions in pg_stat_activity and timetable.active_session contains 39 rows.
When will the 10 active sessions be idle? Because I suspect that if I want to schedule another 10 chains, maybe the active sessions in pg_stat_activity will grow up to 20, which is not suitable for my 4 cpu db server.
UPDATE: when I launched the next 10 chains, the active sessions were gone from pg_stat_activity.
But I would like to know what the rul

[...]
Posted by Bruce Momjian in EDB on 2022-05-16 at 17:00

I have just completed the first draft of the Postgres 15 release notes. (It still needs more XML markup and links.) The release note feature count is 186, which is similar to recent major releases, excluding Postgres 14's high feature count. Postgres 15 Beta 1 should be released soon. The final Postgres 15 release is planned for September/October of this year.

Posted by Andreas 'ads' Scherbaum on 2022-05-16 at 14:00
PostgreSQL Person of the Week Interview with Haki Benita: I’m Haki Benita. I’m living with my family in a city near Tel-Aviv. My interest in development started at around junior-high, and I’ve been working as a professional developer since I was 18.

In the old days, things were simpler. Computer systems were running in the server room in the office building, hopefully the server was carefully sized, unix operating systems ran processes, and when a server was overloaded, processes waited in the runqueue.

In todays world, this is different. Computer systems are running in the cloud, mostly sizing means you can scale in the cloud, and the linux operating system is running processes and (lots of) threads.

Mind the absence of 'runqueue'. Linux does not have an explicit runqueue. Instead, when a process or a thread needs to run, it is set to running, after which it is the task of the task scheduler (I say 'task scheduler', there also is an IO scheduler) to assign a process or thread a CPU time slice (sometimes called 'quantum'). So all of waiting to run and actual runtime are covered by the process status 'running'.

This is leading to a lot of confusion and misunderstanding, because it's not clear when a system is just busy or really overloaded.

CPU usage?

You might think: but how about just looking at the amount of CPU used? CPU used time from /proc, used by lots of tools, shows how the time is spent from what the operating system considers to be a CPU, it does not say anything about anything else than what is running on CPU, such as processes or threads waiting to get running.

load

Another thought might be: this is old news: we got the load per 1, 5 and 15 minutes. This indeed does tell more than CPU usage, which cannot tell anything outside of what the CPU spent its time on, but the load figure is has serious problems.

First of all it's a exponentially-damped sum of a five second average. In lay-mans terms: if activity changes, the load figure start moving towards it, so it takes time to reflect the actual change in activity, and it's not steady, it will always be moving towards it.

Second, linux load does NOT actually indicate CPU load only, but rather has become a more general, non CPU specific indicator of activity of t

[...]
Posted by Yugo Nagata in SRA OSS, Inc. on 2022-05-13 at 02:29

pg_ivm v1.0 was officially released!

pg_ivm is an extension module that provides Incremental View Maintenance (IVM) feature, which is a way to make materialized views up-to-date in which only incremental changes are computed and applied on views rather than recomputing.

I have already explained pg_ivm in the previous post, but it was in alpha release stage at that time. The official v1.0 was released after a few bug fixes and code cleaning.

Also, RPM packages for pg_ivm are now available from PostgreSQL yum repository. So, you can also install pg_ivm for PostgreSQL 14 using yum or dnf intead of building from the source code, like:

$ sudo yum install pg_ivm_14

See the official instruction for details.

We would appreciate it if you could try pg_ivm v1.0 and give us any feedback!

Posted by Egor Rogov in Postgres Professional on 2022-05-13 at 00:00
In previous articles we discussed query execution stages and statistics . Last time, I started on data access methods, namely Sequential scan . Today we will cover Index Scan. This article requires a basic understanding of the index method interface. If words like "operator class" and "access method properties" don't ring a bell, check out my article on indexes from a while back for a refresher. Plain Index Scan Indexes return row version IDs (tuple IDs, or TIDs for short), which can be handled in one of two ways. The first one is Index scan . Most (but not all) index methods have the INDEX SCAN property and support this approach. The operation is represented in the plan with an Index Scan node ...
Frequent participants in the Postgres Discord and Slack servers, mailing lists, IRC chat, and other Postgres hangouts often hear a common refrain: I’ve installed Postgres, so now what? We may be tempted to say “Go forth young one, and explore!” Yet there’s a more fundamental question left unanswered: how do you even connect to explore in any capacity? What does that entail? Let’s explore how to connect to a fresh Postgres installation, and see what a new user really needs to know in order to experiment with Postgres. [Continue reading...]
Posted by Keith Fiske in Crunchy Data on 2022-05-11 at 15:55

One of the most requested features by Crunchy Data customers using modern enterprise database environments is some form of data encryption. However, nailing down exactly what someone means when they say "We need our data encrypted" is often a challenge due to the actual requirements not being fully clarified or even understood. So, before anyone tries to implement database encryption it is critically important to understand what needs to be encrypted and what benefit is actually gained by the methods that are employed. This blog post is not going to discuss any deep technical implementations of encryption. Instead, let's discuss what vectors of attack any given encryption method will mitigate since that will greatly influence which method is effective before you even reach any sort of development or deployment phases.

The application of encryption to a database environment can be broken down into three different methods:

  1. Data-At-Rest
  2. Data-In-Transit
  3. Data-In-Use

Data-At-Rest

Data-At-Rest is probably one of the most talked about and requested methods of encryption when talking about databases, so let's talk about that one first. What vectors of attack is this method effective for?

  • Attack vector of concern is when data is not in use
  • Data must remain encrypted at all times while not in use
  • ALL data must be encrypted
  • Physical access to hardware is a perceived threat

When is this method of encryption ineffective?

  • Attack vector of concern is a fully compromised host
  • Attack vector of concern is not physical access
  • Attack vector of concern is data transmission

The solutions to Data-At-Rest encryption can shed some light on why it is either effective or ineffective based on the above statements. The most common solution to this is full disk encryption, which is completely independent of any RDBMS or application [...]

PostgreSQL 15 will include COPY FREEZE support

 In my blog post I reported that a patch to allow to use COPY FREEZE in pgbench was accepted for upcoming PostgreSQL 15  (supposed to be released by the end of 2022). It seems the patch has survived without being reverted!

 But it will not be in the release note

Now that PostgreSQL 15 beta1 is coming soon, I checked the proposed release note for PostgreSQL 15 beta1. Unfortunately it does not mention anything about COPY FREEZE support in pgbench.  Don't worry. The code for supporting COPY FREEZE in pgbench is surely in PostgreSQL 15. Please try PostgreSQL 15 beta1 once it is out!

 


Posted by Luca Ferrari on 2022-05-11 at 00:00

pgenv 1.3.0 adds a new command: switch

pgenv switch`

pgenv, a simple but great shell script that helps managing several PostgreSQL instances on your machine, have been improved in the last days.

Thanks to the contribution of Nils Dijk @thanodnl on GitHub, there is now a new command named switch that allows you to quickly prepare the whole environment for a different PostgreSQL version without having to start it.


The problem, as described in this pull request was that the use command, trying to be smart, starts a PostgreSQL instance once it has been chosen. On the other hand, switch, allows you to pre-select the PostgreSQL instance to use without starting it. This is handy, for example, when you want to compile some code against a particular version of PostgreSQL (managed by pgenv) but don’t want to waste your computer resources starting up PostgreSQL.
To some extent, switch can be thought as an efficient equivalent of:



% pgenv use 14.2
% pgenv stop



The command has been implemented as a subcase of use, but while use does fire up an instance, switch does not.
However, in the case an instance is already running, switching to a new instance will stop the previously running one!

Other minor contributions

If you have pgenv on the radar, you probably have seen another release in the last days, that covered a bug fix spot by Nils Dijk about the management of the configuration.

Conclusions

pgenv keeps growing and adding new features, and is becoming a more complex beast than it was in the beginning. Hopefully, it can help your workflow too!

Posted by Nikhil Mohite in EDB on 2022-05-10 at 14:58
pgAdmin 4 provides 2 different ways to execute queries on the database: * Query tool: 
 You can write and execute custom queries on any table in the connected database. * View/Edit data: 
 View/Edit data can be used to view or modify a particular table's data.
Posted by Gabriele Bartolini in EDB on 2022-05-10 at 14:47
If your organization requires professional assistance on Postgres in Kubernetes with CloudNativePG, you can take advantage of EDB’s leading expertise. Gabriele will be speaking about Kubernetes in Postgres at the Data on Kubernetes Day before KubeCon Europe 2022. [Continue reading...]
Posted by Laurenz Albe in Cybertec on 2022-05-10 at 08:00

Time zone as an excuse for oversleeping
© Laurenz Albe 2022

Next to character encoding, time zones are among the least-loved topics in computing. In addition, PostgreSQL’s implementation of timestamp with time zone is somewhat surprising. So I thought it might be worth to write up an introduction to time zone management and recommendations for its practical use.

Time zones and the SQL standard

The SQL standard has rather limited provisions for specifying time zones:

 ::=
    

 ::=
  

It has the following to say about time zones:

The surface of the earth is divided into zones, called time zones, in which every correct clock tells the same time, known as local time. Local time is equal to UTC (Coordinated Universal Time) plus the time zone displacement, which is an interval value that ranges between INTERVAL ‘–14:00’ HOUR TO MINUTE and INTERVAL ‘+14:00’ HOUR TO MINUTE. The time zone displacement is constant throughout a time zone, and changes at the beginning and end of Summer Time, where applicable.

In other words, the standard only has time zone management provisions for representing an offset from UTC. Users or applications that have to cope with daylight savings time changes have to change the offset at the appropriate times to reflect that. Apparently not a lot of usability considerations went into this part of the SQL standard (or none of the big database vendors whose voice carries weight in the standard committee had a smarter implementation).

It is also interesting to note that the SQL standard seems to consider the time zone offset to be a part of the timestamp value, although it is not totally clear on that point.

Timestamp data types in PostgreSQL

There are two data types for timestamps in PostgreSQL: timestamp (also known as timestamp without time zone) and timestamp with time zone (or, shorter, timest

[...]
Posted by Andreas 'ads' Scherbaum on 2022-05-09 at 14:00
PostgreSQL Person of the Week Interview with Abhijit Menon-Sen: My name is Abhijit, and my family and I live in a small village in the Himalayas in India, overlooking a peaceful little valley with a small river that we can hear when it rains heavily.

1. Introduction

PostgreSQL’s 2 phase commit (2PC) feature allows a database to store the details of a transaction on disk without committing it. This is done by issuing PREPARE TRANSACTION [name] command at the end of a transaction block. When the user is ready to commit, he/she can issue COMMIT PREPARED [name] where [name] should match the [name] in PREPARE TRANSACTION command. Because the transaction details are stored on disk with 2PC, the server is able to commit this transaction at a later time even if it crashes or out of service for some time. In a single database instance, the use of 2PC is not critical; the plain ‘commit’ can perform the job equally as well. However, in a larger setup, the data may be distributed on 2 or more database instances (for example, via Foreign Data Wrapper (FDW)), the use of 2PC is absolutely critical here to keep every database instance in sync.

2. Atomic Commit Problem with Foreign Data Wrapper (FDW)

Current postgres_fdw does not support the use of 2PC to commit foreign server. When a commit command is sent to the main server, it will send the same commit command to all of the foreign servers before processing the commit for itself. If one of the foreign node fails the commit, the main server will go through a abort process and will not commit itself due to the failure. However, some of the foreign nodes could already been successfully committed, resulting in a partially committed transaction.

Consider this diagram:

where the CN node fails the commit to DN3 and goes through a abort process, but at the same time, DN1 and DN2 have already been committed successfully and can no longer be rollbacked. This scenario creates a partial commit that may not be desirable.

3. FDW with 2PC Capability

If we were to add a 2PC functionality to current postgres_fdw, instead of sending the same commit to all foreign servers, we let the main server to send PREPARE TRANSACTION instead. The main server should proceed to send COMMIT PREPARE to all foreign server

[...]
Posted by Christopher Winslett in Crunchy Data on 2022-05-06 at 20:00

For many developers, databases are basically magic. Like Penn & Teller, this blog post is set to break the illusion. Databases are just like any other code, they have algorithms and processes. These algorithms and processes are meant to improve performance, but can cause limitations if they are not expected.

Disclaimer: it is okay to break the rules. Sometimes, you may choose to have slow database interactions, because you know they are rare events.

Assuming a well-designed database infrastructure (which is what we launch at Crunchy Data), database performance is reactionary to the queries, reads, and writes sent to it. Databases do not blow themselves up on purpose, in fact, they are continuously trying to self-right from interactions so that they can return to equilibrium.

In this post, we'll describe things conceptually (want more? check out Postgres Tips)), then give examples using a physical library as the metaphor. Libraries used these algorithms in a physical form long before databases used them in logical form, the examples are applicable for understanding performance.

A sample bookshelf

Although we are showing tables below, the same concepts apply to most NoSQL databases.

Should you write code and queries with a knowledge of the following topics, you'll dodge quite a few reasons for performance issues.

Indexes

Basically, think of indexes as a sorted list of keys with values. Indexes speed up reads at the expense of speed during writes. Indexes can be a simple single-key, or a multi-dimensional key. Often multi-dimensional ke[...]

Posted by Matt Yonkovit in Percona on 2022-05-05 at 15:49
Understand Your PostgreSQL Workloads Better with pg_stat_monitor

Awesome news, pg_stat_monitor has reached a GA STATUS!  Percona has long been associated with pushing the limits and understanding the nuances involved in running database systems at scale, so building a tool that helps get us there brings a bit more insight and details around query performance and scale on PostgreSQL systems fits with our history. So what the hell does pg_stat_monitor do, and why should you care?  Excellent question!

Currently, for collecting and reviewing query metrics, the defacto standard is pg_stat_statements.  This extension collects query metrics and allows you to go back and see which queries have impacted your system.  Querying the extension would yield something like this:

postgres=# \dx
List of installed extensions
-[ RECORD 1 ]-----------------------------------------------------------------------
Name        | pg_stat_statements
Version     | 1.8
Schema      | public
Description | track planning and execution statistics of all SQL statements executed
-[ RECORD 2 ]-----------------------------------------------------------------------
Name        | plpgsql
Version     | 1.0
Schema      | pg_catalog
Description | PL/pgSQL procedural language


postgres=# \x
Expanded display is on.
postgres=# select * from pg_stat_statements;


-[ RECORD 2 ]-------+--------------------------------------------------------
userid              | 16384
dbid                | 16608
queryid             | -7945632213382375966
query               | select ai_myid, imdb_id, year, title, json_column from movies_normalized_meta where ai_myid = $1
plans               | 0
total_plan_time     | 0
min_plan_time       | 0
max_plan_time       | 0
mean_plan_time      | 0
stddev_plan_time    | 0
calls               | 61559
total_exec_time     | 27326.783784999938
min_exec_time       | 0.062153
max_exec_time       | 268.55287599999997
mean_exec_time      | 0.44391208084927075
stddev_exec_time    | 2.522740928486301
rows                | 61559
shared_blks_hit     | 719441
shared_blks_read    | 1031
shared_blks_dirti
[...]