Latest Blog Posts

Writing a good talk proposal
Posted by Tomas Vondra on 2024-09-10 at 10:00

I’ve submitted a lot of talk proposals to a lot of Postgres conferences over the years. Some got accepted, many more were not. And I’ve been on the other side of this process too, as a member of the CfP committee responsible for selecting talks. So let me give you a couple suggestions on how to write a good talk proposal.

Why "SELECT * FROM table" is NOT a read
Posted by Hans-Juergen Schoenig in Cybertec on 2024-09-10 at 08:54

People tend to think that a SELECT-operation is always a read. This might look obvious and make sense but does it actually hold true? Is a SELECT-statement a read? Can we really determine from the statement what it does so that we can safely load balance? In fact, no. The fact that something seems to be a pure read does not make it a read - and that has huge implications which will be discussed in this short but
important post.

Load balancing: How to do it the wrong way

Often people try to build their own replication solution assumung that load balancing of queries can be simply handled by some kind of middleware. However, this is definitely doomed to fail. In short: Do NOT build software that magically decides in some middleware what to do.

SELECT FROM table graphic 1

Why is that? Consider the following:

SELECT  * FROM tab;

But what if the following happens?

test=# CREATE TABLE a (id int);
CREATE TABLE
test=# CREATE FUNCTION x(int) 
    RETURNS int AS ' INSERT INTO a VALUES ($1) RETURNING *' 
LANGUAGE 'sql';
CREATE FUNCTION

In this case I have defined a simple table and a function that just happens to insert some simple data. But what if we add the following view to the scenario?

test=# CREATE VIEW tab AS SELECT * FROM a, x(10) AS y;
CREATE VIEW

This will change things because we can access the view easily:

test=# SELECT * FROM tab;
 id | y 
----+---
(0 rows)

If we call it again we will already see the leftovers of the previous execution:

test=# SELECT * FROM tab;
 id | y  
----+----
 10 | 10
(1 row)

From the application or middleware point of view we are still talking about a simple SELECT statement. What can possibly go wrong? Well, as you can see: A lot can go wrong because the string we send to the database does not carry all the information the middleware needs to make a decision whether we are talking about reading and writing.

Send reads to the secondary and writes to the primary?

Let us focus on what we want to solve here: We want to use our read-replicas which w

[...]

CNPG Recipe 13 - Configuring PostgreSQL Synchronous Replication
Posted by Gabriele Bartolini in EDB on 2024-09-10 at 08:12

CloudNativePG 1.24 introduces a highly customisable approach to managing PostgreSQL synchronous replication through the new .spec.postgresql.synchronous stanza. In this article, I’ll guide you through configuring synchronous replication within a single Kubernetes cluster and across multiple clusters. I’ll explore quorum-based and priority-based replication methods, highlighting their benefits and trade-offs. Additionally, I’ll explain how to adjust the synchronous_commit setting to strike the right balance between data durability and performance.

Understanding Factors Impacting Data Replication Latency in PostgreSQL Across Geographically Distributed Nodes
Posted by Umair Shahid in Stormatics on 2024-09-10 at 08:04

In an increasingly globalized world, companies and organizations are leveraging distributed systems to handle massive amounts of data across geographically separated locations. Whether it is for ensuring business continuity, disaster recovery, or simply improving data access for users in different regions, replication of data between nodes situated in diverse geographical locations has become a critical aspect of modern database systems.

The post Understanding Factors Impacting Data Replication Latency in PostgreSQL Across Geographically Distributed Nodes appeared first on Stormatics.

Philippe Noël
Posted by Andreas 'ads' Scherbaum on 2024-09-09 at 14:00
PostgreSQL Person of the Week Interview with Philippe Noël: My name is Philippe. I was raised in the woods of Québec, in a small town called Rivière-du-Loup. I moved to the US for university, and since then have lived a bit all over the place, but I’m primarily based in NYC.

PostgreSQL Backups to Support Point In Time Recovery: Learning PostgreSQL with Grant
Posted by Grant Fritchey in Redgate on 2024-09-07 at 20:00

The focus of your backup strategy should not be backups. Instead, you should be focused on restoring databases, because, after all, that’s what matters. Simply having a backup is fine. However, if you can’t restore your data from that backup, it doesn’t matter. On top of this is the ability to ensure you can restore your databases to a point in time, not simply to the last full backup. That ability requires a different approach to how you perform your backups.

PostgreSQL has the capabilities to support backups as I already described in my first article on the topic. PostgreSQL also has the capability to restore to a point in time. However, that does require you to change the way you’re performing your backups. This article advances our understanding of how to better protect your PostgreSQL databases by expanding on the database backups and restores into a more full-blown disaster recovery process through point in time restores.

While the important part is the restore, in a classic chicken or egg conundrum, we can’t talk about restoring until we first have a backup, so I’ll start with how you need to backup your databases in preparation for a point in time restore.

Continuous Archiving

Unlike SQL Server, the backups needed for point in time recovery are a little different in PostgreSQL from the full backups. Like SQL Server though, the mechanism that makes this work is a log. In the case of PostgreSQL, it’s the WAL, or Write Ahead Log. This plays an almost identical role to the database logs within SQL Server, recording all actions that modify the data or structures. This means our core process in getting the right kind of protection in place for our PostgreSQL database is the same, conceptually, as it is for SQL Server:

  1. Get a database backup
  2. Get WAL backups
  3. Restore the database, but don’t finish the recovery process
  4. Replay the WAL files up to a point in time
  5. Recover our database and be back online.

The backup process I showed in the first article on backups used p

[...]

How to get info about relations between system tables?
Posted by Pavel Stehule on 2024-09-07 at 06:52
One my customer asked me "what tables holds references to table pg_roles"?

The reply is simple - none. pg_roles is view. But this view uses very important table pg_authid. For custom tables we can use a query:

SELECT conname, conrelid::pg_catalog.regclass AS ontable,
       pg_catalog.pg_get_constraintdef(oid, true) AS condef
  FROM pg_catalog.pg_constraint c
 WHERE confrelid = 'a'::pg_catalog.regclass
       AND contype = 'f'
ORDER BY conname;
┌─────────────┬─────────┬─────────────────────────────────────┐
│   conname   │ ontable │               condef                │
╞═════════════╪═════════╪═════════════════════════════════════╡
│ b_a_id_fkey │ b       │ FOREIGN KEY (a_id) REFERENCES a(id) │
└─────────────┴─────────┴─────────────────────────────────────┘
(1 row)
But this method doesn't work for system tables. These tables doesn't use explicitly defined foreign keys. We should to use different method. The system function pg_get_catalog_foreign_keys returns all referencies between system tables, and we can filter result:
SELECT * 
 FROM pg_get_catalog_foreign_keys()
WHERE pktable = 'pg_authid'::regclass;

┌─────────────────────────┬──────────────┬───────────┬────────┬──────────┬────────┐
│         fktable         │    fkcols    │  pktable  │ pkcols │ is_array │ is_opt │
╞═════════════════════════╪══════════════╪═══════════╪════════╪══════════╪════════╡
│ pg_proc                 │ {proowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_type                 │ {typowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_class                │ {relowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_operator             │ {oprowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_opfamily             │ {opfowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_opclass              │ {opcowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_language             │ {lanowner}   │ pg_authid │ {oid}  │ f        │ f      │
│ pg_largeobject_metadata │ {lomowner}   │ pg_authid │ {oid}  │ f        │ f 
[...]

Shrinking Big PostgreSQL tables: Copy-Swap-Drop
Posted by Andrew Atkinson on 2024-09-06 at 00:00

In this post, you’ll learn a recipe that you can use to “shrink” a large table. This is a good fit when only a portion of the data is accessed, the big table has become unwieldy, and you don’t want a heavier solution like table partitioning.

This recipe has been used on tables with billions of rows, and without taking Postgres offline. How does it work?

Postgres testing details

These are the PostgreSQL instance details where this was tested:

  • PostgreSQL 16
  • Transactions use the default isolation level of READ COMMITTED
  • Steps were performed using psql

A few big tables

Let’s discuss some context.

Application databases commonly have a few tables that are much larger in size than the others. These jumbo tables have data for data-intensive features, granular information captured at a high frequency.

Because of that, the table row counts can grow into the hundreds of millions, or billions. Depending on the size of rows, the table size can be hundreds of gigabytes or even terabytes in size.

Problems with big tables

While PostgreSQL supports tables up to 32TB in size, working with large tables of 500GB or more can be problematic and slow.

Query performance can be poor, adding indexes or constraints is slow. Backup and restore operations slow down due to these large tables.

Large tables might force a need to scale the database server instance vertically to provision more CPU and memory capacity unnecessarily, when only more storage is needed.

When the application queries only a portion of the rows, such as recent rows, or rows for active users or customers, there’s an opportunity here to move the unneeded rows out of Postgres.

One tactic to do that is to DELETE rows, but this is a problem due to the multiversion row design of Postgres. We’ll cover this in more detail in an upcoming post on massive delete operations.

Another option would be to migrate data into partitioned tables.

We aren’t going to cover table partitioning in this post, but let’

[...]

PostgreSQL Snapshots and Backups with pgBackRest in Kubernetes
Posted by Brian Pace in Crunchy Data on 2024-09-05 at 15:00

Backups are dead. Now that I have your attention, let me clarify. Traditional backups have earned a solid reputation for their reliability over time. However, they are dead in the sense that a backup is essentially useless until it's restored—essentially "resurrected." In this post, we'll explore best practices for managing PostgreSQL snapshots and backups using pgBackRest. We will then provide some guidance of how you apply these techniques in Kubernetes using the Postgres Operator (PGO) from Crunchy Data. Whether you're overseeing a production environment, handling replicas, or refreshing lower environments, understanding how to effectively manage snapshots is key.

Creating snapshots

There are two effective methods for creating snapshots, but before we dive into those, let's address a common but ill-advised solution.

You shouldn't snapshot the primary PostgreSQL instance

When working with PostgreSQL, it's crucial to avoid taking snapshots of the primary instance or running replicas for a couple of reasons:

  • Volume Overhead: Snapshotting the primary instance can impose unnecessary overhead on the underlying volume, potentially affecting performance.
  • Risk of Corruption: If the database contains a corrupt block, it can propagate to the snapshots, compromising the integrity of your backups and hindering data recovery.
  • Backup Label Management: To snapshot a running instance, you need to execute pg_backup_start and pg_backup_stop. The output of the stop command must be stored, and the appropriate content injected into the backup_label file if the clone is used.

To avoid these issues, I recommend two alternative approaches.

Option 1: Delta restores with pgBackRest

The first and preferred approach is to use pgBackRest for delta restores. When you snapshot a PostgreSQL instance, there's a risk of corrupt blocks being included, endangering your snapshots. pgBackRest adds a layer of protection by checking for corrupt blocks during the backup. If the previous ba

[...]

PITR and Streaming Replication environments
Posted by Stefan Fercot in Data Egret on 2024-09-05 at 12:27

Recently, I had to address a support case where a user was struggling to perform Point-in-Time Recovery in his PostgreSQL cluster using streaming replication. We have already discussed how to achieve a successful recovery in our past webinars, but the question remains: what is the impact on the standby servers? In this specific case, the user was unsure whether he needed to apply the Point-in-Time Recovery steps on the standby as well, and after doing so, the standby refused to connect to the primary.

In this post, we will examine this specific use case, perform a recovery on the primary, and explore the best approach to re-synchronize the standby servers.


For the purpose of this post, we will use 2 nodes called pg1 and pg2. Both are running on Rocky Linux 9.

Installation

On both pg1 and pg2 server, first configure the PGDG yum repositories:

$ sudo dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/\
    EL-9-x86_64/pgdg-redhat-repo-latest.noarch.rpm
$ sudo dnf -qy module disable postgresql

Then, install PostgreSQL and create a basic PostgreSQL cluster on pg1:

$ sudo dnf install -y postgresql16-server
$ sudo PGSETUP_INITDB_OPTIONS="--data-checksums"\
    /usr/pgsql-16/bin/postgresql-16-setup initdb
$ sudo systemctl enable postgresql-16 --now
$ sudo -iu postgres ps -o pid,cmd fx

As we are talking about Point-in-time Recovery, we need to store the WAL archives in a location shared across the nodes. In this example, we will create an archives directory inside a /shared NFS mount:

$ sudo -iu postgres mkdir /shared/archives

Configure the listening addresses and enable archiving in the postgresql(.auto).conf file:

listen_addresses = '*'
archive_mode = on
archive_command = 'test ! -f /shared/archives/%f && cp %p /shared/archives/%f'

Finally, restart the PostgreSQL service using:

$ sudo systemctl restart postgresql-16.service

Setup the streaming replication

On pg1, create a specific user for the replication:

$ sudo -iu postgres psql
postgres=# CREATE ROLE re
[...]

Follow-Up: Reduce Vacuum by Using “ON CONFLICT” Directive
Posted by Shane Borden on 2024-09-04 at 12:58

I previously blogged about ensuring that the “ON CONFLICT” directive is used in order to avoid vacuum from having to do additional work. You can read the original blog here: Reduce Vacuum by Using “ON CONFLICT” Directive

Now that Postgres has incorporated the “MERGE” functionality into Postgres 15 and above, I wanted to ensure that there was no “strange” behavior as it relates to vacuum when using merge. As you can see here, the “MERGE” functionality does perform exactly as expected. For example, when you attempt to have a merge where the directive is to try an insert first followed by an update, exactly one row is marked dead when the insert fails and the update succeeds.

/* Create the table: */
CREATE TABLE public.pk_violation_test (
        id int PRIMARY KEY, 
        value numeric,
        product_id int,
        effective_date timestamp(3)
        );
 
 
/* Insert some mocked up data */
INSERT INTO public.pk_violation_test VALUES ( 
        generate_series(0,10000), 
        random()*1000,
        random()*100,
        current_timestamp(3));
 
/* Verify that there are no dead tuples: */
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup
FROM
    pg_stat_all_tables
WHERE
    relname = 'pk_violation_test';
 
 schemaname |      relname      | n_live_tup | n_dead_tup
------------+-------------------+------------+------------
 public     | pk_violation_test |    100001  |          0

Then, create a simple merge and check the results:

WITH insert_query AS (
    SELECT
        0 AS id,
        44.33893489873 AS value,
        46 AS product_id,
        now() AS effective_date) MERGE INTO pk_violation_test pkt
    USING insert_query i ON pkt.id = i.id
    WHEN MATCHED THEN
        UPDATE SET
            value = i.value, product_id = i.product_id, effective_date = i.effective_date
    WHEN NOT MATCHED THEN
        INSERT (id, value, product_id, effective_date)
            VALUES (i.id, i.value, i.product_id, i.effective_date);
MERGE 1

And then check the dead tuple count:

[...]

We need to talk about ENUMs
Posted by Radim Marek on 2024-09-04 at 00:00

Designing a database schema, whether for a new application or a new feature, always raises a lot of questions. The choices you make can have a big impact on how well your database performs and how easy it is to maintain and scale. Whether you’re just getting started with PostgreSQL or consider yourself a seasoned pro, it’s easy to rely on old habits or outdated advice. In this article, I want to take a fresh look at one of those topics that often sparks debate: the use of ENUMs in PostgreSQL.

I have to admit, not so long ago, I would advise "don't use ENUMs" without thinking about it too much. Relying only on random articles and some personal but outdated experience, this had been my go-to answer for some years. And while there were several limitations of ENUMs in PostgreSQL in the (distant) past, the support has improved a long time ago.

The improvements are so long-standing that:

  • PostgreSQL 9.1 (released in 2011) introduced the option to add new values to ENUMs without a table rewrite. From what I can say this fact alone remained the source of biggest misconception when thinking about ENUMs.
  • Renaming of values was added in version 10 (2017).
  • The ability to ALTER TYPE ... ADD VALUE in a transaction block was introduced in PostgreSQL 12 (2019).

And new features coming (at the time of writing this article) with PostgreSQL 17 allow the use of newly added values within the same transaction block (previously not possible without an explicit commit).

Therefore, I want to correct even myself and say—let's give ENUMs another chance. This article will go into detail and help us correctly decide when it makes sense to use them or not.

How are ENUMs Implemented?

Every stored ENUM value occupies 4 bytes on disk. This is the size of the OID, representing the actual ENUM value, stored as a row within the pg_enum table. You can see this yourself by querying the table directly, but it will come by default with no data defined.

# select * from pg_catalog.pg_enum;

 oid | enumtyp
[...]

The Timescale PostgreSQL Survey 2024
Posted by Stefanie Janine on 2024-09-03 at 22:00

Timescale PostgreSQL Survey

Timescale Inc. is a well known company in the PostgreSQL community, mostly because of their PostgreSQL extension TimescaleDB.

The survey for 2024 is open now until September 30th 2024. If you could spare some time to answer the questions, the community would appreciate it.

The results will be published shortly afer the survey has ended to everyone.

Why It Matters

It helps everyone and every company and every user in the PostgreSQL community to get more information how PostgreSQL is used and why people are using it.

Also what is not well is important to know. Don’t spare your critics. Problems will only go away if they are known and can be solved.

Results From Previous Years

Quick Benchmark: ANALYZE vs. maintenance_io_concurrency
Posted by Michael Banck in credativ on 2024-09-03 at 13:55

Introduction

Running ANALYZE (either explicitly or via auto-analyze) is very important in order to have uptodate data statistics for the Postgres query planner. In particular after in-place upgrades via pg_upgrade, ANALYZE needs to be run in order to have any query statistics at all. As ANALYZE samples only parts of the blocks in a table its I/O pattern looks more like random access than sequential read. Version 14 of Postgres has gained the possibility to use prefetching (if available, but this is the case on Linux) to tell the operating system kernel which blocks it will look at next. This is controlled via the maintenenance_io_concurrency configuration parameter, which is set to 10 by default (contrary to effective_io_concurrency, which is set to 1 by default).

Benchmark

In order to test and demonstrate the changes between version 13 and 14, we have done some quick benchmarks using the current maintenance releases (13.16 and 14.13) on Debian 12 with package from https://apt.postgresql.org. Hardware-wise, a ThinkPad T14s Gen 3 with a Intel i7-1280P CPU with 20 cores and 32 GB of RAM was used. The basis is a pgbench database, initialized with scale factor of 1000:

    $ pgbench -i -I dtg -s 1000 -d pgbench

This creates 100 million rows and leads to a database size of around 15 GB. In order to have ANALYZE do a bit more work, we increase default_statistics_target from the default of 100 to the same value as the pgbench scale factor (i.e., 1000). This results in ANALYZE scanning around 20% of all blocks. We then analyze the main pgbench table, pgbench_accounts:

    $ vacuumdb -Z -v -d pgbench -t pgbench_accounts
    INFO:  analyzing "public.pgbench_accounts"
    INFO:  "pgbench_accounts": scanned 300000 of 1639345 pages,
           containing 18300000 live rows and 0 dead rows;
           300000 rows in sample, 100000045 estimated total rows

Between runs, the file system page cache is dropped via echo 3 | sudo tee /proc/sys/vm/drop_caches and all runs are repeated three times. The following

[...]

[PATCH IDEA] Statistics for the file descriptor cache
Posted by Tomas Vondra on 2024-09-03 at 10:00

Let me present another “first patch” idea, related to a runtime stats on access to files storing data. Having this kind of information would be very valuable on instances with many files (which can happen for many reasons).

This is a very different area than the patch idea, which was about an extension. The runtime stats are at the core of the system, and so is the interaction with the file systems. But it’s still fairly isolated, and thus suitable for new contributors.

Deploying Your First PostgreSQL Clusters on Kubernetes with Cloud Native Postgres
Posted by semab tariq in Stormatics on 2024-09-03 at 07:54

CloudNativePG is an open-source operator designed to manage PostgreSQL workloads on any supported Kubernetes cluster running in private, public, hybrid, or multi-cloud environments. CloudNativePG adheres to DevOps principles and concepts such as declarative configuration and immutable infrastructure.

The post Deploying Your First PostgreSQL Clusters on Kubernetes with Cloud Native Postgres appeared first on Stormatics.

How to test a PostgreSQL Commitfest patch in Kubernetes
Posted by Gabriele Bartolini in EDB on 2024-09-03 at 07:01

In this article, I’ll explore how Kubernetes and CloudNativePG can be leveraged to streamline the testing of PostgreSQL patches, especially during a Commitfest. By automating the creation of PostgreSQL container images from any public Git repository, developers and reviewers can quickly and efficiently test patches within a Kubernetes environment. This approach not only saves time but also enhances the reliability of patches before integration into the main codebase.

VACUUM: Managing and detecting bloat
Posted by Hans-Juergen Schoenig in Cybertec on 2024-09-03 at 07:00

In this introduction, we will explore the concept of VACUUM in PostgreSQL, its importance, and best practices for implementing effective vacuuming strategies to ensure your database remains efficient, performant, and reliable. Before we dive into strategies, problem detection and so on, it is important to understand why VACUUM is needed in the first place.

VACUUM and database transactions

VACUUM will always exist in PostgreSQL and many myths have formed around it. Therefore, it is important to shed some light on why VACUUM is necessary.  The reason why VACUUM is so relevant can be summed up in one word and one word only: “Transactions”. 

Why is that the case? Consider the following example:

Transaction 1 Transaction 2
SELECT count(*) FROM tab;
… returns 0 …
BEGIN;
INSERT INTO table 
    VALUES (1), (2), (3);
SELECT count(*) FROM tab; SELECT count(*) FROM tab;
… returns 3 … … returns 0 …
ROLLBACK; ← 3 rows are now eliminated

What is important to understand is that our two sessions will see different data. The left session will actually see 3 rows, while the right session, happening at the same time, won’t see any rows because the left transaction has not been committed yet. 

The question naturally arising is: How can we get rid of those 3 rows that are subject to our ROLLBACK? In contrast to MS SQL and Oracle, PostgreSQL does NOT remove those rows from disk during a ROLLBACK. That is the reason why ROLLBACK is lightning fast in PostgreSQL and usually a real

[...]

PostgreSQL LTO battle
Posted by Peter Eisentraut in EDB on 2024-09-03 at 04:00

I wrote recently about the performance of PostgreSQL when compiled with different compilers and optimization levels. Another dimension in that evaluation is link-time optimization (LTO).

LTO allows the compiler to perform optimizations across source-file boundaries (more correctly, compilation unit boundaries).

Sounds useful, so let’s give it a try. I’m using the same setup as in the previous article, except this time I’m up to tag REL_16_4. To enable LTO, I’m using the meson option -Db_lto.

As before, the numbers in the following table are transactions per second (tps) as reported by pgbench (higher is better):

OS Compiler LTO debugoptimized release
macOS 14 gcc-14 false 48063 48248
    true 49554 50427
  clang-18 false 48143 47864
    true 50014 49935

These results show that there is a positive effect from LTO, but it is quite limited. The improvement is about 3–5% tps, which is not nothing, but it’s also not overwhelming.

I also tried with the “release” buildtype in the hope that it would result in additional impact, maybe because the compiler is trying harder, but it was not there.

My recommendation here is the same as for the optimization level: Try to stick with what most people are using. If, say, some Linux distribution switched all their builds to use LTO, then ok. But otherwise, you don’t want to be the only one running a particular configuration in production.

Postgres Happy Hour with Xata at Open Source Summit Europe
Posted by Gülçin Yıldırım Jelínek in Xata on 2024-09-03 at 00:00
Join us for the Postgres Happy Hour in Vienna on September 18th!

Contributions of w/c 2024-08-26 (week 35)
Posted by Jimmy Angelakos in postgres-contrib.org on 2024-09-02 at 19:36

“Sometimes, It’s Slow!”
Posted by Henrietta Dombrovskaya on 2024-09-02 at 15:52

These are my least favorite mysteries to resolve. I hate mysteries when it comes to performance problems. A performance mystery means I do not know how to make it work faster! However, I love solving mysteries—except when presented with a “sometimes it is slow” case.

Cases like “it was fast just yesterday, and now it is slow” are easier because you have a pretty good idea of what you might want to check: stale statistics, increased data volumes, “not fit into memory anymore” – all usual suspects. But what do you do with “sometimes”?! What are these “times”? The only thing you can suspect are prepared statements, and if that’s not the case, you are out of ideas.

***

Recently, I was an unfortunate receiver of one of these “sometimes it is slow” service tickets. The only advantage I had this time was a “new pgBadger” in place, so I could have a full view of what was happening inside the database during this slowness. Still, I could not explain what I saw. Yes, the process was definitely suboptimal, but I still could not explain what could make some queries wait for an extended time.

Finally, after examining multiple snapshots during the slow time, I figured it out. There were two contributing factors which had nothing to do with Postgres.

  1. All textbooks, not only the ones we used forty years ago but even the ones students use nowadays, suggest dropping all indexes and constraints before you start a bulk load and recreate them.
  2. Most ORMs offer the courtesy of adding BEGIN at the beginning of an interaction with a database and COMMIT/ROLLBACK at the end, even in the case of read-only transactions, and an app developer who uses Python has no idea about it!

In addition, there are some Postgres specifics that contributed, so let’s look at what exactly happened.

  • Why #1 is a horrible idea? It all depends on the total table size of a table and relative sizes of a table/vs the number of records we are about to insert. If we have a table that already has 10M rows, and we will add 50K, dro
[...]

The 2024 State of the Database Landscape Survey
Posted by Ryan Booz on 2024-09-02 at 14:43
Redgate's State of the Database Landscape Survey is back again in 2024 and ready for your feedback! The 2023 survey gathered insights from nearly 4,000 data professionals about evolving database complexities and skill demands. The 2024 survey will focus on learning, AI, cloud, and security. Participate for insights and a chance to win Amazon vouchers.

Amul Sul
Posted by Andreas 'ads' Scherbaum on 2024-09-02 at 14:00
PostgreSQL Person of the Week Interview with Amul Sul: I am Amul Sul, and I grew up and completed my university studies in Mumbai. I did my master’s in computer application from VJTI, Mumbai. In 2012, NTT DATA brought me on board through campus placement. For three years, I dived deep into PostgreSQL, working on NTT DATA’s internal PostgreSQL project.

pg_easy_replicate Now Supports Tracking Schema Changes in PostgreSQL
Posted by Shayon Mukherjee on 2024-08-31 at 16:19
I have been meaning to support common DDLs (Data Definition Language) for pg_easy_replicate for quite some time now and I am super stoked that it is now finally out. This new capability addresses one of the limitations of PostgreSQL’s native logical replication, bringing more flexibility to database migrations and replication through pg_easy_replicate. What is pg_easy_replicate? For those new to the project, pg_easy_replicate is a CLI orchestrator tool that simplifies the process of setting up and managing logical replication between PostgreSQL databases.

pg_easy_replicate Supports Schema Change Tracking During Logical Replication
Posted by Shayon Mukherjee on 2024-08-31 at 16:19
I have been meaning to support common DDLs (Data Definition Language) for pg_easy_replicate for quite some time now and I am super stoked that it is now finally out. This new capability addresses one of the limitations of PostgreSQL’s native logical replication, bringing more flexibility to database migrations and replication through pg_easy_replicate. What is pg_easy_replicate? For those new to the project, pg_easy_replicate is a CLI orchestrator tool that simplifies the process of setting up and managing logical replication between PostgreSQL databases.

Office hours experiment
Posted by Tomas Vondra on 2024-08-31 at 10:00
I’ve decided to experiment a little bit and do regular “office hours.” I’ll be available to chat about almost anything related to Postgres. It might be a technical discussion about a patch you’re working on, or a topic about the community etc. This is not an entirely new thing. I’ve been telling people to just ping me if they want to discuss something off-list, or have a call and chat about it.

Exploring PostgreSQL 17: A Developer’s Guide to New Features – Part 6: Login Event Trigger
Posted by Deepak Mahto on 2024-08-31 at 01:27

Welcome to Part 6 of our series exploring the exciting new features anticipated in the official PostgreSQL 17 release. In this series, we delve into newly added features and discuss how they can benefit database developers and migration engineers transitioning to PostgreSQL 17 in future.

In case you missed the earlier parts:

Ready to enhance your PostgreSQL development skills? My course on PostgreSQL and PL/pgSQL will help you master database development. Click here to start your journey, and use code DBROOKIE20 to avail 20% off!

Event Trigger during Database Connections in PostgreSQL 17

In databases, the login event is the first interaction a user has when connecting, making it an ideal candidate to trigger functionality based on different use cases. Logon triggers are used for various purposes, such as logging, enforcing business constraints, or populating global variables within database sessions.

If you are wondering till now how connections were logged so we have log_connections and log_disconnections database configuration to enable it at instance level.

Let’s explore Login Event Triggers with an example that mandates checks on application_name for every connection to a database.

Example: Enforcing application_name for Every Connection

In the example below, we create a Login Trigger that ensures all connections specify an application_name. Setting application_name can be useful for code instrumentation, root cause analysis, and is part of the pg_stat_activity view, which can be used for snapshots of database or connection processes at the application level.

CREATE OR REPLACE FUNCTION mandate_app_info() RETURNS event_trigger AS $$
BEGIN
  IF nullif(current_setting
[...]

Contributions of w/c 2024-08-19 (week 34)
Posted by Jimmy Angelakos in postgres-contrib.org on 2024-08-30 at 11:34

The annual PgUS membership meeting was prepared and conducted by: * Stacey Haysler (President) * Mark Wong (Treasurer & Expo Committee) * Michael Brewer (Secretary) * Jonathan S. Katz (Director & Conference Committee) * Elizabeth Garrett Christensen (Director) * Chelsea Dole (Diversity Committee) * Valeria Kaplan (User Group Committee) * James Myers (Volunteer coordinator) * Katharine Saar (Membership Coordinator)

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.