Planet PostgreSQL

Temporal Joins

Posted by Christopher Winslett in Crunchy Data on 2025-10-24 at 13:00

My first thought seeing a temporal join in 2008 was, “Why is this query so complex?” The company I was at relied heavily on database queries, as it was a CRM and student success tracking system for colleges and universities. The query returned a filtered list of users and their last associated record from a second table. The hard part about the query isn’t returning the last timestamp or even performing joins, it’s returning only their last associated record from a second table.

Back in 2008, we didn’t have window functions or CTEs, so the query algorithm was a series of nested tables that looked like this:

SELECT
    *
FROM users, ( -- find the record for the last second_table by created_at and user_id
                SELECT
                    second_table.*
                FROM second_table, ( -- find the last second_table created_at per user_id
                                        SELECT
                                            user_id,
                                            max(created_at) AS created_at
                                        FROM second_table
                                        GROUP BY 1
                                    ) AS last_second_table_at
                WHERE
                    last_second_table_at.user_id = second_table.user_id
                    AND second_table.created_at = last_second_table_at.created_at
            ) AS last_second_table
WHERE users.id = last_second_table.user_id;

See the Sample Code section below for the schema and data to run these queries.

But, even that query was wrong because the second table may have records with duplicate created_at values. That was the source of a bug back in 2008 that resulted in duplicate rows being listed.

Obviously, we weren't using Postgres at the time because there has always been a simpler way to do this in Postgres using DISTINCT ON:

SELECT DISTINCT ON (u.id)
    u.id,
    u.name,
    s.created_at AS last_action_time,
    s.action_type
FROM users u
JOIN second_table s ON u.id = s.user_id

[...]

Loading The Titanic Passenger Data Into PostgreSQL With DBeaver Part 2

Posted by Dave Stokes on 2025-10-22 at 12:08

In the last edition of this blog, the passenger list data from the HMS Titanic was loaded into a preliminary database. Now it is time to refine.

I am using DBeaver Enterprise 25.2.0. PostgreSQL 18, and Github Copilot with Gpt-4.

Prompt: Any recommendations on improving this table for storage efficiency? This prompt was entered into the DBeaver AI Assistant.

To improve the storage efficiency of the "titanic" table, we can consider the following recommendations:

### 1. **Optimize Data Types**

- **"PassengerId"**: If the number of passengers is not expected to exceed 2,147,483,647, the `int4` type is fine. However, if the range is smaller (e.g., less than 32,767), you could use `int2` (smallint) to save space.

- **"Survived"**: Since this is likely a binary value (0 or 1), you can use `boolean` instead of `int4`.

- **"Pclass"**: If the class is limited to 1, 2, or 3, you can use `int2` (smallint) instead of `int4`.

- **"Sex"**: If this column only contains "male" and "female", consider using an `enum` type or a `char(1)` ('M' or 'F') to save space.

- **"Age"**: If precision beyond one decimal place is not required, consider reducing the precision or using `numeric` with a defined scale.

- **"SibSp" and "Parch"**: If these values are small integers (e.g., number of siblings/spouses or parents/children), you can use `int2` (smallint).

- **"Ticket"**: If ticket numbers are of fixed length, consider using `char(n)` instead of `varchar`.

- **"Fare"**: If precision beyond two decimal places is not required, consider using `numeric` with a defined scale (e.g., `numeric(10,2)`).

- **"Cabin"**: If cabin identifiers are of fixed length, consider using `char(n)` instead of `varchar`.

- **"Embarked"**: If this column only contains a small set of values (e.g., 'C', 'Q', 'S'), consider using an `enum` type or `char(1)`.

Comment: This is what I was looking for. Saving sp

[...]

Say Hello to OIDC in PostgreSQL 18!

Posted by Jan Wieremjewicz in Percona on 2025-10-22 at 11:00

If you’ve ever wondered how to set up OpenID Connect (OIDC) authentication in PostgreSQL, the wait is almost over.

We’ve spent some time exploring what it would take to make OIDC easier and more reliable to use with PostgreSQL. And now, we’re happy to share the first results of that work.

Why OIDC, and why now?

We’ve spoken to some of our customers and noticed a trend of moving away from LDAP to OIDC. Our MongoDB product is already providing OIDC integration and the team working on PostgreSQL products saw an opportunity coming with PostgreSQL 18.

Waiting for PostgreSQL 19 – Support COPY TO for partitioned tables.

Posted by Hubert 'depesz' Lubaczewski on 2025-10-22 at 10:05

On 20th of October 2025, Masahiko Sawada committed patch: Support COPY TO for partitioned tables. Previously, COPY TO command didn't support directly specifying partitioned tables so users had to use COPY (SELECT ...) TO variant. This commit adds direct COPY TO support for partitioned tables, improving both usability and performance. Performance tests show … Continue reading

Understanding the Execution Plan of a Hash Join

Posted by Chao Li in Highgo Software on 2025-10-22 at 08:26

A hash join is one of the most common join methods used by PostgreSQL and other relational databases. It works by building a hash table from the smaller input (called the build side) and then probing it with rows from the larger input (the probe side) to find matching join keys.
Hash joins are especially efficient for large, unsorted datasets—particularly when there are no useful indexes on the join columns.

This post uses a concrete example to explain how a hash join works. The example is run on PostgreSQL 18.

Preparing the Data

-- Create tables
CREATE TABLE regions (
    id SERIAL PRIMARY KEY,
    region_name TEXT
);

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name TEXT,
    region_id INT
);

-- Insert sample data
INSERT INTO regions (region_name)
SELECT 'region_' || g FROM generate_series(1, 10) g;

INSERT INTO customers (name)
SELECT 'customer_' || g FROM generate_series(1, 100000) g;

-- Assign each customer a region
UPDATE customers
SET region_id = (random() * 9)::int + 1;

The Query

EXPLAIN
SELECT c.id, c.name, r.region_name
FROM regions r
LEFT JOIN customers c ON c.region_id = r.id;

Output:

Hash Right Join  (cost=38.58..2575.95 rows=100000 width=50)
  Hash Cond: (c.region_id = r.id)
  ->  Seq Scan on customers c  (cost=0.00..2274.00 rows=100000 width=22)
  ->  Hash  (cost=22.70..22.70 rows=1270 width=36)
        ->  Seq Scan on regions r  (cost=0.00..22.70 rows=1270 width=36)

At first glance, this seems weird: the SQL uses LEFT JOIN, but the plan shows RIGHT JOIN. Why does that happen? This post is going to explain the puzzle.

Understanding JOIN Types

Let’s assume simplified data:

customers

id	name	region_id
1	Alice	1
2	Bob	2

[...]

Contributions for week 42, 2025

Posted by Pavlo Golub in postgres-contrib.org on 2025-10-21 at 13:13

Belgium Meetup: Tuesday, 14 October 2025 - organized by Boriss Mejias & Stefan Fercot

Speakers:

Niradj Selvam
Stefan Fercot

Contributions for week 41, 2025

Posted by Pavlo Golub in postgres-contrib.org on 2025-10-21 at 13:12

Barcelona PostgreSQL User Group met on Tuesday, Oct 7, organized by Dave Pitts & Lauro Ojeda

Speakers - Dave Pitts - Lauro Ojeda

Ellyne Phneah released two books: * The Social Code: Building a PostgreSQL Community Wired for Belonging * Decode PostgreSQL: Understanding the World's Most Powerful Open-Source Database Without Writing Code

Benefits of a DESCending index

Posted by Laurenz Albe in Cybertec on 2025-10-21 at 06:00

Sometimes a descending index is the better choice: a man is trying to read an alphabetically sorted list upside down
© Tobi Albe 2025

PostgreSQL can scan B-tree indexes in both directions. That means that there is little need to create a index in descending order (one created with the DESC clause). However, there are some cases where you need a descending index. There are also some corner cases where a descending index performs better than an ascending one. Follow me to explore use cases for the DESC clause!

Mixed `ORDER BY` clauses

This is the obvious use case for a descending index. There is a reason why CREATE INDEX offers the same clauses as ORDER BY: if you want a B-tree index to support an ORDER BY clause, its sorting order must match the ORDER BY clause precisely. So, to support a query like

SELECT col2 FROM tab
ORDER BY col1 DESC NULLS LAST;

you need to create an index like

CREATE INDEX tab_col1_desc_idx
ON tab (col1 DESC NULLS LAST);

But since PostgreSQL can scan indexes in both directions, the following index will also work:

CREATE INDEX tab_col1_idx
ON tab (col1 NULLS FIRST);

However, you cannot avoid the DESC clause if you want to support an ORDER BY clause with both ascending and descending columns:

SELECT col2 FROM tab
ORDER BY col1 DESC, col2 NULLS LAST;

In a previous article, I wrote some more about mixed ORDER BY clauses in the context of keyset pagination. There, you can find some more creative ideas for using descending indexes.

Descending indexes for space efficiency

This is a somewhat more exotic use case. I'll demonstrate it with the following table:

CREATE UNLOGGED TABLE large(
   id bigint NOT NULL
);

CREATE INDEX large_id_idx ON large(id);
CREATE INDEX large_id_desc_idx ON large (id DESC);

One of these two indexes is clearly redundant. But look what happens if I INSERT rows in descending order:

INSERT INTO large (id)
SELECT * FROM generate_series(10000000, 1, -1);

SELECT ind AS index_name,
       pg_size_pretty(pg_relation_size(ind)) AS index_size
FROM (VALUES ('large_id_idx'::regclass),
             ('large_id_desc_idx'::regclass)) AS i(ind);

    index_na

[...]

Postgres in Kubernetes: the commands every DBA should know

Posted by Gabriele Bartolini in EDB on 2025-10-21 at 05:18

For many Postgres DBAs, Kubernetes feels like a new, complex world. But what if your existing skills were the key to unlocking it? This article demystifies cloud-native Postgres by revealing a first handful of kubectl and kubectl cnpg commands that act as your direct translator. I’ll move past the intimidating YAML to focus on the practical, imperative commands you’ll actually use to troubleshoot, inspect, and even perform a production switchover. You’ll see how your core DBA work maps directly to this new environment, helping you build the confidence to take the next step into the cloud-native world.

PostGIS Performance: pg_stat_statements and Postgres tuning

Posted by Paul Ramsey in Crunchy Data on 2025-10-20 at 13:00

In this series, we talk about the many different ways you can speed up PostGIS. Today let’s talk about looking across the queries with pg_stat_statements and some basic tuning.

Showing Postgres query times with pg_stat_statements

A reasonable question to ask, if you are managing a system with variable performance is: “what queries on my system are running slowly?”

Fortunately, PostgreSQL includes an extension called “pg_stat_statements” that tracks query performance over time and maintains a list of high cost queries.

CREATE EXTENSION pg_stat_statements;

Now you will have to leave your database running for a while, so the extension can gather up data about the kind of queries that are run on your database.

Once it has been running for a while, you have a whole table – pg_stat_statements – that collects your query statistics. You can query it directly with SELECT * or you can write individual queries to find the slowest queries, the longest running ones, and so on.

Here is an example of the longest running 10 queries ranked by duration.

SELECT
  total_exec_time,
  mean_exec_time,
  calls,
  rows,
  query
FROM pg_stat_statements
WHERE calls > 0
ORDER BY mean_exec_time DESC
LIMIT 10;

While “pg_stat_statements” is good at finding individual queries to tune, and the most frequent cause of slow queries is just inefficient SQL or a need for indexing - see the first post in the series.

Occasionally performance issues do crop up at the system level. The most frequent culprit is memory pressure. PostgreSQL ships with conservative default settings for memory usage, and some workloads benefit from more memory.

Shared buffers

A database server looks like an infinite, accessible, reliable bucket of data. In fact, the server orchestrates data between the disk – which is permanent and slow – and the random access memory – which is volatile and fast – in order to provide the illusion of such a system.

alt

When the balance between slow storage and fast memory is out of whack, syste

[...]

Check out my new repo: logs_processing

Posted by Henrietta Dombrovskaya on 2025-10-20 at 10:18

I finally shared the set of functions that I use to process pgBadger raw output.

There will be more documentation, I promise, but at least the code is there, along with two of many presentations.

Enjoy! https://github.com/hettie-d/logs-processing

PgPedia Week, 2025-10-12

Posted by Ian Barwick on 2025-10-20 at 06:15

PostgreSQL 19 changes this week support for Eager Aggregation added column stats_reset added to pg_stat_all_tables / indexes and related views pg_get_sequence_data() : output column page_lsn added ALL SEQUENCES support added to publications planner hooks planner_setup_hook and planner_shutdown_hook added mem_exceeded_count column added to pg_stat_replication_slots

more...

Scaling Postgres

Posted by Ibrar Ahmed in pgEdge on 2025-10-20 at 04:38

Postgres has earned its reputation as one of the world's most robust and feature-rich open-source databases. But what happens when your application grows beyond what a single database instance can handle? When your user base explodes from thousands to millions, and your data grows from gigabytes to terabytes?This is where Postgres scaling becomes critical. The good news is that Postgres offers multiple pathways to scale, each with its own advantages and use cases. Since pgEdge Distributed Postgres and pgEdge Enterprise Postgres are 100% Postgres, all of the scaling techniques that follow also apply to your pgEdge cluster.In this comprehensive guide, we'll explore three fundamental scaling approaches:

vertical scaling
(enhancing the power of your current server)

horizontal
scaling
(adding additional servers)

and
high availability
strategies (ensuring your system remains online in the event of failures).

Postgres Architecture

Before diving into scaling strategies, it's crucial to understand how Postgres works under the hood. Unlike some databases that use threads, Postgres uses a process-based architecture. This design choice has significant implications for how we approach scaling.

The Postgres Process Family

When Postgres runs, it creates several specialized processes, each with a specific job:

Postmaster
: The master process that coordinates everything else

Backend processes
: One for each client connection - they handle your SQL queries

WAL Writer
: Manages the Write-Ahead Log, ensuring data durability

Checkpointer
: Periodically flushes data from memory to disk

Background Writer
: Smooths out disk I/O by gradually writing data

Autovacuum Workers
: Clean up dead rows and maintain database health

Replication processes
: Handle data copying to other servers

You can see these processes in action on any Postgres server:Understanding this architecture is crucial because[...]

Part 2: Postgres incredible journey to the top with developers.

Posted by Tom Kincaid in EDB on 2025-10-19 at 20:33

In August of this year, I published a blog entitled PostgreSQL’s incredible trip to the top with developers which shows how Postgres has become the most used, most loved and most desired database according to the Stack Overflow annual developer survey. In that blog I said, I want to do the series in two parts. After some thought, I have decided to make it a 3 part series. It will break down as follows:

Part 1: The aforementioned blog was just the facts regarding Postgres becoming the most loved, most desired and most used database by developers according to the stack overflow survey.

Part 2

Create and debug PostgreSQL extension using VS Code

Posted by Sergey Solovev on 2025-10-18 at 15:27

In this tutorial we will create PostgreSQL extension ban_sus_query. It will check that DML queries contain predicates, otherwise will just throw an error.

Next, in order not to mislead up, I will use term contrib for PostgreSQL extension, and for extension for PostgreSQL Hacker Helper VS Code extension.

This tutorial is created not only for newbies in PostgreSQL development, but also as a tutorial for VS Code extension PostgreSQL Hacker Helper. Documentation for it you can find here.

Creating initial files

PostgreSQL has infrastructure for contrib building and installation. In short, contribs have a template architecture - most parts are common for all.

So, for faster contrib creation we will use command: PGHH: Bootstrap extension.

It will prompt us to bootstrap some files - choose only C sources.

After that we will have our contrib files created:

Initial code

Query execution pipeline has 3 stages:

Parse/Semantic analysis - query string parsing and resolving tables
Plan - query optimization and creating execution plan
Execution - actual query execution

Our logic will be added to the 2 stage, because we must check real execution plan, not Query.
This is because after multiple transformations query can be changed in multiple ways - predicates can be deleted or added, therefore we may get a completely different query than in the original query string.

To implement that we will create hook on planner - planner_hook. Inside we will invoke actual planner and check it's output for the existence of predicates.

Starter code is the following:

#include "postgres.h"

#include "fmgr.h"
#include "optimizer/planner.h"

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

static planner_hook_type prev_planner_hook;

void _PG_init(void);
void _PG_fini(void);

static bool
is_sus_query(Plan *plan)
{
    /* ... */
    return false;
}

static PlannedStmt *
ban_sus_query_planner_hook(Query *parse,
                           const char *query_string,

[...]

Revising the Postgres Multi-master Concept

Posted by Andrei Lepikhov in pgEdge on 2025-10-18 at 10:39

One of the ongoing challenges in database management systems (DBMS) is maintaining consistent data across multiple instances (nodes) that can independently accept client connections. If one node fails in such a system, the others must continue to operate without interruption - accepting connections and committing transactions without sacrificing consistency. An analogy for a single DBMS instance might be staying operational despite a RAM failure or intermittent access to multiple processor cores.

In this context, I would like to revisit the discussion about the Postgres-based multi-master problem, including its practical value, feasibility, and the technology stack that needs to be developed to address it. By narrowing the focus of the problem, we may be able to devise a solution that benefits the industry.

I spent several years developing the multi-master extension in the late 2010s until it became clear that the concept of essentially consistent multi-master replication had reached a dead end. Now, after taking a long break from working on replication, changing countries, residency, and companies, I am revisiting the Postgres-based multi-master idea to explore its practical applications.

First, I want to clarify the general use case for multi-master replication and highlight its potential benefits. Apparently, any technology must balance its capabilities with the needs it aims to address. Let's explore this balance within the context of multi-master replication.

Typically, clients consider a multi-master solution when they hit a limit in connection counts for their OLTP workloads. They often have a large number of clients, an N transaction-per-second (TPS) workload, and a single database. They envision a solution that involves adding another identical server, setting up active-active replication, and doubling their workload.

[...]

PGConf.EU 2025: The Underground Map for Database Nerds

Posted by Mayur B. on 2025-10-17 at 22:57

PGConf.EU schedule can feel like a parallel query gone wild, so many great talks but not enough CPU.
I built this guide to help my fellow database nerds skip the overwhelm and enjoy the best prod-DBA focussed sessions without a single deadlock.
Follow this path, and you’ll cruise through the conference like a perfectly tuned autovacuum.

🗓️ Wednesday, Oct 22 — Warming Up the Buffers

11:15 – 12:05 in Omega 1 : Don’t Do That!
Laurenz Albe reminds us that every bad Postgres habit comes with a sequel called “incident report.”

13:05 – 13:35 in Omega 2 : Parsing Postgres Logs the Non-pgBadger Way
Kaarel Moppel shows that pgweasel and caffeine can out-analyze any dashboard.

13:45–14:35 in Alfa : Improved Freezing in Postgres Vacuum: From Idea to Commit
Melanie Plageman walks us through the icy depths of tuple immortality.

14:45–15:35 in Omega 2 : Operational Hazards of Running PostgreSQL Beyond 100 TB
Teresa Lopes shares real stories and engineering lessons from scaling Postgres into the terabyte realm, where every decision costs you.

16:05–16:55 in Omega 2 : What You Should Know About Constraints (and What’s New in 18)
Gülçin Yıldırım Jelínek explores how new enhancement to constraints in PG 18 make data integrity both smarter and more flexible.

17:05–17:55 in Omega 1 : Hacking pgvector for Performance
Daniel Krefl reveals clever hacks to push filtering and indexing deeper into pgvector for faster, leaner similarity searches.

🧱 Thursday, October 23 — The Day of Observability and Enlightenment

09:25–10:15 in Omega 2 : Unified Observability: Monitoring Postgres Anywhere with OpenTelemetry (Yogesh Jain)
Learn how to unify metrics, logs, and traces across cloud, containers, and bare-metal Postgres instances using OpenTelemetry to build scalable, vendor-agnostic observability.

10:25–10:55 in Omega

[...]

Is Postgres Read Heavy or Write Heavy? (And Why You Should You Care)

Posted by David Christensen in Crunchy Data on 2025-10-17 at 12:00

When someone asks about Postgres tuning, I always say “it depends”. What “it” is can vary widely but one major factor is the read and write traffic of a Postgres database. Today let’s dig into knowing if your Postgres database is read heavy or write heavy.

Of course write heavy or read heavy can largely be inferred from your business logic. Social media app - read heavy. IoT logger - write heavy. But …. Many of us have mixed use applications. Knowing your write and read load can help you make other decisions about tuning and architecture priorities with your Postgres fleet.

Understanding whether a Postgres database is read-heavy or write-heavy is paramount for effective database administration and performance tuning. For example, a read-heavy database might benefit more from extensive indexing, query caching, and read replicas, while a write-heavy database might require optimizations like faster storage, efficient WAL (Write-Ahead Log) management, table design considerations (such as fill factor and autovacuum tuning) and careful consideration of transaction isolation levels.

By reviewing a detailed read/write estimation, you can gain valuable insights into the underlying workload characteristics, enabling informed decisions for optimizing resource allocation and improving overall database performance.

Read and writes are not really equal

The challenge here in looking at Postgres like this is that reads and writes are not really equal.

Postgres reads data in whole 8kb units, called blocks on disk or pages once they’re part of the shared memory. The cost of reading is much lower than writing. Since the most frequently used data generally resides in the shared buffers or the OS cache, many queries never need additional physical IO and can return results just from memory.
Postgres writes by comparison are a little more complicated. When changing an individual tuple, Postgres needs to write data to WAL defining what happens. If this is the first write after a checkpoint, this could inc

[...]

Configuring Linux Huge Pages for PostgreSQL

Posted by Umair Shahid in Stormatics on 2025-10-17 at 10:28

Huge pages are a Linux kernel feature that allocates larger memory pages (typically 2 MB or 1 GB instead of the normal 4 KB). PostgreSQL’s shared buffer pool and dynamic shared memory segments are often tens of gigabytes, and using huge pages reduces the number of pages the processor must manage. Fewer page‑table entries mean fewer translation‑lookaside‑buffer (TLB) misses and fewer page table walks, which reduces CPU overhead and improves query throughput and parallel query performance. The PostgreSQL documentation notes that huge pages “reduce overhead … resulting in smaller page tables and less CPU time spent on memory management”[1].

This blog explains why huge pages matter and how to configure them correctly on modern PostgreSQL releases (v15–v18) running on Linux. It also covers special considerations for Kubernetes, cloud services (Amazon RDS/Aurora and Azure Database for PostgreSQL), and Windows, and emphasises disabling Transparent Huge Pages (THP).

1. When to use huge pages and why they matter

Performance benefit: With large shared buffers, regular 4 KB pages cause many TLB misses. Huge pages group memory into 2 MB or 1 GB chunks, so the CPU spends less time managing memory and more on executing queries. The official docs highlight that huge pages reduce CPU overhead[1], and pganalyze notes that enabling huge pages on a dedicated PostgreSQL server can yield measurable performance improvements when shared buffers are tens of gigabytes[2].
Dedicated servers: Because huge pages reserve physical memory, they are best on dedicated database servers. Over‑allocating huge pages on multi‑purpose servers can starve the operating system or other services[3].
Use huge_pages=on for clarity: PostgreSQL’s huge_pages parameter accepts off, try (default) or on. When set to on, PostgreSQL will refuse to start if sufficient huge pages are not available; this surfaces mis‑configurations immediately[4]. The docs encourage enabling huge pages and w

[...]

Hacking Workshop for November 2025

Posted by Robert Haas in EDB on 2025-10-16 at 13:36

For next month, I'm scheduling 2 or 3 discussions of Matthias van de Meent's talk, Improving scalability; Reducing overhead in shared memory, given at 2025.pgconf.dev (talk description here). If you're interested in joining us, please sign up using this form and I will send you an invite to one of the sessions. Thanks to Matthias for agreeing to attend the sessions, and to Melanie Plageman for agreeing to serve as host. (I normally host, but am taking a month off. We will also skip December due to the end-of-year holidays.)

Sanitized SQL

Posted by Jeremy Schneider on 2025-10-16 at 03:57

A couple times within the past month, I’ve had people send me a message asking if I have any suggestions about where to learn postgres. I like to share the collection of links that I’ve accumulated (and please send me more, if you have good ones!) but another thing I always say is that the public postgres slack is a nice place to see people asking questions (Discord, Telegram and IRC also have thriving Postgres user communities). Trying to answer questions and help people out can be a great way to learn!

Last month there was a brief thread on the public postgres slack about the idea of sanatizing SQL and this has been stuck in my head for awhile.

The topic of sensitive data and SQL is actually pretty nuanced.

First, I think it’s important to directly address the question about how to treat databases schemas – table and column names, function names, etc. We can take our cue from the large number industry vendors with data catalog, data lineage and data masking products. Schemas should be internal and confidential to a company – but they are not sensitive in the same way that PII or PCI data is. Within a company, it’s desirable for most schemas to be discoverable by engineers across multiple development teams – this is worth the benefits of better collaboration and better architecture of internal software.

Unfortunately, the versatile SQL language does not cleanly separate things. A SQL statement is a string that can mix keywords and schema and data all together. As Benoit points out in the slack thread – there are prepared (parameterized) statements, but you can easily miss a spot and end up with literal strings in queries. And I would add to this that most enterprises will also have occasional needs for manual “data fixes” which may involve basic scripts where literal values are common.

Benoit’s suggestion was to run a full parse of the query text. This is a good idea – in fact PgAnalyze already maintains a standalone open-source library which can be used to directly leverage Postgres’

[...]

Prairie Postgres Birthday Meetup

Posted by Henrietta Dombrovskaya on 2025-10-16 at 03:04

Huge thanks to everyone who came to the Prairie Postgres meetup and celebrated our first birthday with us! Thank you for helping me to rehearse my talk, and for your insightful questions!

Here are my presentation slides:

Long_Queries_2025 Download

And we had a cake!

Keep Calm - TDE for PostgreSQL 18 Is on Its Way!

Posted by Jan Wieremjewicz in Percona on 2025-10-15 at 11:00

If you’ve been following the buzz around PostgreSQL, you’ve probably already heard that database level open source data-at-rest encryption is now available thanks to the Transparent Data Encryption (TDE) extension available in the Percona Distribution for PostgreSQL. So naturally, the next question is:

Where’s Percona Distribution for PostgreSQL 18?

The short answer:

It’s coming.

The slightly longer one:

It’s taking a bit of time, for all the right reasons.

Understanding Disaster Recovery in PostgreSQL

Posted by warda bibi in Stormatics on 2025-10-15 at 10:12

System outages, hardware failures, or accidental data loss can strike without warning. What determines whether operations resume smoothly or grind to a halt is the strength of the disaster recovery setup. PostgreSQL is built with powerful features that make reliable recovery possible.

This post takes a closer look at how these components work together behind the scenes to protect data integrity, enable consistent restores, and ensure your database can recover from any failure scenario.

What is Disaster Recovery?

Disaster Recovery (DR) refers to a set of practices and strategies designed to back up and restore databases in the event of a disaster. In this context, a disaster means any event that renders the entire database environment unusable, such as:

Cloud region outages – e.g., when AWS us-east-1 goes down and takes half the internet with it
Physical disasters – fire, flood, earthquakes, or even a backhoe cutting fiber lines
Catastrophic human error – like a faulty migration that corrupts critical tables
Major security incidents – where you must rebuild from known good backups
Power outages – extended downtime impacting availability
Hardware failures – disks, memory, or server crashes
Software failures – bugs, crashes, or corrupted processes
Cyberattacks – ransomware, data breaches, or malicious tampering
…and whatever else you can’t imagine!

The goal of DR is to ensure that the system can be quickly restored to a normal operational state in the event of an unexpected incident.

The Pillars of Business Continuity: RTO and RPO

Before designing a disaster recovery strategy, we must understand the two metrics that define it:

RPO
RTO

[...]

What's Our Vector, Victor? Building AI Apps with Postgres

Posted by Shaun Thomas in pgEdge on 2025-10-15 at 04:14

Something I’ve presented about recently (a couple times, now!) is how we can make AI actually useful with Postgres. Not the mystical robot overlord kind of AI that people worry about, but the practical, math-heavy kind that can actually help solve real problems with the data you already have.Let me be clear up front: AI isn't magic. It's math. Lots and lots of math. And if you can write SQL queries (which, let's be honest, is why you're here), you can build AI applications with Postgres. Let me show you how.

Breaking Down the AI Buzzwords

Before we get too deep into the technical stuff, let's clarify some terms that get thrown around way too much these days:LLM (Large Language Model): This is what people are actually talking to when they chat with AI systems like ChatGPT or Claude. It's the thing that generates responses that sound surprisingly human.RAG (Retrieval Augmented Generation): We're all tired of AIs making stuff up, right? RAG forces them to use reference material - your actual data - instead of just hallucinating answers. It's basically saying "here's some context, now answer based on this instead of whatever random training data you remember."Tokens: Text gets broken into chunks, and those chunks get meaning associated with them. Think of it as building up from word parts to bigger concepts, piece by piece.Embeddings: Here's where it gets interesting, and where Postgres really starts to shine. An embedding is a vector, essentially a coordinate system for those tokens. You know how a Cartesian plot has X and Y coordinates? Vectors are like that, except they're hundreds or even thousands of coordinates long. That's where AI concepts live and get their granularity.

Understanding Vectors: The School of Fish

I love using this analogy because it actually captures what's happening under the hood. Picture a school of fish in a coral reef. A vector isn't just representing the species of fish, it's capturing everything about that fish in context. The position of its fins, what other fish it's swimmin[...]

#PostgresMarathon 2-008: LWLock:LockManager and prepared statements

Posted by Nikolay Samokhvalov in Postgres.ai on 2025-10-14 at 23:59

As was discussed in #PostgresMarathon 2-002, for a simple SELECT from a table, at planning time, Postgres locks the table and all of its indexes with AccessShareLock. A simple demo to remind it (let me be a bit weird here and save some bytes when typing SQL):

test=# create table t();
CREATE TABLE
test=# create index on t((1));
CREATE INDEX
test=# create index on t((1));
CREATE INDEX
test=# create index on t((1));
CREATE INDEX
test=# create index on t((1));
CREATE INDEX
test=# create index on t((1));
CREATE INDEX
test=# \d t
                Table "public.t"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
Indexes:
    "t_expr_idx" btree ((1))
    "t_expr_idx1" btree ((1))
    "t_expr_idx2" btree ((1))
    "t_expr_idx3" btree ((1))
    "t_expr_idx4" btree ((1))

test=#
test=# begin; explain select from t;
BEGIN
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on t  (cost=0.00..39.10 rows=2910 width=0)
(1 row)

test=*# select relation::regclass, mode from pg_locks where pid = pg_backend_pid();
  relation   |      mode
-------------+-----------------
 t_expr_idx2 | AccessShareLock
 pg_locks    | AccessShareLock
 t_expr_idx3 | AccessShareLock
 t_expr_idx4 | AccessShareLock
 t_expr_idx  | AccessShareLock
 t_expr_idx1 | AccessShareLock
 t           | AccessShareLock
             | ExclusiveLock
(8 rows)

test=*#

– indeed, all indexes locked.

Using prepared statements to reduce locking

To mitigate it, we can just use prepared statements. Let's create one:

prepare test_query (int) as select from t;

And then run this snippet 7 times:

begin;
explain (verbose) execute test_query(1);

select relation::regclass, mode
from pg_locks
where pid = pg_backend_pid() and relation::regclass <> 'pg_locks'::regclass;

rollback;

Six (6) times, we'll see that all indexes are locked:

  relation   |      mode
-------------+-----------------
 t_expr_idx2 | AccessShareLock
 t_expr_idx3 | AccessShareLoc

[...]

Understanding and Setting PostgreSQL JDBC Fetch Size

Posted by Shane Borden on 2025-10-14 at 22:17

By default, the PostgreSQL JDBC driver fetches all rows at once and attempts to load them into memory vs. other drivers such as Oracle that by default only fetches 10 rows at a time. Both defaults have pros and cons, however in the context of the types of workloads I see every day, the PostgreSQL default is typically not optimal.

As a frame of reference, the default PostgreSQL fetch size is just fine if you have queries that always return small result sets. If there is a chance that larger results sets could be retrieved, a high variance of performance between small and large result sets will be seen and an explicit fetch size should be considered.

To demonstrate, I wanted to create a demo application which would create a simulated table with 2 million rows and select them with various fetch sizes:

fetchSize of 1000 (stream 1000 rows at a time)
fetchSize of 5000 (stream 5000 rows at a time)
fetchSize of 0 (fetch all rows at once) – Default

For a query returning 2 million rows, leveraging the default fetch size produce the following results:

java -cp .:/home/shaneborden_google_com/java/postgresql-42.5.4.jar DatabaseFetchSizeTest

--- Database Setup ---
  [MEMORY] Initial Baseline: 6.68 MB Used (Total Heap: 56.00 MB)
Existing table dropped.
New table created: large_data_test
Inserting 2000000 rows... Done in 44.36 seconds.
  [MEMORY] After Data Insertion: 6.72 MB Used (Total Heap: 40.00 MB)

------------------------------------------------------------
--- Running Test: Small Chunk (1000 rows) (Fetch Size: 1000) ---
------------------------------------------------------------
Executing query with fetch size 1000...
  [MEMORY] 1. Before Query Execution: 6.63 MB Used (Total Heap: 40.00 MB)
  [MEMORY] 2. After Query Execution (Data Loaded/Cursor Open): 6.86 MB Used (Total Heap: 40.00 MB)
Test Complete.
  Total Rows Read: 2000000
  Total Time Taken: 1613 ms
  [MEMORY] 3. After All Rows Processed: 6.67 MB Used (Total Heap: 68.00 MB)
  Mode: STREAMING. Expect memory usage to remain low

[...]

Three Interviews

Posted by Bruce Momjian in EDB on 2025-10-14 at 13:00

I recently did three interviews for Edd Mann's Compiled Conversations. The first is a general interview about the Postgres project, its history, and challenges. The other is a two-part (1, 2) interview about how a table is created and populated internally. Total recording time is 3.5 hours.

The PostgreSQL Village

Posted by Cornelia Biacsics in Cybertec on 2025-10-14 at 05:41

At PGDay Lowlands 2025, I had the chance to give my very first Lightning Talk. My topic? The idea of thinking about PostgreSQL as a village.

It was inspired by a phrase I recently came across: ”If you want to go fast, go alone. If you want to go far, go together.” (I’m not sure who first said this, but I really like it.)

This saying stuck with me, because it beautifully captures the essence of open source — and especially the PostgreSQL community.

Because open source is not a solo journey. It’s a shared effort — like a village. A place where people contribute, share, and look after the common good, while valuing the diversity of skills, backgrounds, and perspectives that make it thrive.

About villages and the PostgreSQL village

When I think about a village, diversity comes to my mind. But why?

Because what you can find in a village is diversity at its core:

People of different families and professions
Different passions, skills, and interests
People who sometimes take different paths, but still live together for a shared purpose
And many more

The same is true for PostgreSQL. We might come from different “families” — developers, DBAs, sales, marketers, trainers, consultants and many, many more — but we all live in the same PostgreSQL village. We all contribute in our own way.

Some write code, others teach, promote, or help organize. Some focus on users, others on infrastructure. The perspectives may differ, but the goal is the same: keeping the community and the project strong.

How I Found My Place in the Village

Although I am not a technician, as my origin is marketing related, I found my place in the PostgreSQL village.

Like many others, I first joined the community by connecting through shared interests. For me, the easiest entry point was attending my first PostgreSQL conference — PGConf.EU in Prague (2023). That’s where I truly experienced the spirit of the community, and it left a lasting impression on me. Later, I joined the PostgreSQL Soc

[...]

Getting Ready for PGConf.EU 2025

Posted by Karen Jex in Crunchy Data on 2025-10-13 at 19:13

I hope everyone's looking forward to PostgreSQL Conference Europe 2025 in Riga next week!

I can't wait to see all of my favourite Postgres people, and meet some new ones.

Before I get completely lost in last-minute preparations, I thought I'd take a few minutes to share my plans for the week. That way, you'll know where to find me if you want to introduce yourself, catch up, grab a coffee, or just ask me a question.

I'd love to hear what everyone else has planned too. Please let me know!

Without further ado, here's my agenda for the week:

Tuesday

New this year, the conference kicks off with the Community Events Day. We invited the PostgreSQL Europe Community to propose mini community-focused, participative events. There's something for everyone - from AI to Patroni and from Extensions to volunteer training.

I plan to start the week off gently with a morning of Postgres-themed crafting at Crafty Slonik. I'm looking forward to chatting about databases whilst doing some crochet. If crochet's not your thing, there'll be plenty of other options, or bring your own! If you're new to PGConf.EU and/or to the Postgres community, this is the perfect opportunity to meet some new folks and maybe make a conference buddy or two in an informal setting.

In the afternoon, I'll be at the Community Organizers Conf, learning from other PostgreSQL event organisers, and sharing what the PostgreSQL Europe Diversity Committee has been working on, especially how and why we've made PGConf.EU a Sunflower friendly event.

Hidden Disabilities Sunflower lanyards, Photography by Neil Juggins, https://neiljuggins.photoshelter.com/index

Wednesday

I'm excited to hear the keynote from Karen Sandler, of the Software Freedom Conservancy. It's going to be a great start to 3 days of amazing talks. I have no idea how I'm going to choose which ones to see!

I'll spend some time today hanging around the Community Celebration board - a place where we can pin things that we've drawn, written or created to celebrate what we love about the Postgres community and showcase the rich variety of voices and backgrou

[...]

Latest Blog Posts

Posted by Christopher Winslett in Crunchy Data on 2025-10-24 at 13:00

Posted by Dave Stokes on 2025-10-22 at 12:08

Posted by Jan Wieremjewicz in Percona on 2025-10-22 at 11:00

Why OIDC, and why now?

Posted by Hubert 'depesz' Lubaczewski on 2025-10-22 at 10:05

Posted by Chao Li in Highgo Software on 2025-10-22 at 08:26

Preparing the Data

The Query

Understanding JOIN Types

Posted by Pavlo Golub in postgres-contrib.org on 2025-10-21 at 13:13

Posted by Pavlo Golub in postgres-contrib.org on 2025-10-21 at 13:12

Posted by Laurenz Albe in Cybertec on 2025-10-21 at 06:00

Mixed ORDER BY clauses

Descending indexes for space efficiency

Posted by Gabriele Bartolini in EDB on 2025-10-21 at 05:18

Posted by Paul Ramsey in Crunchy Data on 2025-10-20 at 13:00

Posted by Henrietta Dombrovskaya on 2025-10-20 at 10:18

Posted by Ian Barwick on 2025-10-20 at 06:15

Posted by Ibrar Ahmed in pgEdge on 2025-10-20 at 04:38

Postgres Architecture

The Postgres Process Family

Posted by Tom Kincaid in EDB on 2025-10-19 at 20:33

Posted by Sergey Solovev on 2025-10-18 at 15:27

Creating initial files

Initial code

Posted by Andrei Lepikhov in pgEdge on 2025-10-18 at 10:39

Posted by Mayur B. on 2025-10-17 at 22:57

🗓️ Wednesday, Oct 22 — Warming Up the Buffers

🧱 Thursday, October 23 — The Day of Observability and Enlightenment

Posted by David Christensen in Crunchy Data on 2025-10-17 at 12:00

Posted by Umair Shahid in Stormatics on 2025-10-17 at 10:28

1. When to use huge pages and why they matter

Posted by Robert Haas in EDB on 2025-10-16 at 13:36

Posted by Jeremy Schneider on 2025-10-16 at 03:57

Posted by Henrietta Dombrovskaya on 2025-10-16 at 03:04

Posted by Jan Wieremjewicz in Percona on 2025-10-15 at 11:00

The short answer:

The slightly longer one:

Posted by warda bibi in Stormatics on 2025-10-15 at 10:12

What is Disaster Recovery?

The Pillars of Business Continuity: RTO and RPO

Posted by Shaun Thomas in pgEdge on 2025-10-15 at 04:14

Breaking Down the AI Buzzwords

Understanding Vectors: The School of Fish

Posted by Nikolay Samokhvalov in Postgres.ai on 2025-10-14 at 23:59

Using prepared statements to reduce locking​

Posted by Shane Borden on 2025-10-14 at 22:17

Posted by Bruce Momjian in EDB on 2025-10-14 at 13:00

Posted by Cornelia Biacsics in Cybertec on 2025-10-14 at 05:41

About villages and the PostgreSQL village

How I Found My Place in the Village

Posted by Karen Jex in Crunchy Data on 2025-10-13 at 19:13

Tuesday

Wednesday

Top posters

Top teams

Feeds

Planet

Contact

Mixed `ORDER BY` clauses

Using prepared statements to reduce locking