Latest Blog Posts

fresh dll of orafce and plpgsql_check for PostgreSQL 17 and PostgreSQL 18
Posted by Pavel Stehule on 2025-12-20 at 06:25

I compiled and uploaded zip files with latest orafce and plpgsql_check for PostgreSQL 17 and PostgreSQL 18 - I used Microsoft Visual C 2022.

Setup:

  1. download orafce-4.16.3-x86_64-windows.zip or plpgsql_check-2.8.5-x86_64-windows.zip and extract files
  2. copy related dll file to PostgreSQL lib directory (NN is number of pg release)
    orafce-NN.dll -> "c:\Program Files\PostgreSQL\NN\lib"
  3. remove suffix "x64-16" or "x64-17" from dll file
    orafce-NN.dll -> orafce.dll
  4. copy *.sql and *.control files to extension directory
    *.sql, *.control -> "c:\Program Files\PostgreSQL\NN\share\extension"
  5. execute with super user rights SQL command CREATE EXTENSION
    CREATE EXTENSION orafce;




The OOM-Killer Summoning Ritual: “Just Increase work_mem”
Posted by Mayur B. on 2025-12-19 at 21:57

You’ve probably seen the incident pattern:

  1. Postgres backends start disappearing.
  2. dmesg / journalctl -k shows the kernel OOM killer reaping postgres.
  3. Someone spots “out of memory” and reflexively recommends: “Increase work_mem.”

That recommendation is frequently backwards for OS OOM kills.

The linguistic trap: “Out of memory” sounds like “not enough work_mem”

work_mem is not “memory for the query.” It is a base, per-operation budget for executor nodes like sorts and hash tables before they spill to temporary files. PostgreSQL’s own docs explicitly warn that a complex query can run multiple sort/hash operations concurrently, and many sessions can do this at the same time so total memory can be many times work_mem.

If you raise work_mem globally, you are raising the ceiling on many potential concurrent memory consumers. That can turn “rare spike” into “frequent OOM kill.”

OS OOM kill vs “Postgres OOM”: two different failure modes

There are two scenarios people accidentally conflate:

  • Executor spills to disk (healthy): Postgres hits a per-node memory budget and writes temp files.
  • Kernel OOM kill (host-level failure): Linux cannot satisfy memory demands (often influenced by overcommit behavior) and terminates a process to keep the system alive.

In other words: if the kernel is killing Postgres, “give Postgres permission to use even more memory next time” is not a stabilizing strategy.

The source says what the docs say: work_mem is “allowed memory,” then temp files

The tuplesort implementation is blunt about the design: it keeps tuples in memory up to the limit and then switches to temporary “tapes” (temp files) for an external sort.

A tiny excerpt from src/backend/utils/sort/tuplesort.c (Doxygen):

/* ... memory allowe
[...]

Code Signing fun and games for pgAdmin
Posted by Dave Page in pgEdge on 2025-12-19 at 12:53
For years now pgAdmin builds on Windows have been code signed using a certificate installed on a build machine in AWS. That certificate is expiring, so I had to look for a new one (I no longer work at a company that supports Windows that can provide one), and much to my annoyance I found that requirements have changed and a secure physical device is required for key storage. Keys are now generally much more expensive, and whilst there are options to have them installed in virtual keystores at AWS or similar, that adds thousands of dollars to the cost(!) for a supervised installation. Thankfully, Certum offer an Open Source Developer option, which is much more reasonably priced, albeit only valid for a year and only offering basic levels of assurance.

The card and reader arrived this week, and once I was approved, a certificate issued. The first problem I hit was that you cannot run the card manager application via Remote Desktop - you have to do it on the machine's console, or via VNC or similar. I'm running on Proxmox (having passed through the card reader device to the VM), so that meant using the console in the Proxmox UI.

Once the card manager was running, I was able to install the key through the Certum website. So far, so good - until I tried to sign a test binary. That's when the fun began. After a morning of back-and-forth, working with Claude Desktop, we finally got it working with a test binary. In the hope that it will help others in the future (including myself and the rest of the pgAdmin team when the certificate expires in a year, the Claude-generated notes on what we did are below.

My next steps are to figure out how to automate PIN entry, and then incorporate all the changes into the pgAdmin CI/CD pipelines so I don't have to enter a PIN multiple times every day when the team commit changes. Fun times!

Certum Open Source Code Signing Certificate - Windows Setup Reference

Date: December 2025
Hardware: Certum cryptographic card with ACS ACR40T ICC Reader


[...]

pgEdge-Support-for-Large-Object-Logical-Replication
Posted by Ahsan Hadi in pgEdge on 2025-12-19 at 12:12

As AI capabilities continue to evolve and integrate more deeply into our applications, we’re faced with interesting architectural decisions about how to expose our data to large language models (LLMs). Two approaches that have gained significant traction are Retrieval Augmented Generation (RAG) servers (such as pgEdge RAG Server) and Model Context Protocol (MCP) servers (such as pgEdge Natural Language Agent). Both have their place, but they serve quite different purposes and come with vastly different security implications – particularly when it comes to database access.

What is a RAG Server?

RAG servers are designed to enhance LLM responses by providing relevant context from a knowledge base. The basic flow is straightforward: when a user asks a question, the RAG server searches for relevant documents or data chunks, retrieves them, and passes them to the LLM along with the original question. The model then generates a response based on both its training and the provided context.The key characteristic of a RAG server is that it acts as a carefully controlled intermediary. The server’s API defines exactly what operations are possible, what data can be retrieved, and how that data is formatted before being passed to the model. The LLM never directly touches your database; it only sees what the RAG server chooses to show it.

What is an MCP Server?

MCP (Model Context Protocol) servers take a fundamentally different approach. Rather than providing pre-defined retrieval operations, MCP exposes a set of tools that the LLM can invoke directly. In the context of database access, this might include tools to execute SQL queries, browse schemas, or interact with stored procedures.The power of MCP lies in its flexibility. Instead of being limited to whatever retrieval logic was baked into a RAG server, an LLM connected to an MCP server can dynamically construct queries based on what it needs. This makes it exceptionally useful for exploratory data analysis, ad-hoc reporting, and other scenarios where the questions[...]

RAG Servers vs MCP Servers: Choosing the Right Approach for AI-Powered Database Access
Posted by Dave Page in pgEdge on 2025-12-19 at 11:31

As AI capabilities continue to evolve and integrate more deeply into our applications, we’re faced with interesting architectural decisions about how to expose our data to large language models (LLMs). Two approaches that have gained significant traction are Retrieval Augmented Generation (RAG) servers (such as pgEdge RAG Server) and Model Context Protocol (MCP) servers (such as pgEdge Natural Language Agent). Both have their place, but they serve quite different purposes and come with vastly different security implications – particularly when it comes to database access.

What is a RAG Server?

RAG servers are designed to enhance LLM responses by providing relevant context from a knowledge base. The basic flow is straightforward: when a user asks a question, the RAG server searches for relevant documents or data chunks, retrieves them, and passes them to the LLM along with the original question. The model then generates a response based on both its training and the provided context.The key characteristic of a RAG server is that it acts as a carefully controlled intermediary. The server’s API defines exactly what operations are possible, what data can be retrieved, and how that data is formatted before being passed to the model. The LLM never directly touches your database; it only sees what the RAG server chooses to show it.

What is an MCP Server?

MCP (Model Context Protocol) servers take a fundamentally different approach. Rather than providing pre-defined retrieval operations, MCP exposes a set of tools that the LLM can invoke directly. In the context of database access, this might include tools to execute SQL queries, browse schemas, or interact with stored procedures.The power of MCP lies in its flexibility. Instead of being limited to whatever retrieval logic was baked into a RAG server, an LLM connected to an MCP server can dynamically construct queries based on what it needs. This makes it exceptionally useful for exploratory data analysis, ad-hoc reporting, and other scenarios where the questions[...]

Dev Container for pgrx PostgreSQL Extensions: Lessons Learned
Posted by Pavlo Golub in Cybertec on 2025-12-19 at 06:00

I like reproducible development. I also like short feedback loops. Combining both for pgrx was… educational. 🙂 In this post, I share the mistakes, the small pains, and the fixes I used to get a working VS Code dev container for a Rust project that builds PostgreSQL extensions with pgrx. If you’re writing extensions or using pgrx in a team, this will save you a few grey hairs.

TL;DR:

  • PostgreSQL refuses to run as root. Don’t run your devcontainer as root if cargo pgrx test must work.
  • Install user-specific Rust tools as the non-root user, remember that order matters.
  • Run cargo pgrx init after the container starts (use postCreateCommand) so the config persists.

A quick story

I was setting up a devcontainer for an etcd_fdw project that uses pgrx to produce a PostgreSQL extension. At first, it seemed straightforward: start from the official Rust image, install build deps, add cargo-pgrx, and be done.

But real life is noisy. I hit permission errors, failing tests, and an annoying config.toml not found. Have you run 'cargo pgrx init' yet?. After some digging, I discovered three core issues that you should watch out for.

The PostgreSQL root problem

PostgreSQL will not run as root by design. cargo pgrx test launches PostgreSQL instances for integration testing. If your devcontainer runs as root (or if you initialize pgrx as root and then switch user incorrectly), tests will fail.

First, I tried to set remoteUser: "root" in devcontainer.json and installing everything as root. The build failed with errors when tests attempted to run PostgreSQL.

The fix is (simplified Dockerfile snippet):

# Create a non-root user early in the image build
RUN useradd -m -s /bin/bash -u 1000 vscode && \
echo "vscode ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

# Switch to that user for subsequent steps that create user-owned files
USER vscode

The key takeaway is to think about runtime actions (what cargo pgrx test will do) when you design the container user. Create and use a non-root user earl

[...]

🐏 Taming PostgreSQL GUC “extra” Data
Posted by David Wheeler on 2025-12-18 at 18:04

New post up on on the ClickHouse blog:

I wanted to optimize away parsing the key/value pairs from the pg_clickhouse pg_clickhouse.session_settings GUC for every query by pre-parsing it on assignment and assigning it to a separate variable. It took a few tries, as the GUC API requires quite specific memory allocation for extra data to work properly. It took me a few tries to land on a workable and correct solution.

Struggling to understand, making missteps, and ultimately coming to a reasonable design and solution satisfies me so immensely that I always want to share. This piece gets down in the C coding weeds; my fellow extension coders might enjoy it.

Enhancing PostgreSQL OIDC with pg_oidc_validator
Posted by Jan Wieremjewicz in Percona on 2025-12-17 at 11:00

With PostgreSQL 18 introducing built-in OAuth 2.0 and OpenID Connect (OIDC) authentication, tools like pg_oidc_validator have become an essential part of the ecosystem by enabling server-side verification of OIDC tokens directly inside PostgreSQL. If you’re new to the topic, make sure to read our earlier posts explaining the underlying concepts and the need for external validators:

This release builds on the initial version announced in October and continues our mission to make OIDC adoption in PostgreSQL reliable, fast, and accessible for all users.

The Road to Deploy a Production-Grade, Highly Available System with Open-Source Tools
Posted by semab tariq in Stormatics on 2025-12-17 at 10:48

Everyone wants high availability, and that’s completely understandable. When an app goes down, users get frustrated, business stops, and pressure builds.

But here’s the challenge: high availability often feels like a big monster. Many people think, If I need to set up high availability, I must master every tool involved. And there’s another common belief too: Open-source tools are not enough for real HA, so I must buy paid tools.

These assumptions make high availability seem far more complex than it really is, and in this series, we are going to address it.

This Is a 2-Part Series.

  • Part 1 (this one): We will lay the foundation by answering the most important questions you should consider before going hands-on with HA systems.
  • Part 2: We will go fully hands-on. I will walk through the architecture diagram, the tool stack, and provide the exact commands and step-by-step instructions to deploy the cluster based on your requirements.

The “Number of Nines”, RTO, and RPO

These are the foundations of a high-availability cluster. If you understand them and answer them clearly, you are already very close to building your HA setup.

Imagine you have a main site (your primary system). Things are working fine. Life is good. But one day, a disaster happens, maybe a server fails, a region goes down, or your database crashes.

At that moment, three questions decide everything.

1) How much downtime can you accept?

This is where the number of nines comes in (like 99.9% uptime, 99.99%, and so on). More nines usually mean less downtime, but also more effort and architectural cost.

Here’s a simple table to help you choose the right architecture based on your needs.

Target uptime (“nines”) Allowed downtime (per year) A setup that usually fits Notes / what y
[...]

pgBackRest preview - simplifying manual expiration of oldest backups
Posted by Stefan Fercot in Data Egret on 2025-12-17 at 10:35

A useful new feature was introduced on 11 December 2025: Allow expiration of the oldest full backup regardless of current retention. Details are available in commit bf2b276.

Before this change, it was awkward to expire only the oldest full backup while leaving the existing retention settings untouched. Users had to temporarily adjust retention (or write a script) to achieve the same result. Expiring the oldest full backup is particularly helpful when your backup repository is running low on disk space.

Let’s see how this works in practice with a simple example.


Example of expiring the oldest full backup

Let’s use a very basic demo setup.

$ pgbackrest info
...
wal archive min/max (18): 000000010000000000000056/000000010000000300000038

full backup: 20251217-095505F
    wal start/stop: 000000010000000000000056 / 000000010000000000000056

incr backup: 20251217-095505F_20251217-095807I
    wal start/stop: 00000001000000010000001B / 00000001000000010000001B
    backup reference total: 1 full

full backup: 20251217-095902F
    wal start/stop: 000000010000000100000090 / 000000010000000100000090

incr backup: 20251217-095902F_20251217-100124I
    wal start/stop: 000000010000000200000013 / 000000010000000200000013
    backup reference total: 1 full

This gives us two full backups, each of them with a linked incremental backup. Note that I have deliberately reduced the output of the info command to the information relevant to our test case.

$ pgbackrest expire --stanza=demo --dry-run
P00   INFO: [DRY-RUN] expire command begin 2.57.0: ...
P00   INFO: [DRY-RUN] repo1: 18-1 no archive to remove
P00   INFO: [DRY-RUN] expire command end: completed successfully

If we run the expire command, nothing is expired, as I have set repo1-retention-full=2, meaning that two full backups should be kept.

Let’s now imagine that your backup repository is running out of disk space and you want to remove the oldest backup to free up some space. How would you do that?

Once you have identified the back

[...]

Introducing The pgEdge Postgres MCP Server
Posted by Phillip Merrick in pgEdge on 2025-12-17 at 07:02

One of the principal and most powerful components of the pgEdge Agentic AI Toolkit is the pgEdge Postgres MCP Server. In just over a year MCP (Model Context Protocol), initially developed by Anthropic, has become the standard way to connect LLMs to external data sources and tools. Some people describe it as being like a USB for LLMs. The pgEdge Postgres MCP server makes it extremely easy to connect Claude Code, Claude Desktop, Cursor and other AI development tools directly to Postgres database – not just pgEdge distributions, but standard community Postgres, Amazon RDS, and pretty much any other relatively standard version of Postgres (so long as it is v14 or newer).Let’s walk through how to set up the pgEdge Postgres MCP Server, what makes it different, and how to use it with Claude Desktop and AI code generators like Claude Code and Cursor. Our discussion is somewhat Claude-centric, but the pgEdge MCP Server also works with OpenAI GPT-5 and local models such as Ollama.

What Makes Our MCP Server for Postgres Different

Perhaps your first reaction to seeing our announcement was “wait, what? Another Postgres MCP Server”.  But it turns out until today there was no dedicated Postgres vendor offering a fully featured and fully supported MCP Server that works with all your existing Postgres databases.  Most of the available Postgres MCP Servers are tied to the vendor's own products, and in particular their cloud database offering.The pgEdge MCP Server provides flexible deployment options: on-premises, in self managed cloud accounts or soon in our pgEdge Cloud managed cloud service. Additionally, when used with pgEdge Distributed Postgres, applications making use of the pgEdge MCP Server can meet enterprise requirements for high availability, multi-region failover and data sovereignty.  Most Postgres MCP servers out there give you basic read-only access—enough to query schemas and run SELECT statements. We built ours to be production-grade from day one: - The MCP server doesn't just list your tables. It pulls det[...]

Building AI Agents on Postgres: Why We Built the pgEdge Agentic AI Toolkit
Posted by Phillip Merrick in pgEdge on 2025-12-16 at 17:57

We are delighted today to be announcing the beta release of our pgEdge Agentic AI Toolkit for Postgres.  We’ve had the benefit of collaborating on real-world Postgres-based AI applications the past two years with a number of leading customers, and this product announcement is the outgrowth of this learning.We listened to customers as they refined their AI strategies in response to the rapid evolution of LLMs, Agentic AI and integration technologies such as the Model Context Protocol (MCP), and as we did so a few things stood out to us.First and foremost, many of the newly available tools and technologies are not suited to the needs of the enterprise, particularly in highly regulated industries or major government agencies.   Many of the new AI application builders and code generators – and the database platforms supporting them – do not adequately address enterprise requirements for high availability, data sovereignty, global deployment, security and compliance and the need in some cases to run on-premises or in self-managed cloud accounts.  As one CIO in financial services put it to us recently: “We’ve got a couple of dozen AI generated applications end users really want to put into production, but first we’ve got to figure out how to deploy them on our own internal compliant infrastructure.”Secondly, as compelling as it is to automate workflows with Agentic AI, or to generate new applications with tools like Claude Code, Replit, Cursor or Lovable, the biggest need is to work with existing databases and applications.  While newer Postgres-based cloud services work well with Agentic AI and AI app builders for brand new applications, they cannot accommodate existing databases and applications without a costly migration. And such a migration may well be to an environment that doesn’t meet the organization’s strict security and compliance requirements. Enterprise customers need AI tooling – including an MCP Server – that can operate against their existing databases.Additionally we saw there was no dedicated Postg

[...]

Which indexes can be corrupted after an operating system upgrade?
Posted by Laurenz Albe in Cybertec on 2025-12-16 at 06:00

Dialog between two men: "We had some corruption after the operating system upgrade." — "What, Windows 11 and a different codepage in Notepad?" — "No, PostgreSQL and a natural language collation."
© Laurenz Albe 2025

Most major Linux distributions have upgraded to the GNU C library version 2.28 or later. Therefore, there is a growing awareness that an operating system upgrade can lead to index corruption. However, rebuilding all your indexes can slow down the upgrade process considerably. In this article, I want to discuss how you can avoid rebuilding more indexes than necessary.

A management summary of the problem

For natural language collation support (the rules to compare and sort strings in natural languages), PostgreSQL largely relies on external libraries. Depending on how how you ran initdb, that could be the C library or the ICU library. Sometimes bugfixes in the new library versions that you install during an operating system upgrade lead to a change in a collation. Since database indexes are pre-sorted data structures, a change in the rules for sorting strings can render such indexes to be sorted incorrectly. This index corruption can lead to wrong query results and violated unique constraints.

The solution is to rebuild an affected index with the REINDEX command.

Keeping the down time for an operating system upgrade short

For many users, it is important that any down time of the database is as short as possible. For such users, the need to rebuild indexes is a problem. If you want to play it safe, you cannot start working with the database before rebuilding the indexes is completed. You could also take a more relaxed attitude and risk starting the application right away. Then you would use REINDEX INDEX CONCURRENTLY to rebuild problematic indexes while your application is running. The risk is that your index scans may return wrong results for a while, and few people are willing to take such a risk.

One way to perform an operating system upgrade that avoids the problem of corrupted indexes and can keep the down time very short is logical replication. Here, the subscriber builds its indexes based on logical information (similar to SQL statements). Consequently, th

[...]

From PGDays to PGCon/fs
Posted by Valeria Kaplan in Data Egret on 2025-12-15 at 22:51

A Journey Through PostgreSQL Events

• Introduction
• Community recognition (transparency, inclusivity, organisational balance)
• If I organise a PostgreSQL event — does it have to be recognised?
• Are community-recognised events better than those that aren’t?
• Conferences — a quick flyover (pgDays, pgConfs, FOSDEM,
PGConf.dev etc.)
• PostgreSQL User Groups / meet-ups
• Key takeaways
• Which PostgreSQL event to attend?

Introduction

Working at a PostgreSQL-focused company for over 10 years takes you places. Specifically, it takes you to PostgreSQL community events that are very different than your standard commercial conference.

The dynamic of community events is different. They feel like a village fête, where people have known each other for ages. There’s familiarity, shared purpose, and a genuine sense of belonging.

At the same time, there are many different Postgres-focused events — community and commercial ones, small meet-ups, bigger pgDays, and even larger pgConfs - each with a slightly different audience and scope in mind.

So when I’ve been recently asked, “Which Postgres conference would you recommend going to?” I realised this might be a good topic for a post.

A quick note —here, I’ll mention a handful of events, predominantly in Europe, but there are, of course, many others Postgres-focused events out-there. From PG Down Under in Australia to pgDay Mexico to PGConf NYC 2025 to growing user group meetings in Madagascar and Armenia, and the virtual ones, such as POSETTE and Postgres meetup for all— Postgres is definitely having its momentum.

When you see 🔔 below, those are actions or deadlines worth noting.

Community recognition

Majority of PostgreSQL conferences can be found on PostgreSQL.org events page and of those some are community recognised (note a small gold medal next to their titles) while others aren’t.

screenshot of conferences listed on postgresql.org. Some conferences have small golden medlas next to them.
Source postgresql.org

Community event recognition indicates that an event complies with the PostgreSQL event recog

[...]

Improved Markdown Parsing
Posted by David E. Wheeler on 2025-12-15 at 15:55

Quick announcement to say that I’ve replaced the ancient markdown parser with a new one, discount, which supports tables, code fences, definition lists, and more. I reindexed pg_clickhouse this morning and it’s sooo nice to see the table properly formatted.

New uploads will use this parser for Markdown (but not MultiMarkdown) files from now on. I think I’ll start working on reindexing all existing extensions, too. Give me a holler if you don’t see an improvement in your extensions in the next few days.

PostgreSQL 18 Asynchronous Disk I/O – Deep Dive Into Implementation
Posted by Josef Machytka in credativ on 2025-12-15 at 09:53

AI generated image symbolizing an asynchronous running databasePostgreSQL 17 introduced streaming I/O – grouping multiple page reads into a single system call and using smarter posix_fadvise() hints. That alone gave up to ~30% faster sequential scans in some workloads, but it was still strictly synchronous: each backend process would issue a read and then sit there waiting for the kernel to return data before proceeding. Before PG17, PostgreSQL typically read one 8kB page at a time.

PostgreSQL 18 takes the next logical step: a full asynchronous I/O (AIO) subsystem that can keep multiple reads in flight while backends keep doing useful work. Reads become overlapped instead of only serialized. The AIO subsystem is deliberately targeted at operations that know their future block numbers ahead of time and can issue multiple reads in advance:
  • Heap sequential scans, like plain SELECT and COPY operations that stream lots of data
  • VACUUM on big tables and indexes
  • ANALYZE sampling
  • Bitmap heap scans

Autovacuum benefits from this change too, since its workers share the same VACUUM/ANALYZE code paths. Other operations still remain synchronous for now:

  • B‑tree index scans / index‑only scans
  • Recovery & replication
  • All write operations INSERT, UPDATE, DELETE, WAL writes
  • Small OLTP lookups that touch a single heap page

Future work is expected to widen coverage, especially index‑only scans and some write‑path optimizations.

Significant improvements for cloud volumes

Community benchmarks show that PostgreSQL 18 AIO significantly improves cold cache data reads in cloud setups with network‑attached storage where latency is high. AWS documentation states that average latency of Block Express volumes is “under 500 microseconds for 16 KiB I/O size”, when latency of General Purpose volumes can exceed 800 microseconds. Some articles suggest that under high load each physical block read from disk can cost around 1ms, while page processing in PostgreSQL is much cheaper. By combining many pages into one read, all

[...]

PostgreSQL Contributor Story: Nishant Sharma
Posted by Floor Drees in EDB on 2025-12-15 at 07:56
Earlier this year we started a program (“Developer U”) to help colleagues who show promise for PostgreSQL Development to become contributors. Meet: Nishant Sharma, Staff SDE, who maintains a list of places in the world he has yet to visit.

Anonymising PII in PostgreSQL with pgEdge Anonymizer
Posted by Dave Page in pgEdge on 2025-12-15 at 06:13

Data privacy regulations such as GDPR, CCPA, and HIPAA have made it increasingly important for organisations to protect personally identifiable information (PII) in their databases. Whether you're creating a development environment from production data, sharing datasets with third parties, or simply trying to minimise risk, you'll often need to anonymise sensitive data whilst maintaining the structure and relationships within your database.I've been working on a tool to address this need: . It's a command-line utility that replaces PII in PostgreSQL databases with realistic but fake values, all whilst preserving referential integrity and data consistency.

The Problem

Consider a typical scenario: you have a production database containing customer records, and you need to create a copy for your development team. The data includes names, email addresses, phone numbers, National Insurance numbers, and credit card details. You can't simply hand over the production data as that would be a compliance nightmare, but you also need the development database to contain realistic data that exercises the same code paths as production.Manually anonymising this data is tedious and error-prone. You need to ensure that:
  • The same customer email appears consistently across all tables
  • Foreign key relationships remain intact
  • The anonymised data looks realistic (not just "XXXX" or "test@test.com")
  • The process is repeatable and auditable

Enter pgEdge Anonymizer

pgEdge Anonymizer addresses these challenges with a simple YAML-based configuration approach. You define which columns contain PII and what type of data they hold, and the tool handles the rest.

Installation

Building from source is straightforward:This produces a single binary in the directory that you can copy wherever you need it.

Configuration

The configuration file defines your database connection and the columns to anonymise. Here's a typical example:Each column is specified using its fully-qualified name () and assigned a pattern [...]

Contributions for week 51, 2025
Posted by Cornelia Biacsics in postgres-contrib.org on 2025-12-14 at 20:57

On December 11, the PGDay CERN 2026 CfP committee met to finalize the talk selection. The committee members are listed here:

  • Tobias Bussmann, SCNAT (chair, non voting)
  • Abel Cabezas Alonso, CERN
  • Maurizio De Giorgi, CERN
  • Andreas Geppert, ZKB
  • Julia Gugel, migrolino
  • Tom Hagel, Volue
  • Svitlana Lytvynenko, CYBERTEC *Gülçin Yıldırım Jelinek, Xata

IT Tage Frankfurt Dec 8-11 2025 Workshops:

  • Hans-Jürgen Schönig

Talks:

  • Thomas Koch
  • Raphael Salguero Aragón
  • Robert Baric
  • Hans-Jürgen Schönig
  • Adrien Obernesser
  • Bernd Patolla
  • Michael Mühlbeyer
  • Daniel Westermann
  • Dirk Krautschick
  • Chris Norman Bischoff
  • Jonas Gassenmeyer
  • Markus Dahm
  • Dr. Veikko Krypczyk

Meetup Frankfurt on December 10 2025 organised by Ilya Kosmodemiansky and Diego Calvo de No

Speaker

  • Andreas Scherbaum
  • Dirk Aumueller

New podcast episode What Postgres developers can expect from PGConf.dev published by Claire Giordano from her series “Talking Postgres” with Melanie Plageman.

On December 11 2025, Jimmy Angelakos organized the first PostgreSQL Edinburgh meetup. Chris Ellis and Jimmy Angelakos spoke at this meetup.

The Swiss PostgreSQL Users had a Meetup in Zurich, speakers are Stefan Keller and Lars Herrmann.

The Talk Selection Committee from the Prague PostgreSQL Developer Day finalized the schedule for P2D2 - the team consists of:

  • Pavlo Golub
  • Pavel Hák
  • Hana Litavská
  • Teresa Lopes
  • Mayur B
  • Esther Miñano
  • Josef Šimánek
  • Pavel Stěhule
  • Tomáš Vondra

How does the PostgreSQL Buildfarm check upgrades across versions?
Posted by Andrew Dunstan in EDB on 2025-12-12 at 19:47

From time to time I see questions from otherwise well informed people about how the PostgreSQL Build farm checks how pg_upgrade checking is done across versions, e.g. how does it check upgrading from release 9.5 to release 18. I realize that this isn't well documented anywhere, so here is a description of the process.

All of the code referenced here can be found at https://github.com/PGBuildFarm/client-code.

The principal buildfarm client script is run_build.pl, which builds and tests a single branch (e.g. REL_17_STABLE). There is another script called run_branches.pl, which calls this run_build

Zero-Downtime PostgreSQL Maintenance with pgEdge
Posted by Antony Pegg in pgEdge on 2025-12-12 at 06:02

PostgreSQL maintenance doesn't have to mean downtime anymore. With pgEdge's zero-downtime node addition, you can perform critical maintenance tasks like version upgrades, hardware replacements, and cluster expansions without interrupting production workloads. Your applications stay online. Your users stay connected. Your business keeps running.This capability is available across both single-primary deployments (with integration of the open source extension Spock) and globally distributed deployments (by default), giving you the same operational advantages whether you're running a single-region deployment or a globally distributed system. Now, for an incredibly quick and easy approach to zero-downtime node addition, you can use the pgEdge Postgres Control Plane (hosted on GitHub). This approach provides drastically simplified management and orchestration of Postgres databases using a declarative API, whether you're running a single-primary or a globally distributed deployment of PostgreSQL clusters. Spock and other high-availability components come built-in accompanying community PostgreSQL for quick database administration with simple commands.And because pgEdge and all associated components are 100% open source under the PostgreSQL license, using 100% core community PostgreSQL, you get to leverage the high-availability components that enable zero-downtime maintenance without vendor lock-in or compatibility concerns.

What Is the Spock Extension?

Spock is pgEdge's advanced logical replication extension for PostgreSQL that enables active-active (multi-master) replication in clusters with row filtering, column projection, conflict handling, and more. Even though Spock originated from earlier projects like pgLogical and BDR 1, it has seen enormous growth. Our dedicated team of PostgreSQL experts continues to push Spock forward, making it a high-performance, enterprise-grade replication system built for distributed environments.

Zero-Downtime Node Addition: Maintenance Without Interruption

Adding a new node [...]

VACUUM Is a Lie (About Your Indexes)
Posted by Radim Marek on 2025-12-11 at 23:30

There is common misconception that troubles most developers using PostgreSQL: tune VACUUM or run VACUUM, and your database will stay healthy. Dead tuples will get cleaned up. Transaction IDs recycled. Space reclaimed. Your database will live happily ever after.

But there are couple of dirty "secrets" people are not aware of. First of them being VACUUM is lying to you about your indexes.

The anatomy of storage🔗

When you delete a row in PostgreSQL, it is just marked as a 'dead tuple'. Invisible for new transactions but still physically present. Only when all transactions referencing the row are finished, VACUUM can come along and actually remove them - reclamining the space in the heap (table) space.

To understand why this matters differently for tables versus indexes, you need to picture how PostgreSQL actually stores your data.

Your table data lives in the heap - a collection of 8 KB pages where rows are stored wherever they fit. There's no inherent order. When you INSERT a row, PostgreSQL finds a page with enough free space and slots the row in. Delete a row, and there's a gap. Insert another, and it might fill that gap - or not - they might fit somewhere else entirely.

This is why SELECT * FROM users without an ORDER BY can return rows in order initially, and after some updates in seemingly random order, and that order can change over time. The heap is like Tetris. Rows drop into whatever space is available, leaving gaps when deleted.

Heap Page

When VACUUM runs, it removes those dead tuples and compacts the remaining rows within each page. If an entire page becomes empty, PostgreSQL can reclaim it entirely.

And while indexes are on surface the same collection of 8KB pages, they are different. A B-tree index must maintain sorted order - that's the whole point of their existence and the reason why WHERE id = 12345 is so fast. PostgreSQL can binary-search down the tree instead of scanning every possible row. You can learn more about the fundamentals of B-Tree Indexes and what makes th

[...]

pgBackRest PITR in Docker: a simple demo
Posted by Stefan Fercot in Data Egret on 2025-12-11 at 13:25

While moving production database workloads towards cloud-native (Kubernetes) environments has become very popular lately, plenty of users still rely on good old Docker containers. Compared to running PostgreSQL on bare metal, on virtual machines, or via a Kubernetes operator, Docker adds a bit of complexity, especially once you want to go beyond simple pg_dump / pg_restore for backups, upgrades, and disaster recovery.

A few years back, it was common to find open-sourced Dockerfiles from trusted companies bundling PostgreSQL with the extensions and tooling you needed. Most of those projects are now deprecated, as the ecosystem’s focus has shifted hard towards cloud-native patterns.

In many of my conversations with pgBackRest users, one theme comes up regularly: deploying pgBackRest for backups in Docker is straightforward, but restoring from those backups feels much harder. The usual advice was to “use a trusted Docker image and follow their guidelines”, but those images are mostly out-of-date now. These days you typically need to maintain your own image, and more importantly you need a reliable recovery playbook.

So I asked myself: how hard is point-in-time recovery (PITR) with pgBackRest for a PostgreSQL 18 Docker container? Turns out: not that hard, once you’ve seen it end-to-end.

This post is a small lab you can run locally. You’ll:

  • build a PostgreSQL 18 + pgBackRest image
  • take a full backup
  • create restore points
  • delete data on purpose
  • restore to the moment just before the delete

The tiny lab setup

The lab image is deliberately small. It just layers pgBackRest on top of the official PostgreSQL 18 image, adds a minimal config, and enables WAL archiving on first init.

Dockerfile

FROM postgres:18

# Install PGDG repository
RUN apt-get update && apt-get install -y \
wget \
gnupg \
lsb-release \
&& wget --quiet -O /usr/share/keyrings/postgresql-archive-keyring.asc https://www.postgresql.org/media/keys/ACCC4CF8.asc \
&& echo "deb [signed-by=/usr/share/keyri
[...]

Postgres 18 New Default for Data Checksums and How to Deal with Upgrades
Posted by Greg Sabino Mullane in Crunchy Data on 2025-12-11 at 13:00

In a recent Postgres patch authored by Greg Sabino Mullane, Postgres has a new step forward for data integrity: data checksums are now enabled by default.

This appears in the release notes as a fairly minor change but it significantly boosts the defense against one of the sneakiest problems in data management - silent data corruption.

Let’s dive into what this feature is, what the new default means for you, and how it impacts upgrades.

What is a data checksum?

A data checksum is a simple but powerful technique to verify the integrity of data pages stored on disk. It's like a digital fingerprint for every 8KB block of data (a "page") in your database.

  • Creation: When Postgres writes a data page (table and indexes) to disk, it runs an algorithm on the page's contents to calculate a derived, small value—the checksum.
  • Storage: This checksum is stored in the page header alongside the data.
  • Verification: Whenever Postgres reads that page back from disk, it immediately recalculates the checksum from the data and compares it to the stored value.

If the two values do not match, it means the data page has been altered or corrupted since it was last written. This is important because data corruption can happen silently. By detecting a mismatch, Postgres can immediately raise an error and alert you to a potential problem. Checksums are also an integral part of pgBackRest which uses these checksums to verify backups.

What is initdb and why does it matter?

The initdb command in Postgres is the utility used to create a new Postgres database cluster and initializes the data directory where Postgres stores all the permanent data. When you run initdb, it does things like:

  1. create the directory structure
  2. create the template databases like template1 and postgres
  3. populate the initial system catalog tables
  4. create the initial version of the server configuration files
  5. enable and start keeping track of checkums

The syntax often looks something lik

[...]

PGIBZ 2025: An Event for the Postgres Community in Ibiza
Posted by Evan Stanton in Data Bene on 2025-12-11 at 00:00

Postgres Ibiza (PGIBZ): An open source conference designed to bring together people with a love for PostgreSQL in Ibiza, a relaxed place for fresh and innovative discussions. An international event run by the nonprofit PostgreSQL España.

This was the first time that the Data Bene team attended the event, and we’re happy to share that it was a very positive experience.

The Conference

Location and Venue

As its name suggests, this conference takes place on the Mediterranean island of Ibiza. For those who are less familiar, the Official Tourism Site provides a nice overview that you may refer to. While it should go without saying, this UNESCO World Heritage Site has incredible offerings outside of the conference itself. Combined with a potentially long-haul flight, it’s strongly recommended to plan an extra day or two (or more) to explore the island and enjoy the local experience.

The event itself is hosted at the Palacio de Congresos de Ibiza; a two-story conference center with multiple rooms and catering. Though located outside of the capital city, the venue can be reached by car in good timing (~15 minutes). To add, the venue is situated within walking distance of numerous hotels, restaurants, and a beach! We happened to have stayed at Hotel Tres Torres, which we can recommend, though admittedly it’s difficult to make a wrong decision, especially following a sprinkle of background research.

Briefly, on the topic of extracurriculars, our go-to recommendation would be to visit the capital city, Ibiza. There you will be able to find a surprisingly large amount of shopping, restaurants, and sightseeing. Adding to that a walk through the historic neighborhood and you’ve got quite the memorable experience.

Attendance

This year’s PGIBZ welcomed about 30-40 attendees from across the globe, including China and the United States. And while there are certainly two sides to the coin, I consider this group size as more of a positive. Indeed, during both the presentation Q/A sessions, as well

[...]

PGIBZ 2025: An Event for the Postgres Community in Ibiza
Posted by Evan Stanton in Data Bene on 2025-12-11 at 00:00

Postgres Ibiza (PGIBZ): An open source conference designed to bring together people with a love for PostgreSQL in Ibiza, a relaxed place for fresh and innovative discussions. An international event run by the nonprofit PostgreSQL España.

This was the first time that the Data Bene team attended the event, and we’re happy to share that it was a very positive experience.

The Conference

Location and Venue

As its name suggests, this conference takes place on the Mediterranean island of Ibiza. For those who are less familiar, the Official Tourism Site provides a nice overview that you may refer to. While it should go without saying, this UNESCO World Heritage Site has incredible offerings outside of the conference itself. Combined with a potentially long-haul flight, it’s strongly recommended to plan an extra day or two (or more) to explore the island and enjoy the local experience.

The event itself is hosted at the Palacio de Congresos de Ibiza; a two-story conference center with multiple rooms and catering. Though located outside of the capital city, the venue can be reached by car in good timing (~15 minutes). To add, the venue is situated within walking distance of numerous hotels, restaurants, and a beach! We happened to have stayed at Hotel Tres Torres, which we can recommend, though admittedly it’s difficult to make a wrong decision, especially following a sprinkle of background research.

Briefly, on the topic of extracurriculars, our go-to recommendation would be to visit the capital city, Ibiza. There you will be able to find a surprisingly large amount of shopping, restaurants, and sightseeing. Adding to that a walk through the historic neighborhood and you’ve got quite the memorable experience.

Attendance

This year’s PGIBZ welcomed about 30-40 attendees from across the globe, including China and the United States. And while there are certainly two sides to the coin, I consider this group size as more of a positive. Indeed, during both the presentation Q/A sessions, as well

[...]

Introducing pg_clickhouse
Posted by David Wheeler on 2025-12-10 at 16:34
PostgreSQL Logo ⇔ pg_clickhouse ⇔ ClickHouse Logo

The ClickHouse blog has a posted a piece by yours truly introducing pg_clickhouse, a PostgreSQL extension to run ClickHouse queries from PostgreSQL:

While clickhouse_fdw and its predecessor, postgres_fdw, provided the foundation for our FDW, we set out to modernize the code & build process, to fix bugs & address shortcomings, and to engineer into a complete product featuring near universal pushdown for analytics queries and aggregations.

Such advances include:

  • Adopting standard PGXS build pipeline for PostgreSQL extensions
  • Adding prepared INSERT support to and adopting the latest supported
  • release of the ClickHouse C++ library
  • Creating test cases and CI workflows to ensure it works on PostgreSQL versions 13-18 and ClickHouse versions 22-25
  • Support for TLS-based connections for both the binary protocol and the HTTP API, required for ClickHouse Cloud
  • Bool, Decimal, and JSON support
  • Transparent aggregate function pushdown, including for ordered-set aggregates like percentile_cont()
  • SEMI JOIN pushdown

I’ve spent most of the last couple months working on this project, learning a ton about ClickHouse, foreign data wrappers, C and C++, and query pushdown. Interested? Try ou the Docker image:

docker run --name pg_clickhouse -e POSTGRES_PASSWORD=my_pass \
       -d ghcr.io/clickhouse/pg_clickhouse:18
docker exec -it pg_clickhouse psql -U postgres -c 'CREATE EXTENSION pg_clickhouse'

Or install it from PGXN (requires C and C++ build tools, cmake, and the openssl libs, libcurl, and libuuid):

pgxn install pg_clickhouse

Or download it and build it yourself from:

Let me know what you think!

More about…
[...]

What you should know about constraints in PostgreSQL
Posted by Gülçin Yıldırım Jelínek in Xata on 2025-12-10 at 12:00
In this blog, we explore Postgres constraints through the pg_constraint catalog, covering table vs. column constraints, constraint triggers, domains and more.

Building a RAG Server with PostgreSQL - Part 3: Deploying Your RAG API
Posted by Dave Page in pgEdge on 2025-12-10 at 09:24

In Part 1 we loaded our documentation into PostgreSQL. In Part 2 we chunked those documents and generated vector embeddings. Now it's time to put it all together with an API that your applications can use.
In this final post, we'll deploy the pgEdge RAG Server to provide a simple HTTP API for asking questions about your content. By the end, you'll have a working RAG system that can answer questions using your own documentation.

What the RAG Server Does

The RAG server sits between your application and the LLM, handling the retrieval part of Retrieval-Augmented Generation. When a query comes in, it:
  • Converts the query to a vector embedding
  • Searches for relevant chunks using both semantic (vector) and keyword (BM25) matching
  • Combines and ranks the results
  • Formats the top results as context for the LLM
  • Sends the context and query to the LLM
  • Returns the generated answer
This hybrid search approach - combining vector similarity with traditional keyword matching - tends to give better results than either method alone. Vector search catches semantically related content even when the exact words differ, while BM25 ensures you don't miss obvious keyword matches.

Prerequisites

Before we start, you'll need:
  • The database we set up in Parts 1 and 2, with documents and embeddings
  • An API key for your chosen LLM provider (Anthropic, OpenAI, or local Ollama)
  • Go 1.23 or later for building from source

Installing the RAG Server

Clone and build the server:This creates the binary at .

Configuration

The RAG server uses a YAML configuration file. Here's a basic setup:Save this as . Let's break down the key sections: - Where the API listens. Default is port 8080 on all interfaces. - Paths to files containing your API keys. Each file should contain just the key, nothing else. Make sure they have restrictive permissions (chmod 600). - This is where it gets interesting. A pipeline defines a complete RAG configuration: which database to query, which tables to sear[...]

How to Transition Easily to PostgreSQL from Oracle
Posted by Stéphane Carton in Data Bene on 2025-12-10 at 00:00

If you need to transition easily from Oracle to PostgreSQL without worrying about type conversions or other Oracle packages that require modifications to be PostgreSQL-compatible, a useful solution is IvorySQL!

Introduction

More and more companies are seeking to free themselves from the costs and constraints of Oracle by migrating to PostgreSQL. But how can this transition be achieved without spending months on it? Here is a proven, fast and secure method, based on feedback and open source tools.

Start by assessing and mapping

Before migrating, it is essential to take stock of the situation: we start by listing Oracle objects through performing a complete extraction (including tables, views, sequences, procedures, functions, and packages).

It is also important to communicate regularly with the client to refine findings and validate directions, as well as to identify dependencies (applications connected with which technology, jobs, interfaces, etc.).

Finally, identify the client’s Oracle-specific features (specific functions such as ROWID, DUAL, DBMS_XXX, etc.) and any other keywords requiring special attention.

Choosing the right open source tools

The success of a transition to PostgreSQL – because it is indeed a gradual transition and not a migration – depends as much on strategy as it does on tools.

It is not enough to simply copy objects: you need to understand the subtleties of the Oracle engine and intelligently transpose them into the PostgreSQL universe.

Among the essential tools, Ora2Pg plays a central role. Not only does it generate clear inventory reports, it also automates the conversion of schemas, data and PL/SQL code. It is a valuable accelerator for well-structured projects.

But in some contexts, the reality is more complex. Many customers accumulate significant technical debt, with thousands of lines of procedural code. Attempting to rewrite everything in PL/pgSQL in a short period of time would exhaust teams and compromise the migration itself. Maint

[...]

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.