With the postgis_raster
extension, it is possible to access gigabytes of raster data from the cloud, without ever downloading the data.
How? The venerable postgis_raster
extension (released 13 years ago) already has the critical core support built-in!
Rasters can be stored inside the database, or outside the database, on a local file system or anywhere it can be accessed by the underlying GDAL raster support library. The storage options include S3, Azure, Google, Alibaba, and any HTTP server that supports RANGE requests.
As long as the rasters are in the cloud optimized GeoTIFF (aka "COG") format, the network access to the data will be optimized and provide access performance limited mostly by the speed of connection between your database server and the cloud storage.
Set up a database named raster
with the postgis
and postgis_raster
extensions.
CREATE EXTENSION postgis;
CREATE EXTENSION postgis_raster;
ALTER DATABASE raster
SET postgis.gdal_enabled_drivers TO 'GTiff';
ALTER DATABASE raster
SET postgis.enable_outdb_rasters TO true;
COG is still a new format for public agencies, so finding a large public example can be tricky. Here is a 56GB COG of medium resolution (30m) elevation data for Canada. Don't try and download it, it's 56GB!
You can see some metadata about the file using the gdalinfo
utility to read the headers.
gdalinfo /vsicurl/https://datacube-prod-data-public.s3.amazonaws.com/store/elevation/mrdem/mrdem-30/mrdem-30-dsm.tif
Note that we prefix the URL to the image with /viscurl/
to tell GDAL to use virtual file system access rather than direct download.
There is a lot of metadata!
Driver: GTiff/GeoTIFF
Files: /vsicurl/https://datacube-prod-data-public.s3.amazonaws.com/store/elevation/mrdem/mrdem-30/mrdem-30-dsm.tif
Size is 183687, 159655
Coordinate System is:
PROJCRS["NAD83(CSRS) / Canada Atlas Lambert",
BASEGEOGCRS["NAD83(CSRS)",
DATUM["NAD83 C
FOSDEM has long been a key event for the PostgreSQL community, and 2025 was no exception. Every year, PostgreSQL developers, contributors, and users worldwide gather in Brussels for a week packed with talks, meetings, and discussions. Our community has a strong tradition at FOSDEM, with a dedicated developer room, a booth, and numerous side events
One of the most recognizable FOSDEM traditions is FOSDEM Weather — a perfect mix of rain, cold, and clouds that always greets us in Brussels. Over the years, we even started selling PostgreSQL-branded umbrellas at our booth, and you could track the weather by watching the sales spike! But this year, for the first time in a while, the weather was sunny and perfect — so we joked that the weather was trying to lower our sales. 😄
FOSDEM week is always busy for the PostgreSQL community, with multiple events happening alongside the main conference:
Held earlier in the week, the Prague PostgreSQL Developer Day (P2D2) is a major PostgreSQL event that gathers experts for in-depth discussions. For a detailed experience of the event, check out Ants Aasma’s blog post.
On Thursday, January 30th, PostgreSQL core developers and contributors held a private developer meeting to discuss ongoing projects, significant challenges, and future PostgreSQL development. More details can be found on the official event page.
On Friday, January 31st, we had FOSDEM PGDay, a full-day PostgreSQL conference. The schedule featured talks on a wide range of PostgreSQL topics, from performance tuning to backups and new features. The complete schedule is available here.
The PostgreSQL Devroom took place on Sunday, February 2nd, featuring sessions on database performance, replication, analytics, AI/ML, and more. This year, we had great discussions about PostgreSQL internals, best practices, and real-world use cases. You can fin
[...]The Citus 13.0 release is out and includes PostgreSQL 17.2 support! We know you’ve been waiting, and we’ve been hard at work adding features we believe will take your experience to the next level, focusing on bringing the Postgres 17 exciting improvements to you at distributed scale.
The Citus database is an open-source extension of Postgres that brings the power of Postgres to any scale, from a single node to a distributed database cluster. Since Citus is an extension, using Citus means you're also using Postgres, giving you direct access to the Postgres features. And the latest of such features came with Postgres 17 release! In addition, Citus 13 will be made available on the elastic clusters (preview) feature on Azure Database for PostgreSQL - Flexible Server, along with PostgreSQL 17 support, in the near future.
PostgreSQL 17 highlights include performance improvements in query execution for indexes, a revamped memory management system for vacuum, new monitoring and analysis features, expanded functionality for managing data in partitions, optimizer improvements, and enhancements for high-concurrency workloads. PostgreSQL 17 also expands on SQL syntax that benefits both new workloads and mission-critical systems, such as the addition of the SQL/JSON JSON_TABLE()
command for developers, and the expansion of the MERGE
command. For those of you who are interested in upgrading to Postgres 17 and scaling these new features of Postgres: you can upgrade to Citus 13.0!
Along with Postgres 17 support, Citus 13.0 also fixes important bugs, and we are happy to say that we had many community contributions here as well. These bugfixes focus on data integrity and correctness, crash and fault tolerance, and cluster management, all of which are critical for ensuring reliable operations and user confidence in a distributed PostgreSQL environment.
Let's take a closer look at what's new in Citus 13.0:
[...]For several years there is a PGDay in Brussels on the Friday before FOSDEM organized by PostgreSQL Europe.
The talks that I attended have been good. Only that it the speakers have not been very diverse.
That was alike the audience. I have only counted four women, and three of them are members of the PostgreSQL Europe Diversity Committee.
And all three members of the Diversity Committee gave a lightning talk.
For the first time ever I gave a lightning talk at a conference. The idea is based on the Books paragraph in a blog post I wrote about sources where to learn PostgreSQL.
The idea came from Claire Giordano and Karen Jex enouraged me, to submit it.
A big thank you goes to Floor Drees for bringing lightning talks to Brussels.
https://fosdem.org/2025/schedule/track/postgresql/ I extended the book list on the blog with some additional infos, you can find the slides on GitLab.
FOSDEM is the biggest open source conference in Europe scheduled on the first weekend of February each year. It takes place in Brussels at the UNIVERSITÉ LIBRE DE BRUXELLES (ULB).
This year was the 25th birthhttps://fosdem.org/2025/schedule/track/postgresql/day of the conference, and it had overall more than 700 talks on two days.
For several years now there is a so called devroom, where PostgreSQL Europe organizes talks and manages the audience. There is a video host, a room host, several room bouncer aka Elephant Herders.
In addition a stand has to be maintained where people ask questions and one could by PostgreSQL merchandise stuff.
Everything has to be taken to Brussels, that has been handled by Andreas Scherbaum and Ilya Kosmodemiansky. They both came by car packed with shirts, hoodies, pins, stickers, small plusch Sloniks.
Not to forget that there is a Sloinik costume that has to be worn by people. And as one doesn’t see a lot through from inside the costume, another person is needed
[...]Originally the PL/pgSQL was very simple language and integration procedural language with SQL was very primitive. Very specific feature of PL/pgSQL is translation of every expression to SQL. Thanks to this translation, the PL/pgSQL is very simple and really fully integrated with SQL engine. Bad side is possibility to write dirty or broken code, that is not detected.
Every expression is translated to queries. So expression `10` is translated to `SELECT 10`. The PL/pgSQL allows dirty expressions like `a FROM foo WHERE x = 10`. Although this syntax is ugly, it was very popular and it is supported and will be supported for ever.
Unfortunately, this support of this syntax with removing some limits in PostgreSQL 14, allows new kind of bugs (see https://www.postgresql.org/message-id/CALyvM2bp_CXMH_Gyq87pmHJRuZDEhV40u9VP8rX=CAnEt2wUXg@mail.gmail.com).
When somebody write code:
DECLARE var int;
BEGIN
var := 10
DELETE FROM tab WHERE varx = 20;
END;
This code on PostgreSQL 13 and older fails, but on PostgreSQL 14 and higher just quietly does nothing. Why?
The problem is in missing semicolon after `10`. PL/pgSQL translates the code to query:
`SELECT 10 DELETE FROM tab WHERE varx = 20`. This query fails on older Postgres, because `DELETE` is keyword and requires usage of `AS` keyword. But PostgreSQL 14 doesn't need it - It allows to usage a keywords like column names without necessity to use `AS` keyword.
I wrote a patch to Postgres - that implements new extra check that can detect this issue https://commitfest.postgresql.org/52/5044/. But this patch is waiting for review half year. I am not sure how much of plpgsql's developers using extra checks.
Today I found a way, how it is possible to detect this issue without necessity to modify SQL parser, and I wrote new check to plpgsql_check (it is merged in master branch):
(2025-02-05 22:17:04) postgres=# create or replace function foofoo()
returns void as $$
declare x int;
begin
x := 1 delete from foo where x = 10
Cover photo by Mayukh Karmakar
An issue I often see folks missing when reviewing query plans, is that all of their scans involve indexes, so they think that the query is likely already as fast (or efficient) as it can be.
Many know that a Seq Scan with a high number of Rows Removed by Filter is a sign an index could help. But the reason an index speeds things up there applies just as much to other scans with large filters, where a better suited index could speed things up! While often we will see index filters show as “Rows Removed by Filter”, sadly there are cases that aren’t currently reported by EXPLAIN (but we’ll get to that later).
In this post we’ll look through several examples, some things to looks out for, and how to then go about making things faster.
It’s worth noting that there are other ways that index use can still be optimised further. One such example that we also highlight via the “Index Efficiency” tips in pgMustard are a high number of Rows Removed by Index Recheck. But this post will be long enough already, so we’ll take a look at those another day.
Let’s create a simple example table to demonstrate:
CREATE TABLE example ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, a smallint NOT NULL, b smallint NOT NULL); INSERT INTO example (a, b) SELECT random (1, 1_000), random (1, 1_000) FROM generate_series(1, 1_000_000) AS i; VACUUM ANALYZE example; SELECT * FROM example LIMIT 5; id | a | b ----+-----+----- 1 | 803 | 627 2 | 978 | 702 3 | 15 | 506 4 | 966 | 335 5 | 247 | 172
So we have a million rows in total, with an integer primary key, and two extra columns a and b, which contain random integers between 1 and 1000.
Let’s take a look at several indexing options, for optimising a simple query that filters on both a and b:
CREATE INDEX example_a_idx ON example (a); EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS) SELECT id FROM example WHERE a = 42 AND b = 42;[...]
Introduction When building production-grade applications in Go that interact with PostgreSQL, efficient database connection management is crucial for performance and reliability. While you might start with single connections using the pgx driver, as your application scales, you’ll need a more sophisticated approach to handle multiple concurrent database operations. This is where connection pools, specifically pgx/v5/pgxpool, […]
The post PostgreSQL Client-side connection pooling in Golang using pgxpool appeared first on HexaCluster.
Some of you have been asking for advice about what to submit to the CFP for POSETTE: An Event for Postgres 2025. So this post aims to give you ideas that might help you submit a talk proposal (or 2, or 3) before the upcoming CFP deadline.
If you’re not yet familiar with this conference, POSETTE: An Event for Postgres 2025 is a free & virtual developer event now in its 4th year, organized by the Postgres team at Microsoft.
I love the virtual aspect of POSETTE because the conference talks are so accessible—for both speakers and attendees. If you’re a speaker, you don’t need travel budget $$—and you don’t have to leave home. Also, the talk you’ve poured all that energy into is not limited to the people in the room, and has the potential to reach so many more people. If you’re an attendee, well, all you need is an internet connection
The CFP for POSETTE: An Event for Postgres will be open until Sunday Feb 9th at 11:59pm PST. So as of the publication date of this blog post, you still have time to submit a CFP proposal (or 2, or 3, or 4)—and to remind your Postgres teammates and friends of the speaking opportunity.
If you have a Postgres experience, success story, failure, best practice, “how-to”, collection of tips, lesson about something that's new, or deep dive to share—not just about the core of Postgres, but about anything in the Postgres ecosystem, including extensions, and tooling, and monitoring—maybe you should consider submitting a talk proposal to the CFP for POSETTE.
If you’re not sure about whether to give a conference talk, there are a boatload of reasons why you should. And there’s also a podcast episode with Álvaro Herrera, Boriss Mejías, and Pino de Candia that makes the case for why giving conference talks matters. For inspiration, you can also take a look at the playlist of POSETTE 2024 talks.
And if you’re looking for even more CFP ideas, you’ve come to the right place! Read on…
On the CFP page there is a list of
[...]While answering support issues on pgBackRest, I regularly see some users falling in the infinite archives retention trap and asking the same question: Why are my old WAL archives not being expired?
This is pretty much always linked to a bad configuration of the repo-retention-archive setting. For example, using --repo1-retention-archive=7 --repo1-retention-diff=7 --repo1-retention-full=1
.
Let’s see what this setting means and how it should be used.
According to its documentation, the repo-retention-archive
option specifies the number of backups worth of continuous WAL to retain.
If this value is not set and repo-retention-full-type is count (default), then the archive to expire will default to the repo-retention-full (or repo-retention-diff) value corresponding to the repo-retention-archive-type if set to full (or diff). This will ensure that WAL is only expired for backups that are already expired.
Let’s use a very basic configuration for this example:
[global] repo1-path=/var/lib/pgbackrest repo1-retention-full=2 start-fast=y log-level-console=info log-level-file=detail compress-type=zst [my_stanza] pg1-path=/var/lib/pgsql/17/data
The repository is stored locally for the purpose of this demo, with the configuration set to keep two full backups.
Let’s generate some data using pgbench
:
$ pgbench -i -s 50 test dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data (client-side)... vacuuming... creating primary keys... done in 5.36 s... $ pgbackrest info stanza: my_stanza status: error (no valid backups) cipher: none db (current) wal archive min/max (17): 000000010000000000000008/00000001000000000000002E $ ./walNb.pl --min=0[...]
I arrived in the beautiful city of Prague on Tuesday evening to take part in Prague PostgreSQL Developer Day (P2D2) for the first time . Here are some impression this wonderful event left me with.
The event started out with a speaker dinner,where I had the chance to enjoy a delicious meal and chat through the evening with Robert Haas, Esther Minano, Josef Machytka and Alfredo Rodriguez. I'm especially thankful to Josef, who had 3D-printed some beautiful elephants as gifts for the speakers!
After dinner, I headed back to the hotel to get rest for the upcoming eventful day. In the morning after a quick breakfast, it was time to make my way to the conference venue. Though it was a bit cold, the beautiful sunny morning walk towards the conference was quite refreshing. When I arrived, the place was already bustling with PostgreSQL people—a mix of familiar faces from other conferences and many new ones. I later learned that nearly 300 participants had registered for the event.
After registering and grabbing a coffee, it was time for the first talks. There were two tracks to choose from with so many great talks that it was hard to pick. I decided to start the day with Jan Suchánek’s talk on analytical functions in PostgreSQL. It was a practical introduction into window functions in SQL, explaining when and how to use them. The hands-on examples were neat to see.
The following talk I attended was “Advanced Performance Tuning in PostgreSQL” with Somdyuti Paul from Google. He shared insights on setting up PostgreSQL for best performance and robust operation, clearly based on his extensive experience with helping organisations run databases at scale.
Next up was Gülçin Yıldırım Jelinek’s talk on the anatomy of table-level locks. Besides explaining different lock levels, she shared great tips on avoiding unnecessary locks and introduced best practices for schema migrations, including the expand-contract pa
[...]In this post, I would like to discuss the stability of standard Postgres statistics (distinct, MCV, and histogram over a table column) and introduce an idea for one more extension - an alternative to the ANALYZE command.
My interest in this topic began while wrapping up my previous article when I noticed something unusual: the results of executing the same Join Order Benchmark (JOB) query across a series of consecutive runs could differ by several times and even orders of magnitude - both in the value of the execution-time and in pages-read.
This was puzzling, as all variables remained constant - the test script, laptop, settings, and even the weather outside were the same. This prompted me to investigate the cause of these discrepancies… .
In my primary activity, which is highly connected to query plan optimisation, I frequently employ JOB to assess the impact of my features on the planner. At a minimum, this practice enables me to identify shortcomings and ensure that there hasn't been any degradation in the quality of the query plans produced by the optimiser. Therefore, benchmark stability is crucial, making the time spent analysing the issue worthwhile. After briefly examining the benchmark methodology, I identified the source of the instability: the ANALYZE
command.
In PostgreSQL, statistics are computed using basic techniques like Random Sampling with Reservoir, calculating the number of distinct values (ndistinct), and employing HyperLogLog for streaming statistics - for instance, to compute distinct values in batches during aggregation or to decide whether to use abbreviated keys for optimisation. Given that the nature of statistics calculation is stochastic, fluctuations are expected. However, the test instability raises the following questions: How significant are these variations? How can they be minimised? And what impact do they have on query plans? Most importantly, how can we accurately compare benchmark results when such substantial deviations are present, even in the baseline
[...]I have been watching the codification of spatial data types into GeoParquet and now GeoIceberg with some interest, since the work is near and dear to my heart.
Writing a disk serialization for PostGIS is basically an act of format standardization – albeit a standard with only one consumer – and many of the same issues that the Parquet and Iceberg implementations are thinking about are ones I dealt with too.
Here is an easy one: if you are going to use well-known binary for your serialiation (as GeoPackage, and GeoParquet do) you have to wrestle with the fact that the ISO/OGC standard for WKB does not describe a standard way to represent empty geometries.
Empty geometries come up frequently in the OCG/ISO standards, and they are simple to generate in real operations – just subtract a big thing from a small thing.
SELECT ST_AsText(ST_Difference(
'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))',
'POLYGON((-1 -1, 3 -1, 3 3, -1 3, -1 -1))'
))
If you have a data set and are running operations on it, eventually you will generate some empties.
Which means your software needs to know how to store and transmit them.
Which means you need to know how to encode them in WKB.
And the standard is no help.
But I am!
All WKB geometries start with 1-byte “byte order flag” followed by a 4-byte “geometry type”.
enum wkbByteOrder {
wkbXDR = 0, // Big Endian
wkbNDR = 1 // Little Endian
};
The byte order flag signals which “byte order” all the other numbers will be encoded with. Most modern hardware uses “least significant byte first” (aka “little endian”) ordering, so usually the value will be “1”, but readers must expect to occasionally get “big endian” encoded data.
enum wkbGeometryType {
wkbPoint = 1,
wkbLineString = 2,
wkbPolygon = 3,
wkbMultiPoint = 4,
wkbMultiLineString = 5,
wkbMultiPolygon = 6,
wkbGeometryCollection = 7
};
The type number is an integer from 1 to 7, in the indicated byte order
[...]Materialized views are widely used in Postgres today. Many of us are working with using connected systems through foreign data wrappers, separate analytics systems like data warehouses, and merging data from different locations with Postgres queries. Materialized views let you precompile a query or partial table, for both local and remote data. Materialized views are static and have to be refreshed.
One of the things that can be really important for using materialized views efficiently is indexing.
Adding indexes to Postgres in general is critical for operation and query performance. Adding indexes for materialized views is also generally recommended for a few different reasons.
In case you have not thought about Postgres views and Postgres materialized views recently, let’s just do a quick refresher.
A view is a saved query. It is not stored on the disk. It dynamically fetches data from the underlying tables whenever queried. Since views do not have their own storage, views cannot have indexes.
Materialized views do not dynamically fetch data from underlying tables- they are stored on disk - and must be explicitly refreshed to update the contents. This makes them ideal for scenarios involving complex queries or frequent access to relatively static datasets. Because they can be stored on disk, materialized views
Backups are crucial for any mission-critical application as they protect against unforeseen disasters. Regular backups help minimize the Recovery Point Objective (RPO), allowing systems to recover quickly with minimal data loss. However, it's equally important to store backups safely. If backups are kept in the same location as the primary site and something goes wrong, you may have no way to recover, leading to complete data loss.
To reduce these risks, many organizations choose fully managed servers to host their databases. One popular option is Azure Flexible Server for PostgreSQL, which offers a reliable, scalable, and managed solution.
Azure provides 3 levels of redundancy in three different ways, and not only that, you can recover backups using these same three methods. These are
Locally Redundant Storage
Zone Redundant Storage
Geo Redundant
Each level of redundancy offers unique advantages when it comes to restoring backups. In today's blog, we will explore all three types of backups and recovery methods. We will dive into the differences between each type and learn how to restore your backup if your primary site goes down.
The post A Guide to Restoring a PostgreSQL Database from Disaster Using Azure Flexible Server appeared first on Stormatics.
CommitFest 2025-01 came to an end this week, with a final batch of user-visible changes:
ALTER TABLE ... ALTER COLUMN ... in case of an error, the position of the affected column will now be shown in the error message log_rotation_size maximum value changed to 2TB pg_stat_all_tables etc. cumulative time spent in [auto]vacuum and [auto]analyze now trackedUnusually there was also one reversion of a small improvement previously added to PostgreSQL 17 : Revert "Speed up tail processing when hashing aligned C strings, take two" (commit 235328ee ). See the pgsql-hackers thread " Re: Change GUC hashtable to use simplehash? " for details.
Going forward, there will be two changes affecting CommifFests :
Commitfest app release on Feb 17 with many improvements 15 user-visible changes list, all of which look to be very useful. new commitfest transition guidance Pending patches should now only be moved forward by someone involved in the patch, and only if the patch is active. This is to prevent dormant patches being shunted forward indefinitely. PostgreSQL 18 articles PostgreSQL 18: Per-relation cumulative statistics for [auto]vacuum and [auto]analyze (2025-01-31) - Daniel Westermann / dbi services Waiting for PostgreSQL 18 – Add OLD/NEW support to RETURNING in DML queries (2025-01-30) - Hubert 'depesz' Lubaczewski PostgreSQL 18: Add OLD/NEW support to RETURNING in DML queries (2025-01-22) - Daniel Westermann / dbi servicesSlides and transcript from my 5 minute lightning talk about the PostgreSQL Europe Diversity Task Force at FOSDEM PGDay 2025.
This was a tough one to fit into 5 minutes, but I'm pleased with how much I managed to share.
Did you know that PGEU now has a Diversity Task Force?
First, briefly, what is diversity?
In a nutshell: variety.
Especially
The inclusion of people of different races, cultures etc. in a group.
Diversity is about more than gender or skin colour, although those are important.
it’s also about:
There are so many attributes that bring variety to our community, and make each of us unique.
Why do we have a diversity task force?
It’s no secret that there are challenges in terms of diversity across the entire tech industry, and Postgres is no exception.
But diversity has been shown to be good for everyone.
We know that diverse people bring unique skills and viewpoints, making the Postgres project better for all of us.
It’s important that everyone feels welcomed, represented, and valued within our community, and there are things that all of us can do better to make that a reality.
For example, take a look around you.
If most of the people around you look a lot like you,
try to imagine how it might feel to be one of the people who doesn’t look a lot like you.
When the team organising an event looks like this [picture on slide]
and when the speaker lineup looks like this [picture on slide]
it’s easy to understand why not everyone feels represented or included.
It’s also easy to see how we might unconsciously be giving more opportunities to people like us, and continually getting feedback that supports our existing biases.
Note, this is not a criticism of the folks that organise
[...]PostgreSQL is already known for its reliability, extensibility, and open-source pedigree and continues to grow and evolve with each release. PostgreSQL 17 introduces several performance improvements and features that make it a powerhouse for OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) workloads.This blog will explore advanced performance tuning techniques for PostgreSQL 17 and highlight key improvements compared to versions 15 and 16.
Starting with version 17, PostgreSQL provides a feature many Oracle users have been waiting for: the LOGON trigger. The idea is to make the database launch a function as soon as the user tries to log in.
This post explains how this works and how such a LOGIN trigger (as it is called in the PostgreSQL world) can be implemented and used in real life.
A LOGIN trigger is defined for a database (not for an instance). To demonstrate how this works, we first create a simple database:
local_machine:~ hs$ psql postgres
psql (17.2)
Type "help" for help.
postgres=# CREATE DATABASE login_example;
CREATE DATABASE
postgres=# \c login_example
You are now connected to database "login_example" as user "hs".
The idea of this example is to write a simple log entry for each successful login attempt. Here is the definition of the table:
login_example=# CREATE TABLE t_user_login (
id serial,
tstamp timestamptz DEFAULT now(),
who text
);
CREATE TABLE
Like any trigger in PostgreSQL, we first need to create a function before identifying the trigger itself. The special aspect here is that the trigger has to be enabled to make it fire during the login process.
The following code demonstrates how this works:
BEGIN;
CREATE OR REPLACE FUNCTION on_login_proc()
RETURNS event_trigger AS
$$
BEGIN
INSERT INTO t_user_login (who)
VALUES (SESSION_USER);
END;
$$ LANGUAGE plpgsql;
CREATE EVENT TRIGGER on_login_event
ON login
EXECUTE FUNCTION on_login_proc();
ALTER EVENT TRIGGER on_login_event ENABLE ALWAYS;
COMMIT;
The important observation is the function returning an event_trigger variable, a special data type specifically for this purpose. The function itself is simple PL/pgSQL code.
Next, we define the event trigger. The event we want to listen for is "login"—a new feature available starting from PostgreSQL 17.
Finally, the event
[...]Google Summer of Code 2025 is here, and PostgreSQL is once again proud to participate in this amazing program! Over the years, many talented contributors have started their PostgreSQL journey through GSoC, with some becoming long-term members of our community. Their commitment has helped improve PostgreSQL and its ecosystem, making this program an essential part of our growth and innovation.
Looking ahead, we are excited for new ideas, fresh contributors, and more diversity in projects, mentors, and mentees. PostgreSQL welcomes everyone who wants to contribute and learn, and we encourage both new and experienced community members to get involved.
This year, the GSoC team is particularly looking to support more security and AI/ML projects. If you know of an AI/ML or security-focused open-source project, encourage them to apply for GSoC 2025!
GSoC is open to students and newcomers to open-source development, making it an excellent opportunity for those looking to get involved with PostgreSQL. Whether you’re interested in database internals, extensions, drivers, performance tools, or UI development, there’s a place for you in the PostgreSQL ecosystem.
Key Dates:
The most important part of our application as a mentoring organization is having a well-defined list of project ideas. These projects should take approximately:
We encourage all PostgreSQL community members to submit project ideas and volunteer as mentors. Mentorship is essential
[...]It’s time for the fourth episode of Postgres Café, a podcast from our teams at Data Bene and Xata where we discuss PostgreSQL contribution and extension development. In this latest episode, Sarah Conway and Gülçin Yıldırım Jelinek meet with Stéphane Carton to cover Citus Data, a completely open-source extension from Microsoft that provides a solution for deploying distributed PostgreSQL at scale.
The Citus database has experienced 127 releases since Mar 24, 2016 when it was first made freely open-source for open use and contributions. It’s a powerful tool that works natively with PostgreSQL, and seamlessly integrates with all Postgres tools and extensions. Continue reading for a summary of what we covered in this podcast episode!
So why does Citus Data exist, and what problems does it solve? Let’s delve into this by category.
Citus is designed to solve the distributed data modeling problem by providing methods in distributed data modeling to map workloads, such as sharding tables based on primary keys (especially useful for microservices and high-throughput workloads.
By distributing data across multiple nodes, you’re able to enable the horizontal scaling of PostgreSQL databases.
This allows developers to combine CPU, memory, storage, and I/O capacity across multiple machines for handling large datasets and high traffic workloads. It’s simple to add more worker nodes to the cluster and rebalance the shards as your data volume grows.
The distributed query engine in Citus is used to maximize efficiency, parallelizing queries and batching execution across multiple worker nodes.
Even in cases where there are thousands to millions of statements being executed per second, data ingestion is still optimized through finding the right shard placements, connecting to the appropriate worker nodes, and performing operations in parallel. All of th
[...]I delivered my presentation "Databases in the AI Trenches" today as part of a tutorial at Prague PostgreSQL Developer Day, so I have placed my slides online. It focuses on semantic search, generative AI, and RAG, and builds on my discriminative AI talk, "Postgres and the Artificial Intelligence Landscape."
PostgreSQL version 12 introduced a new option on the VACUUM
command, INDEX_CLEANUP
. You should (almost) never use it.
First, a quick review of how vacuuming works on PostgreSQL. The primary task of vacuuming is to find dead tuples (tuples that still exist on disk but can’t ever be visible to any transaction anymore), and reclaim them as free space. At a high level, vacuuming proceeds as:
maintenance_work_mem
, or autovacuum_work_mem
for autovacuum).
Most of the time in vacuuming is spend removing the dead tuples from the indexes. It has to do this first, because otherwise, you would have tuple references in the indexes that are now invalid… or, worse, look like they are valid but point to wrong tuples!
INDEX_CLEANUP OFF
turns off removing dead tuples from the indexes. This means it also can’t remove dead tuples from the heap! In effect, turning INDEX_CLEAUP
OFF
defeats the whole purpose of vacuuming.
(The documentation does explain this, but uses terminology that can be unfamiliar to a lot of PostgreSQL users.)
Well… um, why would you ever use it, then? The other thing that vacuuming does is mark any tuples which can always be visible to any transaction (until modified or deleted) as “frozen.” This is an important operation to prevent data corruption due to PostgreSQL’s 32-bit transaction ID (you can get more information about freezing here).
Sometimes, for one reason or another, PostgreSQL has been unable to completely “freeze” a table, and the database can get dangerously close to the point that the wraparound data corruption can occur. Since vacuuming the indexes takes most of the time, sometimes, you want to tell PostgreSQL to skip that step, and just vacuum the heap so that the wraparound danger has passed. Thus, the INDEX_CLEANUP
option.
But you still
[...]
During training sessions, I tell my students how they can tune transactional workloads by disabling the parameter synchronous_commit
. It is easy to demonstrate the striking effects of that measure, but the possibility to lose committed transactions during an operating system crash makes it a non-starter for many applications. So I tell the students about commit_delay
and commit_siblings
, but when they ask me for a demonstration, I have to tell them that that would be too complicated. So I decided to try it out and write about it.
To make sure that committed transactions cannot get lost, PostgreSQL has to make sure that the WAL for the transaction is flushed to disk before it can report success to the client. If the database workload is dominated by small data modifications, the IOPS generated by these transactions can saturate the disk, even though the amount of data written is moderate.
The parameter pair commit_delay
and commit_siblings
can relax the bottleneck by reducing the number of IOPS necessary for those WAL flushes.
commit_delay
and commit_siblings
work?
You activate the feature by setting commit_delay
to a value greater than zero. Whenever a transaction reaches the point where it would flush the WAL to disk during a commit, it first examines how many other transactions are currently active. If there are at least commit_siblings
other transactions open and not waiting for a lock, PostgreSQL doesn't flush the WAL immediately, but waits for commit_delay
microseconds. After that delay, some other transactions may have reached the point when they are ready to flush the WAL. All these backends can then perform their WAL flush in a single I/O operation.
commit_delay
is not easy to tune, because the delay will make the transaction take longer. On the other hand, if you choose a value that is too low, no other transaction might be ready by the time the delay has passed, and you c
It is rare that a Postgres table keeps the exact same structure year after year. New columns get added. Old columns get dropped. Column data types need to change. Those are all done with the ALTER TABLE command. One big drawback to these changes is that they may force a complete table rewrite. A rewrite means a completely new copy of the table is created, and then the old one is dropped. This can take a very long time for large tables. Worse, everything else is blocked/locked from using the table, so your application may need downtime.
So which commands need a full table rewrite? Which only needs a split-second lock? The alter table documentation has some guidelines, but at the end of the day, you want to know for sure if your production application is going to require major downtime or not. Presented here is a quick recipe for safely determining if a rewrite will happen. In short make a copy of the table, modify that copy, see if the underlying file on disk for that table has changed.
Let’s look at what happens when a table is modified. For this, we will use the good old reliable pgbench_accounts table, which gets created when you run pgbench -i
. We can simulate the effects of a long-running table rewrite by putting our statement into a transaction. For this first one, let’s add a new column to it and see what locks are being held:
greg=> begin;
BEGIN
greg=*> alter table pgbench_accounts add chocolates int;
ALTER TABLE
greg=*> select locktype, mode from pg_locks
where relation::regclass::text = 'pgbench_accounts';
locktype | mode
----------+---------------------
relation | AccessExclusiveLock
The AccessExclusiveLock is on the entire table, and is a very, very strong lock that blocks almost all other access to the table from other processes. Let’s start another process and see what happens:
-- Without the lock_timeout, this update would hang forever,
-- or until the other transaction commits:
greg=> set lock_timeout TO '5s';
SET
greg=> update pgbench_accounts set bid = bid;
ERR
[...]
In this blog (the third in my series), I'd like to present yet another new feature in the PostgreSQL 17 release: enhancement to logical replication functionality in PostgreSQL. The blog will also provide a small script that demonstrates how to use this feature when upgrading from Postgres 17 to a future version. In my prior blogs, (also published on Planet PostgreSQL, and DZone) I have written about other PG-17 features which you can read about:
In " an awkward gesture in a moment of enthusiasm ", we have decided it is no longer appropriate to post on the website formerly known as "Twitter". Let that sink in. Please follow us on BlueSky (or even via RSS ).
PostgreSQL 18 changes casefold() function and Unicode case folding support added pgcrypto function fips_mode() added, making it possible to detect if FIPS mode is enabled GUC pgcrypto.builtin_crypto_enabled added, making it possible to disable pgcrypto 's built-in cryptographic support to_number() support for Roman numeral format added NOT VALID foreign key constraints now permitted on partitioned tables PostgreSQL 18 articles Allow changing autovacuum_max_workers without restarting (2025-01-24) - Hubert 'depesz' Lubaczewski details change to autovacuum_max_workers Implementing thread-safe scanners and parsers in PostgreSQL (2025-01-21) - Peter Eisentraut discusses in detail an important set of changes in PostgreSQL 18And one more tale from the frontline!
When I encounter cases like this, I start wondering whether I am the first person who ever tried to do “this” for real, whatever “this” is.
Here is a story. When anyone gives a talk about partitions, they always bring up an example of archiving: there is a partitioned table, and you only keep “current” partitions, whatever “current” means in that context, and after two weeks or a month, or whatever interval works for you, you detach the oldest partition from the “current” table and attach it to the “archived” table, so that the data is still available when you need it, but it does not slow down your “current” queries.
So here is Hettie confidently suggesting that a customer implement this technique to avoid querying a terabyte-plus-size table. A customer happily agrees, and life is great until one day, an archiving job reports an error of a “name already exists” for an index name.
current.a_table
is a partitioned table
current.a_table_2024_12_17
current.a_table_2024_12_18
...
current.a_table_2024_12_31
are partitions.
a_table_useful_index_idx
is a global index on (col1, col2, cl3, col8
)
archive.a_table
is an archived partition table
archive.a_table_2024_12_01
archive.a_table_2024_12_02
...
archive.a_table_2024_12_15
archive.a_table_2024_12_16
are partitions
When I looked at the conflicting name, I saw the name of the index:
a_table_col1_col2_col3_col8_idx
Somehow, this name was coming from the December 17 2024 partition that we were trying to archive, and I wondered why this name, when I clearly named the global index differently, all of a sudden, this name reappeared.
I decided that I had overlooked it in the very beginning and renamed all indexes in the archive.a_table partitions so that there would be no possible collisions.
However, in two weeks, the problem reappeared. Once again, I thought that it was all because I neglected to create a global index on the archive.a_table and rebuild-renamed everything, and surely…
[...]Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.