We are pleased to provide binaries for file_textarray_fdw and odbc_fdw for PostgreSQL 17 Windows 64-bit.
To use these, copy the files into your PostgreSQL 17 Windows 64-bit install folders in same named folders and then run CREATE EXTENSION as usual in the databases of your choice. More details in the packaged README.txt
These were compiled against PostgreSQL 17.2 using msys2 / mingw64 and tested against PostgreSQL 17.2 EDB windows distribution.
Continue reading "PostgreSQL 17 64-bit for Windows FDWs"I am unsure whether others have this problem, but I find myself explaining the benefits9and what’s not)of partitioning tables in Postgres over and over again.
Most times, developers have unrealistic expectations that if they partition a gigantic table, “it will be faster to select from it.” I always have to explain that the goal and the purpose of partitioning is improving maintenance, not query optimization, and if the execution speed of short queries remains the same as before partitioning, they should consider it a success (yes, there are no rules without exceptions, and there are cases of performance improvement, but those are exceptions indeed).
The next thing I tell developers is that for the queries not to slow down, they need to explicitly include the values of the attributes used for range partitioning. Quite often, this requirement is not obvious, and moreover, it requires a significant rewrite of the SQL statements in use.
From an application developer’s perspective, they do just that:
SELECT a, b, c
FROM partitioned_table
WHERE p IN
(SELECT p FROM other_table WHERE r between 1 and 100)
You can’t imagine how many times I’ve heard: But I am selecting from a specific partition! And I had to explain that there is no way for Postgres to know before the execution starts which partition it is going to be!
When developers ask me what they should do, I advise them to calculate the value first and use it in a query. This approach works all the time, but to be honest, that’s the question I have for the community: why can’t the query planner do it? Let me be clear: I know why it can’t do it now, but what exactly fundamentally prevents smart people who make Postgres better from making this change in the planner’s behavior? I mean, if I can figure it out, why Postgres can’t?
Postgres Conference Seattle 2024 partnered up with PASS this year to present a united database front. They accepted my “Kubernetes Killed the High Availability Star” talk, which I graciously gave on the last day of the conference. The next talk in that room wasn’t for another hour, so I had plenty of time to talk shop with attendees, about the future of Postgres, high availability, and Kubernetes in general.
If you weren’t there and missed out on the fun, this is your chance to catch up and enjoy a few of my notorious bad puns along the way. Let me tell you why the concept of Postgres HA is dead.
The call for proposals for 2025.pgconf.dev has been extended to January 6, 2025, otherwise known as "very soon". I'm writing this post to encourage you to submit, if you haven't done so yet, regardless of whether you have submitted to 2024.pgconf.dev or its predecessor, PGCon, in the past. The event will only be as good as the content you (collectively) submit, and having found much value in these events over the years that I've been participating, I very much want our future events to be as good as those in the past, or, hopefully, even better. But what makes a good event, and what kind of talk should you submit?
Read more »
I have been playing around with the powercap framework
. I wrote a postgres extension that shows the energy usage of a query. Postgres has a hook mechanism that allows an extension to override the default executor. This implementation is very simple: the extension records the current energy reading when a query starts and then calls the actual executor that runs the query. When the query finishes, a second hook records the current energy reading. The overall energy usage of this query is the difference between the two values.
Recently I had a customer come to me with a poorly performing query that had a perfectly fine execution time on Oracle, but once migrated to Postgres the query was slow. Upon investigation, it was found that even though the “WHERE” and “JOIN” clauses had explicitly specified partition keys and were joined on the partition key, “pruning” was only happening on one part of the query.
The setup for the test is at the bottom of the blog post: Jump to Test Case Setup
The tables in question are all partitioned by “hire_date” as as you can see in the “WHERE” clause below, there should be partition pruning because the “hire_date” from the driving table is equal to that of the joined table, yet we did not get partition pruning:
WITH top_emp AS (
SELECT
DISTINCT
id,
first_name,
last_name,
hire_date,
birth_date
FROM
employees.employee_part emp
INNER JOIN employees.salary sal on (sal.employee_id = emp.id)
WHERE
emp.hire_date = '1985-01-01'::date
and emp.gender = 'M'
and emp.birth_date between '1957-01-01'::date and '1970-01-01'::date
and sal.amount > 50000
)
SELECT
dept.employee_id,
dept.department_id,
dept.hire_date
FROM
top_emp emp1,
employees.department_employee_part dept
WHERE
emp1.hire_date = dept.hire_date and
emp1.id = dept.employee_id;
/* Explain Plan */
Nested Loop (cost=1467.16..2646.20 rows=307 width=17) (actual time=4.565..4.613 rows=4 loops=1)
Output: dept.employee_id, dept.department_id, dept.hire_date
Buffers: shared hit=330
-> Unique (cost=1466.87..1471.11 rows=37 width=31) (actual time=4.537..4.560 rows=4 loops=1)
Output: emp.id, emp.first_name, emp.last_name, emp.hire_date, emp.birth_date
Buffers: shared hit=318
-> Sort (cost=1466.87..1467.72 rows=339 width=31) (actual time=4.534..4.539 rows=69 loops=1)
Output: emp.id, emp.first_name, emp.last_name, emp.hire_date, emp.birth_date
So
Federico Campoli and Vik Fearing organised PGDay/MED 2025 in Naples, Italy. Speakers presented Gabriele Quaresima, Ryan Booz, Gianluca Padovani, Patrick Lauer, Stefanie Janine Stölting, Fabiana Farias, Pavlo Golub, Devrim Gündüz.
Prepare for the next great event by PgTraining!
PgTraining is organizing the next year event, namely OpenDay 2025 that will be held on April 11th in Bolzano, Italy.
The event will be totally free but registration is required because the room assigned has a fixed number of seats.
Please note that all the speaks will be in italian.
The event will be held at the NOI Techpark.
We are working on the schedule, but the day will be organized in a talks-session and a laboratory/practical session, the former in the morning, the latter in the afternoon.
Please see the official event page for more details and stay tuned for updates!
In the spirit of the holiday season, I thought I would write a quick post regarding some of my favorite queries that I use on a day to day basis working on Postgres. Some of these queries I have developed and others were found on the internet (hat tip to those who have previously posted) and further refined.
Many more are found on my github site:
https://github.com/shane-borden/sqlScripts/tree/master/postgres
Hope these queries can also help you in your day to day quest to make Postgres run better!
The first three queries rank top SQL from pg_stat_statements either by execution count, mean_exec_time and total_exec_time. I love to use these queries to get a quick glance of what I should focus on tuning. Given that pg_stat_statements tracks lots of things, you can filter out certain “query text” as necessary so that those do not influence the ranking.
WITH
hist AS (
SELECT queryid::text,
SUBSTRING(query from 1 for 1000) query,
ROW_NUMBER () OVER (ORDER BY mean_exec_time::numeric DESC) rn,
SUM(mean_exec_time::numeric) mean_exec_time
FROM pg_stat_statements
WHERE queryid IS NOT NULL
AND query::text not like '%pg_%'
AND query::text not like '%g_%'
/* Add more filters here */
GROUP BY
queryid,
SUBSTRING(query from 1 for 1000),
mean_exec_time::numeric
),
total AS (
SELECT SUM(mean_exec_time::numeric) mean_exec_time FROM hist
)
SELECT DISTINCT
h.queryid::text,
ROUND(h.mean_exec_time::numeric,3) mean_exec_time,
ROUND(100 * h.mean_exec_time / t.mean_exec_time, 1) percent,
h.query
FROM hist h,
total t
WHERE h.mean_exec_time >= t.mean_exec_time / 1000 AND rn <= 14
UNION ALL
SELECT 'Others',
ROUND(COALESCE(SUM(h.mean_exec_time), 0), 3) mean_exec_time,
COALESCE(ROUND(100 * SUM(h.mean_exec_time) / AVG(t.mean_exec_time), 1), 0) percent,
NULL sql_text
FROM hist h,
total t
WHERE h.mean_exec_time < t.mean_exec_time / 1000 O
Today I’m excited to introduce pg_incremental, a new open source PostgreSQL extension for automated, incremental, reliable batch processing. This extension helps you create processing pipelines for append-only streams of data, such as IoT / time series / event data workloads.
Notable pg_incremental use cases include:
After you set up a pg_incremental pipeline, it runs forever until you tell Postgres to stop. There’s a lot you can do with pg_incremental and we have a lot of thoughts on why we think it’s valuable. To help you navigate some of if you want to jump directly to one of the examples that you feel is relevant to you:
My team has been working on handling data-intensive workloads in PostgreSQL for many years. The most data-intensive workloads are usually the ones with a machine-generated stream of event data, and we often find that the best solution for handling those workloads in PostgreSQL involves incremental data processing.
For example, a common pattern in PostgreSQL is to periodically pre-aggregate incoming event data into a summary table. In that model, writes (esp. batch loads) are fast because they do not trigger any immediate processing. The incremental aggregation is fast because it only processes new rows, and queries from dashboards are fast because they hit an indexed summary table. I originally developed pg_cron for this purpose, but creating an end-to-end pipeline sti
[...]PostgreSQL comes with a variety of functions that allow you to group rows into a “window” and perform calculations on that window. By using these functions, you can create more advanced and efficient queries for analyzing your database.
In early 2023, I contributed to a project that converts data models to PostgreSQL, called db_migrator. On this occasion, I (re)discovered the power of these window functions with the SQL language. In this article, I revisit a specific case of transforming the upper bounds of a partitioned table into an array of boundaries.
Version 17 of PostgreSQL has been released for a while. One of the many features is a change by Tom Lane called “Rearrange pg_dump’s handling of large objects for better efficiency”. In the past, we have seen our customers have several problems with a large number of large objects being a performance issue for dump/restore. The main reason for this is that large objects are quite unlike to TOAST (The Oversized Attribute Storage Technique): while TOASTed data is completely transparent to the user, large objects are stored out-of-line in a pg_largeboject
table with a link to the particular row in that table being an OID in the table itself.
Here is an example on how large objects can be used:
postgres=# CREATE TABLE test(id BIGINT, blob OID);
CREATE TABLE
postgres=# INSERT INTO test VALUES (1, lo_import('/etc/issue.net'));
INSERT 0 1
postgres=# SELECT * FROM test;
id | blob
----+-------
1 | 33280
(1 row)
postgres=# SELECT * FROM pg_largeobject;
loid | pageno | data
-------+--------+--------------------------------------------
33280 | 0 | \x44656269616e20474e552f4c696e75782031320a
(1 row)
postgres=# SELECT lo_export(test.blob, '/tmp/foo') FROM test;
lo_export
-----------
1
(1 row)
postgres=# SELECT pg_read_file('/tmp/foo');
pg_read_file
---------------------
Debian GNU/Linux 12+
(1 row)
postgres=# INSERT INTO test VALUES (1, lo_import('/etc/issue.net'));
INSERT 0 1
Now if we dump the database in custom format with both version 16 and 17 of pg_dump
and then use pg_restore -l
to display the table of contents (TOC), we see a difference:
$ for version in 16 17; do /usr/lib/postgresql/$version/bin/pg_dump -Fc -f lo_test_$version.dmp; \
> pg_restore -l lo_test_$version.dmp | grep -v ^\; > lo_test_$version.toc; done
$ diff -u lo_test_{16,17}.toc
--- lo_test_16.toc 2024-12-11 09:05:46.550667808 +0100
+++ lo_test_17.toc 2024-12-11 09:05:46.594670235 +0100
@@ -1,5 +1,4 @@
215; 1259 33277 TABLE public test postgres
-3348; 2613
[...]
The first-ever official PostgreSQL conference was held in Italy. Years later, PGDay/MED 2024 brought us back to this inspiring country. I was excited to visit Naples, often called the cradle of pizza and home to some of the finest cuisines in the world. As a self-proclaimed gastronomic geek, this was a bucket-list experience both professionally and personally!
This was the second iteration of the Mediterranean PGDay series following Malta. We are still working hard to build strong events in this region as a community. Napoli’s conference was a one-track event with not so many attendees, creating a warm and cozy atmosphere. Each talk felt like a friendly discussion filled with comments, questions, and even jokes. It was a refreshing contrast to the larger, bustling conferences like PostgreSQL Europe or FOSDEM.
The conference venue was perfectly located right in front of Naples’ Central Station. The entire top level of the hotel was divided into a lecture room and a restaurant. Adding to the charm was a beautiful terrace offering views of Mount Vesuvius and the sea. It was truly a spectacular setting.
My presentation focused on implementing the Wordle game inside PostgreSQL. Given the location, I added an Italian vocabulary to the demo, allowing us to have some fun with the local attendees. It was a huge success! We even found a bug in the Italian Hunspell dictionary and another issue with the comparison functions during the demo. I included a few local-themed slides featuring Adriano Celentano and Pippo Franco for added flair. The audience laughed and snapped pictures, making me feel like a stand-up comedian. After this experience, I seriously consider writing a full database-themed stand-up routine. 😄
We had productive discussions with local community members about the future of PostgreSQL in the region. I don’t want to spoil anything, but I hope you’ll hear some
[...]We're happy to announce that PostgreSQL Conference Europe 2025 will be held in Riga, Latvia on October 21-24 2025. The conference format and schedule will be announced later in 2025.
In the meantime, mark your calendars, and we look forward to seeing you in Riga in October!
While the user notices the positive aspects of technology, a developer, usually encountering limitations, shortcomings or bugs, watches the product from a completely different perspective. The same stuff happened at this time: after the publication of the comparative testing results, where Join-Order-Benchmark queries were passed on a database with and without partitions, I couldn't push away the feeling that I had missed something. In my mind, Postgres should build a worse plan with partitions than without them. And this should not be just a bug but a technological limitation. After a second thought, I found a weak spot - queries with limits.
In the presence of a LIMIT
statement in the SQL query, unlike the case of plain tables, the optimiser immediately faces many questions: How many rows may be extracted from each partition? Will only a single partition be used? If so, which one will be this single one? - it is not apparent in the circumstances of potential execution-time pruning ... .
What if we scan partitions by index, and the result is obtained by merging? In that case, it is entirely unclear how to estimate the number of rows that should be extracted from the partition and, therefore, which type of partition scan operator to apply. And what if using partitionwise join, we have an intricate subtree under the Append
- knowledge of the limits, in this case, should be crucial - for example, when choosing the JOIN type, isn't it?
Such a pack of questions about planning partitions led to a compromise solution in choosing a query plan for Append
's subpaths: for picking the optimal fractional path, two plan options are considered: the minimum total cost and the minimum startup cost paths. Roughly speaking, the plan will be optimal if we have LIMIT 1
or some considerable LIMIT
value in the query. But what about intermediate options? Let's look at specific examples (thanks to Alexander Pyhalov).
DROP TABLE IF EXISTS parted,plain CASCADE;
CREATE TEMP TABLE parted (x integ
[...]
One of the biggest problems open source projects face today is the bus factor problem.
I've been thinking a lot about this lately as how it applies to my PostGIS, pgRouting, and OSGeo System Administration (SAC) teams.
Continue reading "The bus factor problem"Recently, I attended a conference talk about the vibrant ecosystem of PostgreSQL. It was a fascinating presentation, showcasing the incredible breadth of innovation in the PostgreSQL community. Extensions that push the boundaries of functionality, AI integrations within PostgreSQL, advanced connection pools, and robust cluster management tools were all highlighted.
But something vital was conspicuously absent. Can you guess? That’s right—not a single mention of client connectors.
Let’s pause for a moment and consider this: without clients to connect to PostgreSQL, how do you actually use any of these features?
Of course, one could argue, “We have libpq!” True, libpq is a core PostgreSQL client library written in C. However, most modern enterprise applications aren’t developed in C or C++—they’re built in higher-level languages like Java, Python, or .NET. That’s where the ecosystem of client connectors becomes indispensable.
Now, I’ll admit, I’m a bit biased. Having worked on the PostgreSQL JDBC driver for over 24 years, my perspective may naturally lean toward this critical layer of the stack. But that bias only drives my curiosity. So, I decided to do a little research.
Here’s what I found—a snapshot of the client connectors ecosystem, complete with the number of lines of code for each project:
Project | Lines of Code | Language |
---|---|---|
psqlodbc
|
112,886 | C++ |
psycopg
|
52,970 | Python |
pgx
|
52,905 | Go |
crystal-pg
|
3,858 | Crystal |
node-postgres
|
18,838 | Node.js |
rust-postgres
|
20,448 | Rust |
If you have worked with PostgreSQL for a while, you have probably come across the command VACUUM FULL. At first glance, it might seem like a silver bullet for reclaiming disk space and optimizing tables. After all, who would not want to tidy things up and make their database more efficient, right?
But here is the thing: while VACUUM FULL can be useful in some situations, it is not the hero it might seem. In fact, it can cause more problems than it solves if you are not careful.
Let us dive into:
- What VACUUM FULL actually does
- When you should use it
- Why it is not the best solution for most cases
- And what to do instead
What Does VACUUM FULL Actually Do?
PostgreSQL uses something called Multi-Version Concurrency Control (MVCC). Without getting too technical, MVCC keeps multiple versions of rows around to handle updates and deletes efficiently. These older versions of rows - called dead tuples - are cleaned up by a process called vacuuming.
A regular VACUUM removes those dead tuples so the space can be reused.
VACUUM FULL, however, goes further. It rewrites the entire table to remove dead space completely. It also rebuilds all the indexes on the table. Essentially, it is like dumping all your clothes out of the closet, refolding everything, and putting it back in neatly.
Sounds great, right? So, why not use it all the time?
When Should You Actually Use VACUUM FULL?
There are a few very specific situations where VACUUM FULL makes sense:
After Massive Deletions
Imagine you delete millions of rows from a table. Regular vacuuming might not reclaim that disk space immediately, and the table could still look bloated. In this case, VACUUM FULL can shrink the table and give you that disk space back.
Disk Space Crunch
If your database server is running out of disk space and you need to reclaim it fast, VACUUM FULL can help (though it is still not ideal—more on that later).
Post-Migration Cleanup
If you have migrated a large table or reorganized your data, VA
Next month, I'l be hosting 2 or 3 discussions of Andres Freund's talk, NUMA vs PostgreSQL, given at PGConf.EU 2024. You can sign up using this form. I anticipate that both Andres and I will be present for the discussions, and I'd like to thank Andres and all of the other presenters who have made time to join the discussions and answer questions for their time (so far: Melanie Plageman, Thomas Munro, Andrey Borodin). It has been absolutely great having them join the workshops.
Read more »Better later than never – here is a recording of Devrim Gunduz’s talk at the November meetup – enjoy!
As announced earlier this month [1], PostgreSQL Conference Germany 2025 will take place from May 08-09, 2025 in Berlin
We now invite you to join us as a sponsor and become an integral part of this exciting event!
The Call for Sponsors opens at December, 16th 2025 at 2PM UTC
The different sponsorship levels cover various aspects of exposure, with all levels including free tickets to the event. A limited number of Platinum and Gold sponsorships are available, so sign up to secure your spot now.
We would appreciate it if you could forward this message to decision-makers in your company and encourage them to support us.
Thank you for your support of the PostgreSQL community. See you in Berlin in May 2025!
[1] https://www.postgresql.org/message-id/173321378104.2556768.16039369174125999081%40wrigleys.postgresql.org
This recipe shows how to perform an online major PostgreSQL upgrade using the new declarative approach to logical replication introduced in CloudNativePG 1.25. By leveraging the Publication
and Subscription
CRDs, users can set up logical replication between PostgreSQL clusters with ease. I will walk you through configuring a PostgreSQL 15 publisher, importing schemas into a PostgreSQL 17 subscriber, and verifying data synchronisation, with the broader goal of highlighting the benefits of a repeatable and testable upgrade process.
As databases grow in size and complexity, performance issues inevitably arise. Whether it is slow query execution, lock contention, or disk I/O bottlenecks, identifying the root cause of these issues is often the most challenging aspect of database management. One way to understand performance bottlenecks is to determine what the database is waiting for.
Wait events in PostgreSQL provide detailed insights into what a database backend process is waiting for when it is not actively executing queries. Understanding and analyzing these events enables DBAs to resolve bottlenecks with precision.
What Are Wait Events in PostgreSQL?
Wait events represent the specific resources or conditions that a PostgreSQL backend process is waiting on while it is idle. When a process encounters a delay due to resource contention, input/output (I/O) operations, or other reasons, PostgreSQL logs the wait event to help you understand the source of the problem.
Why Wait Events Matter
Wait events can help reveal the underlying cause for slow query execution. For example:
- When a query waits for a lock held by another transaction, it logs a Lock event.
- When a process is waiting for disk reads, it logs an I/O event.
- When a replication delay occurs, it logs a Replication event.
By analyzing and acting on wait events, DBAs can:
- Reduce query execution times.
- Optimize hardware utilization.
- Improve user experience by minimizing delays.
How PostgreSQL Tracks Wait Events
PostgreSQL backend processes constantly update their current state, including any associated wait events. These states are exposed through dynamic management views like pg_stat_activity and pg_stat_wait_events. By querying these views, you can see which events are impacting performance in real-time.
The post Understanding Wait Events in PostgreSQL appeared first on Stormatics.
Let’s stipulate that PostgreSQL has grown significantly in popularity over the last 20 years. I don’t know by how much, but certainly at least one order of magnitude, probably two or more.
It used to be the case that if you were a PostgreSQL user and wanted to interact with other PostgreSQL users and perhaps developers (which was a more fluent distinction back then), you’d hang out on mailing lists such as pgsql-general and maybe lurk a bit and answer questions and ask your own. There are a few such mailing lists, such as pgsql-general, pgsql-sql, pgsql-admin, but back then, that was basically it, that was the pretty much the whole world of PostgreSQL users hanging out and sharing knowledge. There was no other place.
One advantage of having this concentrated in one or a few related places is that it was easy to pick the place to hang out in. You’d be sure that if there was anywhere to get an answer or to learn something, this was the one. Also, for the developers, this was an easy and useful way to monitor user feedback and to engage with users, because you could just read along on a few mailing lists to get the entirety of user discussions about the product.
But I don’t think it works like that anymore.
Traffic on user mailing lists is down. In November 2004, there were 1507 messages on pgsql-general. In November 2024, there were only 445. (By contrast, traffic on pgsql-hackers was up 2279 versus 1300.) This is just one example; you can pick any combination of time intervals of the above-mentioned mailing lists to get similar results. Make a chart out of it to make it more depressing.
Where are all the users now?
Certainly, users still want to ask questions and other users still want to hang out with users and share knowledge.
Of course, we know the answer: They are on Stack Overflow, Reddit, Slack, Telegram, Discord, Mastodon, and so on and so on. I don’t even know the whole list. Is there a whole list? There are also non-text media that didn’t really exist in that way in say 2004
[...]Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.