Latest Blog Posts

PostgreSQL Hacking Workshop - February 2025
Posted by Robert Haas in EDB on 2025-01-17 at 19:02

Please considering joining us next month (February 2025) for a discussion of Heikki Linnakangas's talk on The Wire Protocol, from PGCONF.EU 2024. For those not familiar with the concept, this hacking workshop is basically a virtual meetup: you watch the talk, and then you sign up to participate in one of two or three Zoom meetings where we discuss the talk. Usually, we're able to get the original author of the talk to join us; thanks to Heikki for agreeing to join us this month.

Read more »

The importance of testing with not-so-usual setups
Posted by Luca Ferrari on 2025-01-16 at 00:00

How we discovered a trivial bug in pgagroal

The importance of testing with not-so-usual setups

This week we found a trivial and silly bug in [pgagroal](https://github.com/agroal/pgagroal){:target="_blank"}.

This post is a brief description about such bug, not because it is important on itself, but because the way we discovered it emphatizes how important it is to randomize the configuration of a system. It is a well known concept, however we all still tend to fail on this, due also to the lack and time to configure and test all possibilities (thanks God there is automation!).

As it often happens, the bug was caused by a memory allocation problem.

As it often happens in these cases. the fixing is a very short troophy patch.

Again, the aim of this post is not to discuss a one line patch, rather the importance of running and testing with different tools and setups.

The memory bug

The bug is described in a dedicated issue. What is interesting, as often happens when dealing with bugs, is how long it get unnoted.

While working and testing other work in progress features of pgagroal, I was encouraged to compile the project using clang instead of my usual gcc. The result was discouraging, since I was not able anymore to start the program:



% pgagroal
pgagroal: Unknown key  with value  in section [pgagroal] (line 46 of file )
2025-01-13 12:38:09 WARN  configuration.c:482 pgagroal: max_connections (20) is greater than allowed (8)
2025-01-13 12:38:09 DEBUG configuration.c:3074 PID file automatically set to: [/tmp/pgagroal.54322.pid]
=================================================================
==17659==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x502000009495 at pc 0x559726597f10 bp 0x7ffc9c1e2360 sp 0x7ffc9c1e1b00
WRITE of size 6 at 0x502000009495 thread T0
    #0 0x559726597f0f in vsprintf (/usr/local/bin/pgagroal+0x58f0f) (BuildId: 16ffc1dab018cfa8eed6b5cc7e6981bc6e861325)
    #1 0x55972659900e in sprint
[...]

Waiting for PostgreSQL 18 – Enable BUFFERS with EXPLAIN ANALYZE by default
Posted by Hubert 'depesz' Lubaczewski on 2025-01-15 at 19:53
On 11st of December 2024, David Rowley committed patch: Enable BUFFERS with EXPLAIN ANALYZE by default   The topic of turning EXPLAIN's BUFFERS option on with the ANALYZE option has come up a few times over the past few years. In many ways, doing this seems like a good idea as it may be more … Continue reading "Waiting for PostgreSQL 18 – Enable BUFFERS with EXPLAIN ANALYZE by default"

Cut Cloud Costs with Smarter PostgreSQL CPU Core Allocation
Posted by Umair Shahid in Stormatics on 2025-01-15 at 07:57

Cloud costs can quickly spiral out of control if resources are not optimized. One of the most significant contributors to these costs is CPU core allocation, which forms the basis of the instance size with every major cloud provider. Many organizations over-provision cores for their PostgreSQL databases, paying for unused capacity, or under-provision them, leading to poor performance and missed SLAs.

This blog will explore strategies to allocate CPU cores effectively for PostgreSQL databases, ensuring optimal performance while keeping cloud expenses in check.

The Cost-Performance Tradeoff in the Cloud

Cloud providers charge based on resource usage, and CPU cores are among the most expensive components. Allocating too many cores leads to wasted costs, while too few can cause performance bottlenecks.

PostgreSQL databases are particularly sensitive to CPU allocation, as different workloads—OLTP (Online Transaction Processing) vs. OLAP (Online Analytical Processing)—place varying demands on processing power. Finding the right balance is essential to achieving both cost-efficiency and performance reliability.

How CPU Core Allocation Impacts PostgreSQL

PostgreSQL can leverage multi-core systems effectively, but how you allocate cores depends on your workload:

- OLTP Workloads: High concurrency workloads benefit from multiple cores, allowing PostgreSQL to process many small transactions simultaneously.
- OLAP Workloads: Analytical queries often rely on parallel execution, utilizing a few powerful cores to handle complex operations like aggregations and joins.

Additionally, PostgreSQL supports parallel query execution, which can distribute certain operations across multiple cores. However, parallelism primarily benefits large analytical queries and can sometimes degrade performance for small or simple queries due to overhead. It is critical to assess your workload before over-allocating resources.

The post Cut Cloud Costs with Smarter PostgreSQL CPU Core Allocation appeared first

[...]

PostgreSQL Post Statistics for 2025
Posted by Stefanie Janine on 2025-01-14 at 23:00

As I have been on several CfP committees (CfPC) for PostgreSQL conferences, I like to share my experiences.

Submissions

When you submit more than one talk, think about the additional work of the CfPC. Every member has to read almost all submissions and to vote for each one of it. The exceptions are the ones they submitted themselves or the ones submitted by coworkers.

All the members of CfP committees of PostgreSQL community conferences are doing their duty mostly without compensation, sometimes they are offered a free ticket to the conference.

Therefore also keep your abstract as short as necessary and to the point. You don’t gain anything with a long text, which includes many unnecessary details. Keep in mind that the abstract is not only used by the CfPC, but also later shown on the website and is the deciding factor for conference visitors to go to your talk. The same goes for the description about you.

When you gave a talk on a one track conference in the previous year, think about submitting at all. The content of your talk has to be exceptionally above other talks, by abstract and topic.

After Voting

When the voting by each CfP member for each submission has been done, usually meetings are scheduled to select and create a good program for the conference. The speakers who have been approved are getting emails to reconfirm their talk.

Some acceptances take longer, as the selected speakers have to ask for getting the attendance sponsored.

There have been some speakers, who didn’t get their talk financed by their companies. Now they have a couple of choices. Decline to give their talk, find another sponsor, or giving their talk financing their travel costs themselves, or ask the organizers for funding.

In the latter two choices, I would advice to update the talk slides and remove everything mentioning their employer, they haven’t earned to be mentioned.

Declined talks have to be replaced. The CfP committee has again to decide which other talk to approve. The circl

[...]

What Hurts in PostgreSQL Part One: Temporary Tables
Posted by Jobin Augustine in Percona on 2025-01-14 at 14:49
Temporary Tables PostgreSQLPostgreSQL is one of the most powerful database systems in the world. I have always been passionate about its great power, especially its modern SQL language features. However, that doesn’t mean everything is great. There are areas where it hurts. Novice users unaware of the problem might get into trouble, and I see such things […]

The power of open source in PostgreSQL
Posted by Laurenz Albe in Cybertec on 2025-01-14 at 05:00

Superhero Slonik - the power of open source in PostgreSQL
© Laurenz Albe 2024

During a training session on tuning autovacuum, a participant asked me,
“Why is autovacuum_vacuum_scale_factor set to 0.2?”
Rather than replying with a guess or a wisecrack (“Because that's the way somebody implemented it!” or “If you don't like it, submit a patch!”), I decided to use the opportunity to demonstrate the true power of open source. That was such a nice experience that I decided to share it with a wider audience.

The dreadful “why” question

Everybody who has children has learned to dread “why” questions. Very often, they are hard or impossible to answer, and after a while you learn that seeing an adult struggle for words is one of the reasons for children to ask them. Over the years, I have come to the conclusion that when children ask “why”, they usually mean something else; namely:

  • Tell me more about this! It is interesting, and I don't want to go to bed yet!
  • This is really annoying, and it should be different!

As a mathematician and computer engineer, I tend to take questions too literally. Knowing the above interpretation has helped me come up with more appropriate answers. Adults are not that much different from children. So when people start asking “why” instead of “how”, I try to figure out what they really mean. In the case I related in the introduction, the “why” was rather of the interested kind, which is why I decided to pursue the question.

The true power of open source

I hadn't spent a lot of thought on the power of open source before I heard Simon Phipps speak on the European PGDay 2010. In this section, he is the giant on whose shoulders I am standing!

Is cheapness the main reason for using open source software?

Many people think that the main advantage of open source software is that you don't have to pay a license fee. This can be a substantial advantage. However, the total cost of ownership is not zero when you run open source software:

  • you need to integrate the software into your landscape; at least set u
[...]

Contributions for the week 51 (2024) and the week 2 (2025)
Posted by Boriss Mejias in postgres-contrib.org on 2025-01-13 at 13:50

Catching up with the last weeks of 2024, and entering into 2025, these are the last contributions we were able to know about:

Anatomy of Table-Level Locks in PostgreSQL
Posted by Gülçin Yıldırım Jelínek in Xata on 2025-01-13 at 00:00
This blog explains locking mechanisms in PostgreSQL, focusing on table-level locks that are required by Data Definition Language (DDL) operations.

PgPedia Week, 2025-01-12
Posted by Ian Barwick on 2025-01-11 at 10:30

2025 is picking up speed, and with CommitFest 2025-01 underway we're seeing some more PostgreSQL 18 changes again. In news closer to home, the PgPedia Week format has been expanded to include links to recent articles on PostgreSQL 18 , other PostgreSQL-related newsletters published in the preceding week, as well as to general PostgreSQL announcements. Oh, and there's also a new-ish section listing commits of interest from 25 years ago , in case anyone wants an occasional trip down memory lane.

PostgreSQL 18 changes autovacuum_max_workers can now be changed without a server restart however the new value will be ineffective if it exceeds that of new GUC autovacuum_worker_slots support for NOT ENFORCED in CHECK constraints has been added the passwordcheck contrib module now has a min_password_length GUC PostgreSQL 18 articles PostgreSQL 18: Change the maximum number of autovacuum workers on the fly (2025-01-08) - Daniel Westermann / dbi Services discusses improvements to autovacuum_max_workers and new GUC autovacuum_worker_slots Postgres backend statistics (Part 1): I/O statistics (2025-01-06) - Bertrand Drouvot discusses pg_stat_get_backend_io() Waiting for PostgreSQL 18 – psql: Add more information about service name (2025-01-06) - Hubert 'depesz' Lubaczewski Waiting for PostgreSQL 18 – Add UUID version 7 generation function (2024-12-31) - Hubert 'depesz' Lubaczewski review of the uuidv7() function

more...

Waiting for PostgreSQL 18 – Support LIKE with nondeterministic collations
Posted by Hubert 'depesz' Lubaczewski on 2025-01-10 at 17:02
On 27th of November 2024, Peter Eisentraut committed patch: Support LIKE with nondeterministic collations   This allows for example using LIKE with case-insensitive collations. There was previously no internal implementation of this, so it was met with a not-supported error. This adds the internal implementation and removes the error. The implementation follows the specification of … Continue reading "Waiting for PostgreSQL 18 – Support LIKE with nondeterministic collations"

Your Data’s Not Safe Until It’s TDE-Safe — Here’s How
Posted by Jan Wieremjewicz in Percona on 2025-01-09 at 14:04
pg_tde Beta Percona PostgresqlIf you’re managing a PostgreSQL database and handling sensitive data or PII, the answer is simple: You need data-at-rest encryption. This isn’t just a “nice-to-have” feature—it’s often a legal or regulatory requirement. Compliance auditors, security officers, and privacy-conscious customers all expect it.  But is this enough?  We think NO! The reality check: No native TDE […]

Idle Transactions Cause Table Bloat? Wait, What?
Posted by Umair Shahid in Stormatics on 2025-01-09 at 12:30

Yup, you read it right. Idle transactions can cause massive table bloat that the vacuum process may not be able to address. Bloat causes degradation in performance and can keep encroaching disk space with dead tuples.
This blog delves into how idle transactions cause table bloat, why this is problematic, and practical strategies to avoid it.

What Is Table Bloat?
Table bloat in PostgreSQL occurs when unused or outdated data, known as dead tuples, accumulates in tables and indexes. PostgreSQL uses a Multi-Version Concurrency Control (MVCC) mechanism to maintain data consistency. Each update or delete creates a new version of a row, leaving the old version behind until it is cleaned up by the autovacuum process or manual vacuuming.
Bloat becomes problematic when these dead tuples pile up and are not removed, increasing the size of tables and indexes. The larger the table, the slower the queries, leading to degraded database performance and higher storage costs.

How Idle Transactions Cause Table Bloat
Idle transactions in PostgreSQL are sessions that are connected to the database but not actively issuing queries. There are two primary states of idle transactions:

Idle: The connection is open, but no transaction is running.
Idle in Transaction: A transaction has been opened (e.g., via BEGIN) but has neither been committed nor rolled back.

The post Idle Transactions Cause Table Bloat? Wait, What? appeared first on Stormatics.

Summary of PostgreSQL in 2024
Posted by Avi Vallarapu on 2025-01-09 at 08:10

HexaCluster is back with its Summary of PostgreSQL in 2024 like our article in 2023. We would first like to thank all PostgreSQL Users, Contributors, Organizations, and Sponsors who have directly or indirectly contributed to an exponential growth, year over year, in PostgreSQL adoptions across the planet. PostgreSQL continues its legacy as one of the […]

The post Summary of PostgreSQL in 2024 appeared first on HexaCluster.

Postgres Per-Connection Statistics
Posted by Jeremy Schneider on 2025-01-09 at 06:21

I’ve had a wish list for a few years now of observability-related things I’d love to see someday in community/open-source Postgres. A few items from my wish list:

  • Wait event counters and cumulative time
  • Wait event arguments (like object, block, etc – specific argument depends on the wait event)
  • Comprehensive tracking of CPU time (ie. capture/track POSIX rusage data and expose the kernel’s perspective on per-connection CPU usage as a metric)
  • Stop putting “COMMIT/END” in pg_stat_activity when it’s the currently executing statement. It is endlessly frustrating because for 99% of applications you will have no way to know which transactions are committing – and what part of your application code is triggering this – when you get a pile-up of sessions running COMMIT/END at the same time. It would be more useful to just leave the previous SQL in pg_stat_activity, and expose the fact that it’s a commit elsewhere (eg. state==commit or state==active_commit instead of state==active). It’s also usually pretty clear from the current wait events when you’re committing. To troubleshoot, you usually also want to know what you’re committing – not just when you’re committing.
  • On-CPU state
    • SQL execution stage (parse/plan/execute/fetch)
    • SQL execution plan identifier in pg_stat_statements and pg_stat_activity (there are some promising patches and discussions on the hackers email list about this right now!)
    • Currently executing node of the execution plan in pg_stat_activity
  • Progress on long operations (eg. a large seqscan) – there have been improvements here in recent years
  • Better runtime visibility into procedural languages
  • Per-connection statistics (almost all stats today are at the instance or database level) – I wrote a tool to snapshot statistics before and after a query so that you could get a report on exactly what the query did, and a lot of the stats are only useful if you run the tool on an otherwise idle system.

As I’ve noted in

[...]

Celebrating 5,000 GitHub Stars for CloudNativePG
Posted by Gabriele Bartolini in EDB on 2025-01-08 at 10:24

CloudNativePG has surpassed 5,000 stars on GitHub! More than just a number, this achievement reflects the trust, enthusiasm, and collaboration of the Postgres and Kubernetes open-source community. I look back at the journey, acknowledge the incredible contributions from users and maintainers, and invite everyone to join us in shaping the future of cloud-native PostgreSQL.

Postgres backend statistics (Part 1): I/O statistics
Posted by Bertrand Drouvot on 2025-01-07 at 05:26

Introduction

PostgreSQL 18 will normally (as there is always a risk of seeing something reverted until its GA release) include this commit: Add backend-level statistics to pgstats.

commit 9aea73fc61d4e77e000724ce0b2f896590a10e03
Author: Michael Paquier 
Date:   Thu Dec 19 13:19:22 2024 +0900

Add backend-level statistics to pgstats

This adds a new variable-numbered statistics kind in pgstats, where the
object ID key of the stats entries is based on the proc number of the
backends.  This acts as an upper-bound for the number of stats entries
that can exist at once.  The entries are created when a backend starts
after authentication succeeds, and are removed when the backend exits,
making the stats entry exist for as long as their backend is up and
running.  These are not written to the pgstats file at shutdown (note
that write_to_file is disabled, as a safety measure).

Currently, these stats include only information about the I/O generated
by a backend, using the same layer as pg_stat_io, except that it is now
possible to know how much activity is happening in each backend rather
than an overall aggregate of all the activity.  A function called
pg_stat_get_backend_io() is added to access this data depending on the
PID of a backend.  The existing structure could be expanded in the
future to add more information about other statistics related to
backends, depending on requirements or ideas.

Auxiliary processes are not included in this set of statistics.  These
are less interesting to have than normal backends as they have dedicated
entries in pg_stat_io, and stats kinds of their own.

This commit includes also pg_stat_reset_backend_stats(), function able
to reset all the stats associated to a single backend.

It means that, thanks to the new pg_stat_get_backend_io() function, we can see the I/O activity in each backend (in addition to the overall aggregate still available through the pg_stat_io view).

Let’s look at some examples

Thanks to this new pg_stat_get_backend_io

[...]

Running an Async Web Query Queue with Procedures and pg_cron
Posted by Paul Ramsey in Crunchy Data on 2025-01-06 at 14:30

The number of cool things you can do with the http extension is large, but putting those things into production raises an important problem.

The amount of time an HTTP request takes, 100s of milliseconds, is 10- to 20-times longer that the amount of time a normal database query takes.

This means that potentially an HTTP call could jam up a query for a long time. I recently ran an HTTP function in an update against a relatively small 1000 record table.

The query took 5 minutes to run, and during that time the table was locked to other access, since the update touched every row.

This was fine for me on my developer database on my laptop. In a production system, it would not be fine.

Geocoding, For Example

A really common table layout in a spatially enabled enterprise system is a table of addresses with an associated location for each address.

CREATE EXTENSION postgis;

CREATE TABLE addresses (
  pk serial PRIMARY KEY,
  address text,
  city text,
  geom geometry(Point, 4326),
  geocode jsonb
);

CREATE INDEX addresses_geom_x
  ON addresses USING GIST (geom);

INSERT INTO addresses (address, city)
  VALUES ('1650 Chandler Avenue', 'Victoria'),
         ('122 Simcoe Street', 'Victoria');

New addresses get inserted without known locations. The system needs to call an external geocoding service to get locations.

SELECT * FROM addresses;
 pk |       address        |   city   | geom | geocode
----+----------------------+----------+------+---------
  8 | 1650 Chandler Avenue | Victoria |      |
  9 | 122 Simcoe Street    | Victoria |      |

When a new address is inserted into the system, it would be great to geocode it. A trigger would make a lot of sense, but a trigger will run in the same transaction as the insert. So the insert will block until the geocode call is complete. That could take a while. If the system is under load, inserts will pile up, all waiting for their geocodes.

Procedures to the Rescue

A better performing approach would be to insert the address right away

[...]

Waiting for PostgreSQL 18 – psql: Add more information about service name
Posted by Hubert 'depesz' Lubaczewski on 2025-01-06 at 09:43
On 18th of December 2024, Michael Paquier committed patch: psql: Add more information about service name   This commit adds support for the following items in psql, able to show a service name, when available: - Variable SERVICE. - Substitution %s in PROMPT{1,2,3}.   This relies on 4b99fed7541e, that has made the service name available … Continue reading "Waiting for PostgreSQL 18 – psql: Add more information about service name"

PostgreSQL Post Statistics for 2025
Posted by Stefanie Janine on 2025-01-05 at 23:00

A short review of PostgreSQL post rankings in 2024. The counting is limited, as there are no server logs and no cookies used at ProOpenSource websites.

The only tracking used is images on another instance with matomo. As some more protective browsers are blocking third party stuff, there is no counting for those.

Overall Statistics

As you may see in the image below, the access went up each year since the start of the blogs in 2021.

Overall statistics since 2021

First Guest Post

Last year the blog had the first guest post by Emma Saroyan, she wrote about her experiences at PGConf NYC,

Emma Saroyan presenting at PGConf NYC

Top Three Posts in 2024

Number 1

Handling BLOBs In PostgreSQL has been published on October 16 2024 and that is a bit unexpected to have it on No. 1.

Number 2

PostgreSQL with PostGIS on Android has been published on February 13, 2023. It had nearly the same number in 2023 as in 2024.

Number 3

PostgreSQL Connection Poolers published on July 29 2024, seems to be interesting to a lot of people.

Browser Statistics

The pie chart shows, sthst most people are using Chrome and Chrome Mobile to read the blog.

Browser statistics as pie chart for 2024

Country Statistics

Most of all visits have been from the USA in 2024.

World map with countries visiting the blog in 2024

Most Visited Entry Page

The conference calendar has been the most successful entry page in 2024.

PgPedia Week, 2025-01-05
Posted by Ian Barwick on 2025-01-05 at 20:42

A very short "Week" this week, as the end-of-year holiday season inevitably sees a global lull in activity - we're all only human, after all. Wishing everyone a happy new PostgreSQL year!

PostgreSQL 18 changes

No user-visible features or other changes were added this week.

more...

Investigating Memoize's Boundaries
Posted by Andrei Lepikhov in Postgres Professional on 2025-01-03 at 14:01

During the New Year holiday week, I want to glance at one of Postgres' most robust features: the internal caching technique for query trees, also known as memoisation.

Introduced with commit 9eacee2 in 2021, the Memoize node fills the performance gap between HashJoin and parameterised NestLoop: having a couple of big tables, we sometimes need to join only minor row subsets from these tables. In that case, the parameterised NestLoop algorithm does the job much faster than HashJoin. However, the outer size is critical for performance and may cause NestLoop to be rejected just because of massive repetitive scans of inner input.

When predicting multiple duplicates in the outer column that participate as a parameter in the inner side of a join, the optimiser can insert a Memoize node. This node caches the results of the inner query subtree scan for each parameter value and reuses these results if the known value from the outer side reappears later.

This feature is highly beneficial. However, user migration reports indicate that there are still some cases in PostgreSQL where this feature does not apply, leading to significant drops in query execution time. In this post, I will compare the caching methods for intermediate results in PostgreSQL and SQL Server.

Memoisation for SEMI/ANTI JOIN

Let me introduce a couple of tables:

DROP TABLE IF EXISTS t1,t2;
CREATE TABLE t1 (x integer);
INSERT INTO t1 (x)
  SELECT value % 10 FROM generate_series(1,1000) AS value;
CREATE TABLE t2 (x integer, y integer);
INSERT INTO t2 (x,y)
  SELECT value, value%100 FROM generate_series(1,100000) AS value;
CREATE INDEX t2_idx ON t2(x,y);
VACUUM ANALYZE t1,t2;

In Postgres, a simple join of these tables prefers parameterised NestLoop with memoisation:

EXPLAIN (COSTS OFF)
SELECT t1.* FROM t1 JOIN t2 ON (t1.x = t2.x);
/*
 Nested Loop
   ->  Seq Scan on t1
   ->  Memoize
         Cache Key: t1.x
         Cache Mode: logical
         ->  Index Scan using t2_idx on t2
               Index Cond: (x = t1.x)
*/

The smalle

[...]

SQL/JSON Path Playground Update
Posted by David Wheeler in Tembo on 2024-12-31 at 20:40

Based on the recently-released Go JSONPath and JSONTree playgrounds, I’ve updated the design and of the SQL/JSON Playground. It now comes populated with sample JSON borrowed from RFC 9535, as well as a selection of queries that randomly populate the query field on each reload. I believe this makes the playground nicer to start using, not to mention more pleasing to the eye.

The playground has also been updated to use the recently-released sqljson/path v0.2 package, which replicates a few changes included in the PostgreSQL 17 release. Notably, the .string() function no longer uses a time zone or variable format to for dates and times.

Curious to see it in action? Check it out!

Challenges of Postgres Containers
Posted by Jeremy Schneider on 2024-12-31 at 10:52

Many enterprise workloads are being migrated from commercial databases like Oracle and SQL Server to Postgres, which brings anxiety and challenges for mature operational teams. Learning a new database like Postgres sounds intimidating. In practice, most of the concepts directly transfer from databases like SQL Server and Oracle. Transactions, SQL syntax, explain plans, connection management, redo (aka transaction/write-ahead logging), backup and recovery – all have direct parallels. The two biggest differences in Postgres are: (1) vacuum and (2) the whole “open source” and decentralized development paradigm… once you learn those, the rest is gravy. Get a commercial support contract if you need to, try out some training; there are several companies offering these. Re-kindle the curiosity that got us into databases originally, take your time learning day-by-day, connect with other Postgres people online where you can ask questions, and you’ll be fine!

Nonetheless: the anxiety is compounded when you’re learning two new things: both Postgres and containers. I pivoted to Postgres in 2017, and I’m learning containers now. (I know I’m 10 years late getting off the sidelines and into the containers game, but I was doing lots of other interesting things!)

Postgres was already one of the most-pulled images on Docker Hub back in 2019 (10M+) and unsurprisingly it continues to be among the most-pulled images today (1B+). Local development and testing with Postgres has never been easier. For many developers, docker run postgres -e POSTGRES_PASSWORD=mysecret has replaced installers and package managers and desktop GUIs in their local dev & test workflows.

With the widespread adoption of kubernetes, the maturing of it’s support for stateful workloads, and the growing availability of Postgres operators – containers are increasingly being used throughout the full lifecycle of the database. They aren’t just for dev & test: they’re for production too.

Containers will dominate the future of Postgres, if only becaus

[...]

Waiting for PostgreSQL 18 – Add UUID version 7 generation function.
Posted by Hubert 'depesz' Lubaczewski on 2024-12-31 at 09:37
On 11st of December 2024, Masahiko Sawada committed patch: Add UUID version 7 generation function.   This commit introduces the uuidv7() SQL function, which generates UUID version 7 as specified in RFC 9652. UUIDv7 combines a Unix timestamp in milliseconds and random bits, offering both uniqueness and sortability.   In our implementation, the 12-bit sub-millisecond … Continue reading "Waiting for PostgreSQL 18 – Add UUID version 7 generation function."

Can we use this index, please? – Why not?
Posted by Henrietta Dombrovskaya on 2024-12-30 at 02:30

It’s Christmas time and relatively quiet in my day job, so let’s make it story time again! One more tale from the trenches: how wrong you can go with one table and one index?

Several weeks ago, a user asked me why one of the queries had an “inconsistent performance.” According to the user, “Sometimes it takes three minutes, sometimes thirty, or just never finishes.” After taking a look at the query, I could tell that the actual problem was not the 30+ minutes, but 3 minutes – when you have a several hundred million row table and your select yields just over a thousand rows, it’s a classical “short query,” so you should be able to get results in milliseconds.

The original query was over a view with self-join, and at first, I suspected that something was wrong with the view itself, but then I got it down to one SELECT from one table, which was indeed super-slow: taking minutes while it should have taken seconds. The “inconsistency” was due to the high I/O and dependent on what was in the shared buffers at the execution time. The query looked like this:

SELECT * FROM large_table
  WHERE col1='AAA'
  AND col2='BCD'
  AND created_at BETWEEN '01-01-2012' AND '12-31-2012'
  AND extract (hour FROM created_at)=16
  AND extract (minute FROM created_at)=15

There was an index on all of the attributes which were referenced in the query:

CREATE INDEX large_table_index ON large_table (col1, col2, created_at);

The query plan looked perfect: INDEX SCAN using that index; however, the query was incredibly slow because, for each fetched record, the hour and minute had to be verified (and you’ve already guessed that the table was not only large but also wide).

According to the execution plan, the number of rows selected during the index scan was about 30M, and subsequent filtering reduced it to a little bit over 1K. I started to think that as ridiculous as it sounds, it could be a good idea to create an additional partial index or to include the “hour” and “minute” parts into the index. (Un)fortunately, both of

[...]

PgPedia Week, 2024-12-29
Posted by Ian Barwick on 2024-12-29 at 20:21

Another calendar year draws to an end, so this will be the last PgPedia Week of 2024.

PostgreSQL 18 changes

As always, the end-of-year holidays mean commit log activity is lower than usual, but we did see one new potential performance improvement, with commit 58a359e5 (" Speedup tuple deformation with additional function inlining ") promising query performance increases of around 5-20% in deform-heavy OLAP-type workloads.

more...

CloudNativePG in 2024: Milestones, Innovations, and Reflections
Posted by Gabriele Bartolini in EDB on 2024-12-29 at 11:27

2024 was a transformative year for CloudNativePG, marked by significant contributions to the Kubernetes ecosystem, increased adoption, and a growing community. This article reflects on key milestones, including the integration of advanced Kubernetes features, conference highlights, and personal insights, while looking ahead to the opportunities awaiting PostgreSQL in the cloud-native era.

Name Collision of the Year: Vector
Posted by Elizabeth Garrett Christensen in Crunchy Data on 2024-12-26 at 13:30

I can’t get through a zoom call, a conference talk, or an afternoon scroll through LinkedIn without hearing about vectors. Do you feel like the term vector is everywhere this year? It is. Vector actually means several different things and it's confusing. Vector means AI data, GIS locations, digital graphics, and a type of query optimization, and more. The terms and uses are related, sure. They all stem from the same original concept. However their practical applications are quite different. So “Vector” is my choice for this year’s name collision of the year.

In this post I want to break down the vector. The history of the vector, how vectors were used in the past and how they evolved to what they are today (with examples!).

The original vector

The idea that vectors are based on goes back to the 1500s when René Descartes first developed the Cartesian coordinate XY system to represent points in space. Descartes didn't use the word vector but he did develop a numerical representation of a location and direction. Numerical locations is the foundational concept of the vector - used for measuring spatial relationships.

The first use of the term vector was in the 1840s by an Irish mathematician named William Rowan Hamilton. Hamilton defined a vector as a quantity with both magnitude and direction in three-dimensional space. He used it to describe geometric directions and distances, like arrows in 3D space. Hamilton combined his vectors with several other math terms to solve problems with rotation and three dimensional units.

image.png

The word Hamilton chose, vector, comes from the Latin word vehere meaning ‘to carry’ or ‘conveyor’ (yes, same origin for the word vehicle). We assume Hamilton chose this Latin word origin to emphasize the idea of a vector carrying a point from one location to another.

There’s a book about the history of vectors published just this year, and a nice summary here. I’ve already let Santa know this is on my list this year.

Mathematical vectors

Building upon Hamilto

[...]

CNPG Recipe 16 - Balancing Data Durability and Self-Healing with Synchronous Replication
Posted by Gabriele Bartolini in EDB on 2024-12-26 at 09:57

CloudNativePG 1.25 enhances control of PostgreSQL synchronous replication with a new dataDurability option, allowing you to choose between prioritising data consistency or self-healing capabilities. This article explains the feature, contrasts it with previous approaches, and provides guidance on migrating to the new API format.

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.