Tablespaces play an important role in database management systems, as they determine where and how database objects like tables and indexes are stored. Both Oracle and PostgreSQL have the concept of tablespaces, but they implement them differently based on the overall architecture of each database.
The post Transitioning from Oracle to PostgreSQL: Tablespaces appeared first on Stormatics.
With the introduction of failover slots in PostgreSQL-17, logical replication can now be effectively utilized in high-availability environments. This feature ensures that logical replication continues seamlessly even if the publisher node fails and its physical standby is promoted as the new publisher.
To maintain continuous replication after a failover, you need to configure the following parameters and options for both logical and physical replication setups:
1. Enable Failover Property on Slots:
• Set the failover property for the slots corresponding to the subscriptions that need to continue receiving changes after a failover. This can be done using the failover option in CREATE SUBSCRIPTION or the failover parameter in the pg_create_logical_replication_slot() SQL API. These slots will be referred to as failover slots.
2. Synchronize Replication Slots:
• Set the sync_replication_slots parameter to on on the standby. This ensures that the failover slots are periodically synchronized by the slotsync worker.
3. Configure Physical Replication Slot:
• Ensure a physical replication slot exists between the primary and the standby. Set primary_slot_name on the standby and ensure the slot with that name exists on the primary. Also, enable hot_standby_feedback on the standby to prevent row removal needed by the standby.
4. Prevent Subscriber Overrun:
• On the primary, set the synchronized_standby_slots parameter to include the physical replication slot. This prevents the subscriber from consuming changes faster than the hot standby can handle.
5. Update Connection Info:
• Modify primary_conninfo on the standby to include dbname=postgres. This is necessary for the slot sync worker to connect to the primary.
Example Setup
Here's an example demonstrating logical replication continuity after failover using three nodes: Node-A (primary server/publisher), Node-B (physical standby for Node-A), and Node-C (subscriber for Node-A).
As usual, this project was prompted by multiple user reports with typical complaints, like 'SQL server executes the query times faster' or 'Postgres doesn't pick up my index'. The underlying issue that united these reports was frequently used VALUES sequences, typically transformed in the query tree into an SEMI JOIN
.
I also want to argue one general question: Should an open-source DBMS correct user errors? I mean optimising a query even before the search for an optimal plan begins, eliminating self-joins, subqueries, and simplifying expressions - everything that can be achieved by proper query tuning. The question is not that simple since DBAs point out that the cost of query planning in Oracle overgrows with the complexity of the query text, which is most likely caused, among other things, by the extensive range of optimisation rules.
Now, let's turn our attention to the VALUES
construct. Interestingly, it's not just used with the INSERT
command but also frequently appears in SELECT
queries in the form of a test of inclusion in a set:
SELECT * FROM something WHERE x IN (VALUES (1), (2), ...);
and in the query plan, this syntactical construct is transformed into SEMI JOIN. To demonstrate the essence of the problem, let's generate a test table with an uneven distribution of data in one of the columns:
CREATE EXTENSION tablefunc;
CREATE TABLE norm_test AS
SELECT abs(r::integer) AS x, 'abc'||r AS payload
FROM normal_rand(1000, 1., 10.) AS r;
CREATE INDEX ON norm_test (x);
ANALYZE norm_test;
here, the value x
of the norm_test
table has a normal distribution with a mean of 1 and a standard deviation 10 [1]. There are not too many distinct values, which will all be included in the MCV statistics. As a result, it will be possible to calculate the number of duplicates accurately for each value despite the uneven distribution. Also, we naturally introduced an index on this column, easing the table’s scanning. Now, let's execute the query:
EXPLAIN ANALYZE
SELECT * FROM norm_test W
[...]
In one of the recent PostgresFM episodes, Michael Christofides and Nikolay discussed planning time and what can affect it. One of the obvious negative factors we've discussed is the large number of partitions a partition table might have.
In this post, we're using our Postgres.AI assistant to see how planning time depends on the number of partitions.
This is the very first blog post of its kind: it has an integrated AI that you can use to explore the topic discussed here further, to repeat the experiment, alter it in any direction, and study the Postgres behavior more effectively.
If you, like some of us, prefer reading summaries, please jump straight to the AI chat section and ask for the "TL;DR" version. We don't mind.
It's quite a well-known fact that with the growth in the number of partitions, planning time also grows. The official documentation says:
It is important to consider the overhead of partitioning during query planning and execution. The query planner is generally able to handle partition hierarchies with up to a few thousand partitions fairly well, provided that typical queries allow the query planner to prune all but a small number of partitions. Planning times become longer and memory consumption becomes higher when more partitions remain after the planner performs partition pruning.
There is a long thread in the pgsql-hackers mailing list where this problem is discussed, and where there are patches proposed to improve this overhead:
This work is still in progress (currently, patches need a next round of reviews). Meanwhile, fast-growing OLTP projects need to find a way to deal with multi-terabate tables, and partitioning is widely used. We thought it would be a good idea to conduct a simple database experiment to show how exactly the planning time depends on the
[...]
There are multiple tools to run benchmarks on Postgres, but pgbench is probably the most widely used one. The workload is very simple and perhaps a bit synthetic, but almost everyone is familiar with it and it’s a very convenient way to do quick tests and assessments. It was improved in various ways (e.g. to do partitioning), but the initial data load is still serial - only a single process does the COPY
. Which annoys me - it may take a lot of time before I can start with the benchmarks itself.
This week’s “first patch” idea is to extend pgbench -i
to allow the data load to happen in parallel, with multiple clients generating and sending the data.
© Laurenz Albe 2024
Every year in fall, there is a new PostgreSQL release. After looking at the highlights of PostgreSQL v17, you may think, “what's the big deal?” Quite a few people might even be unhappy about the reminder that they should really upgrade some time soon. Time to explain how wonderful PostgreSQL v17 is!
Well, there are — I'm going to rant about them later. But there is certainly no eye-catcher like “automatic sharding for friction-less horizontal scaling” or “built-in automatic fail-over for high availability”. That's not because PostgreSQL has lost its momentum: in fact, there are more contributors today than ever before. There are several explanations for this seeming lack of innovation.
Over the decades, PostgreSQL has grown a lot. If I think back to release 8.1, the first release I worked with: autovacuum was still something new and somewhat experimental, replication was something scary you did with Slony, and so on. The average DBA had never heard of PostgreSQL. It is amazing to think of all the new features that have come since. How could we ever live without them?
Many smart people have contributed many great things over the years. Most of the easy, obvious improvements (and some difficult ones!) have already been made. The remaining missing features are the really hard ones.
Over the years, as the number of contributors and the world-wide importance of PostgreSQL have grown, so have the the demands on new contributions. Today, each code contribution has to go through a peer review process. An ever-increasing number of patches vie for the interest of reviewers and committers. Spending time to improve and merge somebody else's work is much less attractive than working on your own cool feature. This narrow bottleneck means that you need a lot of time and determination if you want to get your contribution c
[...]In previous blogs, we talked about an overview of PostgreSQL’s table access method API here , how sequential scan is handled within this API here, and how data insertion is handled here. Today in this blog, we will look closely into how PostgreSQL handles update. A successful update in PostgreSQL can be viewed as “insert a new record” while “marking the old record as invisible” due to the MVCC technique that PostgreSQL employs. It sounds simple enough, but there are quite a lot of considerations in place to make a successful update. Let’s dive in.
2 Table access method APIs are primarily involved when performing an update
To perform an update, PostgreSQL performs a series of checks and considerations before it can perform the upgrade. This process is illustrated in the diagram below:
The very first check the main update routine performs is to determine the columns to be updated. Particularly to find out if identity key columns have been update. This could be a primary key, index key, or partition key. It needs t
[...]
The soucre code is available on GitLab, a mirror is hosted on GitHub.
One could install the whole package, or just copy what is needed from the source code.
The extension is also available on PGXN.
Due to the problem that pgTAP is using the same function name, is_empty, there was a problem when pgTAP has been installed before pgsql-tweaks.
To solve this problem and to not break existing code, I decided to install the function under a different name, is_empty_b, when pgTAP is already installed.
When pgTAP is not installed, the function name is still is_empty to not break existing code using the function.
There is also a differnt in the the functions between the two implementations. While the pgTAP function returns text, the pgsql-tweaks function returns a boolean result.
I have also created an issue at the pgTAP. because I can only solve the problem of pgTAP has been installed before pgsql-tweaks.
There have been no new or changed features, therefore this is only a minor release.
The soucre code is available on GitLab, a mirror is hosted on GitHub.
One could install the whole package, or just copy what is needed from the source code.
The extension is also available on PGXN.
In this release the recommended PostgreSQL version has been changed to 17.
PostgreSQL 11 has been removed from the list of supported versions.
There have been no new or changed features, therefore this is only a minor release.
Upgrading your PostgreSQL cluster is an important task to keep your database running smoothly and securely. With each new release, PostgreSQL introduces performance improvements, security patches, and new features that can benefit your system. However, upgrading can be a bit tricky, especially if you're working in a Windows environment, where certain challenges like permissions, service management, and file handling may differ from Linux setups. In this blog, we’ll walk you through the process of performing an upgrade on a PostgreSQL cluster in Windows, covering the key steps to ensure everything goes smoothly without causing data loss.
The post Using pg_upgrade to Upgrading Your PostgreSQL Cluster on Windows appeared first on Stormatics.
In my last blog post, Evolution of Logical Replication, I mentioned the future development of a feature to allow "upgrades of logical replication nodes." The upcoming release of PostgreSQL 17 includes this feature. Previously, after major version upgrades (via pg_upgrade), users couldn't immediately connect and write data to logical replication nodes. This was because the slots were lost during upgrades, preventing replication from continuing. As a result, new writes wouldn't get replicated, causing data on both nodes to become out of sync. As explained in this blog post, users had to block applications from writing until the replication setup was re-enabled after the upgrade.
With PostgreSQL 17, logical replication nodes can be upgraded without blocking writes or requiring users to manually detach/attach subscriptions or create slots. The migration of logical slots is supported only when the old cluster is version 17.0 or later, so users will benefit from this feature when upgrading from 17.0 to 18.0 (or later versions). Refer to the PostgreSQL documentation for the prerequisites for upgrading publisher and subscriber nodes.
This blog will delve into the internals of this feature and provide an example of upgrading a two-node logical replica setup. The later part of this blog will talk about online upgrading physical replication setups.
Let's start with an example of upgrading a logical replica setup where the old and new versions of databases are PostgreSQL 17.
The PostGIS Team is pleased to release PostGIS 3.5.0! Best Served with PostgreSQL 17 RC1 and GEOS 3.13.0.
This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5 is needed.
Cheat Sheets:
This release is a feature release that includes bug fixes since PostGIS 3.4.3, new features, and a few breaking changes.
A couple days ago I had a bit of free time in the evening, and I was bored, so I decided to play with BOLT a little bit. No, not the dog from a Disney movie, the BOLT tool from LLVM project, aimed at optimizing binaries. It took me a while to get it working, but the results are unexpectedly good, in some cases up to 40%. So let me share my notes and benchmark results, and maybe there’s something we can learn from it. We’ll start by going through a couple rabbit holes first, though.
As I mentioned in my last blog post, as your cluster grows with multiple standby servers and potentially automated failover (using tools like Patroni), it becomes more practical to set up a dedicated repository host, also known as a dedicated backup server. This backup server can then trigger backups and automatically select the appropriate node in case of failover, eliminating the need for manual intervention.
In this post, I’ll show you how easy it is to add a repository host to an existing cluster. I’ll also give you a sneak peek at a new feature expected to be included in the next pgBackRest release 😉
In this example, we pick up from where we left off last time: a primary server (pg1
) with a standby (pg2
), both already configured to use pgBackRest (with an NFS mount) for backups taken from the standby. Now, we will add a new node, repo1
, to take over pgBackRest backups.
The pgBackRest user guide provides a comprehensive overview of how to set up a repository host. Since pgBackRest needs to interact with local processes on each node, we must enable communication between the hosts, either through passwordless SSH or TLS with client certificates. While SSH is generally easier to set up, TLS offers better performance. If you’re interested in an example of the TLS setup, I wrote this blog post when the feature was first introduced.
Let’s return to our repository host setup. The first step, of course, is to install pgBackRest:
$ sudo dnf install pgbackrest -y
Any user can own the repository, but it’s best to avoid using the postgres
user (if it exists) to prevent confusion. Instead, let’s create a dedicated system user for this purpose:
$ sudo groupadd pgbackrest
$ sudo adduser -g pgbackrest -n pgbackrest
$ sudo chown -R pgbackrest: /var/log/pgbackrest/
The SSH setup is up to you, but usually it is as simple as creating SSH keys and authorize them on the other nodes. Example:
# From repo1
[pgbackre
As a senior database consultant and developer at CYBERTEC PostgreSQL International, I enjoyed speaking at PGDay Hyderabad 2024. This event marked two firsts for me: my first time visiting India and the first-ever PostgreSQL conference held in Hyderabad. I was filled with excitement and curiosity as I started this journey. I didn't know what to expect, but I was eager to experience it all.
I arrived in Hyderabad late at night, around 2:00 AM. While I was a little tired from the long flight, I was extremely excited.. However, I made two rookie mistakes. First, I didn't buy a local SIM card, which left me without mobile internet for the whole trip — a particularly unpleasant experience in India, where OTPs are required even for something as simple as Uber. Second, I didn't exchange money for local currency, leading to small logistical challenges. And the driving culture? Well, let's just say that seeing it live was a real surprise! ????
Being in India for the first time was a cultural shock — but in a good way! Everything here is different from what you're used to in Europe. The tempo of life is wild, and the cuisine is an adventure of its own. Yes, it's spicy, but I handled it! If you ever have the opportunity to visit India, please do so. It's an unforgettable experience.
Despite the rain pouring down the entire day, I was glad I took a full-day tour of Hyderabad. I visited Golconda Fort, The Charminar, Laad Bazaar, and Mecca Masjid Mosque. One of the highlights was stopping at the famous Nimrah Cafe, where the owner welcomed me and gave me a kitchen tour. I sampled an assortment of delicious pastries and tasted their renowned tea. Spending time there, soaking in the atmosphere, was a beautiful experience. I highly recommend it!
In general I had no particular expectations at my first PostgreSQL conference in India. I just wanted to live in the moment and take everything in as it un
[...]Since the PostgreSQL 17 RC1 on a home run towards the official PostgreSQL release, scheduled for September 26, 2024.
Letʼs take a look at the patches that came in during the March CommitFest. Previous articles about PostgreSQL 17 CommitFests: 2023-07, 2023-09, 2023-11, 2024-01.
Together, these give an idea of what the new PostgreSQL will look like.
Unfortunately, some previously accepted patches didn't make it in after all. Some of the notable ones:
Now, letʼs get to the new stuff.
With RC1 freshly cut, the release of Postgres 17 is right on the horizon, giving us a host of features, improvements, and optimizations to look forward to.
As a backend developer, one in particular pops off the page, distinguishing itself amongst the dozens of new release items:
Allow btree indexes to more efficiently find a set of values, such as those supplied by IN clauses using constants (Peter Geoghegan, Matthias van de Meent)
The B-tree is Postgres' overwhelmingly most common and best optimized index, used for lookups on a table's primary key or secondary indexes, and undoubtedly powering all kinds of applications all over the world, many of which we interact with on a daily basis.
During lookups, a B-tree is scanned, with Postgres descending down through its hierarchy from the root until it finds a target value on one of its leaf pages. Previously, multi-value lookups like id IN (1, 2, 3)
or id = any(1, 2, 3)
would require that process be repeated multiple times, once for each of the requested values. Although not perfectly efficient, it wasn't a huge problem because B-tree lookups are very fast. It'd take an extremely performance sensitive user to even notice the deficiency.
As of a Postgres 17 enhancement to nbtree's ScalaryArrayOp
execution, that's no longer always the case. Any particular scan with multiple scalar inputs will consider all those inputs as it's traversing a B-tree, and where multiple values land on the same leaf page, they're retrieved together to avoid repetitive traversals.
A narrowly focused script to demonstrate the original problem shows a dramatic performance increase before and after ScalaryArrayOp
improvement, so we already know the gains are very real. With Postgres 17 so close to hand, we wanted to try to measure what kind of gain a realistic web app might expect from the optimization by testing it against the real API service that powers Crunchy Bridge.
In our experiment we saw roughly a 30% improvement in throughput 20% drop in average r
[...]Floor Drees wrote an article, how the Postgres community migrated to Mastodon, and which tools can be used.
Elodie Jex (Instagram: elodie.s_art_) designed the three proposals for the PostgreSQL Europe Diversity Task Force, from which one was selected as the final logo.
The following people contributed to the translation of the press release for the upcoming PostgreSQL v17 version:
Recently, we’ve received many questions about how to take backups from a standby server using pgBackRest. In this post, I’d like to clarify one of the most frequently asked questions and address a common misconception for new users.
First of all, it’s important to understand that taking a backup exclusively from the standby server is not currently possible. When you trigger a backup from the standby, pgBackRest creates a standby backup that is identical to a backup performed on the primary. It does this by starting/stopping the backup on the primary, copying only files that are replicated from the standby, then copying the remaining few files from the primary.
For this setup to work, both the primary and standby servers must share a common backup repository. This can be any supported repository type.
Let’s take an example, using an NFS mount point.
Both the primary (pg1
) and the standby (pg2
) are seeing the same content of the mentioned NFS mount:
[postgres@pg1 ~]$ ls /shared/
[postgres@pg1 ~]$ touch /shared/test_write_from the primary
[postgres@pg1 ~]$ mkdir /shared/pgbackrest
[postgres@pg2 ~]$ ls /shared
pgbackrest test_write_from
And we’ve got a working replication connection:
postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 27773
usename | replicator
application_name | pg2
state | streaming
sent_lsn | 0/500AD6C8
write_lsn | 0/500AD6C8
flush_lsn | 0/500AD6C8
replay_lsn | 0/500AD6C8
sync_state | async
Let’s configure pgBackRest on pg1
and pg2
:
$ pgbackrest version
pgBackRest 2.53.1
$ cat<<EOF | sudo tee "/etc/pgbackrest.conf"
[global]
repo1-path=/shared/pgbackrest
repo1-retention-full=4
repo1-bundle=y
repo1-block=y
start-fast=y
log-level-console=info
log-level-file=detail
delta=y
process-max=2
compress-type=zst
[mycluster]
pg1-path=/var/lib/pgsql/16/dat
Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.