For database experts well-versed in Oracle, moving to PostgreSQL opens up new indexing methods that differ significantly in terms of structure, management, and optimization. While both databases leverage indexing to enhance query speed, their approaches vary, particularly in terms of available types, performance tuning, and maintenance. This guide clarifies key differences and provides practical strategies for effectively handling indexes in PostgreSQL.
Understanding Indexing in Databases: The Basics
Indexes reduce query time by creating a more accessible data structure, limiting the need to scan entire tables. Think of them as a ‘Table of Contents’ of sorts to quickly look up the relevant data. However, indexes consume storage and require careful planning—creating too many or inefficient indexes can degrade performance. Both Oracle and PostgreSQL offer various index types, each suited for specific tasks. Here is where they align and where PostgreSQL introduces unique indexing options.
Types of Indexes in Oracle vs. PostgreSQL
B-tree Indexes
Oracle: The default index type, suitable for common lookup operations, range conditions, and queries using comparison operators.
PostgreSQL: B-tree indexes are also default in PostgreSQL, optimized for single and range lookups, and offer operator class flexibility for more precise control.
Bitmap Indexes
Oracle: Bitmap indexes optimize performance for low-cardinality columns with complex WHERE clauses.
PostgreSQL: While bitmap indexes are not available, PostgreSQL’s query planner can use B-tree indexes with bitmap heap scans to achieve a similar effect. This approach is typically used in complex AND/OR queries but doesn’t fully replicate Oracle’s bitmap capabilities.
Hash Indexes
Oracle: Limited in application and typically used in specialized cases as hash clusters.
PostgreSQL: Offers hash indexes but with restricted use cases. They support only equality operations and require careful selection to avoid unnecessary bloat.
GIN and Gi
[...]Kubernetes (and OpenShift) have become highly popular deployment methods to run PostgreSQL at scale. While Kubernetes is an excellent way of running things it also helps to make things such as consistency, compliance and a lot more easier to achieve. In this post you will learn how to deploy the Zalando operator for PostgreSQL and make things work quickly and easily.
To make things simple for you, I have used Minikube which is an easy to use way to run Kubernetes locally on your development environment or even on your laptop.
If you are new to Minikube I highly recommend to take a look at the "Getting started" guide. It explains the most basic steps in detail.
Let us assume Minikube is already deployed on your machine. You can easily start the system by calling the following command:
$ # We do start of a minikube with:
$ minikube start
We can verify if things are working as follows:
$ # Kubectl should be working and i.e.
$ kubectl get pod
...
$ # should return list of pods in 'default' namespace
Once we have ensured that Kubernetes is indeed running we can download the Zalando operator from the Github repository which is easy to do:
$ # Clone of postgresql operator
$ git clone https://github.com/zalando/postgres-operator.git
$ cd postgres-operator
The repository is public and can be cloned directly. In the next step we can simply apply templates. The way this works is that such templates basically describes what we want from Kubernetes. All the vital information of the service we desire to deploy is there.
Before looking at the content of the file we can simply apply the template:
The first thing to throw at Kubernetes is the configmap:
$ # We apply default postgres operator configuration with
$ kubectl create -f manifests/configmap.yaml
The next thing to set is identities and permissions which can be stored in a simple YAML file as well. Here is how it works:
$ # We apply manifest for identity and permissionspsql is a CLI editor that ships with Postgres. It’s incredibly powerful for working with Postgres, and doesn’t take too much of a learning curve to start to get comfortable so you can really feel like an expert working with your database.
Just a rundown of a few things to get you started:
Once connected in psql you can get an idea of all utility commands available with:
\?
A handy thing I use all the time is \d.
\d will describe the tables within database. You can also add a table/index/etc. onto it to describe that specific table such as:
\d accounts
There are a number of options you can set in your psqlrc (config) file to customize your CLI experience. But you can also toggle those when directly working in psql.
\timing will give you the time it took to run your query
\x auto will autoformat your text output
\pset pager 0 turns off your pager or 1 to turn it back on
Oh and for editing a query in your editor of choice. Make sure you set your $EDITOR enviroment variable to the editor of your choice, though the only right choice is vim:
\e
Just a few things to get you started working with psql.
The PL/pgSQL language, available as a default extension in PostgreSQL, provides powerful tools and flexibility for application developers to build complex, enterprise-scale functionality within the database. Through PL/pgSQL’s functions and procedures, developers can choose different volatility categories—IMMUTABLE, STABLE, or VOLATILE—that offer varying performance benefits, especially in terms of result caching and data state awareness. For a deeper dive, refer to the official documentation. Choosing the appropriate volatility for a function is crucial; as they say, “With great power comes great responsibility.“
During one of my migration projects, we ran into unexpected behavior while debugging code converted from Oracle. Below is a recreated example using mock functions to illustrate the issue:
truncate table user_status;
create table user_status(col1 bigint GENERATED BY DEFAULT AS IDENTITY , status text);
create or replace function upd_status(text)
returns bigint language plpgsql
as $$
declare var1 bigint;
begin
with alias1 as (insert into user_status(status) values($1) returning col1)
select * into var1 from alias1;
return var1;
end;$$;
create or replace function lookup_status(text)
returns boolean language plpgsql
immutable parallel safe
as $$
begin return (select exists (select 1 from user_status where status = $1)); end;$$;
create or replace function check_status(text)
returns void language plpgsql
as $$
declare var1 bigint;
begin
var1 := upd_status($1);
if lookup_status($1) then
raise notice 'Status exists as part of immutable check - %', $1;
else
raise notice 'Status does not exists - %', $1;
end if;
raise notice 'Actual Result for status % is %', $1,(select exists (select 1 from user_status where status = $1 and col1 = var1));
end;$$;
If you observe the output, the Open status is inserted into user_status via the check_status call, but any call to lookup_status within the same snapshot doesn’t detect this data change.
It took some time to diagnose the
[...]Whether you are designing a standalone application or a microservice, you will inevitably encounter the topic of sharing identifiers. Whether it’s URLs of web pages, RESTful API resources, JSON documents, CSV exports, or something else, the identifier of specific resources will be exposed.
/orders/123
/products/345/variants/1
While an identifier is just a number and does not carry any negative connotations, there are valid reasons why you might want to avoid exposing them. These reasons include:
These and other reasons (like SEO optimisation) have led to the increased use of text-based identifiers. Their readability and versatility make them ideal for external data sharing.
However, in database (or data model) design, the advantages of text identifiers are often overshadowed by the problems they introduce. While text identifiers improve interoperability, they frequently come with performance and storage trade-offs. In contrast, integers are naturally faster and more efficient to process, resulting in lower storage requirements and faster indexing, sorting, and searching-tasks that computers are optimised for.
In this article, we will explore scenarios where using text identifiers directly in the database design might seem natural and discuss strategies for using them effectively.
Let’s be honest-text identifiers are popular f
[...]It is never immediately obvious how to go from a simple SQL query to a complex one -- especially if it involves intricate calculations. One of the “dangers” of SQL is that you can create an executable query but return the wrong data. For example, it is easy to inflate the value of a calculated field by joining to multiple rows.
Let’s take a look at a sample query. This appears to look for a summary total of invoice amounts across teams. If you look closely, you might see that the joins would inflate a team’s yearly invoice spend for each team member.
SELECT
teams.id,
json_agg(accounts.email),
SUM(invoices.amount)
FROM teams
INNER JOIN team_members ON teams.id = team_members.team_id
INNER JOIN accounts ON teams.id = team_members.team_id
INNER JOIN invoices ON teams.id = invoices.team_id
WHERE lower(invoices.period) > date_trunc('year', current_date)
GROUP BY 1;
The query is joining invoices to teams after already joining team_members to teams. If a team has multiple members and multiple invoices, each invoice amount could be counted multiple times in the SUM(invoices.amount) calculation.
The above error may not be immediately obvious. This is why it’s better to start small and use building blocks.
Writing complex SQL isn’t as much “writing a query” as it is “building a query.” By combining the building blocks, you get the data that you think you are getting. The steps to building are a query are continuous loops until you improve:
Let’s step through this above query example, getting sum aggregate totals, to learn my method for building a query.
It is okay if it changes, because data exploration may mean the data is differ
[...]PostgreSQL is widely known for its Multi-Version Concurrency Control (MVCC) model, which allows multiple transactions to occur simultaneously without interfering with each other.
However, one side effect of MVCC is the creation of dead tuples—old versions of data rows that are no longer needed but still occupy space.
Dead tuples also lead to a phenomenon known as table bloat, which refers to the excessive unused space in a table caused by dead tuples that haven't been cleaned up, resulting in inefficient storage and reduced performance
To address the issues of dead tuples and table bloat, autovacuum comes into play. It's an automatic process designed to clean up these dead tuples and maintain optimal database performance.
In this blog, we will explore the main situations when autovacuum should run:
The post Scenarios That Trigger Autovacuum in PostgreSQL appeared first on Stormatics.
A key aspect of maintaining backup integrity is understanding data checksums. Without proper checksum validation, detecting data corruption becomes virtually impossible. Therefore, we will start with The Importance of Data Checksums.
Ensuring data integrity is crucial for the reliability of any database system. Data checksum validation is essential for ensuring data integrity. Checksums help detect data corruption caused by hardware issues.
1. During Page Writes: PostgreSQL calculates the checksum each time a data page is written to disk. This ensures that any corruption occurring after the write can be detected during future reads.
2. Verification During Reads: Every time a page is read from disk into memory—whether by user queries or maintenance operations—PostgreSQL verifies the checksum to confirm data integrity.
If the checksum does not match, a warning is raised and recorded in the log. The log entry below indicates that a data page is corrupt, and the checksum does not match the expected value. Such logs can be conveniently captured and monitored using the logerrors extension, which helps identify and track these errors for further analysis.
WARNING: page verification failed, calculated checksum 24693 but expected 58183
Although enabling checksums does add extra overhead to the database, the general rule of thumb is to enable this option on every cluster via initdb. This practice is considered so important that, in PostgreSQL 18, there is a strong possibility that data checksum will be enabled by default.
However, remember that checksums are disabled by default in PostgreSQL through version 17, so you must enable them when creating a cluster. If data checksums are not enabled, PostgreSQL provides a tool called pg_checksums. This tool allows you to enable, disable, or verify checksums, but it only works when the cluster is offline.

While enabling checksums provides a baseline for detecting corru
[...]This year, the 14th annual PostgreSQL Conference Europe (PGConf.EU for short) took place from October 22nd to October 25th in the historic city of Athens, Greece, featuring awesome content and awesome Mediterranean weather.
It was my honour to serve on the organising committee, alongside Andreas Scherbaum, Chris Ellis, Dave Page, Ilya Kosmodemiansky, Karen Jex, Magnus Hagander, Tomas Vondra, and Valeria Kaplan. More details on everyone who helped out with running the conference can be found here on postgres-contrib.org.
On October 22nd, before the main conference, there was a full day of training delivered by expert PostgreSQL professionals, and also new and exciting fringe events took place.
One was the inaugural meeting of the PostgreSQL Europe Diversity Committee that I was privileged to attend, chaired by the force of nature that is Karen Jex. Goals, plans and benchmarks were set in a kick-off session that's very promising for the future. The first early results of the initiative were already visible before the meeting, with the achievement of the greatest percentage of female speakers yet at a PGConf.EU. The diversity pinboard in the main conference hallway was a great success and it was very heartening to receive all your messages of encouragement and support.
The PostgreSQL Extension Ecosystem Summit also took place on that day, sponsored by Tembo, Percona, Xata and Timescale, and organized by Floor Drees and David Wheeler. I was invited to speak about the extension I maintain, pg_statviz so I shared some backstory, details and insights. I find that the current effort to revitalise and strengthen extension building for Postgres is hugely important, and fully support the initiative.
The main conference was kicked off on the 23rd with Stacey Haysler's keynote address "The PostgreSQL License Fee" (video here), where it was explained how the PostgreSQL Project works, why you don't have to pay a license fee, and how the contributors (and their employers) are the main driving force behind
[...]The shared presentations are online, as are a couple of recordings and turtle-loading have-a-cup-of-tea locally stored photos.
Using the well known and broadly spread technique of inductive reasoning we came to the conclusion that this fourth PGConf.be conference was a success, as well as the art work. No animals or elephants we’re hurt during this event.
The statistics are
67 participants
depending on the session, an extra 80 to 100 students attended as well
11 speakers
4 sponsors
This conference wouldn’t have been possible without the help of volunteers.
To conclude a big thank you to all the speakers, sponsors and attendants.
Without them a conference is just a like tee party.
TL;DR Kubernetes was built for stateless apps, but as more stateful applications (like databases) run on it, operators include quite heavy implementations to Kubernetes workload management API (such as StatefulSets) deficits. While creating custom methods allows flexibility and faster time to market, it also leads to inconsistency and complexity. In this blog, I want to […]
As databases grow, managing large tables becomes more challenging. Table partitioning is a tried-and-tested approach that helps break down large tables into smaller, more manageable segments, enhancing performance, maintainability, and scalability.
What is Table Partitioning?
Table partitioning is a database design technique that divides a large table into smaller, more manageable sub-tables called partitions. Each partition holds a subset of the data based on specific criteria, such as date ranges, categories, or hash values. While partitioning makes it seem like you’re working with a single large table, behind the scenes, queries and operations are distributed across multiple partitions.
This approach serves several key purposes:
- Performance Improvement: Partitioning allows databases to focus operations (like SELECT, UPDATE, or DELETE) on relevant partitions instead of scanning the entire table. For instance, when querying a sales table for a specific month, only the partition corresponding to that month is accessed, significantly reducing the I/O load and boosting performance.
- Better Manageability: By splitting large tables into smaller segments, maintenance tasks such as indexing, backups, and archiving can be performed on individual partitions. This keeps operations manageable, even for tables with billions of rows.
- Efficient Data Retention and Archiving: Data retention policies are easier to enforce when using partitioning. For example, old partitions can be quickly archived or dropped when data is no longer needed, without affecting the rest of the table.
In both Oracle and PostgreSQL, partitioning is a crucial feature for DBAs managing high-volume databases. Although both systems offer range, list, and hash partitioning methods, the implementation and management vary, which is why understanding the nuances is critical for a seamless transition.
The post Transitioning from Oracle to PostgreSQL: Partitioning appeared first on Stormatics.
As databases grow, managing large tables becomes more challenging. Table partitioning is a tried-and-tested approach that helps break down large tables into smaller, more manageable segments, enhancing performance, maintainability, and scalability.
What is Table Partitioning?
Table partitioning is a database design technique that divides a large table into smaller, more manageable sub-tables called partitions. Each partition holds a subset of the data based on specific criteria, such as date ranges, categories, or hash values. While partitioning makes it seem like you’re working with a single large table, behind the scenes, queries and operations are distributed across multiple partitions.
This approach serves several key purposes:
- Performance Improvement: Partitioning allows databases to focus operations (like SELECT, UPDATE, or DELETE) on relevant partitions instead of scanning the entire table. For instance, when querying a sales table for a specific month, only the partition corresponding to that month is accessed, significantly reducing the I/O load and boosting performance.
- Better Manageability: By splitting large tables into smaller segments, maintenance tasks such as indexing, backups, and archiving can be performed on individual partitions. This keeps operations manageable, even for tables with billions of rows.
- Efficient Data Retention and Archiving: Data retention policies are easier to enforce when using partitioning. For example, old partitions can be quickly archived or dropped when data is no longer needed, without affecting the rest of the table.
In both Oracle and PostgreSQL, partitioning is a crucial feature for DBAs managing high-volume databases. Although both systems offer range, list, and hash partitioning methods, the implementation and management vary, which is why understanding the nuances is critical for a seamless transition.
The post Transitioning from Oracle to PostgreSQL: Partitioning appeared first on Stormatics.
A note about mgirating from other databases…
Well, let’s turn that around: SQLite3 is somehow too flexible in allowing you to store data!
We all know that.
And we all have been fighting situations where we have a well defined structure in SQLite3 and, ocne we try to migrate to PostgreSQL, a bad surprise arrives! As an example, today I was trying to migrate a Django project with the built-in loaddata from a dumpdata, and sadly:
django.db.utils.DataError:
Problem installing fixture '/home/luca/git/respi/respiato/sql/data.respi.json':
Could not load respi.PersonPhoto(pk=30647):
value too long for type character varying(20)
So in my SQLite3 tables some fields (at least one) have exceeded the size of the varchar(20), and while PostgreSQL correctly refuses to store such value(s), SQLite3 happily get them into the database without warning you!
The fix, in this particular case, is quite simple: issueing an ALTER TABLE personphoto ALTER COLUMN file_path SET VARCHAR(50) does suffice. I could have used text also, but I would like to keep under control crazy values incoming from my application.
The point is: sooner or later, you will be stuck against a constraint your stack is not honoring, so be prepared for some troubles.
Using PostgreSQL in first place would have made the long-term maintanance easier, according to me.
PGConf.EU 2024, held in Athens from October 22-25, was the biggest PostgreSQL event in the world yet! It was organized by:
The Call for Papers Committee for PGConf.EU 2024:
The Code of Conduct Committee for PGConf.EU 2024:
The following people (in random order) volunteered to help running the conference:
I wrote a blog post in the past about a generic approach on how to expose databases in Kubernetes with Ingress controllers. Today, we will dive deeper into PostgreSQL with ingress and also review two different ways that can be taken to expose the TCP service. The goal is to expose multiple PostgreSQL clusters through […]
Floor Drees and Andreas Scherbaum took the stage during the PGConf.EU lightning talks to present postgres-contrib.org and the PostgreSQL Event Calendar! Slides are here
Recently, in the Official PostgreSQL Slack, a user posted a PL/pgSQL block (shown below) and was surprised that it did not generate a syntax error:
DO $$
DECLARE i INT;
BEGIN
i = 0;
IF i = 0 AND THEN
RAISE NOTICE 'i = 0';
END IF;
END; $$;
At first glance, this code block seems incomplete. Notice the IF condition: it appears to be missing an additional condition after the AND operator. Logically, this should cause an exception due to the incomplete condition following AND.
IF i = 0 AND THEN
However, during PL/pgSQL execution, the condition is evaluated without any syntax errors or warnings. This raises a critical question:
How does PostgreSQL internally process this condition?
What allows this seemingly incomplete expression to work?
While reviewing such Pl/pgSQL it looks incomplete and assuming that it should fail will be surprising.
In this blog, we’ll dive into the internals of PL/pgSQL to understand how this conditional statement is processed and why PostgreSQL does not flag it as erroneous.
This blog covers the topic both in text and through an accompanying video for a more in-depth look. Scroll down to watch the video or read on for the full written guide.
One of the greatest advantages of open-source software is the ability to examine the code base directly. This gives us a foundation for understanding how things work or, in some cases, why they don’t break as expected.
Our investigation begins with the PLPGSQL_STMT_IF structure, tracing through the call stack in the pl_exec.c file.
Double click to enlarge
By exploring the code, we find that IF statements and their conditions are evaluated using exec_run_select, which effectively executes a SELECT statement that returns a boolean result.
Ready to enhance your PostgreSQL development skills? My course on PL/pgSQL will hel[...]
I have pushed Release 18 of the PostgreSQL Buildfarm client
In addition to numerous minor tweaks and bug fixes, the following changes are made:
The release can be downloaded from
https://github.com/PGBuildFarm/client-code/releases/tag/REL_18 or
https://buildfarm.postgresql.org/downloads
Upgrading is highly recommended.
I wrote a blog post a couple of weeks ago entitled Is pg_dump a Backup Tool?. In that post, I argued in the affirmative, but also said that it's probably shouldn't be your primary backup mechanism. For that, you probably shouldn't directly use anything that is included in PostgreSQL itself, but rather a well-maintained third-party backup tool such as barman or pgbackrest. But today, I want to talk a little more about why I believe that pg_dump is both amazingly useful for solving all kinds of PostgreSQL-related problems and also just a great piece of technology.
Read more »PostgreSQL is one of the most loved databases, especially by developers, for its simplicity, easy configurations, and massive community support. It's an open-source powerhouse known for handling everything from small projects to large-scale applications.
While major cloud providers, like AWS, Google, and Microsoft offer robust solutions for hosting databases on the cloud, not all businesses can or want to go this route
Many companies, choose to store their databases in secure, closed environments—machines without internet access or outside the cloud. This is often done to maintain tight control over sensitive data and to meet strict security requirements. However installing PostgreSQL in a restricted, offline environment can be a real challenge, as it limits access to typical installation tools.
Recently, I worked on a client project with a similar setup—a secure, offline environment without internet access—where we needed to install and configure PostgreSQL from scratch. If you’re facing the challenge of setting up PostgreSQL in a closed environment, this blog will guide you through the process step-by-step.
The post Offline PostgreSQL Installation on RHEL 9: Solving the No Internet Challenge appeared first on Stormatics.
The event was organized by: * Floor Drees * David Wheeler
(A quick interlude in our series on locales and character encodings in PostgreSQL.)
C/POSIX locale is only useful if you know that you will never have characters in your strings other than 7-bit ASCII, or you are 100% sure that you are never going to have mixed client encodings. If you do have non-7-bit-ASCII characters with mixed client encodings, zaniness can result:
c=# \l
List of databases
Name | Owner | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules | Access privileges
-----------+------------------+----------+-----------------+-------------+-------------+-----------+-----------+-------------------
c | xof | UTF8 | libc | POSIX | POSIX | | |
(17 rows [othes deleted])
c=# set client_encoding='UTF8';
SET
c=# CREATE TABLE m (v varchar);
CREATE TABLE
c=# INSERT INTO m VALUES('Í');
INSERT 0 1
c=# set client_encoding='LATIN1';
SET
c=# INSERT INTO m VALUES('é');
INSERT 0 1
c=# SELECT * FROM m;
v
----
?
é
(2 rows)
c=# set client_encoding='UTF8';
SET
c=# SELECT * FROM m;
v
----
Í
é
(2 rows)
c=# INSERT INTO m VALUES('‰');
INSERT 0 1
c=# SELECT * FROM m;
v
----
Í
é
‰
(3 rows)
c=# set client_encoding='LATIN1';
SET
c=# SELECT * FROM m;
ERROR: character with byte sequence 0xe2 0x80 0xb0 in encoding "UTF8" has no equivalent in encoding "LATIN1"
Even worse (in this regard) than C locale is SQL_ASCII locale. Please never use this locale.
For the first time we have a guest writer, Emma Saroyan.
I recently traveled from Armenia to New York City to speak at PGConf NYC 2024, organized by PgUS. PostgreSQL, originally called Postgres, is the #1 open source relational database in the world. I gave a talk about my journey into discovering Postgres, and it was an inspiring experience engaging with the PostgreSQL community, sharing insights, and learning about the latest advancements in the field.
The 3-day conference, held from September 30 to October 2, 2024, at Convene 237 Park Avenue in New York City, welcomed hundreds of attendees from across the US and around the world.
The speakers shared insights on their lessons learned in the Postgres world, along with the latest developments in PostgreSQL’s security, performance, new features, best practices, challenges, community, and more.
Some of the world’s largest software companies, including Microsoft, Google, AWS, and Fujitsu were present at the conference. They are heavily invested in PostgreSQL, making its adoption a great success in the global database market.
Many companies are migrating from alternative database platforms to PostgreSQL due to its free and open-source nature. Companies like EDB, Percona, Citadel, pganalyze, Notion, and Xata showcased prominent use cases for PostgreSQL in areas such as financial services, analytics, and AI. These examples highlight not only why PostgreSQL is the top choice among open-source databases, but also reflect how its strong, collaborative community is a key factor in its widespread adoption.
With over 35 years of active development, PostgreSQL has become the world’s most advanced open source database, excelling in various areas and supported by a vibrant, dedicated community that continuously drives its growth and innovation. There are a number of ways to get involved in the Postgres community: user groups, submitting patches, documentation, events, conferences, blogs,
[...]Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.