PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

If you’ve built your application on Postgres, you already know why so many people love Postgres.

And if you’re new to Postgres, the list of reasons people love Postgres is loooong—and includes things like: 3 decades of database reliability baked in; rich datatypes; support for custom types; myriad index types from B-tree to GIN to BRIN to GiST; support for JSON and JSONB from early days; constraints; foreign data wrappers; rollups; the geospatial capabilities of the PostGIS extension, and all the innovations that come from the many Postgres extensions.

But what to do if your Postgres database gets very large?

What if all the memory and compute on a single Postgres server can’t meet the needs of your application?

In this post, let’s walk through when you might want to scale out Postgres horizontally. Specifically, when to use Hyperscale (Citus), a built-in deployment option in our Azure Database for PostgreSQL managed service. But first: what exactly is Hyperscale (Citus)?

What is Hyperscale (Citus) in Azure Database for PostgreSQL?

Citus is an open source extension to Postgres that transforms Postgres into a distributed database.

Citus uses sharding and replication to distribute your Postgres tables and queries across multiple machines—parallelizing your workload and enabling you to use the memory, compute, and disk of a multi-machine database cluster.

Hyperscale (Citus) is the integration of the Citus extension with our managed Postgres service on Azure. When you go to provision an Azure Database for PostgreSQL server on the Azure portal, you’ll see Hyperscale (Citus) is one of the built-in deployment options available to you. (See Figure 1 below.) Under the covers, the Citus open source extension is at the core of Hyperscale (Citus).

In short: Hyperscale (Citus) = managed Postgres service on Azure + Citus

Why would you want to use Hyperscale (Citus) to scale out Postgres on Azure?

  • Performance: Because your single node Postgres is not performant enough and cann
[...]

Introduction

On November 17-20, 2020, PostgresConf.CN & PGconf.Asia2020 (referred to as 2020 PG Asia Conference) was held online for the very first time! This conference was jointly organized by the PG China Open Source Software Promotion Alliance, PostgresConf International Conference Organization, and PGConf.Asia Asian Community. This conference was broadcast exclusively via the Modb Technology Community platform in China with a record-high number of viewers streaming the conference events. With the great support from these PostgreSQL communities, the conference was held with great success, which brought together the Chinese PG power, major Asian PG contributors and many PostgreSQL experts worldwide to build the largest PG ecosystem in Asia.

About the Conference

Also known as the Asia’s largest open source relational database ecology conference!

PostgresConf.CN and PGConf.Asia, for the very first time, were hosted together as one conference online accompanied by additional offline sessions hosted at several reputable university campuses in China.

PostgresConf.CN is an annual conference held by the China PostgreSQL Association for PostgreSQL users and developers. It is also one of the conference series held by PostgresConf Organization. PostgreConf.CN 2019 took place in Beijing, it was very well attended by PostgreSQL users and community members across the globe.

PGCONF.Asia is also an annual PostgreSQL event that took place in Bali Indonesia in 2019, it was a continuation of the PGCONF.Asia event that took place in Tokyo, Japan in 2018. The first PGCONG.Asia conference took place in 2016 in Tokyo, this conference acts as a hub of PostgreSQL related development and technical discussion among PostgreSQL users and developers in the region as well as experts from around the globe.

Learn more about these conferences and the organizers from these resources:

Sponsors

This conference was sponsored b

[...]
Posted by Egor Rogov in Postgres Professional on 2020-12-04 at 00:00

After having discussed isolation problems and having made a digression regarding the low-level data structure, last time we explored row versions and observed how different operations changed tuple header fields.

Now we will look at how consistent data snapshots are obtained from tuples.

What is a data snapshot?

Data pages can physically contain several versions of the same row. But each transaction must see only one (or none) version of each row, so that all of them make up a consistent picture of the data (in the sense of ACID) as of a certain point in time.

Isolation in PosgreSQL is based on snapshots: each transaction works with its own data snapshot, which "contains" data that were committed before the moment the snapshot was created and does not "contain" data that were not committed by that moment yet. We've already seen that although the resulting isolation appears stricter than required by the standard, it still has anomalies.

Posted by Mark Wong in 2ndQuadrant on 2020-12-03 at 17:50
This post continues from my report on Random Numbers. I have begun working on a random data generator so I want to run some tests to see whether different random number generators actually impact the overall performance of a data generator. Let’s say we want to create random data for a table with 17 columns, […]
Posted by Hans-Juergen Schoenig in Cybertec on 2020-12-03 at 09:00

PostgreSQL offers a nice BLOB interface which is widely used. However, recently we came across problems faced by various customers, and it makes sense to reflect a bit and figure out how PostgreSQL handles BLOBs – and especially BLOB cleanup.

Using the PostgreSQL BLOB interface

In PostgreSQL, you can use various means to store binary data. The simplest form is definitely to make use of the “bytea” (= byte array) data type. In this case a binary field is basically seen as part of a row.
Here is how it works:


test=# CREATE TABLE t_image (id int, name text, image bytea);
CREATE TABLE
test=# \d t_image
Table "public.t_image"
Column | Type    | Collation | Nullable | Default
-------+---------+-----------+----------+---------
id     | integer |           |          |
name   | text    |           |          |
image  | bytea   |           |          |

As you can see, this is a normal column and it can be used just like a normal column. The only thing worth mentioning is the encoding one has to use on the SQL level. PostgreSQL uses a variable to configure this behavior:


test=# SHOW bytea_output;
bytea_output
--------------
hex
(1 row)

The bytea_output variable accepts two values: “hex” tells PostgreSQL to send the data in hex format. “escape” means that data has to be fed in as an octal string. There is not much the application has to worry about here, apart from the maximum size of 1 GB per field.
However, PostgreSQL has a second interface to handle binary data: The BLOB interface. Let me show an example of this powerful tool in action:


test=# SELECT lo_import('/etc/hosts');
lo_import
-----------
80343
(1 row)

In this case, the content of /etc/hosts has been imported into the database. Note that PostgreSQL has a copy of the data – it is not a link to the filesystem. What is noteworthy here is that the database will return the OID (object ID) of the new entry. To keep track of these OIDs, some developers do the following:


test=# CREATE TABLE t_file (
id int,
name text,
object_id o
[...]

Greetings readers, today we're going to take a semi-break from my “doing data science in SQL” series to cover a really cool use case I just solved with regular expressions (regex) in Postgres. For those of you who have a bad taste in your mouth from earlier run-ins with regexs, this will be more use case focused and I will do my best to explain the search patterns I used.

Posted by Tatsuo Ishii in SRA OSS, Inc. on 2020-12-02 at 05:50

Real time, fast update of materialized views

 In  this blog entry I have introduced the ongoing project: Incremental View Maintenance (IVM), which allows to real time, fast update of materialized views. Since the bolg, IVM now can handle outer joins and self joins. The out come of the project is in public in a form of set of patches against git master branch of PostgreSQL. Also a GitHub repository for this project is in public as well.

Docker image for IVM

So skilled PostgreSQL developers can test IVM today. However it's not that easy for people who do not regularly compile and build PostgreSQL and IVM from source code. Fortunately one of IVM developers "Yugo Nagata" comes up and starts to provide complete docker image of IVM. I am going to demonstrate how to use it step by step.

# Pull docker image  
docker pull yugonagata/postgresql-ivm 
 
 # Run docker container using port mapping 15432:5432 
docker run -d --name pgsql-ivm -e POSTGRES_PASSWORD=postgres -p 15432:5432 yugonagata/postgresql-ivm
 
 # Run psql to connect to PostgreSQL to create database named "test"
docker run -it --rm --link pgsql-ivm yugonagata/postgresql-ivm psql -h pgsql-ivm -U postgres 
# create database test; 
\q

# Run psql to run pgbench
docker run -it --rm --link pgsql-ivm yugonagata/postgresql-ivm pgbench -i -h pgsql-ivm -U postgres test

# Run psql to try IVM
docker run -it --rm --link pgsql-ivm yugonagata/postgresql-ivm psql -h pgsql-ivm -U postgres test

# Create a materialized view "mv1" for query "select count(*) from pgbench_accounts.

# Note that "pgbench_accounts" was created by pgbench
test=# create incremental materialized view mv1 as select count(*) from pgbench_accounts;
SELECT 1

# select the result from the view
test=# select * from mv1;
 count  
--------
 100000
(1 row)

 # Start a transaction

 test=# begin;
BEGIN

# delete 1 row from pgbench_accounts table
test=*# delete from pgbench_accounts where aid = 1;
DELETE 1

[...]
Posted by Brandur Leach on 2020-12-01 at 20:06

Big data has an unfortunate tendency to get messy. A few years in, a growing database that use to be small, lean, and well-designed, has better odds than not of becoming something large, bloated, and with best practices tossed aside and now considered unsalvageable.

There’s a few common reasons that this happens, some better than others:

  • Technological limitation: The underlying tech doesn’t support the scale. Say transactions or referential integrity across partitions in a sharded system.
  • Stability: Certain operations come to be considered too risky. e.g. Batch update operations that have unpredictable performance properties.
  • Cost/effort: Doing things the right way is too hard or too expensive. e.g. Back-migrating a large amount of existing data.
  • Convenience: Similar to the previous point, poor data practice is simply by far the easiest thing to do, and gets your immediate project shipped more quickly, even if it makes future projects more difficult.

The loss of these features is unfortunate because they’re the major reason we’re using sophisticated databases in the first place. In the most extreme cases, advanced databases end up as nothing more than glorified key/value stores, and the applications they power lose important foundations for reliability and correctness.

ACID transactions tend to be one of the first things to go, especially since the value they provide isn’t immediately obvious in a new system that’s not yet seeing a lot of traffic or trouble. Between that and the facts that they add some friction in writing code quickly, and can lead to locking problems in production mean that they’re often put in the chopping block early, especially when less experienced engineers are involved.

Losing transactions is bad news for an applications future operability, but as this subject’s already covered extensively elsewhere (including by me), I won’t go into depth here.

[...]

AWS announces Babelfish: open source Postgres with SQL Server compatibility

At AWS re:Invent’s first keynote, Andy Jassy has announced a major breakthrough for Postgres: SQL Server compatibility for Postgres. Open Source.

Andy Jassy announces Babelfish

Postgres is a fantastic database: reliable, trustable, featureful. But has Postgres made a significant dent into the market share of commercial databases? It has, but not at the level I would love to see it happen. Key to making Postgres more mainstream is to widen Postgres userbase. There are many ways to achieve this goal, including developing better tools and making it more accessible to non experts. Other means to achive this goal is what this announcement is doing today: adding compatibility with other databases, the commercial SQL Server database in this case.

From a technical perspective, this is a significant achievement. First of all, it adds the capability to bridge the SQL differences and features between SQL Server and Postgres. While SQL is a standard, almost all databases deviate from it adding their own capabilities. Modern SQL website by Markus Winand tracks in detail the different SQL features of most common relational databases, if you are interested. This compatibility layer is, thus, doable but not easy.

Second, SQL Server users frequently use a stored procedure langauge called T-SQL. It allows you to push “computation to the data”, and run data logic on the database side. It is so pervasive, that a compatibility capabily would not be complete without supporting T-SQL. Well, Babelfish comes with it, too.

But if it weren’t enough, SQL Server equivalent catalogs and also the SQL Server wire protocol (TDS) has been implemented. This means that you will be able to talk to Postgres Babelfish as if it were SQL Server: the same drivers, the same SQL, the same T-SQL and the same apps should work unmodified.

This is how Babelfish Postgres works:

Babelfish Postgres architecture

Source: babelfish-for-postgresql.github.io

I’m waiting to get a hold onto it. Compatibility will

[...]
Slow PostgreSQL WAL Archiving

In one of my previous blog posts, Why PostgreSQL WAL Archival is Slow, I tried to explain three of the major design limitations of PostgreSQL’s WAL archiver which is not so great for a database with high WAL generation. In this post, I want to discuss how pgBackRest is addressing one of the problems (cause number two in the previous post) using its Asynchronous WAL archiving feature.

Let me explain the problem in a bit more detail here.

The PostgreSQL archiver process has an internal function

pgarch_archiveXlog()
  which calls the
system()
  system call which executes the
archive_command
  which will be a shell command/script.

So the execution is one-after-another without any parallelism or batching. Each execution starts up a separate process.

Let’s look at a simple WAL archive using Unix/Linux

cp
  command:
postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AC
postgres 2954 2639 1 09:37 ? 00:00:00 cp -f pg_wal/0000000700000000000000AC /home/postgres/archived/0000000700000000000000AC

Archiver process (PID 2639) started a

cp
process (PID 2954).  Then it starts another process for the next WAL segment (PID 2957).
postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AD
postgres 2957 2639 0 09:37 ? 00:00:00 cp -f pg_wal/0000000700000000000000AD /home/postgres/archived/0000000700000000000000AD

Then another one for the next WAL segment:

postgres 2639 1551 0 08:49 ? 00:00:00 postgres: stampede: archiver archiving 0000000700000000000000AE
postgres 2960 2639 0 09:38 ? 00:00:00 cp -f pg_wal/0000000700000000000000AE /home/postgres/archived/0000000700000000000000AE

We can argue that starting up a

cp
command can be light.

But what if we need to use

scp
, where the ssh key exchanges and authentication negotiations need to happen? The startup time can easily go higher than the time for transferring a 16MB file. And what if we want to take the WAL to Cloud storage like Amazon S3 or Azure Storage? The start[...]
Posted by Edco Wallet on 2020-12-01 at 10:08
PostgreSQL 13 has been released over a month ago. The most important changes and new features have already been announced in various ways and (online) stages, mostly by well-known contributors or influential Postgres community players. In this blog we take a look at those new features, changes and handy facts about PostgreSQL 13. We give an overview of interesting articles about the important major but also perhaps neglected non-major changes.
Posted by Lætitia AVROT on 2020-12-01 at 06:27
As a consultant, I frequently need to install Postgres for customers. Of course I send them a list or prerequisites beforehand but, I don’t know why, there’s always a problem with one of them and, most often, the problem concerns port opening. Tools? Who needs tools ? Sysadmins will recommend you to use either nmap or netcat or netstat and you might need to combine those tools (which syntax you will keep forgetting) with grep to find if your tool is listed.
Posted by Andreas 'ads' Scherbaum on 2020-11-30 at 14:00
PostgreSQL Person of the Week Interview with Stéphane Schildknecht: I live in a small town in the East of France. After graduating in Computational Chemistry, I jumped into consulting at the beginning of the century. I was already convinced by OpenSource. I had the opportunity to discover MySQL and PostgreSQL at that time. I worked on a big PostgreSQL project, and understood PostgreSQL will be a big game changer in the database industry.
 In my previous blog post, I have introduced how to monitor Pgpool-II and PostgreSQL cluster using Pgpool-II Exporter. As I mentioned in the previous post, Pgpool-II Exporter exposes metrics that Prometheus can collect. Prometheus is one of the most popular monitoring tools used on Kubernetes. In this post, I would like to introduce how to deploy Pgpool-II and Pgpool-II Exporter on Kubernetes.

Architecture

A PostgreSQL operator is required to manage PostgreSQL clusters on Kubernetes. Therefore, you need to combine Pgpool-II with a PostgreSQL operator. Below is the architecture:

Prerequisites

Before you start deploying Pgpool-II, please check the following prerequisites:

  • Make sure you have a Kubernetes cluster, and kubectl is installed.
  • PostgreSQL Operator and a PostgreSQL cluster are installed. In the blog, I used the PostgreSQL Operator Crunchy PostgreSQL Operator to create a PostgreSQL cluster.

Deploy Pgpool-II and Pgpool-II Exporter

We deploy the Pgpool-II pod that contains a Pgpool-II container and a Pgpool-II Exporter container. You need to specify the Docker images (see more information, Pgpool-II Dockerfile and Pgpool-II Exporter Dockerfile).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pgpool
spec:
  replicas: 1
  selector:
    matchLabels:
      app: pgpool
  template:
    metadata:
      labels:
        app: pgpool
    spec:
      containers:
      - name: pgpool
        image: pgpool/pgpool:4.2
        ...
      - name: pgpool-stats
        image: pgpool/pgpool2_exporter:1.0
        ...

Configure Pgpool-II

Pgpool-II's health check, automatic failover, Watchdog and online recovery features aren't required on Kubernetes. You need to only enable load balancing and connection pooling

Here, we configure Pgpool-II using environment variables. If you are using a production environment, we recommend using a ConfigMap to configure Pgpool-II's config files, i.e. pgpool.conf, pcp.conf, pool_passwd and

[...]
Posted by Ryan Lambert on 2020-11-29 at 05:01

A few months ago I started experimenting with a few project ideas involving data over space and time. Naturally, I want to use Postgres and PostGIS as the main workhorse for these projects, the challenge was working out exactly how to pull it all together. After a couple false starts I had to put those projects aside for other priorities. In my free time I have continued working through some related reading on the topic. I found why you should be using PostGIS trajectories by Anita Graser and recommend reading that before continuing with this post. In fact, read Evaluating Spatio-temporal Data Models for Trajectories in PostGIS Databases while you're at it! There is great information in those resources with more links to other resources.

This post outlines examples of how to use these new PostGIS trajectory tricks with OpenStreetMap data I already have available (load and prepare ). Often, trajectory examples assume using data collected from our new age of IoT sensors sending GPS points and timestamps. This example approaches trajectories from a data modeling perpective instead, showing how to synthesize trajectory data using pgrouting. Visualization of data is a critical component of sharing information, QGIS has long been my favorite GIS application to use with PostGIS data.

What is your first association with the concept of B-tree? Mine is 'old and well researched, or in other words boring'. And indeed apparently it was first introduced in 1970! Not only that, already in 1979 they were ubiquitous. Does it mean there is nothing exciting left any more? It turns out that there are multitude of interesting ideas and techniques around B-Trees. They're all coming from desire to cover different (often incompatible) needs, as well as adapt to emerging hardware. In this blog post I'll try to show this, and we will be concerned mostly with B-tree as a data structure.
Posted by Lætitia AVROT on 2020-11-27 at 16:27
I was looking for something to add to my Christmas wish list (yes, my parents are retired and they asked for this wish list for the last 3 months) and I remembered I had seen a blog post a few years ago where you could print your own code on clothes. I had a good memory of a particularly nice dress (and I found out later it even had pockets! Yay!
Posted by Dimitri Fontaine in CitusData on 2020-11-27 at 12:10
Among a lot of other changes, the year 2020 brings Online Conferences to us. In the Postgres community too we now record our talks at home and send a video file to be playedto a virtual audience, and sometimes shared later in a platform online. So this yeah I did participate in Postgres Vision 2020 where I did deliver a talk about The Art of PostgreSQL. This a talk all about the book that I have written and self-publish at The Art of PostgreSQL: learn how to turn thousands of lines of code into simple SQL queries.
Posted by Michał Mackiewicz on 2020-11-26 at 11:13

Debezium is a popular Open Source change data capture (CDC) solution. It’s based on a set of well-established components (Kafka, Kafka Connect, ZooKeeper) and used successfully in production by companies like Dehlivery or BlaBlaCar. However, the setup is not as straightforward as one may think, and you may encounter some obstacles. In this post, I will share some tips originating from hooking up Debezium to a quite big and busy OLTP Postgres database.

Originally, Debezium was connected to a Postgres DB using a wal2json or decoderbufs plugins. Don’t use them – as native logical replication became available in Postgres 10, a better option called “pgoutput” was introduced in Debezium. From the database perspective, the Debezium connector will be treated just like another logical replication subscriber, and usual rules of logical replication will apply.

Check wal_level

Logical decoding – on which logical replication and Debezium depend on – requires wal_level configuration parameter to be set to “logical”, and its change requires Postgres restart. Check this beforehand and schedule a database restart if necessary.

Pick right Debezium version

NEVER, EVER use Debezium versions between 2.2 and 3.0. Those versions come with a bug which is hard to understand for anoyne who isn’t a Postgres expert, but its consequences are simple and dangerous : using such buggy Debezium version will take the master database down sooner or later. This is because Debezium will consume data changes, but won’t confirm the consumption to the database server. The server will retain all WAL segments since replication slot creation, and you will eventually run out of disk space (or money, if you use some pay-as-you-go storage like Amazon EFS or keep adding new disks).

Create slot and publication manually

While Debezium is capable of creating required database entities (logical replication slot and publication) itself, it’s a bad idea. First – it requires superuser privileges for Debezium user. Second – it will creat

[...]

My previous blog on “PostgreSQL High Availability: Considerations and Candidates” mostly talked about defining an HA considerations for PostgreSQL, RPO and RTO and briefly touched on some of the open source solutions available. Granted that I didn’t list them all and people shared some additional ones that I may end up reviewing as well. In case you missed my previous blog, here is the link: https://www.highgo.ca/2020/08/10/postgresql-high-availability-the-considerations-and-candidates/

I think it’ll be unfair to assess the HA solutions available before laying out all core requirements and considerations. So in this blog, without discussing any particular solution, I am going to mainly discuss setup, indirection and strategies for minimizing downtime.

Quorum and Avoiding Split Brain

An HA system is designed to not have any single points of failure. However, in a single primary PostgreSQL cluster, the primary node becomes just that. So what happens when it fails. Another node previously acting as a standby (replica) must be promoted to the primary role. However, it’s not that straight forward.

Quorum helps answer the following questions and enables us to strategies an acceptable solution. Quorum is essentially a voting system where the majority vote wins. This comes into play when there is a failure related to the primary node. We must know:

  • if the primary node really failed?
  • which standby to promote?
  • what to do if the old primary node has rejoined the cluster?

Quorum Size

An even number of nodes in a cluster could potentially end up with two equal sized cluster partitions. To understand this, let’s consider a two node cluster and let’s assume that the replication between the primary and secondary has failed. This begs the question which node has really failed? Is the primary or the standby?

Therefore, it is a good practice to have an odd number of nodes in a cluster and no less than three. Whether the third node is part of the PostgreSQL cluster or simply a witness

[...]

In the last two blog posts on data science in Postgres, we got our data ready for regression analysis and had predictive variables that are on wildly different scales. Another example of data on different scales would be annual income versus age. The former is usually at least tens of thousands while age rarely gets to a hundred. 

If you do the regression with non-transformed variables, it becomes hard to compare the effect of the different variables. Statisticians account for this by converting raw data values into a Z-score for each explanatory variable. Other names for this technique are standardizing your data or centering and standardizing. As Wikipedia explains, the formula for a z-score is:

This is my third blog about Stored Procedure support in PostgreSQL, the previous two blogs are reachable from the HighGo CA blogs site https://www.highgo.ca/author/ahsan-h/. The first blog was introduction and usage of Stored Procedures and its difference with Stored Functions, the second blog focussed on creating and using Procedures with Definer and Invoker rights. The purpose of this blog is to demonstrate how to execute a Stored procedure from Java. Java is an important platform for developing enterprise level applications and it is really important to have the capability to call Stored procedure from Java.

Lets start with short intro of Stored Procedures :

“Stored procedure or a Stored function is a module or a program that is stored in the database, it extends the database functionality by creating and reusing user defined programs in supported SQL/PL languages. They can be created in multiple languages (details to follow) and once compiled they become a schema object which can be executed or referenced by multiple applications.  The stored procedures/functions are very useful component of a database application as they underpin the complex application logic and database operations that needs to executed and reused multiple times by the database application. Without this feature it would become very complex to carry out database operations that need to repeated, it will be done using several complex SQL queries with round trips in a single function within the database.” 

Setup and Configuration

In order to call the stored procedure from Java, you need to have the JAVA development environment setup on your machine. There are plenty of blogs that will show you how to do that, I will just list down the key steps that need to be followed for creating the environment.

Start with installing the JDK on your machine, you can do that by getting the installer from the official site or install it using the package manager. I was doing this on Mac so I did the installation with brew using this co

[...]

A few years ago, I wrote a short post on a similar topic; since then, I’ve often seen that the whole concept of suggesting to do more INSERT-s in critical parts of the code seems pretty strange to most customers. It’s even alien to those who are not new to databases in general. So I thought that the technique could use another go-round since I think it’s quite a nice design trick that can alleviate some certain problems when used correctly.

Insert-only data modelling

So what is “insert-only data modelling” about? Basically, it’s very simple – it’s about reducing the amount of UPDATE-s and doing more INSERT-s instead 🙂 Of course, there’s a limit somewhere – you cannot endlessly generate new data if your objects are mutating a lot, so the whole concept is not really well-suited for most use-cases – BUT, it can very well help to avoid specific “peaks” with PostgreSQL if applied correctly… since in the current implementation, the UPDATE operation is a bit of an expensive one, and basically a DELETE + INSERT is happening behind the scenes. Not to forget about the much better HOT-updates though, but they’re out of reach when we, for example, change some indexed columns.

A problematic customer use case

The example at hand was a problematic application — basically, a kind of workflow engine. It had objects going through a pipeline of sorts over a week or so, changing statuses along the way — and every status change was recorded in the database. It’s a pretty typical situation.

The transaction volumes were not too high, a couple of hundred transactions per second on average — which is not a lot at all nowadays. However, given the customer’s modest hardware, it was a problem. They had already tried to upgrade the hardware, but the problem persisted due to a special nuance; their load problems only appeared for an hour or so. Most of the time, it was very quiet… so they found it difficult to justify the costs of scaling-up hardware and wanted to tune Postgres or the schema instead.

Some modest tuni

[...]
The question was asked relatively recently on irc. And it proved to be non-trivial. Surely, if you want to have one row per category (one address per user), it's trivial – add user_id column to addresses, make it unique, and we're done. But what if we want to allow five addresses? Or five thousands? Let's … Continue reading "How to limit rows to at most N per category"

At most of the places I've worked, the primary language used was not what I gravitated to naturally. If you're going to ask for a language of choice personally, it's python. I appreciate the explicit nature, that it's often pseudocode that can execute and it has a rich ecosystem of libraries (though that’s most languages these days). But as much as anything I latched onto Django in its early days. If you're not familiar with Django it's a popular framework for web apps written in Python. What I loved about it in particular was one of its taglines early on: batteries included.

For Django batteries included meant it had:

  • Built-in ORM
  • Authentication
  • Admin tooling
  • Template engine
  • Middleware layer
  • and more

Just recently I was on the changelog Podcast talking about "What's so exciting about Postgres?" I talked for a while and probably could have kept going, a lot of these features are top of mind in what we’re building with Crunchy Bridge. In reflecting on things a bit I can see a big parallel personally to what I appreciated in Django in Postgres. To me Postgres is the batteries included database. So what's in the PostgreSQL box:

Posted by Andreas 'ads' Scherbaum on 2020-11-23 at 14:00
PostgreSQL Person of the Week Interview with Pavel Stehule: I am from the Czech Republic, and live in my home village in Central Czech. I am an independent developer, consultant and lecturer. My work is 100% related to Postgres - training, solving customer performance issues, writing patches, bug fixing.

Stored procedures are widely used in commercial relational databases. You write most of your application logic in PL/SQL and achieve notable performance gains by pushing this logic into the database. As a result, customers who are looking to migrate from other databases to PostgreSQL usually make heavy use of stored procedures.

When migrating from a large database, using the Citus extension to distribute your database can be an attractive option, because you will always have enough hardware capacity to power your workload. The Hyperscale (Citus) option in Azure Database for PostgreSQL makes it easy to get a managed Citus cluster in minutes.

In the past, customers who migrated stored procedures to Citus often reported poor performance because each statement in the procedure involved an extra network round trip between the Citus coordinator node and the worker nodes. We also observed this ourselves when we evaluated Citus performance using the TPC-C-based workload in HammerDB (TPROC-C), which is implemented using stored procedures.

The good news is that this all changed in Citus 9.0, which introduced a feature called “distributed functions.” Distributed functions can take away most of the network overhead in stored procedures for an important class of workloads—and we have further improved distributed Postgres functions in subsequent Citus releases. In HammerDB, these changes added up to give an order of magnitude speed up!

This blog post will show you how we achieved such a dramatic performance improvement in Citus—and why stored procedures are now a powerful technique for scaling transactional workloads on Postgres, with Citus.

HammerDB performance of Citus with and without Postgres stored procedure delegation
Figure 1: Performance of a Hyperscale (Citus) cluster on Azure Database for PostgreSQL as measured by the HammerDB TPROC-C benchmark. All these HammerDB benchmark results were run on Citus 9.4 using either a) regular Postgres tables & regular Postgres functions—only on coordinator; b) distributed tables & regular functions—faster by using worker nodes, but si
[...]

Summarizing data against a fixed grid is a common way of preparing data for analysis. Fixed grids have some advantages over natural and administrative boundaries:

pgBackRest is a well-known powerful backup and restore tool. It offers a lot of possibilities.

In this post, we’ll see how to setup a dedicated repository host to backup a PostgreSQL 3-nodes cluster.


The repository host will be called backup-srv and the 3 PostgreSQL nodes in Streaming Replication: pg1-srv, pg2-srv, pg3-srv. All the nodes will be running on CentOS 7.

If you’re familiar with Vagrant, here’s a simple Vagrantfile to initiate 4 virtual machines using those names:

# Vagrantfile
Vagrant.configure(2) do |config|
    config.vm.box = 'centos/7'
    config.vm.provider 'libvirt' do |lv|
        lv.cpus = 1
        lv.memory = 1024
    end

    nodes  = 'backup-srv', 'pg1-srv', 'pg2-srv', 'pg3-srv'
    nodes.each do |node|
        config.vm.define node do |conf|
            conf.vm.hostname = node
        end
    end
end

Installation

On all the servers, first configure the PGDG yum repositories:

$ sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/\
pgdg-redhat-repo-latest.noarch.rpm

Then, install PostgreSQL on pg1-srv, pg2-srv, pg3-srv:

$ sudo yum install -y postgresql13-server postgresql13-contrib

The pgBackRest package will require some dependencies located in the EPEL repository. If not already done on your system, add it:

$ sudo yum install -y epel-release

Install pgBackRest and check its version:

$ sudo yum install -y pgbackrest
$ sudo -u postgres pgbackrest version
pgBackRest 2.30

Finally, create a basic PostgreSQL instance on pg1-srv:

$ export PGSETUP_INITDB_OPTIONS="--data-checksums"
$ /usr/pgsql-13/bin/postgresql-13-setup initdb
$ sudo systemctl enable postgresql-13
$ sudo systemctl start postgresql-13

Create a dedicated user on the repository host

The pgbackrest user is created to own the backups and archives repository. Any user can own the repository but it is best not to use postgres to avoid confusion.

Create the user and the repository location on backup-srv:

$ sudo groupa
[...]
Posted by Greg Smith in Crunchy Data on 2020-11-20 at 17:23

This week Apple started delivering Macs using their own Apple Silicon chips, starting with a Mac SOC named the M1. M1 uses the ARM instruction set and claims some amazing acceleration for media workloads. I wanted to know how it would do running PostgreSQL, an app that's been running on various ARM systems for years. The results are great!