PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past two months
Top teams
Number of posts in the past two months
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Kat Batuigas in Crunchy Data on 2020-09-18 at 16:30

PL/Python can be a convenient and powerful way to take advantage of your PostgreSQL database. In an earlier post, I talked a little bit about how you can use modules included in the Python standard library with PL/Python functions. In this post, we'll try running NumPy in a simple user-defined function which also takes advantage of PL/Python database access functions. The function will show a working example of how to easily convert a data table in Postgres to a NumPy array.

I recently wrote a presentation, Postgres and the Artificial Intelligence Landscape, which covers the basics of artificial intelligence and shows how Postgres can be used for this purpose. This week I presented it at the Chicago PostgreSQL Meetup Group so I am now publishing the slides.

I released new almost bug-fix release 3.1.4 of pspg. There is new light theme Flatwhite based on Dmitry Biletskyy flatwhite theme.



I recently gave a talk about the Citus extension to Postgres at the Warsaw PostgreSQL Users Group. Unfortunately, I did not get to go in person to beautiful Warsaw, but it was still a nice way to interact with the global Postgres community and talk about what Citus is, how it works, and what it can do for you.

If you are already familiar with Postgres then this talk should be a good introduction to all the powerful capabilities that Citus gives you. The tl;dr is this: Citus is an open source extension to Postgres that transforms Postgres into a distributed database. Citus uses sharding and replication to distribute your data and your Postgres queries across a distributed database cluster.

Shining a light on the performance speedups of Citus (via demo)

Every so often, I try to rethink how I talk about Citus, especially as Postgres evolves and the needs of applications change, too. One thing we have not done much is talk directly about the performance improvements in Citus. Sometimes it’s actually slower, but at scale Citus can be *a lot* faster. Therefore, I introduced every Citus feature with some benchmarks that show the performance compared to a (large) Postgres server.

The talk is also worth watching for the demo (the demo starts at 46:52) where I compare the performance of Hyperscale (Citus) on Azure Database for PostgreSQL against a single Postgres server. For the demo, I use GitHub archive data in an analytics use case, and the demo shows >250x speedups for analytical queries with Citus!

Video of my talk at Warsaw PostgreSQL Users Group, on Citus: PostgreSQL at any Scale. Demo starts at 46:52, but the introductory discussion should be useful, too.

Props to the organizers of the Warsaw PostgreSQL Users Group—especially Alicja Kucharczyk—for the time they spend organizing Postgres talks for their community. And for inviting me to give a talk to their Postgres users group. I really appreciated all the good questions, too.

If this demo is your first intro to Citus & you

[...]
Posted by Bruce Momjian in EDB on 2020-09-16 at 14:00

Postgres has made great strides in adding features to match proprietary databases, and it has many complex features that other databases don't have. However, that doesn't mean it is the best fit for every organization. There are still reasons not to use Postgres:

  • Skills in another relational database, and no desire or value to learn Postgres
  • Custom applications written for another database that you don't want to modify to work with Postgres
  • Using externally-developed applications, tools, or frameworks that don't support Postgres
  • Storage of non-transactional or cache data where Postgres's overhead is significant
  • Multi-host workloads with simple queries where NoSQL is a win
  • Small, single-user systems, where SQLite is best

This email thread had lot of discussion on the topic. What is interesting is that decades-old complaints about missing features, reliability, and performance are no longer mentioned.

Posted by Álvaro Herrera in 2ndQuadrant on 2020-09-16 at 09:00
A simple description of the new WITH TIES standard feature in PostgreSQL 13.

HOT updates are for free!
© Laurenz Albe 2020

HOT updates are not a new feature. They were introduced by commit 282d2a03dd in 2007 and first appeared in PostgreSQL 8.3.

But since HOT is not covered by the PostgreSQL documentation (although there is a README.HOT in the source tree), it is not as widely known as it should be: Hence this article that explains the concept, shows HOT in action and gives tuning advice.

What is HOT?

HOT is an acronym for “Heap Only Tuple” (and that made a better acronym than Overflow Update CHaining). It is a feature that overcomes some of the inefficiencies of how PostgreSQL handles UPDATEs.

Problems with PostgreSQL’s implementation of UPDATE

PostgreSQL implements multiversioning by keeping the old version of the table row in the table – an UPDATE adds a new row version (“tuple”) of the row and marks the old version as invalid.room
In many respects, an UPDATE in PostgreSQL is not much different from a DELETE followed by an INSERT.

 

normal update

 

This has a lot of advantages:

  • no need for an extra storage area where old row versions are kept
  • ROLLBACK does not have to undo anything and is very fast
  • no overflow problem with transactions that modify many rows

But it also has some disadvantages:

  • old, obsolete (“dead”) tuples have to be removed from the table eventually (VACUUM)
  • heavily updated tables can become “bloated” with dead tuples
  • every update requires new index entries to be added, even if no indexed attribute is modified, and modifying an index is much more expensive than modifying the table (order has to be maintained)

Essentially, UPDATE-heavy workloads are challenging for PostgreSQL. This is the area where HOT updates help.

An UPDATE example

Let’s create a simple table with 235 rows:

CREATE TABLE mytable (
   id  integer PRIMARY KEY,
   val integer NOT NULL
) WITH (autovacuum_enabled = off);

INSERT INTO mytable
SELECT *, 0
FROM generate_series(1, 235) AS n;

This table is slightly more than one 8KB block lon

[...]
Sometimes, PostgreSQL databases need to import large quantities of data in a single or a minimal number of steps. This process can be sometimes unacceptably slow. In this article, we will cover some best practice tips for bulk importing data into PostgreSQL databases.
Posted by Jonathan Katz in Crunchy Data on 2020-09-15 at 08:40

PostgreSQL 13 is a lucky release! There are some very impactful new features and the release is coming out at its regularly scheduled time in the fall.

One of my favorite parts around the PostgreSQL release process is when I can take a step back, review and discuss the features going into a new release, and learn how they can make a positive impact on our users. For example, in a past hallway track at PGCon, I learned how PostgreSQL 12 introduced the ability to do a form of multi-factor authentication!

Each release has a lot of these "hidden gems" -- features that may not jump off the page, but can have a big impact when you actually need them. Postgres 13 is no exception: some of these features make it easier to write queries, add additional layers of security, or help you to avoid downtime.

So what are the hidden gems of PostgreSQL 13?

PostgreSQL will create some catalog and other caches for each connection request to store some queried data locally, which can speed up query efficiency. If a process undergoes a DDL operation that causes catalog to change, other running processes need to be notified that some cache data is invalid. We call this notification mechanism the SI (Share Invalid) mechanism.

I have made use of this feature in projects that I have worked before but never studied it in depth to understand it completely or appreciate its power. However I got a chance recently to experiment with this on a project and was able to learn more about it. This promoted to write this blog so I can share my knowledge on this explain the affectiveness of this feature.

1. Brief description of PostgreSQL cache

PostgreSQL’s cache is mainly the catalog cache. You can see the enum structure SysCacheIdentifier of the catalog cache in syscache.h. The catalog caches is mainly used to cache the contents of catalog, and there are some non-system caches too, these caches can be seen in the src/backend/utils/cache directory of the code.

describe_cache_in_postgres

As shown in the figure above, there is a simple query on the t1 table. Before the query, the backend first needs to obtain what t1 is, such as whether there is a t1 relationship, what attributes does this relationship have, and so on. Of course, this information is stored in each catalog. When backend obtains catalog information, it generally does not directly scan the hard disk data of the catalog, but tries to obtain data from the relevant cache. If the data that backend wants does not exist in the cache , the data will be scanned from the hard disk and cached in the cache.

2. Implementation of cache consistency

Each backend of PostgreSQL caches some catalog information, which creates the problem of cache consistency. Suppose there are two backends A and B who cached ‘t1’ tuple in RELNAMENSP, and then A has made some changes to the structure of t1, then B needs to be notified to invalidate this

[...]
Posted by Bruce Momjian in EDB on 2020-09-14 at 14:45

Database applications are initially written using the simplest queries possible. During testing and in production, some application tasks might have unacceptable performance. This is where re-architecturing happens, and where simple queries and data schema layouts can get complex. They might get complex because it is required to accomplish the task, or it might be because of limitations in how the database software handles certain queries. Database and tooling upgrades can require further complex additions.

When switching to a new database like Postgres, all that complexity comes along for the ride. Sometimes complexity added to work around deficiencies in other databases work fine in Postgres, but often that complexity has to be removed to get good performance in Postgres. There also might be cases where complexity has to be added to get good performance in Postgres.

The bottom line is that complexity is bad for applications, so add complexity only when necessary. Wise application developers remove unnecessary complexity periodically, but it can be hard to know if database upgrades have made some complexity unnecessary. Porting to a new database is an ideal time to reevaluate if application simplification is possible.

Posted by Andreas 'ads' Scherbaum on 2020-09-14 at 14:00
PostgreSQL Person of the Week Interview with Anthony Nowocien: Hi, I’m Anthony Nowocien. My name comes from my Polish side and some of my looks from my Trinidadian side. I live and work in Paris as a PostgreSQL DBA.
Posted by Pavel Stehule on 2020-09-13 at 05:16

I tested Fedora 33 on my old Lenovo T510 (enhanced 16GB RAM). It looks pretty well. The interactive processes are quick under high load (what was some time problem in older releases).  It is very nice and comfortable.

At the core of the “10 Things I Hate About PostgreSQL” blog post, sits one particular observation that seems like a simple annoyance at first glance. MVCC is presented merely as item 4 in the list, but like XIDs, it serves as a star upon which an entire constellation of related stellar objects converge. While […]
Posted by Bruce Momjian in EDB on 2020-09-11 at 15:15

I recently wrote a presentation, Postgres in the Cloud: The Hard Way, which shows how to create a cloud instance, and install and run Postgres, purely from the command line. This helps show how all the pieces fit together. I recently presented this at pgDay Israel so I am now publishing the slides.

My introduction to databases  and PostgreSQL was for web application development and statistical analysis. I learned just enough SQL to get the queries to return the right answers. Because of my work with PostGIS (and FOSS4G) I became friends with Paul Ramsey. We are now co-workers at Crunchy Data and he is helping me up my SQL-fu. One of the first lessons he taught me was "Try to use joins rather than subqueries."

Today's post is going to work through this  advice, as Paul and I work through some SQL.

Posted by Ibrar Ahmed in Percona on 2020-09-10 at 17:39
index improvements postgresql 13

index improvements postgresql 13Indexes are one of the core features of all the database management systems (DBMS). Indexes have a very long history in PostgreSQL, which has quite a rich set of index features. PostgreSQL has B-Tree, Hash,  GIN, GIST, and BRIN indexes. And because the development around indexes is still going on, PostgreSQL 13 provides some enhancements.

We can divide the overall improvements into two categories:

  1. Changes which are transparent to the user. There won’t be any visible changes, but they will get the benefit automatically after the upgrade, probably after a rebuild of the index.  No application change required.
  2. The second set of improvements require the user to explicitly use the new feature. 

Both types of improvements have been introduced in PostgreSQL Version 13. Sometimes it is difficult to extract the information from the release notes and comprehend for an end-user or even convince an end-user, so this blog contains examples of improvements done in PostgreSQL 13.

1. Deduplication of B-Tree Index [1]

Non-unique/primary B-Tree indexes generally contain a lot of duplicate values. The storage of duplicates plays a vital role, especially in B-Tree, which allows aggregate operations like COUNT or GROUP BY to use these indexes. However, a smart way of packing can eliminate storing the actual duplicate value by just maintaining the reference.

The process of deduplication eliminates the redundant/duplicate values from the tree. This “Deduplication” process reduces the storage requirement of the index, as each key will be stored only once to save space.

postgres=# CREATE TABLE foo(id INTEGER, name TEXT);

postgres=# INSERT INTO foo VALUES(generate_series(1, 1000000), 'value');
INSERT 0 1000000

postgres=# select pg_size_pretty(pg_relation_size('foo'));
 pg_size_pretty 
----------------
 42 MB
(1 row)

postgres=# CREATE INDEX idx ON foo (name);
postgres=# SELECT pg_size_pretty(pg_relation_size('idx'));
 pg_size_pretty 
----------------
 21 MB
(1 row)

 

postgres=# CREATE T
[...]
Posted by Magnus Hagander in Redpill Linpro on 2020-09-10 at 10:53

I'm sure everybody is aware you can have PostgreSQL fetch data live across the internet (or locally of course) as part of your queries and use the data directly. In fact there are a large number of drivers available to access different kinds of data for download. But in the simplest case, we can also just use the file_fdw wrapper that's included in the standard PostgreSQL packages, together with everybody's http-swiss-army-knife, curl.

In attempting to adapt this post for the public, what more time-relevant dataset to work off in these pandemic-affected times than the open data provided by the ECDC, being both open data and very current. In particular for this example, they provide public datasets with COVID numbers from across the world (actual public data, requiring no registration to read, and actual data, not just a limited API).

So, let's see how we can access this data from PostgreSQL:

CREATE EXTENSION file_fdw;

CREATE SERVER curly FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE _rawdata (
 daterep text not null,
 day int not null,
 month int not null,
 year int not null,
 cases int not null,
 deaths int not null,
 countries text not null,
 geoid text not null,
 countrycode text null,
 popdata int null,
 continent text not null,
 cumulative14days float null
)
SERVER curly
OPTIONS (
 PROGRAM 'curl -s https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/',
 FORMAT 'csv',
 HEADER 'on'
);

And to use this we can simply query the foreign table, which will then trigger a http GET of the data:

covid=# SELECT count(*) FROM _rawdata;
 count 
-------
 41837
(1 row)

However, there is an unfortunate interaction with LIMIT. So if we for example try to get just the first 10 rows:

covid=# SELECT * FROM _rawdata LIMIT 10;
ERROR:  program "curl -s https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/" failed
DETAIL:  child process exited with exit code 23
If you are enjoying working with PostgreSQL declarative partitioning, you might be wondering how to check which partition contains a specific record. While it is quite obvious in the cases of list or range partitioning, it is a bit trickier with hash partitioning. Don’t worry. Here you can find a quick way to determine which […]
Posted by Jobin Augustine in Percona on 2020-09-09 at 13:31
PostgreSQL WAL Archival is Slow

PostgreSQL WAL Archival is SlowIt is very common to see many customer cases where a sudden increase in disk space usage is caused by a lot of WAL segments filling up the WAL directory (pg_wal). Panicking customers generally ask “Why isn’t PostgreSQL deleting them?”. The most common reasons we used to find were:

  1. Failing WAL archival
  2. A slot holding old WAL

However, a different type of case started appearing in recent years, which is in the title of this post. Obviously, “slow” is subjective and mostly users refer to “slow” compared to the speed at which the WAL segment generation is happening. The recent increase in such cases is mainly triggered by an increase in processing power per host server, ever-increasing the scalability of PostgreSQL (eg, recent advancements in partitioning features, bulk data loading improvements, etc.), and faster, new-generation storage.  Basically, more work is getting done per server, and as a consequence, a huge amount of WAL generation is also becoming the new normal.  The reason why WAL compression is becoming a pressing need is also not different.

How backup solutions like WAL-G and pgBackRest solve this with built-in compression features is also discussed in that blog post.

Meanwhile, remote cloud storage is becoming a more attractive choice for storing archive WAL due to the price advantage over costly backup appliances and time-tested reliability. Moreover, users/organizations are becoming increasingly familiar with cloud storage, and this increase in comfort level is the major driving force in the decision-making towards cloud storage for backups.

But this rapid generation of WAL segments + slow/remote location of storage as the archive location is a deadly combination for the overall WAL archival process. Unless monitored and handled properly, it can lead to disaster.

In this blog, we are going to look at a bit of internals on how the archiver process works and how it deals with the external shell command specified in

archive_command
  in a synchronous fashion.[...]

SQL and PostgreSQL are perfect tools to analyze data. However, they can also be used to create sample data which has to possess certain statistical properties. One thing many people need quite often is a normal distribution. The main question therefore is: How can one create this kind of sample data?

Tablefunc: Creating normal distributions

The first thing you have to do is to enable the tablefunc extension, which is actually quite simple to do:

test=# CREATE EXTENSION tablefunc;
CREATE EXTENSION

“tablefunc” is there by default if “postgresql-contrib” has been installed. Once the module has been enabled the desired functions will already be there:

test=# \df *normal_rand*
                                         List of functions
 Schema |    Name     |    Result data type    |             Argument data types             | Type 
--------+-------------+------------------------+---------------------------------------------+------
 public | normal_rand | SETOF double precision | integer, double precision, double precision | func
(1 row)

 

The normal_rand function takes 3 parameters:

  • Number of rows to be generated
  • Average value
  • Desired standard deviation

If you want to run the function, you can simply put it into the FROM-clause and pass the desired
parameters to the function:

test=# SELECT row_number() OVER () AS id, x 
        FROM normal_rand(10, 5, 1) AS x 
        ORDER BY 1;
 id |         x          
----+--------------------
  1 |  4.332941804386073
  2 |  4.905662881624426
  3 |  3.661038976418651
  4 | 6.0087510163144415
  5 |  4.934066454147052
  6 |  5.909371874123449
  7 |  5.016528121699469
  8 |  4.640932937572484
  9 |  7.695984939477616
 10 |  5.647677569953539
(10 rows)

In this case 10 rows were created. The average value is 5 and the standard deviation has been set to 1. At first glance the data looks ok.

Testing your sample data

Let us test the function and see if it really does what it promises. 10 rows won’t be enough for that so I d

[...]

tl;dr -- If you are not actively monitoring your PostgreSQL clusters, and alerting on anomalous conditions, you are "flying blind". Don't do that!

At Crunchy Data, we distribute and support PostgreSQL as a product, in multiple forms, to many enterprise customers. We also make our products available via GitHub, our developer portal, and other places.

In particular, our team is well known for our Container Suite and Kubernetes Postgres Operator. Also pertinent to this blog, if less well known, is our pgMonitor project.

Recently we decided to bolster our monitoring capability as deployed by our Operator to include better node metrics. Specifically the goals were to:

  1. Gain access to certain Host/Operating System, and Kubernetes Pod (i.e. cgroup v1) specific, data
  2. Do so entirely via the SQL interface to PostgreSQL itself

There are various existing methods to achieve these goals, but we decided to create a purpose-built PostgreSQL extension to facilitate the gathering of these metrics. We will not debate the wisdom of that decision here, although personally I think it has worked out extremely well.

The extension is called pgnodemx. I also had additional goals for the extension to provide support for:

3. Non-containerized PostgreSQL clusters
4. Hosts running cgroup v2

The purpose of today's blog is to discuss this new extension in some detail.

Anatomy of the Solution

On 8th of September 2020, Michael Paquier committed patch: Add support for partitioned tables and indexes in REINDEX   Until now, REINDEX was not able to work with partitioned tables and indexes, forcing users to reindex partitions one by one. This extends REINDEX INDEX and REINDEX TABLE so as they can accept a partitioned index … Continue reading "Waiting for PostgreSQL 14 – Add support for partitioned tables and indexes in REINDEX"
Posted by Andreas 'ads' Scherbaum on 2020-09-07 at 14:00
PostgreSQL Person of the Week Interview with Regina Obe: I live in Boston, Massachusetts (USA). I was born in Lagos, Nigeria in 1971 and spent the first 7 years of my life there. My father is Nigerian and mother is American. I moved to the US at age of 7ish when my mother started her medical residency. I think I’m still considered a dual citizen though I haven’t been back to Nigeria since.

PostgreSQL 13 development is coming along nicely, Postgres 13 Beta3 was released on 8/13/2020. The Postgres Beta 1 and 2 releases were released in May and June 2020. One of the features that has my interest in Postgres 13 is the B-Tree deduplication effort. B-Tree indexes are the default indexing method in Postgres, and are likely the most-used indexes in production environments. Any improvements to this part of the database are likely to have wide-reaching benefits. Removing duplication from indexes keeps their physical size smaller, reduces I/O overhead, and should help keep SELECT queries fast!

Our latest release to the Citus extension to Postgres is Citus 9.4. If you’re not yet familiar, Citus transforms Postgres into a distributed database, distributing your data and your SQL queries across multiple nodes. This post is basically the Citus 9.4 release notes.

If you’re ready to get started with Citus, it’s easy to download Citus open source packages for 9.4.

I always recommend people check out docs.citusdata.com to learn more. The Citus documentation has rigorous tutorials, details on every Citus feature, explanations of key concepts—things like choosing the distribution column—tutorials on how you can set up Citus locally on a single server, how to install Citus on multiple servers, how to build a real-time analytics dashboard, how to build a multi-tenant database, and more…

A more advanced resource for you is the actual source code which you can find on the Citus open source repo, so you can see what we’re doing and what’s going to be in future releases of the Citus extension to Postgres.

What’s new in Citus 9.4

Citus 9.4 open source packages and docs have already rolled out, so it seemed like a good time to walk through everything new and notable in Citus 9.4. And if you want to catch up on other recent releases, take a peek at the release notes for Citus 9.3 as well as all the HTAP performance goodness in Citus 9.2.

Before diving into what’s new in Citus 9.4, here’s an overview of the new capabilities:

  • EXPLAIN ANALYZE improvements 
  • INSERT INTO local SELECT .. FROM distributed 
  • Joins between local tables and CTEs 
  • COPY connection management 
  • Calculating percentiles at scale with t-digest 
  • Modifying CTE pushdown in router queries
  • Optional support for binary protocol
  • Citus downgrade scripts

Understanding performance with EXPLAIN ANALYZE

The Postgres EXPLAIN ANALYZE command is essential for you to understand your query performance. In Citus 9.4, we improved how EXPLAIN ANALYZE works with Citus to give you more insights about yo

[...]
Posted by Nazli Ugur Koyluoglu in CitusData on 2020-09-04 at 05:51

Many of you rely on databases to return correct results for your SQL queries, however complex your queries might be. And you probably place your trust with no questions asked—since you know relational databases are built on top of proven mathematical foundations, and since there is no practical way to manually verify your SQL query output anyway.

Since it is possible that a database’s implementation of the SQL logic could have a few errors, database developers apply extensive testing methods to avoid such flaws. For instance, the Citus open source repo on GitHub has more than twice as many lines related to automated testing than lines of database code. However, checking correctness for all possible SQL queries is challenging because of the lack of a “ground truth” to compare their outputs against, and the infinite number of possible SQL queries.

Even when we do know the result a SQL query is supposed to give and we get that result as expected, that does not mean the implementation is 100% correct in all possible variants of the query and input data. The challenge for every database team is in finding subtle edge cases where bugs might be introduced.

What are logic bugs in SQL, and why are they hard to detect?

There are 2 primary types of errors in databases:

  • “raised” errors
  • logic errors

“Raised” errors include syntax errors, panics, faults, and other crashes. Raised errors terminate the process abnormally or give you some signal of undesired behavior, making them immediately recognizable to you. In order to automatically detect (and then fix) raised errors, database providers widely use fuzzing tools for SQL. Fuzzing tools are random SQL query generators that strain the target database with complex commands until an error is raised.

Logic errors, on the other hand, are latent (or silent) errors that cause the database to produce inaccurate results without emitting any sign of unexpected behavior. Logic bugs are dangerous—neither the user, nor the database provider, nor the f

[...]

pgBackRest is a well-known powerful backup and restore tool. It offers a lot of possibilities.

While pg_basebackup is commonly used to setup the initial database copy for the Streaming Replication, it could be interesting to reuse a previous database backup (eg. taken with pgBackRest) to perform this initial copy.

This content updates one of my old posts, using PostgreSQL 13 and the latest pgBackRest version.


For the purpose of this post, we’ll use 2 nodes called primary and secondary. Both are running on CentOS 7.

We’ll cover some pgBackRest tips but won’t go deeper in the PostgreSQL configuration, nor in the Streaming Replication best practices.


Installation

On both primary and secondary server, first configure the PGDG yum repositories:

$ sudo yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/\
EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

Since PostgreSQL 13 is still in Beta version, we need to enable the testing repository:

sudo yum -y install yum-utils
sudo yum-config-manager --enable pgdg13-updates-testing
sudo yum search postgresql13

Then, install PostgreSQL and pgBackRest:

$ sudo yum install -y postgresql13-server postgresql13-contrib
$ sudo yum install -y pgbackrest

Check that pgBackRest is correctly installed:

$ pgbackrest
pgBackRest 2.29 - General help

Usage:
    pgbackrest [options] [command]

Commands:
    archive-get     Get a WAL segment from the archive.
    archive-push    Push a WAL segment to the archive.
    backup          Backup a database cluster.
    check           Check the configuration.
    expire          Expire backups that exceed retention.
    help            Get help.
    info            Retrieve information about backups.
    restore         Restore a database cluster.
    stanza-create   Create the required stanza data.
    stanza-delete   Delete a stanza.
    stanza-upgrade  Upgrade a stanza.
    start           Allow pgBackRest processes to run.
    stop            Stop pgBackRest processes from
[...]
Posted by Luca Ferrari on 2020-09-04 at 00:00

A little contribution in spreading the PostgreSQL word!

Hey there! I’m using PostgreSQL!

A few weeks ago I changed my old mobile phone, and so I had to install again all my applications, including something I personally hate: WhatsApp.
While checking the configuration of the application, correctly and automatically cloned from my old phone, I came across the standard status that WhatsApp places for you:





The standard phrase is Hey there! I'm using WhatsApp!.
I hate this automatically placed sentences, so I was trying to thin about something different, and then I decided that I did not want something different because, after all, I don’t think many people spend their time reading your status.
And then, I decided to let the world know I’m using PostgreSQL:





It’s not a very big contribution, but it is a just a quick and easy way to let the world know about PostgreSQL.
If you like PostrgreSQL and the idea, please update your status too!

Posted by Asif Rehman in Highgo Software on 2020-09-03 at 17:56

In a database production environment, a backup plays quite an essential role. The database server can fail for a number of reasons. It can happen because of hardware failure, software malfunction or just because of user error. Whatever the reason, when a live database goes down, a backup is essential in fixing and recovering it.

In a database system, the data is stored in binary files. Every database provider offers some kind of backup tools using which database files may be backed up. PostgreSQL database server also provides a comprehensive set of tools.

Logical and Physical Backups

In general, the backups are categorized into Logical and Physical backups. The logical backup are human readable backups. i.e. a user can see and understand the content of the data that’s in the backup files,  e.g. SQL statements.

While the physical backups consist of the binary data. These backups are generally the filesystem snapshots of the database files. Both of these options have some pros and cons. In this blog, we’ll understand how we can use these and their pros and cons..

Logical Backups – SQL Dumps

SQL Dumps are the logical backups of your database. This type of backup creates a text file that consists of SQL statements. These statements when fed back to the database system will create the database in the same state when the backup was taken.

In PostgreSQL, we have pg_dump and pg_dumpall tools available to take this type of backup, while we have pg_restore to recreate the database from the backup.

pg_dump and pg_dumpall

The PostgreSQL database system can consist of multiple databases and pg_dump can take a backup of a single database at a time, while pg_dumpall can backup all databases at once. 

 

The caveat with pg_dumpall is that, it will only take the backup in plain format and you cannot use the compression or some custom formats that are available with pg_dump for example if you wish to keep the backup in tarballs you can do that with pg_dump but not with pg_dumpall. 

[...]