Percona Monitoring and Management (PMM) boasts many functionalities that support its extension, be it by using Custom Queries, Custom Scripts, or by collecting data from already available External Exporters.
In this short blog post, we will see how to quickly (and easily) monitor a PostgreSQL cluster managed by Patroni. I will assume you already have an environment set up and running in which the PMM client is already monitoring the PostgreSQL database. I will assume a three-node PostgreSQL cluster, but it will, of course, work with other setups. If needed, refer to the online documentation on this:
https://docs.percona.com/percona-monitoring-and-management/setting-up/client/postgresql.html
Since version 2.1.0, Patroni exposes metrics in a Prometheus-compatible way via the /metrics endpoint:
https://patroni.readthedocs.io/en/latest/releases.html#version-2-1-0
This means that we can instantly benefit from them by using the PMM External Exporters feature:
https://docs.percona.com/percona-monitoring-and-management/setting-up/client/external.html
In a nutshell, we just need to run the following in each client node:
shell> pmm-admin add external --listen-port=8008 --service-name=pg-nodeX-patroni
Where –service-name can be whatever naming scheme you want that lets you easily know which Patroni node it refers to.
After running these commands and waiting some seconds, we should see metrics incoming in the Advanced Data Exploration dashboard, like the patroni_primary one:
Up to now, no Patroni dashboard would use these metrics. So I took the chance while working on a customer ticket to improve this and created my own dashboard (which is proof of the power behind PMM’s extensibility, in my opinion).
I have uploaded the dashboard to Grafana Labs so it’s easy for everyone to import and use. You can use the ID 18870 to do so in three easy steps.
1- Click on the Import dashboard button in PMM:
[...]Crunchy Data is proud to support the pgBackRest project, an essential production grade backup tool used in our fully managed and self managed Postgres products. pgBackRest is also available as an open source project.
pgBackRest provides:
pgBackRest recently released v2.46 with support for block incremental backup, which saves space in the repository by storing only changed parts of files. File bundling, released in v2.39, combines smaller files together for speed and cost savings, especially on object stores.
Efficiently storing backups is a major priority for the pgBackRest project but we also strive to balance this goal with backup and restore performance. The file bundling and block incremental backup features improve backup and, in many cases, restore performance while also saving space in the repository.
In this blog we will provide working examples to help you get started with these exciting features.
To demonstrate these features we will create two repositories. The first repository will use defaults. The second will have file bundling and block incremental backup enabled.
Configure both repositories:
[global]
log-level-console=info
start-fast=y
repo1-path=/var/lib/pgbackrest/1
repo1-retention-full=2
repo2-path=/var/lib/pgbackrest/2
repo2-retention-full=2
repo2-bundle=y
repo2-block=y
[demo]
pg1-path=/var/lib/postgresql/12/demo
Create the stanza on both repositories:
$ pgbackrest --stanza=demo stanza-create
The block incremental backup feature is best demonstrated with a larger dataset. In particular, we would prefer to have at least one table that is near the maximum segment size of
[...]Continuous Integration / Continuous Delivery (CI/CD) is an automated approach in which incremental code changes are made, built, tested and delivered. Organizations want to get their software solutions to market as quickly as possible without sacrificing quality or stability. While CI/CD is often associated with application code, it can also be beneficial for managing changes to PostgreSQL database clusters.
GitOps plays an important part in enabling CI/CD. If you are unfamiliar with GitOps, I recommend starting with my previous post on Postgres GitOps with Argo and Kubernetes.
Today I have a working sample to walk through for implementing basic CI/CD using Crunchy Postgres for Kubernetes, ArgoCD, and a self test container. In the following steps we will:
Deployment.spec.template.spec.containers.env
section of the manager.yaml
file located in the postgres-operator-examples/kustomize/install/manager
directory.
- name: PGO_FEATURE_GATES
value: 'InstanceSidecars=true'
resources:
# - argocd-token.yaml
- hippo-self-test-config.yaml
Another Phriday in the books, and it's time to see what all happened:
Thanks everyone for participating, and look forward to Alicja's invitation coming around the end of the month!
This blog will go over the use cases of setting up NGINX (Engine X) as both a reverse proxy and load balancer for PostgreSQL. NGINX is an excellent feature-rich and open-source project that can help make an application more cohesive. It does this by exposing a single port to the user and proxy requests to different applications in a project based on the URI. This can make it appear as though our entire application is hosted on a single node while in reality it can be distributed anywhere around the globe.
For the examples below we assume that the PostgreSQL server has some HTTP server sitting in front of it that can accept REST API requests and translate them to PostgreSQL-compatible statements.
Figure 1. shows a block diagram with an overview of how NGINX will proxy different requests to different applications. The file nginx.conf
controls the configuration parameters of NGINX. It is typically located in /etc/nginx/nginx.conf
. To create a proxy like in Figure 1. above we could add the following to our nginx.conf
:
events {
worker_connections 1024;
}
http {
server {
listen localhost:55432;
location / {
proxy_pass localhost:5433;
}
location /auth {
proxy_pass 8080;
}
location /stats {
proxy_pass 8081;
}
}
}
Once the configuration is set run:
$ sudo nginx -s reload
This will reload NGINX and apply the changes we’ve made.
From top to bottom, let’s go over all the pieces of the NGINX configuration file.
First, at the very top is the “events” context, this is used to set global parameters that dictate how NGINX runs at a general level. Within this context is the worker_connections
parameter, this dictates how many simultaneous connections an NGINX process can have.
Next is the “http” context, which co
[...]One way to organize data in PostgreSQL is to make use of schemas. What is a schema in PostgreSQL? And more importantly: What is the purpose of a schema and how can schemas be used to make your life easier? Let’s dive in and find out.
Before you figure out how to use schemas, you need to know what the purpose of a schema is in the first place. To understand that, first take a look at how PostgreSQL is structured:
An “instance” is basically what you start when you deploy PostgreSQL. The next layer is a database. In reality this is what you connect to: in PostgreSQL a connection is always bound to a database inside an instance, which happens early on, right after user authentication.
What is important is the next layer down, between databases and tables: Schemas.
Basically, schemas are a way to group tables together. Let’s assume there’s a fairly large data structure: Having 500 tables in one place is certainly harder to manage and understand than to have 10 buckets containing 50 tables each.
It’s simply like organizing pictures: You wouldn’t put all of them into the same folder, but rather group them by year, location, etc. The same logic can be applied to tables.
Now we can focus on how this concept can be applied to PostgreSQL. The first thing we have to look at is the public
schema.
The beauty of PostgreSQL is that it doesn’t matter much if you know nothing at all about schemas. The reason is the existence of the public
schema, which is there by default. How can we find out which schemas there are in PostgreSQL? psql provides the \dn
command to display this information:
demo=# \dn List of schemas Name | Owner --------+------------------- public | pg_database_owner (1 row)
In a default scenario, a table will end up in the public
schema. Here’s an example:
demo=# CREATE TABLE t_product ([...]
From time to time, situations occur where unusual circumstances dictate out-of-the-box thinking.
For example, suppose you have a system where you’ve installed multiple data clusters onto a single host. What you end up with is a host breaking up and sharing valuable system resources, i.e., CPU, RAM, disk, etc., between multiple instances of PostgreSQL data clusters. While easy to do in a development environment, it does represent technical debt when it reaches production. Sooner or later, one must address this issue; otherwise, one can suffer the consequences of handicapping your entire database infrastructure.
Let’s now move forward in time: your system has scaled, of course, and this shortcut of using multiple data clusters on a single host has now become a performance bottleneck. The problem is you either don’t or can’t refactor your application servers; something, maybe, about not having enough time in the day. And, as you may already know, while Postgres can sit on both a UNIX DOMAIN socket and IPv4, IPv6 port, etc., one is nevertheless constrained to listen to just the one port.
So what do you do?
For the experienced sysadmin, there are actually quite a number of “magical” techniques. However, in this case, with a little help from systemd, which manages all service processes, we will solve this little puzzle using PgBouncer with a concoction of configuration files.
Configure the system such that Postgres resides on its default port of 5432 and PgBouncer sits on three ports, i.e., 6432, 6433, and 6433, accessing the resident Postgres server.
The PgBouncer connection pooler will use an administrative account, the ROLE PgBouncer, for the purpose of user authentication. Authentication is to be achieved by accessing the Postgres server’s pg_shadow table and comparing the resultant hash to all incoming connections (this won’t work for cloud setups such as, for example, Amazon RDS).
A set of Systemd configuration files will be created and edited in order to manage the P
[...]We have been talking a lot here about using Postgres for metrics, dashboards, and analytics. One of my favorite Postgres tools that makes a lot of this work easy and efficient is Hyperloglog (HLL). Hyperloglog is like Regex, once you understand it -- you feel like it's a superpower. Also, like Regex -- it can't solve everything. In this post I’ll take you through how to get started with HLL and build some sample queries, and get started with simple tuning.
Hyperloglog is a compression and extraction algorithm for counting distinct values. It gets the name from its disk size characteristic of log(log(x)) -- hence the "loglog". Because it's a compression algorithm, results are an approximation. During storage time, it converts values using a hash, adds those to a summation, and converts them to binary. Then, during read time, it groups the binary values to approximate values based on the length of zeroes.
Because it’s an approximation, the use case should accept approximate calculations. Billing systems? No! Visitor tracking? Yes!
To use Hyperloglog, you must have data that counts distinct values across a time series or set of values. Typically, you can use it where you are currently using COUNT(DISTINCT
. The canonical example for Hyperloglog is product behavior metrics, where you are looking to scale something like the following query:
SELECT
date_trunc('week', received_at) AS query_week,
COUNT(DISTINCT customer_id) AS active_customer_count
FROM activities
WHERE received_at > '2023-01-01'
GROUP BY 1
ORDER BY 1;
Running that query will find the matching set of activities, store the result in memory, then group by iterating over the set, and iterate over the order. The most common solution to this problem is using output caching. The problem with output caching is that it can only answer one question. The cached value for this query can only answer the same question the query answered. Thus, if you wanted to find the
[...]Introduction We are thrilled to announce the release of pg_cirrus! First of all, you might be wondering what “cirrus” means. The term refers to the thin and wispy clouds that are often seen at high altitudes. pg_cirrus is a simple and automated solution to deploy highly available 3-node PostgreSQL clusters with auto failover. It is […]
The post Setting Up a PostgreSQL 3-Node HA Cluster using pg_cirrus appeared first on Stormatics.
I recently released PostGIS 3.3.3. bundle for Windows which is available on application stackbuilder and OSGeo download site for PostgreSQL 11 - 15. If you are running PostgreSQL 12 or above, you get an additional bonus extension MobilityDB which is an extension that leverages PostGIS geometry and geography types and introduces several more spatial-temporal types and functions specifically targeted for managing objects in motion.
What kind of management, think of getting the average speed a train is moving at a segment in time or collisions in time, without any long SQL code. Just use a function on the trip path, and viola. Think about storing GPS data very compactly in a singe row /column with time and being able to ask very complex questions with very little SQL. True PostGIS can do some of this using geometry with Measure (geometryM) geometry types, but you have to deal with that craziness of converting M back to timestamps, which mobilitydb temporal types automatically encode as true PostgreSQL timestamp types.
Anita Graser, of QGIS and Moving Pandas fame, has written several posts about it such as: Visualizing Trajectories with QGIS and mobilitydb and Detecting close encounters using MobilityDB 1.0.
Continue reading "PostGIS Bundle 3.3.3 for Windows with MobilityDB"This month’s PGSQL Phriday topic is “database change management”, proposed by Dian Fay.
While I’ll try to make this topic about performance (as I always do) let me first embark on a quick trip down memory lane.
Back in 2010, my first full-time job was at a company called Redgate, who’s flagship product at the time was SQL Compare. To this day, that tool allows folks to quickly compare SQL Server databases, review the differences, and generate a script to make one match the other. Fun fact, since then, an excellent former teammate of mine, Neil Anderson, has now built Postgres Compare. Anyway, back in 2010 when I joined Redgate, they also had a team working on the first version of SQL Source Control, which uses the same engine to sync databases with files in source control and vice versa — pretty cool! I was lucky enough to work with them, and then a few years later, found myself running a wonderful team tasked with building an equivalent tool, but for Oracle.
We also built tooling using these engines to enable continuous integration and even continuous deployment of databases. As such, I’ve seen everything from teams with no source control in place for databases, all the way through to folks who had the vast majority of the process automated!
A lot of the Postgres shops I come across these days use Rails or Django, who already have in-built migrations support. Others use tools such as sqitch, liquibase, or flyway. But honestly, as I mentioned in our podcast episode on version control for databases, I still see a gap between how people manage their databases vs how they manage their application code.
As such, if you already have the following in place for your databases, I think you’re ahead of the curve:
Version control, with the ability to branch
Continuous integration, with unit tests
Automated deployment pipeline being the primar
The invitation this month for #PGSqlPhriday comes from Dian Fay. The topic is pretty simple, database change management. Now, I may have, once or twice, spoken about database change management, database DevOps, automating deployments, and all that sort of thing. Maybe. Once or twice. OK. This is my topic. I’ve got some great examples on […]
The post PGSQL Phriday #009: On Rollback appeared first on Grant Fritchey.
One of the big topics in Postgres is bloat. It is something that every DBA comes across and in fact we have a few good posts on how to work with bloat in our blog.
Update-heavy tables is a special case of data that due to business requirements needs to be updated frequently to stay relevant. Autovacuum may be a good solution for table bloat (you can check this post describing how to do that), however it does not help with index bloat.
Pg_index_watch resolves this issue by automatically rebuilding indexes when needed.
How it works?
With the introduction of REINDEX CONCURRENTLY in PostgreSQL 12 there was finally a safe way to rebuild indexes without heavy locks. At the same time, it was still unclear how to identify a criteria based on which we can decide whether the index is bloated (and should be rebuilt) or not.
What was missing is a simple and cheap to run statistical model that would allow us to estimate index bloat ratio without the requirement of reading and analyzing the whole index.
Now, PostgreSQL allows you to access the following:
1) number of rows in the index (in pg_class.reltuples for the index)
and
2) index size.
Assuming that the ratio of index size to the number of entries is almost always constant (this is correct in 99.9% of cases), we can speculate that if, compared to its ideal state, the ratio has doubled it is most certain that the index was bloated x2.
Based on this assumption we developed a system, similar to an AUTOVACUUM, that automatically tracks level of bloated indices and rebuilds them when bloat goes over the threshold.
Read more on GitHub and try it out!
Let me know what you think.
Maxim
I've used several migration frameworks in my time. Most have been variations on a common theme dating back lo these past fifteen-twenty years: an ordered directory of SQL scripts with an in-database registry table recording those which have been executed. The good ones checksum each script and validate them every run to make sure nobody's trying to change the old files out from under you. But I've run into three so far, and used two in production, that do something different. Each revolves around a central idea that sets it apart and makes developing and deploying changes easier, faster, or better-organized than its competition -- provided you're able to work within the assumptions and constraints that idea implies.
The first time I used sqitch, I screwed up by treating it like any other manager of an ordered directory of SQL scripts with fancier verification and release management capabilities. It does have those, but they weren't why I used sqitch the second and subsequent times.
sqitch wants you to treat your schema and the statements that define it as a supergraph of all your inter-database-object dependencies. Tables depend on types, on other tables via foreign keys, on functions with triggers or constraints; views depend on tables and functions; functions depend on tables, on other functions, on extensions. Each one, roughly, is a single named migration -- more on that in a bit.
So shipments
depend on warehouses
, since you have to have a source and a destination for the thing you're shipping, and warehouses
depend on regions
, because they exist at a physical address subject to various laws and business requirements. shipments
also have no meaning independently from the thing being shipped, so in the case I'm filing the serial numbers from, that table also maintains a dependency on weather-stations
. Both shipments
and warehouses
depend on the existence of the set_last_updated
audit trigger function. The plan file looks like this:
trigger-set-updated-at 2020-03-19T17:
[...]
Postgres 16 is hot off the press with the beta release last week. I am really excited about the new feature that allows logical replication from standbys, allowing users to:
A second relevant and exciting new feature coming in 16 is that replication slots on a given standby persist the promotion of that standby to a primary. This means that in the event of primary server failure and promotion of a standby to primary, the replication slots will persist and the former-standby subscribers will not be affected.
These two together give PostgreSQL a huge boost in performance for big data operations. Applications moving data around in physical locations and folks doing data warehousing, analytics, data integration, and business intelligence. I’m going to walk through an example schema and database setup and offer some sample settings and code for creating logical replication from a standby.
At a high level PostgreSQL supports two main types of replication - streaming/physical and logical. The Write-Ahead Log can stream the entire set of physical files through a connection and represents the entire database on disk. Logical replication offers a more fine-grained approach, where you can specify individual database objects, such as tables or even specific rows, to be replicated to a remote server. You can read more about logical replication basics in Data To Go: Postgres Logical Replication.
A standby server in PostgreSQL is created by taking a base backup of the primary server and continuously applying all changes made on the primary. A hot standby is a standby server that can be promoted to become the primary server. PostgreSQL saves modifications in WAL (Write-Ahead Log) records and replicates them from the primary server to
[...]The news today is that SQL:2023, the new version of the SQL standard, has been published by ISO.
Here are links to the parts on the ISO web site:
(Note, if you want to purchase a copy, you probably only want part 2, unless you intend to work in the particular areas covered by the other parts.)
Here is my previous article on what is new in SQL:2023, and here is the article on the status of SQL:2023 support in PostgreSQL.
Incidentally, the next SQL working group meeting is in less than two weeks’ time, and work on the next version will start then.
If nodes in a database cluster get out of sync, the inconsistency can cause data corruption. This blog describes what to watch out for.
The post Data inconsistency in highly available PostgreSQL clusters appeared first on Stormatics.
PostGIS excels at storing, manipulating and analyzing geospatial data. At some point it's usually desired to convert raw spatial data into a two-dimensional representation to utilize the integrative capabilities of the human visual cortex. In other words, to see things on a map.
PostGIS is a popular backend for mapping technology, so there are many options to choose from to create maps. Data can be rendered to a raster image using a web map server like GeoServer or MapServer; it can be converted to GeoJSON or vector tiles via servers such as pg_featureserv
and pg_tileserv
and then shipped to a Web browser for rendering by a library such as OpenLayers, MapLibre or Leaflet; or a GIS application such as QGIS can connect to the database and create richly-styled maps from spatial queries.
What these options have in common is that they require external tools which need to be installed, configured and maintained in a separate environment. This can introduce unwanted complexity to a geospatial architecture.
This post presents a simple way to generate maps entirely within the database, with no external infrastructure required.
A great way to display vector data is to use the Scalable Vector Graphic (SVG) format. It provides rich functionality for displaying and styling geometric shapes. SVG is widely supported by web browsers and other tools.
By including CSS and Javascript it's possible to add advanced styling, custom popups, dynamic behaviour and interaction with other web page elements.
pg-svg
Generating SVG "by hand" is difficult. It requires detailed knowledge of the SVG specification, and constructing a complex text format in SQL is highly error-prone. While PostGIS has had the function ST_AsSVG
for years, it only produces the SVG path data attribute value. Much more is required to create a fully-styled SVG document.
The PL/pgSQL library pg-svg
solves this problem! It makes it easy to convert PostGIS data into styled SVG documents. The library provid
It’s annoying to get error messages caused by encoding problems. But it is more annoying not to get error messages and end up with bad data in the database. I’ll show you how to fix bad encoding.
Each PostgreSQL database has a server encoding. You can inspect (but not change) the PostgreSQL parameter server_encoding
to see the encoding of the database you are connected to. You can choose the database encoding when you create a database, but you cannot change the server encoding after creating the database. This restriction is necessary because changing the server encoding will render string data corrupt.
Each database session has a client encoding, which you can view and change using the PostgreSQL parameter client_encoding
. If your client uses the C client library libpq, you can determine the client encoding with the environment variable PGCLIENTENCODING
. In addition, psql
has the command \encoding
to view and change the client encoding. psql
tries to guess a good client encoding from the current locale. If the client encoding isn’t specified anywhere, PostgreSQL will set it to the server encoding.
PostgreSQL expects that character data sent by a client are encoded in the client encoding and converts them to the server encoding. Conversely, PostgreSQL converts query results to the client encoding.
PostgreSQL supports many encodings. All supported server encodings must be a superset of ASCII.
The server encoding of my database is UTF8
. We’ll use this simple table:
CREATE TABLE texts ( id bigint PRIMARY KEY, t text );
Using Linux’ echo
command, I create an SQL script in UTF-8 encoding:
echo -e "INSERT INTO texts VALUES (1, 'sch\xc3\xb6n');" \ > insert_utf8.sql
Then I create a second script, with the same string in LATIN-1 (= ISO 8859-1) encoding:
echo -e "INSERT INTO texts VALUES (2, 'sch\xf6n');" \ > insert_latin1.sql
Both strings contain the German letter “ö”. LATIN-1 encodes it as a
[...]Disaster recovery is not optional for businesses operating in the digital age. With the ever-increasing reliance on data, system outages or data loss can be catastrophic, causing significant business disruptions and financial losses.
With multi-cloud or multi-regional PostgreSQL deployments, the complexity of managing disaster recovery only amplifies. This is where the Percona Operators come in, providing a solution to streamline disaster recovery for PostgreSQL clusters running on Kubernetes. With the Percona Operators, businesses can manage multi-cloud or hybrid-cloud PostgreSQL deployments with ease, ensuring that critical data is always available and secure, no matter what happens.
In this article, you will learn how to set up disaster recovery with Percona Operator for PostgreSQL version 2.
Operators automate routine tasks and remove toil. For standby, Operator provides the following options:
We will review the repo-based standby as the simplest one:
1. Two Kubernetes clusters in different regions, clouds, or running in hybrid mode (on-prem + cloud). One is Main, and the other is Disaster Recovery (DR).
2. In each cluster, there are the following components:
3. pgBackrest on the Main site streams backups and Write Ahead Logs (WALs) to the object storage.
4. pgBackrest on the DR site takes these backups and streams them to the standby cluster.
Use your favorite method to deploy the Operator from our documentation. Once installed, configure the Custom Resource manifest so that pgBackrest starts using the Object Storage of your choice. Skip this step if you already have it configured.
Configure the backups.pgbackrest.repos section by adding the necessary configuration. The below example is for Google Cloud Storage (GCS):
spec: backups: c[...]
What’s the fastest compiler for PostgreSQL? Let’s take a look.
OS | Compiler | time make -s |
---|---|---|
macOS 13 | gcc-13 | 3:59.29 |
gcc-12 | 3:42.19 | |
gcc-11 | 3:33.35 | |
clang-16 | 3:05.05 | |
clang-15 | 2:19.71 | |
clang-14 | 2:21.03 | |
clang-13 | 2:20.72 | |
Apple clang (14) | 1:55.87 | |
Ubuntu 22.04 | gcc-12 (default) | 2:57.87 |
gcc-11 | 2:24.20 | |
gcc-10 | 2:18.28 | |
clang-15 | 2:21.62 | |
clang-14 | 2:24.23 | |
clang-13 | 2:25.68 |
Compilers keep getting slower, it appears! Maybe don’t use the latest one right away!
Also, gcc vs clang? Not sure, it depends.
The above uses the default optimization level -O2. Let’s see if we can squeeze out more:
Optimization | Apple clang | gcc-13 |
---|---|---|
-O3 | 1:55.10 | 4:26.59 |
-O2 | 1:52.13 |
Number of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.