PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

Our beloved Structured Query Language may be the lingua franca for relational databases—but like many languages, SQL is in a state of constant evolution. The first releases of SQL didn’t even have a notation for joins. At the time, SQL only supported inner joins.

Cross Joins and Where Filters

As a result, back in early eighties, the only way to express a join condition between tables would be in the WHERE clause.

select companies.name, campaigns.name, campaigns.monthly_budget
  from companies, campaigns 
 where companies.id = campaigns.company_id
   and company_id = 5;

In that query, the PostgreSQL planner realizes that we are doing an INNER JOIN in between the companies table and the campaigns table, even though what we are actually writing is a CROSS JOIN in between the tables. In PostgreSQL documentation chapter Table Expressions we can read more about the CROSS JOIN operation:

For every possible combination of rows from T1 and T2 (i.e., a Cartesian product), the joined table will contain a row consisting of all columns in T1 followed by all columns in T2. If the tables have N and M rows respectively, the joined table will have N * M rows.

FROM T1 CROSS JOIN T2 is equivalent to FROM T1 INNER JOIN T2 ON TRUE (see below). It is also equivalent to FROM T1, T2.

In my opinion, it is so rare that what you want to achieve actually is a cross join that cross joins should fail your SQL unit testing. After all, code should always express the intentions of the programmer, and if your intention is not to implement a CROSS JOIN then using a comma separated list of tables in your FROM clause looks deadly suspicious.

Inner And Outer Joins in SQL

The previous SQL query can be written properly with an inner join syntax:

select companies.name, campaigns.name, campaigns.monthly_budget
  from            companies
       inner join campaigns 
               on companies.id = campaigns.company_id
 where company_id = 5;

Lucky for us, the PostgreSQL planner is smart enough to realize that both queries are specifying the same res

[...]

Brendan Gregg’s USE (Utilization, Saturation, Errors) method for monitoring is quite known. There are even some monitoring dashboard templates shared on the Internet. There’s also Tom Wilkie’s RED (Rate, Errors, Durations) method, which is suggested to be better suited to monitor microservices than USE.

We, at okmeter.io, recently updated our PgBouncer monitoring plugin and while doing that we’ve tried to comb everything and we used USE and RED as frameworks to do so.

Why we needed both and not just stuck with USE, as it is more commonly known? To answer that we need to understand their applicability first. While they are know, I don’t think they are widely systematically applied in practice of covering IT-systems with monitoring.

USE

Using Brendan Gregg’s own words:

For every resource, check utilization, saturation, and errors.

Where resource is all and any physical server functional component (CPUs, disks, busses, …). But also some software resources as well, or software imposed limits/resource controls (containers, cgroups, etc).

Utilization: the average time that the resource was busy servicing work. So CPU utilization or disk IO utilisation of 90% means that it is idle, not doing work only 10% of the time, and busy 90% of time. But also, for such resources as memory, where one can’t apply the idea of “non idle percentage of time”, one could measure the proportion of a resource that is used.

Anyways, 100% utilization means no more “work” can be accepted, either at all, i.e. when memory is full, it is full, you can’t do anything about it. Or it’s 100% utilized only now, at the moment (as with CPU), and new work could be put into a waiting list, queue or something. And these two scenarios are covered by the corresponding remaining two USE metrics:

Saturation: the degree to which the resource has extra work which it can’t service, often queued.

Errors: the count of error events, i.e. such as “resource is busy”, “Cannot allocate memory”, “Not enough space”. While those usually do not produce performance impact

[...]

The video of my presentation below walks you through ‘Indexing in PostgreSQL’ – a key component to database performance.

This presentation covers the following topics:

  • Various ways to access data in PostgreSQL
  • Different types of indexes supported by PostgreSQL
  • Internals of BTree and BRIN indexes
  • Overview of GIN and GiST indexes
  • How to find missing indexes
  • How to find unused indexes
PostgreSQL enterprise-grade backup and recovery

PostgreSQL® logoIn this post we cover the methods used to achieve an enterprise-grade backup strategy for the PostgreSQL® cluster. In setting up our demonstration system, we use pg_basebackup and continuous archiving. The size of the database and your database environment—if it is on colo or cloud—matters. It is very important to understand how we can ensure minimalistic or no data loss at the time of disasters using our preferred backup tools.

As discussed in the introductory blog post, in our webinar of October 10, 2018 we highlight important aspects an enterprise should consider for their PostgreSQL environments. In our last blog post we looked at security.

Backing up…

The ability to recover a database to a certain point in time is always the ultimate aim of a backup strategy. Without a recoverable backup, you have no backup! So there are always two parts to a backup strategy: backup AND recovery. The backup tool you use to achieve your aims may vary. Let’s discuss some of the options.

pg_basebackup

The community version of PostgreSQL comes with a trusted backup tool called pg_basebackup. This utility can help you take a consistent online binary backup that is very fast, can be used for point-in-time-recovery, and also for setting up slaves/replicas.

Archiving and continuous archiving

Postgres does not flush data pages to disk upon every transaction commit. To ensure that PostgreSQL achieves durability and endures crash recovery, changes are written to transactions logs (a.k.a. WALs, Write-Ahead Logs) stored on disk.

WALs in PostgreSQL are similar to transaction log files in the InnoDB storage engine for MySQL. However, WAL files are recycled according to the values set to the parameters wal_keep_segments and max_wal_size. Hence, if WAL files are not copied to a safe location—such as a backup server or another file system—it won’t be possible to achieve point-in-time-recovery (PITR).

In order to archive WAL segments to a safe location, the parameter archive_mode must be set to ‘ON’ and we must pass an appropriate shell comma

[...]
We are pleased to announce that PostgresConf 2019 will be held at the Sheraton Times Square March 18th - 22nd, 2019. 


Following on the success of PostgresConf 2018, 2019 will include five days with added training and partner summits.



March 18th and 19th will have immersive training. Instructors are encouraged to submit half and full day material for consideration. The preferred topics are centered around Postgres but anything People, Postgres, or Data related will be considered.

Monday, March 18th through Friday, March 22nd will host several partner summits, including popular and upcoming topics within the Postgres community and the annually hosted Regulated Industry Summit. Break out sessions will be held from Wednesday - Friday.

Important Dates:
  • Call for Papers Open: 09/12/2018 
  • Call for Papers Close: 01/11/2019 
  • Confirmation/Acceptance: 01/18/2019
Can't wait until March? Join us at our West Coast event, PostgresConf Silicon Valley, October 15th and 16th, 2018 at the Hilton San Jose.

About PostgresConf:

PostgresConf is a global nonprofit conference series with a focus on growing community through increased awareness and education of Postgres and related technologies. PostgresConf is known for its highly attended national conference with the mission of:


Posted by Vasilis Ventirozos in OmniTI on 2018-09-25 at 13:51
Recently Amazon annnounced that postgres 10 supports logical replication syntax and that it can work as either a publisher or a subscriber. The announcement can be found here.

In order to test it out i made 2 10.4 RDS instances. Set them up to be in the same security,
subnet and parameter groups so i ended up with something like this :

user : vasilis
password : Ap4ssw0rd
db instance : lrnode1 & 2
dbname : lrtest
port : 5432

only thing i had to change in order to enable logical replication is to change rds.enable_logical_replication to 1 in my parameter group.
 

After verifying connectivity

psql -h lrnode1.random.us-east-1.rds.amazonaws.com -U vasilis -d lrtest -W
psql -h lrnode2.random.us-east-1.rds.amazonaws.com -U vasilis -d lrtest -W

i created a dummy table on my publisher and subscriber (lrnode1 and 2)

create table lr_test_tbl (
    id int primary key,
    name text
);

on lrnode1 (publisher) i created a publication
CREATE PUBLICATION lr_pub FOR TABLE lr_test_tbl ;


on lrnode2 and as vasilis i created the subscription
CREATE SUBSCRIPTION lr_sub CONNECTION 'host=lrnode1.cqrcyb3h2bq2.us-east-1.rds.amazonaws.com 
dbname=lrtest user=vasilis password=Ap4ssw0rd' PUBLICATION lr_pub ;

There only 3 things you need to know, you need to adjust your subnet / security groups allowing
instances to see each other, you have to change the RDS.enable_logical_replication parameter 
and that the user you are supposed to use for this is the user you set when you created the instance.
 
That was it, pretty fast and simple.

-- 
Vasilis Ventirozos
OmniTI Computer Consulting Inc.

Posted by Hubert 'depesz' Lubaczewski on 2018-09-25 at 10:04
I just released first version of pg_terminator. It's a tool that is supposed to be run on PostgreSQL db server, monitor a database, and cancel or terminate offending queries/connections. It can be used, for example to: kill leftover psql sessions that are not doing anything cancel too long queries on production servers kill connections that […]
Posted by Douglas Hunley on 2018-09-24 at 14:58

After a lively discussion at work today about monitoring tools and uses cases, I decided to see if I could use tail_n_mail, which I already use to monitor my PostgreSQL logs, to monitor my pgBackRest logs. It turns out that it can, and can do so fairly trivially.

For reference, our .tail_n_mail.conf looks like this:

## Config file for the tail_n_mail program
## This file is automatically updated
LOG_LINE_PREFIX: %t P%p
EMAIL: you@gme.com
MAILSUBJECT: HOST pgBackRest errors NUMBER

INCLUDE: WARN:
INCLUDE: ERROR:
INCLUDE: FATAL:
INCLUDE: PANIC:


FILE1: /var/log/pgbackrest/pgbackrest.log

You would need to change the EMAIL and FILE1 and the rest should just work.

To actually invoke the monitoring, simply add a cron entry that looks like:

* * * * * /path/to/tail_n_mail ~/.tailnmail.conf --pgmode=0

(Obviously, you should adjust the periodicity. I highly doubt you need to check every minute.)

Once you’ve saved the crontab, you should be all good. If you doubt the setup, you can add --mailzero to the invocation to get an email even if everything is OK.

See? Easy.

Posted by Luca Ferrari on 2018-09-23 at 13:01

I have already written about a very useful and powerful small pearl by theory: pgenv. Now the tool does support for user configuration!

pgenv get configuration!

I spent some time implementing a very rudimental approach to configuration for pgenv. The idea was simple: since the program is a single Bash script, the configuration can be done using a single file to source variables in.

But before this was possible, I had to do a little refactoring over here and there in order to make all the commands behave smooth across the configuration. And at least, it seems to work, with some parts that can be improved and implemented better (as always it is!). However, I designed from scratch to support every single version of PostgreSQL, that means configuration could be different depending on the specific version you are running. This allows, for example, to set particular flags for ancient versions, without having to get crazy when switching to more recent ones.

Now pgenv supports a config command that, in turn, support for several subcommands:

  • write to store the configuration in an hidden file named after the PostgreSQL version (e.g., .pgenv.10.5.conf);
  • edit just to launch you $EDITOR to manipulate the configuration;
  • delete to remove a configuration file;
  • show to dump the configuration.

The idea is simple: each time a new PostgreSQL version is built, a configuration file is created for such instance. You can then customize the file in order to make pgenv behave differently for that particular version of PostgreSQL. As an example, you can set different languages (e.g., PL/Perl) or different startup/stop modes. If the configuration file for a particular version is not found, a global configuration is loaded. If neither that is...

Posted by Regina Obe in PostGIS on 2018-09-23 at 00:00

The PostGIS development team is pleased to release PostGIS 2.5.0.

Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11beta4+ and GEOS 3.7.0 which were released recently.

Best served with PostgreSQL 11 beta4 and pgRouting 2.6.1.

WARNING: If compiling with PostgreSQL+JIT, LLVM >= 6 is required Supported PostgreSQL versions for this release are: PostgreSQL 9.4 - PostgreSQL 12 (in development) GEOS >= 3.5

2.5.0

Continue Reading by clicking title hyperlink ..
It came out a while ago but I haven't promoted it much yet.


This is the recorded version of the PostgreSQL at 20TB and Beyond talk.  It covers a large, 500TB analytics pipeline and how we manage data.

For those wondering how well PostgreSQL actually scales, this talk is worth watching.
My overall perspective here is that the PostgreSQL community needs a code of conduct, and one which allows the committee to act in some cases for off-infrastructure activity, but that the current code of conduct has some problems which could have been fixed if efforts had been better taken ensure that feedback was gathered when it was actionable.

This piece discusses what I feel was done poorly but also what was done well and why, despite a few significant missteps, I think PostgreSQL as a project is headed in the right direction in this area.

But a second important point here is to defend the importance of a code of conduct to dissenters here, explain why we need one, and why the scope needs to extend where it needs to extend to, and why we should not be overly worried about this going in a very bad direction.  The reason for this direction is that in part I found myself defending the need for a code of conduct to folks I collaborate with in Europe and the context had less to do with PostgreSQL than with the Linux kernel.  But the projects in this regard are far more different than they are similar.

Major Complaint:  Feedback Could Have Been Handled Better (Maybe Next Time)


In early May there was discussion about the formation of a code of conduct committee, in which I argued (successfully) that it was extremely important that the committee be geographically and culturally diverse so as to avoid one country's politics being unintentionally internationalized through a code of conduct.  This was accepted and as I will go into below this is the single most important protection we have against misuse of the code of conduct to push political agendas on the community.  However after this discussion there was no further solicitation for feedback until mid-September.

In Mid-September, the Code of Conduct plan was submitted to the list.  In the new code of conduct was a surprising amendment which had been made the previous month, expanding the code of conduct to all interactions between community members unless anot
[...]
Posted by Shaun M. Thomas in 2ndQuadrant on 2018-09-21 at 15:00

When working with database capacity planning, there are a lot of variables to consider, and Postgres is no different in this regard. One of the elements which requires management is storage. However, there’s an aspect of storage that escapes inspection almost without exception, and it’s hiding in the shadows between the columns themselves.

Alignment Basics

In most low-level computer languages like C, in which the venerable Postgres is written, data types are addressed by their maximum size, regardless of how large they actually are. Thus a standard 32-bit integer which can store a value of just over 2-billion, must be read as a whole unit. This means even the value of 0 requires 4 bytes of storage.

Further, Postgres is designed such that its own internal natural alignment is 8 bytes, meaning consecutive fixed-length columns of differing size must be padded with empty bytes in some cases. We can see that with this example:

SELECT pg_column_size(row()) AS empty,
       pg_column_size(row(0::SMALLINT)) AS byte2,
       pg_column_size(row(0::BIGINT)) AS byte8,
       pg_column_size(row(0::SMALLINT, 0::BIGINT)) AS byte16;

 empty | byte2 | byte8 | byte16 
-------+-------+-------+--------
    24 |    26 |    32 |     40

This suggests that an empty Postgres row requires 24 bytes of various header elements, a SMALLINT is 2 bytes, a BIGINT is 8 bytes, and combining them is… 16 bytes? That’s no mistake; Postgres is padding the smaller column to match the size of the following column for alignment purposes. Instead of 2 + 8 = 10, our math becomes 8 + 8 = 16.

Intensity Intervals

By itself, this may not necessarily be a problem. But consider a contrived ordering system with this table:

CREATE TABLE user_order (
  is_shipped    BOOLEAN NOT NULL DEFAULT false,
  user_id       BIGINT NOT NULL,
  order_total   NUMERIC NOT NULL,
  order_dt      TIMESTAMPTZ NOT NULL,
  order_type    SMALLINT NOT NULL,
  ship_dt       TIMESTAMPTZ,
  item_ct       INT NOT NULL,
  ship_cost     NUMERIC,
  receive_dt    TIMESTAMPTZ,
  tracking_cd   
[...]
PostgreSQL enterprise-grade security

PostgreSQL® logoIn this post, we review how you can build an enhanced and secure PostgreSQL database environment using community software. We look at the features that are available in PostgreSQL that, when implemented, provide improved security.

As discussed in the introductory blog post of this series, in our webinar of October 10, 2018 we highlight important aspects an enterprise should consider for their PostgreSQL environments. This series of blogs addressing particular aspects of the enterprise-grade postgres environment complements the webinar. This post addresses security.

Authentication Layer

Client connections to PostgreSQL Server using host based authentication

PostgreSQL uses a host based authentication file (pg_hba.conf) to authorize incoming connections. This file contains entries with a combination of 5 categories: type, database, user, address, and method. A client is allowed to connect to a database only when the combination of username, database and the hostname of the client matches an entry in the pg_hba.conf file.

Consider the following entry in pg_hba.conf file :

# TYPE DATABASE USER ADDRESS METHOD
host percona pguser 192.168.0.14/32 md5

This entry says that connections from server 192.168.0.14 are only allowed from user pguser and only to the database percona. The method md5 forces password authentication.

The order of the entries in the pg_hba.conf file matters. If you have an entry that rejects connections from a given server followed by another that allows connections from it, the first entry in the order is considered. So, in this case, the connection is rejected.

This is the first layer of protection in authentication. If this criteria is not satisfied in this Access Control List (ACL), PostgreSQL will discard the request without considering even the server authentication.

Server Authentication

Historically, PostgreSQL uses MD5 digest as a password hash by default. The problem with pure MD5 hashing is that this function will always return the same hash for a given password, which renders a MD5 digest

[...]
Posted by Craig Kerstiens in CitusData on 2018-09-20 at 15:00

Postgres 11 is almost here, in fact the latest beta shipped today, and it features a lot of exciting improvements. If you want to get the full list of features it is definitely worth checking out the release notes, but for those who don’t read the release notes I put together a run down of some what I consider the highlight features.

Quitting Postgres

This is a small usability feature, but so long over due. Now you can quit Postgres by simply typing quit or exit. Previously you had to use Ctrl + D or \q. As a begginer it’s one thing to jump into a psql terminal, but once in if you can’t figure out how to quit it’s a frustrating experience. Small usability features, such as this and watch in an earlier release, are often lost in the highlighted features which talk about performance or new data types. Improvements like this really go a long way for making Postgres a better database for everyone.

Fear column addition no more

Brandur had a great in depth write-up on this feature already, but it falls somewhere into the category of the above as well as a performance improvement. Previously when you added a new column that was NOT NULL with a default value Postgres would have to take a lock and re-write the entire table. In a production environment on any sizable table for all practical purposes the result was an outage. The work around was to break your migrations apart to be a several step process.

With Postgres 11 you can add a new column to a table that is not null with a default value. The new row will get materialized on your database without requiring a full re-write. Here is to having to think less about your migrations.

Of course performance is a highlight

No Postgres release would be complete without some performance improvements. This release there are really two areas that feature key improvements around performance.

Parallelism continuing to mature

We first saw parallelism support back in PostgreSQL 9.6. At the time it was primarily for sequential scans, which if you used parallelism for your sequential sc

[...]

A partitioning system in PostgreSQL was first added in PostgreSQL 8.1 by 2ndQuadrant founder Simon Riggs. It was based on relation inheritance and used a novel technique to exclude tables from being scanned by a query, called “constraint exclusion”. While it was a huge step forward at the time, it is nowadays seen as cumbersome to use as well as slow, and thus needing replacement.

In version 10, it was replaced thanks to heroic efforts by Amit Langote with modern-style “declarative partitioning”. This new tech meant you no longer needed to write code manually to route tuples to their correct partitions, and no longer needed to manually declare correct constraints for each partition: the system did those things automatically for you.

Sadly, in PostgreSQL 10 that’s pretty much all it did. Because of the sheer complexity and the time constraints, there were many things in the PostgreSQL 10 implementation that were lacking. Robert Haas gave a talk about it in Warsaw’s PGConf.EU.

Many people worked on improving the situation for PostgreSQL 11; here’s my attempt at a recount. I split these in three areas:

  1. New partitioning features
  2. Better DDL support for partitioned tables
  3. Performance optimizations.

New Partitioning Features

In PostgreSQL 10, your partitioned tables can be so in RANGE and LIST modes. These are powerful tools to base many real-world databases on, but for many others designs you need the new mode added in PostgreSQL 11: HASH partitioning. Many customers need this, and Amul Sul worked hard to make it possible. Here’s a simple example:

CREATE TABLE clients (
client_id INTEGER, name TEXT
) PARTITION BY HASH (client_id);

CREATE TABLE clients_0 PARTITION OF clients
    FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE clients_1 PARTITION OF clients
    FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE clients_2 PARTITION OF clients
    FOR VALUES WITH (MODULUS 3, REMAINDER 2);

It is not mandatory to use the same modulus value for all partitions; this lets you create more partitions later and r

[...]

Citus extends Postgres to be a horizontally scalable database. By horizontally scalable, we mean the data is spread across multiple machines, and you’re able to scale not only storage but also memory and compute—thus providing better performance. Without using something like Citus to transform PostgreSQL into a distributed database, sure you can add read replicas to scale, but you’re still maintaining a single copy of your data. When you run into scaling issues with your Postgres database, adding a read replica and offloading some of your traffic to your read replica is a common bandaid to slow down the bleeding, but it is only a matter of time until even that doesn’t work any further. Whereas with Citus, scaling out your database is as simple as dragging a slider and rebalancing your data.

Are read replicas still useful with horizontally scalable databases?

But that leaves a question, are read-replicas still useful? Well, sure they are.

In Citus Cloud (our fully-managed database as a service), we have support for read replicas, in our case known as followers. Follower clusters leverage much of our same underlying disaster recovery infrastructure that forks leverage, but support a very different set of use cases.

Previously we talked about how forks can be used in Citus Cloud to get a production set of data over to staging that can help with testing migrations, rebalancing, or SQL query optimization. Forks are often used as a one-off for a short period of time. In contrast, followers are often long running Citus database clusters that can be a key mechanism to run your business, helping you get insights when you need them.

Follower cluster is often only a few seconds (if any) behind your primary database cluster

A follower cluster receives all updates from the primary Citus cluster in an asynchronous fashion. Often followers are only a few seconds (if any) behind your primary database cluster, though can lag at times by a few minutes. Followers have a full copy of your data, but reside on a separate cluster.

[...]

Better correlation helps in real life too

After you ANALYZE a PostgreSQL table to collect value distribution statistics, you will find the gathered statistics for each column in the pg_stats system view. This article will explain the meaning of the correlation column and its impact on index scans.

Physical vs. logical ordering

Most common PostgreSQL data types have an ordering: they support the operators <, <=, =, >= and >.
Such data types can be used with a B-tree index (the “standard” index type).

The values in a column of such a type provide a logical ordering of the table rows. An index on this column will be sorted according to that ordering.

A PostgreSQL table consists of one or more files of 8KB blocks. The order in which the rows are stored in the file is the physical ordering.
You can examine the physical ordering of the rows by selecting the ctid system column: it contains the block number and the item number inside the block, which describe the physical location of the table row.

Correlation

The correlation for a column is a value between -1 and 1. It tells how good the match between logical and physical ordering is.

  • If the correlation is 1, the rows are stored in the table file in ascending column order; if it is -1, they are stored in descending order.
  • Values between -1 and 1 mean a less perfect match.
  • A value of 0 means that there is no connection between the physical and the logical order.

Why should I care?

You will create indexes on your tables for faster access (but not too many!).
The correlation of a column has an impact on the performance of an index scan.

During an index scan, the whole index or part of it are read in index sequential order. For each entry that is found, the corresponding row is fetched from the table (this is skipped in an “index only scan”, but that is a different story).

If the correlation of the indexed column is close to zero, the fetched rows will be from all over the table. This will result in many randomly distributed reads of many different table blocks.

However, if the correlation is close to 1 or -1,

[...]

Graphs are important, as they are your window onto your monitored systems. ClusterControl comes with a predefined set of graphs for you to analyze, these are built on top of the metric sampling done by the controller. Those are designed to give you, at first glance, as much information as possible about the state of your database cluster. You might have your own set of metrics you’d like to monitor though. Therefore ClusterControl allows you to customize the graphs available in the cluster overview section and in the Nodes -> DB Performance tab. Multiple metrics can be overlaid on the same graph.

Cluster Overview tab

Let’s take a look at the cluster overview - it shows the most important information aggregated under different tabs.

Cluster Overview Graphs
Cluster Overview Graphs

You can see graphs like “Cluster Load” and “Galera - Flow Ctrl” along with couple of others. If this is not enough for you, you can click on “Dash Settings” and then pick “Create Board” option. From there, you can also manage existing graphs - you can edit a graph by double-clicking on it, you can also delete it from the tab list.

Dashboard Settings
Dashboard Settings

When you decide to create a new graph, you’ll be presented with an option to pick metrics that you’d like to monitor. Let’s assume we are interested in monitoring temporary objects - tables, files and tables on disk. We just need to pick all three metrics we want to follow and add them to our new graph.

New Board 1
New Board 1

Next, pick some name for our new graph and pick a scale. Most of the time you want scale to be linear but in some rare cases, like when you mix metrics containing large and small values, you may want to use logarithmic scale instead.

New Board 2
New Board 2

Finally, you can pick if your template should be presented as a default graph. If you tick this option, this is the graph you will see by default when you enter the “Overview” tab.

Once we save the new graph, you can enjoy the result:

New Board 3
New Board 3

Node Overview tab

In addition to the graphs on our cluster, we can also use this functionality on each of o

[...]

The PostgreSQL 11 release is nearly here (maybe in the next couple of weeks?!), and while a lot of the focus will be on the improvements to the overall performance of the system (and rightly so!), it's important to notice some features that when used appropriately, will provide noticeable performance improvements to your applications.

One example of such feature is the introduction of "covering indexes" for B-tree indexes. A covering index allows a user to perform an index-only scanif the select list in the query matches the columns that are included in the index. You can specify the additional columns for the index using the "INCLUDE" keyword, e.g.

CREATE INDEX a_b_idx ON x (a,b) INCLUDE (c);

Theoretically, this can reduce the amount of I/O your query needs to use in order to retrieve information (traditionally, I/O is the biggest bottleneck on database systems). Additionally, the data types including in a covering index do not need to be B-tree indexable; you can add any data type to the INCLUDE part of a CREATE INDEX statement.

However, you still need to be careful for how you deploy covering indexes: each column you add to the index still takes up space on disk, and there is still a cost for maintaining the index, for examples, on row updates.

Understanding these trade offs, you can still apply covering indexes in very helpful ways that can significantly help your applications.

A Simple Example: Tracking Coffee Shop Visits

A follow-up to the previous post where the performance of queries with many joins is investigated.

Great discussion on hacker news and r/programming brought up a couple ideas I hadn't considered.

  1. What about enums?
  2. What about tables with more columns?

I also figured it would be neat if other people could run these benchmarks with their own parameters / hardware.

So I adjusted my script to support enums and wider tables, and packaged it up into a tool anyone can use. It supports three different join types: enum, foreign keys, and what I'm calling "chained". The benchmark definitions are described in a json file which looks like this:

$ cat input.json
[
    {
        "join-type": "chained",
        "max-tables": 10,  # Queries will start by joining 2 tables, increasing by one until all tables are joined.  Number of tables joined will be the X axis on the plot.
        "max-rows": 10000,  # Benchmarks will be performed at 10 rows, 100 rows, etc. until max-rows is reached, creating a separate line on the plot for each.
        "extra_columns": 2,
        "max_id": 5,
        "create-indexes": true,
        "output-filename": "benchmark_1",
        "plot-title": "My Chained Benchmark Title"
    },
    {
        "join-type": "enums",
        "max-rows": 10000,  # Benchmarks will be performed at 10 rows in the primary table, increasing by a factor of 10 until max-rows is reached
        "max-enums": 100,  # Queries will start by selecting (and optionally filtering by) 1 enum column, increasing by one until max-enums is reached
        "possible-enum-values": 10,
        "extra-columns": 2,
        "where-clause": true,
        "output-filename": "benchmark_1",
        "plot-title": "My Enum Benchmark Title"
    },
    {
        "join-type": "foreign-keys",
        "max-primary-table-rows": 10000,  # Benchmarks will be performed at 10 rows in the primary table, increasing by a factor of 10 until max-rows is reached
        "max-fk-tables": 100,  # Queries will start by selecting from (and optionally filtering by) 1 fo
[...]
Posted by Regina Obe in PostGIS on 2018-09-16 at 00:00

The PostGIS development team is pleased to release PostGIS 2.5.0rc2.

Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11beta3+ and GEOS 3.7.0 which were released recently.

Best served with PostgreSQL 11beta3.

2.5.0rc2

Changes since PostGIS 2.5.0rc1 release are as follows:

  • 4162, ST_DWithin documentation examples for storing geometry and radius in table (Darafei Praliaskouski, github user Boscop).
  • 4163, MVT: Fix resource leak when the first geometry is NULL (Raúl Marín)
  • 4172, Fix memory leak in lwgeom_offsetcurve (Raúl Marín)
  • 4164, Parse error on incorrectly nested GeoJSON input (Paul Ramsey)
  • 4176, ST_Intersects supports GEOMETRYCOLLECTION (Darafei Praliaskouski)
  • 4177, Postgres 12 disallows variable length arrays in C (Laurenz Albe)
  • 4160, Use qualified names in topology extension install (Raúl Marín)
  • 4180, installed liblwgeom includes sometimes getting used instead of source ones (Regina Obe)

View all closed tickets for 2.5.0.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

Posted by Bruce Momjian in EnterpriseDB on 2018-09-14 at 18:30

Now that I have given a presentation about Postgres 11 features in New York City, I have made my slides available online.

Enterprise PostgreSQL built with open source tools

PostgreSQL® logoPlease join Percona’s PostgreSQL Support Technical Lead,  Avinash Vallarapu; Senior Support Engineer, Fernando Laudares; and Senior Support Engineer, Jobin Augustine, on Wednesday, October 10th, 2018 at 7:00 AM PDT (UTC-7) / 10:00 AM EDT (UTC-4), as they demonstrate an enterprise-grade PostgreSQL® environment built using a combination of open source tools and extensions.

“We built our application on top of PostgreSQL. It works great but only now that we took it to the market and it became a hit we realize how much it relies on the database. How can we “harden” PostgreSQL? How can we make the solution we built around PostgreSQL enterprise-grade?”

“I migrated from a proprietary database software to PostgreSQL. I am curious to know whether I can get the same features I used to have in the proprietary database software.”

You’ll find the answer to these questions and more in a series of blog posts we will be publishing on this topic, which will be followed by a live demo we planned for our webinar on October 10th, 2018.

The market coined the term “enterprise grade” or “enterprise ready” to differentiate products and service offerings for licensed database software. For example: there may be a standard database software or an entry-level package that delivers the core functionality and basic features. Likewise, there may be an enterprise version, a more advanced package which goes beyond the essentials to include features and tools indispensable for running critical solutions in production. With such a differentiation found in commercial software, we may wonder whether a solution built on top of an open source database like PostgreSQL can satisfy all the enterprise requirements.

It starts with building a secured PostgreSQL environment, tuning the database for the production workload, building a high availability strategy that avoids single-point-of-failures, scaling PostgreSQL using connection poolers to avoid excessive usage of server resources, and finally load balancing the reads between master and all

[...]

Introduction

Nowadays, high availability is a requirement for many systems, no matter what technology we use. This is especially important for databases, as they store data that applications rely upon. There are different ways to replicate data across multiple servers, and failover traffic when e.g. a primary server stops responding.

Architecture

There are several architectures for PostgreSQL high availability, but the basic ones would be master-slave and master-master architectures.

Master-Slave

This may be the most basic HA architecture we can setup, and often times, the more easy to set and maintain. It is based on one master database with one or more standby servers. These standby databases will remain synchronized (or almost synchronized) with the master, depending on whether the replication is synchronous or asynchronous. If the main server fails, the standby contains almost all of the data of the main server, and can quickly be turned into the new master database server.

We can have two categories of standby databases, based on the nature of the replication:

  • Logical standbys - The replication between the master and the slaves is made via SQL statements.
  • Physical standbys - The replication between the master and the slaves is made via the internal data structure modifications.

In the case of PostgreSQL, a stream of write-ahead log (WAL) records is used to keep the standby databases synchronized. This can be synchronous or asynchronous, and the entire database server is replicated.

From version 10, PostgreSQL includes a built in option to setup logical replication which is based on constructing a stream of logical data modifications from the information in the WAL. This replication method allows the data changes from individual tables to be replicated without the need of designating a master server. It also allows data to flow in multiple directions.

But a master-slave setup is not enough to effectively ensure high availability, as we also need to handle failures. To handle failures, we need to be able t

[...]

Database security is an increasingly critical topic for any business handling personal data. Data breach can have serious ramifications for an organization, especially if the proper security protocols are not in place.

There are many ways to harden your database. As an example PostgreSQL addresses security using firewalls, encryption and authentication levels among other ways.

2ndQuadrant hosted a webinar on Database Security in PostgreSQL to highlight security concepts, features and architecture. The webinar was presented by Kirk Roybal, Principal Consultant at 2ndQuadrant – the recording is now available here.

Some of the questions that Kirk responded to are listed below:

Q1: What are your thoughts on performance of row-level security vs. doing that filtering via WHERE at the application level and how that affects development? I.E. now that you’re filtering via DB capabilities you lose the visibility of that being done at the application level – it becomes a bit of a black box of “it just works” for the development team.

A1: The PostgreSQL query parser is involved in evaluating the constraint either way. Since this is mostly dependent on PostgreSQL, there will be very little or no measurable difference in performance. Putting the security in the database has the advantage of being modifiable without changes to the application layer.

Q2: Do you have any suggestions for encrypting data at rest?

A2: PostgreSQL provides pgcrypto as an extension. PostgreSQL also allows you to create your own datatypes, operators and aggregates. Put the two together and you have encryption at rest.

Q3: Is it possible to configure Azure AD authentication too?

A3: Yes, if you create a bare Linux machine, you can configure anything you want.

Q4: Do you support performance tuning on AWS RDS Postgres?

A4: Yes, we do provide the Performance Tuning service for RDS. Because of the closed nature of the system, however, there might be some advanced settings that we won’t be able to tune.

Q5: What are the main differences between the PostgreS

[...]

The slides from my presentation, Securing PostgreSQL at PDXPUG PostgreSQL Day 2018 are now available.

Posted by Joe Abbate on 2018-09-13 at 01:55

Over two years ago, I lamented that this blog had remained silent for too long and about the lack of development activity on the Pyrseas project. I also announced immediate and longer term plans. Pyrseas 0.8 was finally released before the end of last year.

From time we get interest in Pyrseas, sometimes from people who use it in unexpected ways. However, my own interest in developing it further and actual commits have been declining considerably. I probably spend less than four hours a week on it. Therefore, I’ve decided to put the project on life support. I will attempt to notify owners of open issues via GitHub as to this status. I may continue to work on some issues or enhancements but on an even more reduced scale. If requested, I’ll make maintenance releases as needed. Should any enterprising developer want to take it over, I’ll gladly consider handing the reins.

Thanks to everyone who has contributed, inquired or shown interest in Pyrseas over the past eight years. In particular, I’d like to extend my gratitude to Daniele Varrazzo (not only for his contributions to Pyrseas but also for his work on Psycopg2), Roger Hunwicks and Josep Martínez Vila.

Posted by Bruce Momjian in EnterpriseDB on 2018-09-12 at 17:00

You have probably looked at logical dumps as supported by pg_dump and restores by pg_restore or, more simply, psql. What you might not have realized are the many options for dumping and restoring when multiple computers are involved.

The most simple case is dumping and restoring on the same server:

$ pg_dump -h localhost -Fc test > /home/postgres/dump.sql
$ pg_restore -h localhost test < /home/postgres/dump.sql

Continue Reading »

Announcing TimescaleDB 1.0: The first enterprise-ready time-series database to support full SQL and scale

Over 1M downloads; production deployments at Comcast, Bloomberg, Cray, and more; native Grafana integration; first-class Prometheus support; and dozens of new features signify positive momentum for TimescaleDB and the future of the time-series market

Today, we are excited to officially announce the first release candidate for TimescaleDB 1.0.

If you work in the software industry, you already know that 1.0 announcements generally signify that your product is “production-ready.”

Ironically, just last week we got this question on Twitter from a TimescaleDB user, who founded a weather mapping company:

Yes, our 1.0 release is a little overdue as we’ve actually been production-ready for quite some time now.

Today, just a year and a half after our launch in April 2017, businesses large and small all over the world trust TimescaleDB for powering mission-critical applications including industrial data analysis, complex monitoring systems, operational data warehousing, financial risk management, geospatial asset tracking, and more.

“At Bloomberg, we have millions of data feeds and trillions of data points dating back over 100 years. My team and I have been extremely pleased with TimescaleDB’s capability to accommodate our workload while simplifying geo-financial analytics and data visualization. If you are looking to support large scale time-series datasets, then TimescaleDB is a good fit.”
Erik Anderson, Lead Software Engineer at Bloomberg

From 0 to over 1 million downloads in less than 18 months

Since our launch, we’ve experienced some significant momentum:

[...]