PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Enterprise-PostgreSQL-q-and-a

PostgreSQL logoHello everyone, and thank you to those that attended our webinar on Building an Enterprise-grade PostgreSQL setup using open source tools last Wednesday. You’ll find the recordings of such as well as the slides we have used during our presentation here.

We had over forty questions during the webinar but were only able to tackle a handful during the time available, so most remained unanswered. We address the remaining ones below, and they have been grouped in categories for better organization. Thank you for sending them over! We have merged related questions and kept some of our answers concise, but please leave us a comment if you would like to see a particular point addressed further.

Backups

Q: In our experience, pg_basebackup with compression is slow due to single-thread gzip compression. How to speed up online compressed full backup?

Single-thread operation is indeed a limitation of pg_basebackup, and this is not limited to compression only. pgBackRest is an interesting alternative tool in this regard as it does have support for parallel processing.

Q: Usually one setup database backup on primary DB in a HA setup. Is it possible to automatically activate backup on new primary DB after Patroni failover? (or other HA solutions)

Yes. This can be done transparently by pointing your backup system to the “master-role” port in the HAProxy instead – or to the “replica-role” port; in fact, it’s more common to use standby replicas as the backup source.

Q: Do backups and WAL backups work with third party backup managers like NetBackup for example?

Yes, as usual it depends on how good the vendor support is. NetBackup supports PostgreSQL, and so does Zmanda to mention another one.

Security and auditing

Q: Do you know a TDE solution for PostgreSQL? Can you talk a little bit about the encryption at rest solution for Postgres PCI/PII applications from Percona standpoint.

At this point PostgreSQL does not provide a native Transparent Data Encryption (TDE) functionality, relying instead in the underlying file system for data

[...]

In a disaster recovery plan, your Recovery Point Objective (RPO) is a key recovery parameter that dictates how much data you can afford to lose. RPO is listed in time, from seconds to days. Effectively, RPO is directly dependent on your backup system. It marks the age of your backup data that you must recover in order to resume normal operations.

If you do a nightly backup at 10 p.m. and your database system crashes beyond repair at 3 p.m. the following day, you lose everything that was changed since your last backup. Your RPO in this particular context is the previous day's backup, which means you can afford to lose one day’s worth of changes.

The below diagram from our Whitepaper on Disaster Recovery illustrates the concept.

For tighter RPO, a backup might not be enough though. When backing up your database, you are actually taking a snapshot of the data at a given moment. So when you are restoring a backup, you will miss the changes that happened between the last backup and the failure.

This is where the concept of Point In Time Recovery (PITR) comes in.

What is PITR?

Point In Time Recovery (PITR), as the name states, involves restoring the database at any given moment in the past. For being able to do this, we will need to restore a backup, and then apply all the changes that happened after the backup until rightbefore the failure.

For PostgreSQL, the changes are stored in the WAL logs (for more details about WALs and the data they store, you can check out this blog).

So there are two things we need to ensure for being able to perform a PITR: The backups and the WALs (we need to setup continuous archiving for them).

For performing the PITR, we will need to recover the backup and th

[...]
Posted by Bruce Momjian in EnterpriseDB on 2018-10-19 at 13:30

People new to the Postgres community are often confused by the deliberateness of the community on some issues, and the willingness to break backward compatibility in other areas. The source code was created 32 years ago, and many of us have been involved with Postgres for 10-20 years. With that length of involvement, we are acutely aware of the long-term consequences of our decisions. Many proprietary and open source developers don't think years in advance, but with Postgres, it is our normal behavior.

This leads to some unusual Postgres behaviors:

  • Weekly source code comment improvements
  • Yearly source code reformatting
  • Exhaustive discussions about even minor changes, particularly user-visible ones
  • API redesigns

This focus on detail often strikes new users as unusual, but it makes sense when a multi-year view is considered.

Posted by Regina Obe in PostGIS on 2018-10-19 at 07:41
Updated October 19th, 2018 64-bit package for PostgreSQL 11 http extension v1.2.4 release

For those folks on windows who want to do http gets and posts directly from your PostgreSQL server, we've made binaries for the http extension v1.2.2 release for PostgreSQL 10, 9.6, 9.5, and 9.4 Windows 64-bit.

These should work fine on both PostgreSQL EDB windows distributions as well as the BigSQL windows 64 distributions. Instructions in the ReadMe.txt for where to copy the files to for each distribution. Caution should be taken especially with BigSQL to not overwrite existing BigSQL files. Many of the files in this already exist in the BigSQL distribution since we both compile using the Mingw64 chain.

If you have PostGIS already installed, many of these files you will also already have since things like the libcurl and PCRE are also packaged with PostGIS.

Continue reading "http extension for windows 64 and 32-bit"
After 1 year of hard work by Pgpool-II developers,  Pgpool-II 4.0 is finally out.
This version celebrates the 15th anniversary of Pgpool-II, and introduces rich new features. I will explain them one by one in a series of blogs.

Support for SCRAM authentication

SCRAM authentication is a modern, secure authentication method supported by PostgreSQL 10 or later. Pgpool-II 4.0 supports the method by providing multiple new functionalities.

  • pool_hba.conf now has "scram-sha-256" authentication method.
  • If this method is specified, users connecting to Pgpool-II are required  to response in proper way of SCRAM authentication.
  • Passwords stored in pool_passwd can be used to verify user's password before a session starts.
  • Passwords can be store in pool_passwd in multiple ways: clear text or AES256 encrypted (md5 hashed passwords can also be stored in pool_passwd but in this case SCRAM cannot use the password).
  • AES256 encryption is preferred and the key to decrypt it is stored in .pgpoolkey under the user's home directory who started Pgpool-II, or a file specified by PGPOOLKEYFILE environment variable.
  • For AES256 encryption, new tool pg_enc can be used.
Of course, Pgpool-II can authenticate itself in SCRAM while connecting to PostgreSQL, if it requires SCRAM authentication to Pgpool-II.

Good thing with Pgpool-II 4.0 is, it allows users to choose different authentication method in users vs. Pgpool-II  and Pgpool-II vs. PostgreSQL. For example, using SCRAM between users and Pgpool-II, while using md5 authentication method between Pgpool-II and PostgreSQL.

SCRAM authentication can also be used for health check and replication delay check.  Before 4.0, user name and password for them are stored in clear text format in pgpool.conf. This is not the most secure way. Pgpool-II 4.0 allows to store AES 256 encrypted passwords in pgpool.conf. .pgpookey is also used for decrypting these passwords.

Next time I will explain about other features of authentication method in Pgpool-II 4.0.
Earlier today (Oct 18, 2018), the PostgreSQL community announced the release of PostgreSQL 11.  Having done multiple software releases earlier, I appreciate the hard work by all contributors to get yet another major release on schedule. It is hard to do a major release every year and the community has been doing it since PostgreSQL 8.4 making this the 10th  release in the last decade. 

Everybody has their favorite feature in PostgreSQL 11 and I have one that is top on my list which is the transactional support in stored procedures2nd Quadrant had first announced that feature end of last year and at that time, it instantly became my favorite as I see it as a giant leap in PostgreSQL as it allows people to essentially write long data routines like ETL broken down in multiple transactions. Of course many users will certainly enjoy the improvements in  table partitioning system, query parallelism, and just-in-time (JIT) compilation for accelerating the execution of expressions in queries. However, the developers will certainly get more freedom with the stored procedure improvements.

With the release of PostgreSQL 11, now there are 6 major releases supported: PostgreSQL 9.3, 9.4, 9.5, 9.6, 10 and, 11. It is definitely a good time to start thinking to upgrade your PostgreSQL 9.3 databases. As per the versioning policy, the final minor release for PostgreSQL 9.3 will be on November 8th, 2018.  PostgreSQL 9.3 will be the last major version which does not support logical replication which was first introduced in PostgreSQL 9.4.  Hence, I expect this is will be the last painful upgrade because PostgreSQL 9.4 onwards you can always leverage logical replication to minimize the downtime while switching to a new version. All is not lost for PostgreSQL 9.3, while the experience is not exactly the same there are still tools available using the older trigger based replication to help or just bite the bullet and upgrade once with a small maintenance window as later versions will give you more options for your next major vers
[...]
slonik_with_black_text

PostgreSQL logoYou may be aware that the new major version of PostgreSQL has been released today. PostgreSQL 11 is going to be one of the most vibrant releases in recent times. It incorporates many features found in proprietary, industry-leading database systems, further qualifying PostgreSQL as a strong open source alternative.

Without further ado, let’s have a look at some killer features in this new release.

Just In Time (JIT) Compilation of SQL Statements

This is a cutting edge feature in PostgreSQL: SQL statements can get compiled into native code for execution. It’s well know how much Google V8 JIT revolutionized the JavaScript language. JIT in PostgreSQL 11 supports accelerating two important factors—expression evaluation and tuple deforming during query execution—and helps CPU bound queries perform faster. Hopefully this is a new era in the SQL world.

Parallel B-tree Index build

This could be the most sought after feature by DBAs, especially those migrating large databases from other database systems to PostgreSQL. Gone are the days when a lot of time was spent on building indexes during data migration. Index maintenance (rebuild) for very large tables can now make an effective use of multiple cores in the server by parallelizing the operation, taking considerably less time to complete.

Lightweight and super fast ALTER TABLE for NOT NULL column with DEFAULT values

In the process of continuous enhancement and adding new features, we see several application developments that involve schema changes to the database. Most such changes include adding new columns to a table. This can be a nightmare if a new column needs to be added to a large table with a default value and a NOT NULL constraint. This is because an ALTER statement can hold a write lock on the table for a long period. It can also involve excessive IO due to table rewrite. PostgreSQL 11 addresses this issue by ensuring that the column addition with a default value and a NOT NULL constraint avoids a table rewrite.  

Stored procedures with transaction control

Post

[...]

In PostgreSQL version 10 or less, if you add a new column to a table without specifying a default value then no change is made to the actual values stored. Any existing row will just fill in a NULL for that column. But if you specify a default value, the entire table gets rewritten with the default value filled in on every row. That rewriting behavior changes in PostgreSQL 11.

In a new feature I worked on and committed, the default value is just stored in the catalog and used where needed in rows existing at the time the change was made. New rows, and new versions of existing rows, are written with the default value in place, as happens now. Any row that doesn’t have that column must have existed before the table change was made, and uses this value stored in the catalog when the row is fetched. The great advantage of this is that adding a column with a default value is now quite a fast and cheap operation, whereas before for very large tables it has been horribly, often intolerably slow. Rewriting a whole multi-terabyte table is really something you want to avoid.

The default value doesn’t have to be a static expression . It can be any non-volatile expression, e.g. CURRENT_TIMESTAMP. Volatile expressions such as random() will still result in table rewrites. For a non-volatile expression, it is evaluated at the time the statement runs and the result is what is stored in the catalog for use with existing rows. Of course, for new rows the expression will be re-evaluated at the time the row is created, as happens now.

Any time that the table does get rewritten, say by VACUUM FULL, all this is cleaned up, as there will be no rows left needing the value from the catalog. Likewise, any time that a row is updated the column’s value is filled in on the new row version, so over time the proportion of rows using the value fetched from the catalog will decrease.

Posted by Andrea Cucciniello in 2ndQuadrant on 2018-10-18 at 12:34

After the success of last year’s event, the second PGDay held in Australia, we’re back this year with PGDay Down Under. The name “Down Under” refers to Australia and New Zealand, due to the fact these countries are located in the lower latitudes of the southern hemisphere.

The conference is a one-day community event organized by the newborn PostgreSQL Down Under Incorporated (also known as PGDU), a not-for-profit association established to support the growth and learning of PostgreSQL, the world’s most advanced open source database, in Australia and New Zealand.

PGDay Down Under aims to satisfy a large audience of PostgreSQL users and enthusiasts by selecting a wide range of talks and presentations that are of interest to:

  • Database administrators that are already using PostgreSQL, or use a different database and are interested in learning about PostgreSQL
  • Team leaders, architects and decision makers considering PostgreSQL as an alternative database to more traditional proprietary databases
  • Developers that have data persistence requirements or are just wanting to expand their knowledge
  • IT enthusiasts who enjoy getting involved in the open source community

Don’t wait until last minute, early bird tickets are now available, you can register for PGDay Down Under here. To complete the registration log in with your community account or preferred third party platform.

The Call for Papers is open until October 21, at midnight (AEDT). We have already received many submissions and we hope to get some more. To submit your proposal for a presentation please visit 2018.pgdu.org and click on Call For Papers on the top right corner.

This event would not be possible without the generous sponsorship and support from many different organisations, such as 2ndQuadrant. Any proceeds from the conference will be used to organize more community events in other locations, creating a virtuous cycle to better support the growth of the PostgreSQL community in the region.

If you are interested in sponsoring and/or volunteering at the event

[...]
Posted by Bruce Momjian in EnterpriseDB on 2018-10-17 at 14:45

As you might know, Postgres is old-school in its communication methods, relying on email for the bulk of its development discussion, bug reports, and general assistance. That's the way it was done in 1996, and we are still doing it today. Of course, some projects have moved on to github or Slack.

As a project, we try to take advantage of new technologies while retaining technologies that are still optimal, and email is arguably one of them. Email hasn't changed much since 1996, but email clients have. Previously all email was composed using a text-editor-like interface, which allowed for complex formatting and editing. New email tools, like Gmail, provide a more simplified interface, especially on mobile devices. This simplified interface is great for composing emails while commuting on a train, but less than ideal for communicating complex topics to thousands of people.

This email captures the requirements of communicating technical points to thousands of people, and the need for fine-grained composition. Basically, when you are emailing thousands of people, taking time to make the communication clear is worthwhile. However, this email explains the reality that many email tools are more tailored for effortless communication on devices with limited features.

Continue Reading »

Posted by Craig Kerstiens in CitusData on 2018-10-17 at 14:36

At Citus whether it’s looking at our own data or helping a customer debug a query I end up writing a lot of SQL. When I do write SQL I do my best to make sure it’s readable in case others need to come along and understand or modify, but admittedly I do have some bad habits from time to time such as using implicit joins. Regardless of my bad habits I still try to make my SQL and database as easy to understand for someone not already familiar with it. One of the biggest tools for that is comments.

Even early on in learning to program we take advantage of comments to explain and describe what our code is doing, even in times when it seems obvious. I see this less commonly in SQL and databases, which is a shame because data is just as valuable so making it easier to reason and work with seems logical. Postgres has a few great mechanisms you can start leveraging when it comes to commenting so you can better document things.

Inline commenting your queries

The place I most use comments is in larger queries I write. Yes, [common table expressions] can be an optimization fence, but they also allow you to create building blocks within your SQL making a query easier to understand. Though you shouldn’t stop there. Within SQL you can have a line comment when you preface it with --. You can do this at the start of a line, or at the end with what follows becoming a comment. We can see this in action on our query example from our earlier CTE blog post:

-- Getting a list of all opportunities opened longer than 30 days ago, but earlier than 60 days ago
WITH opp_list AS (
  SELECT opportunities.id as opportunity_id,
         account_id,
         accounts.name as account_name,
         opportunities.amount as opportunity_amount,
         opportunities.created_at as opportunity_created
  FROM opportunities,
       accounts
  WHERE opportunities.created_at <= now() - '30 days'::interval
    AND opportunities.created_at >= now() - '60 days'::interval
    AND opportunities.account_id = accounts.id
),

-- Get a list of all contacts we
[...]
PostgreSQL row level locks

row signing with postgresqlAn understanding of PostgreSQL locking is important to build scalable applications and avoid downtime. Modern computers and servers have many CPU cores and it’s possible to execute multiple queries in parallel. Databases containing many consistent structures with changes made by queries or background processes running in parallel could crash a database or even corrupt data. Thus we need the ability to prevent access from concurrent processes, while changing shared memory structures or rows. One thread updates the structure while all others wait (exclusive lock), or multiple threads read the structure and all writes wait. The side effect of waits is a locking contention and server resources waste. Thus it’s important to understand why waits happen and what locks are involved. In this article, I review PostgreSQL row level locking.

In follow up posts, I will investigate table-level locks and latches protecting internal database structures.

Row locks – an overview

PostgreSQL has many locks at different abstraction levels. The most important locks for applications are related to MVCC implementation – row level locking. In second place – locks appearing during maintenance tasks (during backups/database migrations schema changes) – table level locking. It’s also possible—but rare—to see waits on low level PostgreSQL locks. More often there is a high CPU usage, with many concurrent queries running, but overall server performance reduced in comparison with normal number of queries running in parallel.

Example environment

To follow along, you need a PostgreSQL server with a single-column table containing several rows:

postgres=# CREATE TABLE locktest (c INT);
CREATE TABLE
postgres=# INSERT INTO locktest VALUES (1), (2);
INSERT 0 2

Row locks

Scenario: two concurrent transactions are trying to select a row for update.

PostgreSQL uses row-level locking in this case. Row level locking is tightly integrated with MVCC implementation, and uses hidden xmin and xmax fields.

xmin
 and
xmax
 store the transaction id. All statement[...]

After a recent chat with a DBA, making his first steps towards migrating a big database away from Oracle to Postgres, I thought there must be just too few articles on PostgreSQL features and “hacks”, aiding in achieving scalability to squeeze the last out of the hardware to safely accomodate some decent multi-terabyte size databases. Especially as there are quite a few options out there, I was very surprised that there was so much fear that Postgres is somehow very limited in the scaling matters. Well maybe it was indeed once so (I started with Postgres in 2011), but in year 2018 things are pretty solid actually – so please do read on for some ideas how to juggle terabytes with ease.

Standard Postgres facilities

If you don’t like to sweat too much and do some pioneering then the safest way to scale of course would be to stick with proven out-of-the-box features of Postgres – so first I’d recommend to take a look at the following keywords with some short explanations and maybe it’s all that you need.

  • Light-weight / special purpose indexes

For a complex OLTP system, supporting hundreds of freaky queries, it is very common that the indexes actually take much more disk space than the table files holding the data. To improve on that (especially for indexes that are used infrequently) one can reduce the index sizes drastically with appropriate use of partial, BRIN, GIN or even a bit experimental BLOOM indexes. In total there are 7 different index types supported…but mostly people only know about and use the default B-tree – a big mistake in a multi-TB setting!

Partial indexes allow indexing only a subset of the data – for example in a sales system we might not be interested in fast access to orders in status “FINISHED” (some nightly reports deal with that usually and they can take their time), so why should we index such rows?

GIN, the most know non-default index type perhaps, has been actually around for ages (full-text search) and in short is perfect for indexing columns where there are lot of repeating values – th

[...]

It would be rather nice if the exact SQL definition (well almost, see below) supplied with the “create view” statement could be held in the PostgreSQL System Catalog, rather than the re-written optimised version that is currently held. This has come up before, see:

http://www.postgresql-archive.org/idea-storing-view-source-in-system-catalogs-td1987401.html
http://www.postgresql-archive.org/Preserving-the-source-code-of-views-td5775163.html

Back then this idea got stomped on for good reasons but I am not suggesting quite the same.

We would prefer PostgreSQL to store view definitions in the system catalog *AFTER* column expansion (unlike in the discussion above) but *BEFORE* internal parser rewrites. So, currently for example:

CREATE TABLE bar ( x INTEGER, y TEXT );

CREATE VIEW foo(x, y) AS
SELECT * FROM bar WHERE y IN ( ‘a’, ‘b’ ) AND x IS NOT DISTINCT FROM 0;

SELECT view_definition FROM information_schema.views
WHERE table_name = ‘foo’;

SELECT bar.x, bar.y
FROM bar
WHERE ((bar.y = ANY (ARRAY[‘a’::text, ‘b’::text])) AND (NOT (bar.x IS DISTINCT FROM 0)));
(1 row)

The column expansion and corresponding rewrite is mandatory at the time of creation (see below), but the other changes in the definition are not. They make comparing the current definition of a view as it is in a database against an externally maintained definition difficult, which is what we in particular want to be able to do.

The SQL standard (at least the ancient version I have access to) for “information_schema” seems rather open to interpretation on this issue. It says that the view definition column:

“contains a representation of the view descriptors”

What do we take “representation” to be – presumably a completely mangled internally optimised “blob” would be compliant (but otherwise useless). The spirit of the statement might be more reasonably taken to be “a human legible representation”. In that case how much rewrite should actually be allowed – to me the rewritten definition above is already getting less legible than the original, but may

[...]
Posted by Bruce Momjian in EnterpriseDB on 2018-10-15 at 14:45

In the early years of Postgres's open source development, we focused on features that could be accomplished in a few weekends. Within a few years, we had completed many of those, and were challenged trying to accomplish big project with a mostly volunteer workforce. Soon, however, large companies started to sponsor developers' time and we launched into big feature development again.

Currently, the calendar is our only real challenge. We have major releases every year, but many features take multiple years to fully implement. We have seen this multi-year process with:

  • Windows port
  • Point-in-time recovery
  • JSON
  • Streaming replication

and are in the process of completing even more:

  • Parallelism
  • Partitioning
  • JIT
  • Sharding

Continue Reading »

PgBouncer monitoring improvements in recent versions

As I wrote in my previous article “USE, RED and real world PgBouncer monitoring” there are some nice commands in PgBouncer’s admin interface that allow to collect stats how things going and spot problems, if you know where to look.

This post is about new stats added in these commands in new PgBouncer versions.

So as you know, SHOW STATS shows cumulative stats for each proxied DB:

Since PgBouncer version 1.8 there’s a couple of new columns in its output.

First one — total_xact_time — total number of microseconds spent by pgbouncer when connected to PostgreSQL in a transaction, either idle in transaction or executing queries.

This will allow us to chart db pool utilization in terms of time spent in transactions and compare it to the query time utilization:

We see two totally different situations — while database is used for serving queries only 5 to 25 % of the time, PgBouncer connections around 8:00 am spend up to 70% of time in transactions!

But this total_xact_time is useful in one more very important way.

There’s a known anti-pattern in Postgres usage, is when your application opens up a transaction, makes a query and then starts doing something else, for example some CPU-heavy calculation on that result or query to some other resource/service/database, while transaction keeps hangging. Later this app will probably return back to this transaction and might, for example, update something and commit it. The bad thing in that case is that there’s a corresponding Postgres backend process, that sits there doing nothing, while transaction is idling. And Postgres backends are somewhat expensive.

Your app should avoid such behavior.

This idle in transaction state might be monitored in the Postgres itself — there’s state column in pg_stat_activity system view. But pg_stat_activity provides us only with a snapshot of current states, that leads to possible false negative errors in reporting occurrences of such cases. Using PgBouncer's stats we can calculate a percenta

[...]
Posted by pgCMH - Columbus, OH on 2018-10-15 at 04:00

The Oct meeting will be held at 18:00 EST on Tues, the 23rd. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.

What

Our very own Douglas will be presenting this month. He’s going to tell us all about how you can join the PostgreSQL community and contribute the growth and success of PostgreSQL.

Where

CoverMyMeds has graciously agreed to validate your parking if you use their garage so please park there:

You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive.

Park in any space that is not marked ‘24 hour reserved’.

Once parked, take the elevator/stairs to the 3rd floor to reach the Miranova lobby. Once in the lobby, the elevator bank is in the back (West side) of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11th floor. (If the elevator won’t let you pick the 11th floor, contact Doug or CJ (info below)). Once you exit the elevator, look to your left and right; one side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space:

Community space as seen from the stage

The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079 or CJ at +1.740.407.7043

Posted by Bruce Momjian in EnterpriseDB on 2018-10-12 at 15:00

PL/v8 is the JavaScript server-side language for Postgres. It has been available for several years, but a change in the way Google packages the v8 languages has made it burdensome for packagers to build PL/v8 packages.

Therefore, few package managers still distribute PL/v8. This is disappointing since it undercuts some of our NoSQL story. We previously supported JSON storage and a JSON-specific server-side language. This second option is effectively no longer available, and those users who are using PL/v8 will need to find alternatives. This highlights the risk of software significantly relying on other software that it does not control and cannot maintain itself.

PostgreSQL encryption and authorization

row signing with postgresqlAuthorisations and encryption/decryption within a database system establish the basic guidelines in protecting your database by guarding against malicious structural or data changes.

What are authorisations?

Authorisations are the access privileges that mainly control what a user can and cannot do on the database server for one or more databases. So consider this to be like granting a key to unlock specific doors. Think of this as more like your five star hotel smart card. It allows you access all facilities that are meant for you, but doesn’t let you open every door. Whereas, privileged staff have master keys which let them open any door.

Similarly, in the database world, granting permissions secures the system by allowing specific actions by specific users or user groups, yet it allows database administrator to perform whatever action(s) on the database he/she wishes. PostgreSQL provides user management where you can can create users, and grant and revoke their privileges.

Encryption

Encryption, decryption can protect your data, obfuscate schema structure and help hide code from prying eyes. Encryption/decryption hides the valuable information and ensures that there are no mischievous changes in the code or data that may be considered harmful. In almost all cases, data encryption and decryption happens on the database server. This is more like hiding your stuff somewhere in your room so that nobody can see it, but also making your stuff difficult to access.

PostgreSQL also provides encryption using pgcrypto (PostgreSQL extension). There are some cases where you don’t want to hide the data, but don’t want people to update it either. You can revoke the privileges to modify the data.

Data modifications

But what if an admin user modifies the data? How you can identify that data is changed? If somebody changes the data and you don’t know about, then it is more dangerous than you losing your data, as you are relying on data which may no longer be valid.

Logs in database systems allow us to track back changes and “po

[...]

Studying Stored Procs in Postgres 11

With Postgres 11 looming on the near horizon, it’s only appropriate to check out a recent beta and kick the tires a few times. Whether it’s improvements in parallelism, partitions, stored procedures, JIT functionality, or any number of elements in the release page, there’s a lot to investigate.

It just so happens that I ran across a fortuitous event on Twitter when deciding on an appropriate topic. Behold!

waste_xid Postgres 11
Wasting XIDs has never been this fun

Wait! No! That’s not what stored procedures are for!

I felt so good like anything was possible

When confronted with such a blatant corruption of such a cool new feature, it’s only natural to question the wisdom of doing so. It is, after all, not a great idea to programatically consume transaction IDs. I said as much and moved on with life, certain the worst was over.

Then this happened.

waste_xid is fun
Magnus helpfully recommends proceeding

Now, Magnus is infamous for two things: his technical acumen, and giddy malevolence. His advocacy of a stored procedure named “waste_xid” only proved nobody anywhere should ever run this anywhere, lest they immolate whatever system hosted the database instance.

But hey, VMs are cheap; let’s break things. How else can we learn the limits of our new toys, but by virtually atomizing them?

I hit cruise control and rubbed my eyes

Before we saddle our poor innocent Postgres 11 installation with an inadvisable stored procedure designed specifically to underhandedly reap its transaction lifespan, we should probably make the routine as evil as possible.

One thing stands out immediately: calling EXECUTE is unnecessary overhead. According to the information function documentation, txid_current will assign a new transaction ID if there isn’t one already. Since the stored procedure is constantly committing, that’s extremely handy. And since this is Pl/pgSQL, we can use direct assignment instead.

Our new procedure looks something like this:

CREATE OR REPLACE PROCEDURE waste_xid(cnt int)
AS $$
DECLARE
    i INT;
    x BIGINT;
B
[...]
Posted by Pavel Stehule on 2018-10-11 at 14:40
I am working on new release of plpgsql_check extension - https://github.com/okbob/plpgsql_check.

Interesting new function is possibility to return list of used relation and functions. With these information is easy to generate dependency graph:

postgres=# \sf fx
CREATE OR REPLACE FUNCTION public.fx()
RETURNS void
LANGUAGE plpgsql
AS $function$
begin
perform upper(((plus(a) + 200) * 100)::text) from xx;
end;
$function$
postgres=# select * from plpgsql_show_dependency_tb('fx()');
┌──────────┬───────┬────────┬──────┬───────────┐
│ type │ oid │ schema │ name │ params │
╞══════════╪═══════╪════════╪══════╪═══════════╡
│ FUNCTION │ 18310 │ public │ plus │ (integer) │
│ RELATION │ 24576 │ public │ xx │ │
└──────────┴───────┴────────┴──────┴───────────┘
(2 rows)
Posted by Bruce Momjian in EnterpriseDB on 2018-10-10 at 18:45

Libpq is used by many client interface languages to communicate with the Postgres server. One new feature in Postgres 10 is the ability to specify multiple servers for connection attempts. Specifically, it allows the connection string to contain multiple sets of host, hostaddr, and port values. These are tried until one connects.

NoSQL solutions have used this method of multi-host connectivity for a while, so it is good Postgres can now do it too. It doesn't have all the features of a separate connection pooler, but it doesn't have the administrative or performance overhead of a separate connection pooler either, so it certainly fits a need for some environments.

Many applications these days want us to know how close we are to things:

  • What are the three closest coffee shops to my current location?
  • Which is the nearest airport to the office?
  • What are the two closest subway stops to the restaurant?

and countless more examples.

Another way of asking these questions is to say “who are my nearest neighbors to me?” This maps to a classic algorithmic problem: efficiently finding the K-nearest neighbors (or K-NN), where K is a constant. For example, the first question would be a 3-NN problem as we are trying to find the 3 closest coffee shops.

(If you are interested in learning more about K-NN problems in general, I highly recommend looking at how you can solve this using n-dimensional Voronoi diagrams, a wonderful data structure developed in the field of computational geometry.)

How can we use PostgreSQL to help us quickly find our closest neighbors? Let’s explore.

Posted by Laurenz Albe in Cybertec on 2018-10-10 at 08:01
A foreign key is no wrong key!
© Laurenz Albe 2018

 

Foreign key constraints are an important tool to keep your database consistent while also documenting relationships between tables.

A fact that is often ignored is that foreign keys need proper indexing to perform well.

This article will explain that and show you how to search for missing indexes.

Index at the target of a foreign key

In the following, I will call the table on which the foreign key constraint is defined the source table and the referenced table the target table.

The referenced columns in the target table must have a primary key or unique constraint. Such constraints are implemented with unique indexes in PostgreSQL. Consequently, the target side of a foreign key is automatically indexed.

This is required so that there is always a well-defined row to which the foreign key points. The index also comes handy if you want to find the row in the target table that matches a row in the source table.

Index at the source of a foreign key

In contrast to the above, PostgreSQL requires no index at the source of a foreign key.

However, such an index is quite useful for finding all source rows that reference a target row. The typical cases where you need that are:

1. You perform a join between the two tables where you explicitly search for the source rows referencing one or a few target rows. If there is an index on the columns at the source, PostgreSQL can use an efficient nested loop join.

This is well known and pretty obvious.

2. You delete rows or update key columns in the target table.

Then PostgreSQL has to check if the foreign key constraint is still satisfied. It does so by searching if there are rows in the source table that would become orphaned by the data modification. Without an index, this requires a sequential scan of the source table.

An example

Let’s build a source and a target table:

-- to make the plans look simpler
SET max_parallel_workers_per_gather = 0;
-- to speed up CREATE INDEX
SET maintenance_work_mem = '512MB';

CREATE TABLE target (
   t_id integer NOT NULL,
 
[...]
Posted by Andrew Dunstan in 2ndQuadrant on 2018-10-10 at 07:26

Postgres contains a moving event horizon, which is in effect about 2 billion transactions ahead of or behind the current transaction id. Transactions more than 2 billion ahead of or more than 2 billion behind the current transaction id are considered to be in the future, and will thus be invisible to current transactions.

Postgres avoids this catastrophic data loss by specially marking old rows so that no matter where they are in relation to the current transaction id they will be visible.

Freezing is this process of marking old live tuples (i.e. database rows) so that they don’t get run over by the moving event horizon that would otherwise make them appear to be in the future. This is in contrast to vacuuming, which is the freeing up of space consumed by old dead tuples that are no longer visible to any transaction.

Both processes are managed by vacuum.

There are a number of settings that govern how freezing is done.

First, vacuum_freeze_min_age governs whether or not a tuple will be frozen while vacuum is already looking at a page to see if it has dead tuples that can be cleaned up. Tuples older than vacuum_freeze_min_age will be frozen in this case. Setting this low means that there will be less work to do later on, but at the possible cost of extra effort both in CPU and IO or WAL activity. Generally you probably want this set to at least a few hours worth of transactions. Let’s say you’re expecting to do up to 2000 transactions per second as a sustained rate. 2000 TPS is 7.2m transactions per hour. Thus a fairly aggressive setting for this case might be say 20m. The default setting is 50m. Similarly for vacuum_multixact_freeze_min_age. Note that the transaction_id and multixid counters are independent – you need to keep track of both of them.

Second, there are vacuum_freeze_table_age and vacuum_multixact_freeze_table_age. These settings govern when autovacuum will not just look at pages that might have dead rows, but any page that might have unfrozen rows. The defaults for these settings are 150m. If you hav

[...]
PostgreSQL Monitoring

PostgreSQL logoThis is the last post in our series on building an enterprise-grade PostgreSQL set up using open source tools, and we’ll be covering monitoring.

The previous posts in this series discussed aspects such as security, backup strategy, high availability, connection pooling and load balancing, extensions, and detailed logging in PostgreSQL. Tomorrow, Wednesday, October 10 at 10AM EST, we will be reviewing these topics together, and showcasing then in practice in a webinar format: we hope you can join us!

 

Monitoring databases

The importance of monitoring the activity and health of production systems is unquestionable. When it comes to the database, with its high number of customizable settings, the ability to track its various metrics (status counters and gauges) allows for the maintenance of a historical record of its performance over time. This can be used for capacity planningtroubleshooting and validation.

When it comes to capacity planning, a monitoring solution is a helpful tool to help you assess how the current setup is faring. At the same time, it can help predict future needs based on trends, such as the increase of active connections, queries, and CPU usage. For example, an increase in CPU usage might be due to a genuine increase in workload, but it could also be a sign of unoptimized queries growing in popularity. In which case, comparing CPU with disk access might provide a more complete view of what is going on.

Being able to easily correlate data like this helps you to catch minor issues and to plan accordingly, sometimes allowing you to avoid an easier but more costly solution of scaling up to mitigate problems like this. But having the right monitoring solution is really invaluable when it comes to investigative work and root cause analysis. Trying to understand a problem that has already taken place is a rather complicated, and often unenviable, task unless you established a continuous, watchful eye on the set up for the whole time.

Finally, a monitoring solution can help you valida

[...]
Posted by Bruce Momjian in EnterpriseDB on 2018-10-09 at 14:30

Postgres support for hot standby servers allows read-only queries to be run on standby servers, but how are read-only sessions handled when promoting a standby server to primary? After a standby is promoted to primary, new connections are read/write, but existing connections also change to read/write:

SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
 
SHOW transaction_read_only;
 transaction_read_only
-----------------------
 on
 
\! touch /u/pg/data2/primary.trigger
 
-- wait five seconds for the trigger file to be detected
CREATE TABLE test (x INTEGER);
 
SELECT pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
 
SHOW transaction_read_only;
 transaction_read_only
-----------------------
 off

Continue Reading »

detailed logging PostgreSQL

PostgreSQL® logoIn this penultimate post from our series on building an enterprise-grade PostgreSQL environment we cover the parameters we have enabled to configure detailed logging in the demo setup we will showcase in our upcoming webinar.

Detailed logging in PostgreSQL and log analyzer

Like other RDBMS, PostgreSQL allows you to maintain a log of activities and error messages. Until PostgreSQL 9.6, PostgreSQL log files were generated in pg_log directory (inside the data directory) by default. Since PostgreSQL 10, pg_log has been renamed to simply log. However, this directory can be modified to a different location by modifying the parameter log_directory.

Unlike MySQL, PostgreSQL writes the error and activity log to the same log file thus it may grow to several GBs when detailed logging is enabled. In these cases, logging becomes IO-intensive thus it is recommended to store log files in a different storage to the one hosting the data directory.

Parameters to enable detailed logging

Here’s a list of parameters used to customize logging in PostgreSQL. All of them need to be modified in the postgresql.conf or postgresql.auto.conf files.

logging_collector: in order to log any activity in PostgreSQL this parameter must be enabled. The backend process responsible for logging database activity is called logger, it gets started when logging_collector is set to ON. Changing this parameter requires a PostgreSQL restart.

log_min_duration_statement: this parameter is used primarily to set a time threshold: queries running longer than such should be logged (as “slow queries”). Setting it to -1 disables logging of statements. Setting it to 0 enables the logging of every statement running in the database, regardless of its duration. The time unit should follow the actual value, for example: 250ms,  250s, 250min, 1h. Changing this parameter does not require a PostgreSQL restart – a simple reload of the configuration is enough.reload but not a restart. For example:

log_min_duration_statement = 5s
  logs every statement running for 5 seconds o[...]

Monitoring PostgreSQL can, at times, be like trying to wrangle cattle in a thunderstorm. Applications connect and issue queries so quickly, it’s hard to see what’s going on or even get a good overview of the performance of the system other than the typical developer complaining ‘things are slow, help!’ kind of requests.

In previous articles, we’ve discussed how to Get to the Source when PostgreSQL is acting slow, but when the source is specifically queries, basic level monitoring may not be enough for assessing what’s going on in an active live environment.

Enter pg_top, a PostgreSQL specific program to monitor real time activity in a database, as well as view basic information for the database host itself. Much like the linux command ‘top’, running it brings the user into a live interactive display of database activity on the host, refreshing automatically in intervals.

Installation

Installing pg_top can be done in the generally expected ways: package managers and source install. The most recent version as of this article is 3.7.0.

Package Managers

Based on the distribution of linux in question, search for pgtop or pg_top in the package manager, it’s likely available in some aspect for the installed version of PostgreSQL on the system.

Red Hat based distros:

# sudo yum install pg_top

Gentoo based distros:

# sudo apt-get install pgtop

Source

If desired, pg_top can be installed via source from the PostgreSQL git repository. This will provide any version desired, even newer builds that are not yet in the official releases.

Features

Once installed, pg_top works as a very accurate real time view into the database it is monitoring and using the command line to run ‘pg_top’ will launch the interactive PostgreSQL monitoring tool.

The tool itself can help shed light on all processes currently connected to the database.

Running pg_top

Launching pg_top is the same as the unix / linux style ‘top’ command itself, along with connection information to the database.

To run pg_top on a local database host:

pg_top -h localhost
[...]
Posted by Pavel Stehule on 2018-10-08 at 08:03
Any Unix like systems has great feature - a pager. The history of pagers is pretty old. Originally was used manually. The result of some programs can be redirected to any program with pipe operator.

ls -la | more

One, most simple pager is more. It can scroll only in one direction. Usually this is default pager in Unix systems.

Much more powerful pager is less. Has lot of advantages and functionality - mainly possibility to scroll in two direction, strong navigation inside result. It can display long lines, and allow horizontal scrolling. This pager is necessary for comfortable usage of psql console.

export PAGER="less -S"
psql postgres
postgres=>select * from pg_class;

It is good to know some less commands:
  • g - move to document begin,
  • G - move to document end,
  • / - search string,
  • n - next occurrence of string,
  • N - previous occurrence of string,
  • q - quit.
less is very rich pager has special support for displaying man pages. But this feature we cannot to use for database data. For tabular data, there are special pager pspg. This pager can freeze n first columns, and m first rows. It knows almost all keyboard commands of less pager, and append some new (based on mcedit keywords).
  • Alt k - new bookmark
  • Alt i - previous bookmark
  • Alt j - next bookmark
  • Alt n - show row numbers
  • Alt l - go to line
  • Ctrl Home - move to document begin
  • Ctrl End - move to document end
  • F9 - menu
  • F10 - quit
The usage of pspg is same:
export PAGER=pspg
psql postgres
...
This pager is available from PostgreSQL community repository or from git https://github.com/okbob/pspg.