PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

On August 15th, Silicon Valley Postgres held their first Postgresql.org Patch Review workshop. The workshop was hosted by PostgresConf partner Pivotal. Attendees’ feedback indicate that we have found a great way to show how companies within the community can benefit Postgresql.org, Postgres User groups, and help us fulfill our mission:


There were 20 attendees, including Melanie Plageman of Pivotal presenting/coordinating “Postgres Patch Review,” and Postgresql.org -Hacker Jeff Davis as hands on help. This was the first workshop of its kind for Silicon Valley Postgres and with the feedback received, we are planning to plan similar events at least twice a year to provide an opportunity that is currently not available elsewhere. (Sneak peek: there may be one at PostgresConf Silicon Valley on October 15th and 16th, which is right around the corner!)

Eight patches were reviewed, with three including specific feedback to the patch authors:
We’re excited to be a part of this path of Postgres contributor building. Hopefully it will be a model for other meetups and community partners to follow in continuing to contribute to the Most Advanced Open Source Database ecosystem.

Silicon Valley Postgres is the fastest growing Postgres user group within North America. It was initiated just 12 months ago and will reach 700 members likely before Monday. The group exists to help all facets of Postgres users and the Postgres ecosystem, including, but not limited to, PostgreSQL, Greenplum, Aurora Postgres, AgensGraph, Yugabyte, ZomboDB, TimescaleDB.

Silicon Valley Postgres is also working with PostgresConf Silicon Valley to bring a high quality and cost effective education solution to the users of Postgres within the Silicon Valley. Early Bird tickets are now on sale and available. You will want to hurry as the Early Bird price ends on September 1st
[...]

In my previous post we looked at various partitioning techniques in PostgreSQL for efficient IoT data management. We do understand that the basic objective behind time based partitions is to achieve better performance, especially in IoT environments, where active data is usually the most recent data. New data is usually append only and it can grow pretty quickly depending on the frequency of the data points.

Some might argue on why to have multiple write nodes (as would be inherently needed in a BDR cluster) when a single node can effectively handle incoming IoT data utilizing features such as time based partitioning. Gartner estimated 8.4 billion connected devices in 2017, and it expects that this number will grow to over 20 billion by 2020. The scale at which connected devices are operating, it becomes imperative to introduce additional write nodes into the cluster at some point just to handle the sheer number of devices connecting to the databases.

In a conventional Master-Standby setup, adding additional write nodes isn’t trivial. There are techniques one might use to introduce additional write nodes (as a separate cluster), but those techniques introduce many complexities which time critical IoT applications cannot afford. As an example, consider a city’s smart grid station that needs to run analytics on past data to make adjustments for power for a specific time of the year.

A couple of key concepts with respect to data in an IoT environment are:

  1. Data localization
  2. Local and global analytics

Edge devices write to their local data store. Data security regulations might dictate data localization in many cases and therefore the ability to store data locally is important. Other important reasons include reducing read/write latency. However, it is equally important to be able to run analytics on the reported data to make various decisions.

Enterprises are increasingly operating at global levels and the ability to handle geographically distributed data is becoming increasingly important. This is where PostgreSQ

[...]
Posted by Mark Wong on 2018-08-15 at 03:25

When: 6-8pm Thursday August 16, 2018
Where: iovation
Who: Mark Wong
What: Sneak peek at stored procedures

Stored procedure support is coming to PostgreSQL 11.  Come hear about what a stored procedure is and how it differs from the existing user-defined functions.

Mark leads the 2ndQuadrant performance practice as a Performance Consultant for English Speaking Territories, based out of Oregon in the USA. He is a long time Contributor to PostgreSQL, co-organizer of the Portland PostgreSQL User Group, and serves as a Director and Treasurer
for the United States PostgreSQL Association.


If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at iovation, on the 3rd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry! For access to the 3rd floor of the plaza, please either take the lobby stairs to the third floor or take the plaza elevator (near Subway and Rabbit’s Cafe) to the third floor. There will be signs directing you to the meeting room. All attendess must check in at the iovation front desk.

See you there!

This month we are spending some time talking about the importance of database monitoring and auditing of your database environments. Over the past year (and in some cases even longer!) we have posted many technical resources to help you monitor and audit your PostgreSQL setups and we wanted to consolidate them into one blog for you to bookmark.

And make sure to join us on August 21st for our free webinar “An Introduction to Performance Monitoring for PostgreSQL” to learn how to operate PostgreSQL efficiently and running at optimal levels.

Most Popular PostgreSQL Monitoring Resources

A Performance Cheat Sheet for PostgreSQL

Performance tuning is not trivial, but you can go a long way with a few basic guidelines. In this blog, we will discuss how you analyze the workload of the database, and then review a list of important configuration parameters to improve the performance of PostgreSQL.

Read More

Key Things to Monitor in PostgreSQL - Analyzing Your Workload

This blog provides an overview of key things you should know when monitoring your PostgreSQL database and its workload.

Read More

PostgreSQL Audit Logging Best Practices

This blog provides an introduction to audit logging, why you should do it, and specific tips for doing it on your PostgreSQL database environment.

Read More

How to Decode the PostgreSQL Error Logs

This blog provides the basic information you need to know to understand the PostgreSQL error logs and what to do when you find errors.

Read More

The Best Alert and Notification Tools for PostgreSQL

This blog highlights the top alert and notification systems available for PostgreSQL database deployments.

Read More

Monitoring your Databases with ClusterControl

Observability is critical piece of the operations puzzle - you have to be able to tell the state of your systems based on trending data presented in the form of graphs and alerts. Ideally, this data will be available from one single location. This blog explains how ClusterControl can work as a monitoring hub for all your database systems.

R

[...]

PostgreSQL is a relational database management system. It’s even the world’s most advanced open source one of them. As such, as its core, Postgres solves concurrent access to a set of data and maintains consistency while allowing concurrent operations.

In the PostgreSQL Concurrency series of articles here we did see several aspects of how to handle concurrent use cases of your application design with PostgreSQL. The main thing to remember is that a Database Management System first task is to handle concurrency access to the data for you.

Posted by pgCMH - Columbus, OH on 2018-08-14 at 04:00

The Aug meeting will be held at 18:00 EST on Tues, the 28th. Once again, we will be holding the meeting in the community space at CoverMyMeds. Please RSVP on MeetUp so we have an idea on the amount of food needed.

What

Our very own Arlette Garay (https://www.linkedin.com/in/arlette-garay-28075a2/) is going to share with us what she learned about the PostgreSQL database’s concept of a ‘cluster’. This will include discussing the object hierarchy and the on-disk locations of database objects.

We’re very excited for this talk, as Arlette has only recently made the jump from MySQL to PostgreSQL and this will be her very first public presentation! We hope everyone will come out and show their support for her.

Where

CoverMyMeds has graciously agreed to validate your parking if you use their garage so please park there:

You can safely ignore any sign saying to not park in the garage as long as it’s after 17:30 when you arrive.

Park in any space that is not marked ‘24 hour reserved’.

Once parked, take the elevator/stairs to the 3rd floor to reach the Miranova lobby. Once in the lobby, the elevator bank is in the back (West side) of the building. Take a left and walk down the hall until you see the elevator bank on your right. Grab an elevator up to the 11th floor. (If the elevator won’t let you pick the 11th floor, contact Doug or CJ (info below)). Once you exit the elevator, look to your left and right; one side will have visible cubicles, the other won’t. Head to the side without cubicles. You’re now in the community space:

Community space as seen from the stage

The kitchen is to your right (grab yourself a drink) and the meeting will be held to your left. Walk down the room towards the stage.

If you have any issues or questions with parking or the elevators, feel free to text/call Doug at +1.614.316.5079 or CJ at +1.740.407.7043

On 1st of August 2018, Peter Eisentraut committed patch: Allow multi-inserts during COPY into a partitioned table CopyFrom allows multi-inserts to be used for non-partitioned tables, but this was disabled for partitioned tables. The reason for this appeared to be that the tuple may not belong to the same partition as the previous tuple did. […]

Rules are a powerful mechanism by which PostgreSQL allows a statement to be transformed into another. And PostgreSQL itself does use rules in order to make your life easier.

An example of PostgreSQL rules: updating pg_settings

When asked for a quick and sweet example about rules I often answer with the pg_settings example.

The special view pg_settings offers a tabular decodification of the current cluster settings, in other words allows you to see postgresql.conf (and friends) as a table to run queries against.

But there is more than that: you can also issue UPDATE commands against such table and get the configuration updated on the fly (this does not mean applied, it depends on the parameter context). Internally, PostgreSQL uses a very simple rule to cascade updates to pg_settings into the run-time configuration. The rule can be found in the system_views.sql files inside the backend source code and is implemented as:

CREATE RULE pg_settings_u AS
    ON UPDATE TO pg_settings
    WHERE new.name = old.name DO
    SELECT set_config(old.name, new.setting, 'f');

It simply reads as: whenever there is an update keeping untouched the parameter name, invoke the special function set_config with the parameter name and its new value (the flag f means to keep changes not local to session). For more information about set_config see the function official documentation.

How cool!

Posted by Regina Obe in PostGIS on 2018-08-11 at 00:00

The PostGIS development team is pleased to release PostGIS 2.5.0beta2.

This release is a work in progress. Remaining time will be focused on bug fixes and documentation until PostGIS 2.5.0 release. Although this release will work for PostgreSQL 9.4 and above, to take full advantage of what PostGIS 2.5 offers, you should be running PostgreSQL 11beta3+ and GEOS 3.7.0beta2 which were released recently.

Best served with PostgreSQL 11beta3 which was recently released.

Changes since PostGIS 2.5.0beta1 release are as follows:

  • 4115, Fix a bug that created MVTs with incorrect property values under parallel plans (Raúl Marín).
  • 4120, ST_AsMVTGeom: Clip using tile coordinates (Raúl Marín).
  • 4132, ST_Intersection on Raster now works without throwing TopologyException (Vinícius A.B. Schmidt, Darafei Praliaskouski)
  • 4109, Fix WKT parser accepting and interpreting numbers with multiple dots (Raúl Marín, Paul Ramsey)
  • 4140, Use user-provided CFLAGS in address standardizer and the topology module (Raúl Marín)
  • 4143, Fix backend crash when ST_OffsetCurve fails (Dan Baston)
  • 4145, Speedup MVT column parsing (Raúl Marín)

View all closed tickets for 2.5.0.

After installing the binaries or after running pg_upgrade, make sure to do:

ALTER EXTENSION postgis UPDATE;

— if you use the other extensions packaged with postgis — make sure to upgrade those as well

ALTER EXTENSION postgis_sfcgal UPDATE;
ALTER EXTENSION postgis_topology UPDATE;
ALTER EXTENSION postgis_tiger_geocoder UPDATE;

If you use legacy.sql or legacy_minimal.sql, make sure to rerun the version packaged with these releases.

2.5.0beta2

Posted by Dan Langille on 2018-08-09 at 22:03

Users and Roles in PostgreSQL are very similar. When I set up the FreshPorts database back in mid-2000, I was using PostgreSQL 7.0.3 (that’s my best guess based on my blog entry). I suspect roles were not available then and were introduced with PostgreSQL 8. I am positive someone will correct me if that’s wrong.

I now have a need to convert a user into a role, then add users to that role. Let’s see what happens.

I’m doing this on my development server, so there’s no concurrent access issue. I’ll just turn stuff off (disable the webserver, the scripts, etc).

Creating the new users

begin;
ALTER ROLE www NOLOGIN;
CREATE USER www_dev  WITH LOGIN PASSWORD '[redacted]' IN ROLE www;
CREATE USER www_beta WITH LOGIN PASSWORD '[redacted]' IN ROLE www;

That went well, so I issued a COMMIT.

The two new users will have the same permission as the original user.

Changing the login

The login credentials will need to be changed. This is my update:

#       $db = pg_connect("host=pg02.example.org dbname=freshports user=www password=oldpassword sslmode=require");
        $db = pg_connect("host=pg02.example.org dbname=freshports user=www_beta password=newpassword sslmode=require");

Access rights

I also updated pg_hba.conf for this server.

#hostssl freshports      www          10.0.0.1/32             md5
hostssl  freshports      www_beta     10.0.0.1/32             md5

After changing pg_hba.conf, you have to tell PostgreSQL about it. This is the FreeBSD command for that:

sudo service postgresql reload

It just worked

I was impressed with how straight forward this was. https://beta.freshports.org/ came right up.

I have three other users to convert to roles but if it’s as easy as the above, I should be finished in time for dinner.

I spent a couple of days in São Paulo, Brazil last week, for the top-notch PGConf.Brazil 2018 experience. This year I gave a talk about improvements in the declarative partitioning area in the upcoming PostgreSQL 11 release — a huge step forward from what PostgreSQL 10 offers. We have some new features, some DDL handling enhancements, and some performance improvements, all worth checking out.

I’m told that the organization is going to publish video recordings at some point; for the time being, here’s my talk slides.

I’m very happy that they invited me to talk once again in Brazil. I had a great time there, even if they won’t allow me to give my talk in Spanish! Like every time I go there, I regret it once it’s time to come home, because it’s so easy to feel at home with the Brazilian gang. Next time, I promise I’ll make time for Sunday’s churrasco.

For our amusement, they posted a teaser video to share the community conference spirit.

Posted by Christophe Pettus in pgExperts on 2018-08-09 at 17:06

pg_rewind is a utility included with PostgreSQL since 9.x. It’s used to “rewind” a server so that it can be attached as a secondary to a primary. The server being rewound could be the former primary of the new primary, or a secondary that was a peer of the new primary.

In pg_rewind terminology, and in this post, the “source” server is the new primary that the old server is going to be attached to, and the “target” is the server that will be attached to the source as a secondary.

Step One: Have a WAL Archive

While pg_rewind does not require that you have a WAL archive, you should have one. pg_rewind works by “backing up” the target server to a state before the last shared checkpoint of the two servers. Then, when the target starts up, it uses WAL information to replay itself to the appropriate point at which it can connect as a streaming replica to the source. To do that, it needs the WAL information from the rewind point onwards. Since the source had no reason to “know” that it would be used as a primary, it may not have enough WAL information in its pgxlog / pgwal directory to bring the target up to date. If it doesn’t, you are back to rebuilding the new secondary, the exact situation that pg_rewind is meant to avoid.

Thus, make sure you have a WAL archive that the target can consult as it is coming up.

Step Two: Properly Promote the Source Server

The source server, which will be the new primary, needs to be properly promoted. Use the pg_ctl promote option, or the trigger_file option in recovery.conf so that the source promotes itself, and starts a new timeline. Don’t just shut the source down, remove recovery.conf, and bring it back up! That doesn’t create a new timeline, and the source won’t have the appropriate divergence point from the target for pg_rewind to consult.

Step Three: Wait for the Forced Checkpoint to Complete

When a secondary is promoted to being a primary, it starts a forced checkpoint when it exits recovery mode. This checkpoint is a “fast” checkpoint, but it can still take a while, depending

[...]

This week we’re continuing our fun with SQL series. In past posts we’ve looked at generate_series, window functions, and recursive CTEs. This week we’re going to take a step backward and look at standard CTEs (common table expressions) within Postgres.

Admittedly SQL isn’t always the most friendly language to read. It’s a little more friendly to write, but even still not as natuarlly readable as something like Python. Despite it’s shortcomings there it is the lingua franca when it comes to data, SQL is the language and API that began with relational databases and now even non traditional databases are aiming to immitate it with their own SQL like thing. With CTEs though our SQL, even queries hundreds of lines long, can become readable to someone without detailed knowledge of the application.

CTEs (common table expressions), often referred to as with clauses/queries, are essentially views that are valid during the course of a transaction. They can reference earlier CTEs within that same transaction or query essentially allowing you separate building blocks on which you compose your queries. It is of note that CTEs are an optimization boundary, so in cases they may have worse performance than their alternative non-CTE queries. Even still they’re incredible useful for readability and should be considered when constructing large complex queries. Let’s dig in with an example.

We’re going to assume a basic CRM schema where we have organizations, that have contacts which are tied to accounts, and those accounts have opportunities. In this CRM world we want to create a report that has all opportunities that were opened between 30 and 60 days ago, and have a contact that was contacted within the last 30 days, grouped by the sales rep owner. The goal is to see that our sales reps are actively chasing the deals that they said exist.

Query for opportunities created 1-2 months ago

First we’re going to construct our query that gives us all opportunities opened in that range:

WITH opp_list AS (
  SELECT opportunities.id as o
[...]

AWS PostgreSQL services fall under the RDS umbrella, which is Amazon’s DaaS offering for all known database engines.

Managed database services offer certain advantages that are appealing to the customer seeking independence from infrastructure maintenance, and highly available configurations. As always, there isn’t a one size fits all solution. The currently available options are highlighted below:

Aurora PostgreSQL

The Amazon Aurora FAQ page provides important details that need to be considered before diving into the product. For example, we learn that the storage layer is virtualized and sits on a proprietary virtualized storage system backed up by SSD.

Pricing

In term of pricing, it must be noted that Aurora PostgreSQL is not available in the AWS Free Tier.

Compatibility

The same FAQ page makes it clear that Amazon doesn’t claim 100% PostgreSQL compatibility. Most (my emphasis) of the applications will be fine, e.g. the AWS PostgreSQL flavor is wire-compatible with PostgreSQL 9.6. As a result, the Wireshark PostgreSQL Dissector will work just fine.

Performance

Performance is also linked to the instance type, for example the maximum number of connections is by default configured based on the instance size.

Also important when it comes to compatibility is the page size that has been kept at 8KiB which is the PostgreSQL default page size. Speaking of pages it’s worth quoting the FAQ: “Unlike traditional database engines Amazon Aurora never pushes modified database pages to the storage layer, resulting in further IO consumption savings.” This is made possible because Amazon changed the way the page cache is managed, allowing it to remain in memory in case of database failure. This feature also benefits the database restart following a crash, allowing the recovery to happen much faster than in the traditional method of replaying the logs.

According to the

[...]

PostgreSQL has a variety of ways of telling time: now(), statement_timestamp(), and clock_timestamp(). Each has a different sense of when “now” is:

  • now() is the time at the start of the transaction; it never changes while the current transaction is open.
  • statement_timestamp() is the time that the current statement started running. It changes from statement to statement, but is constant within a statement (which means it is constant within a PL/pgSQL function).
  • clock_timestamp() changes each time it is called, regardless of context.

Each has its uses:

  • For a predicate in a WHERE clause, you want either now() or statement_timestamp(). These work properly with indexes, because they are constant within the execution of a statement.
  • If you need the time to update within a single transaction, use statement_timestamp(); otherwise, use now().
  • Generally, you only use clock_timestamp() inside of a programming language procedure so you can get the current timestamp.
Posted by Douglas Hunley on 2018-08-07 at 12:53

I’ve noticed several individuals inquiring lately about pgBouncer and how they can avoid putting all users and their passwords in it’s auth_file. After the most recent such inquiry (hi Richard!) I decided I’d write this post to hopefully make it clearer how to use ‘pass-thru auth’ and avoid maintaining your users and their passwords in an external file. So let’s see what this takes, shall we?

First, install pgBouncer as per your OS (yum, apt, brew, etc):

doug@ReturnOfTheMac ~> brew install pgbouncer
Updating Homebrew...
==> Auto-updated Homebrew!
Updated 1 tap (homebrew/core).
==> Updated Formulae
<snip>
==> Downloading https://homebrew.bintray.com/bottles/pgbouncer-1.8.1.high_sierra
######################################################################## 100.0%
==> Pouring pgbouncer-1.8.1.high_sierra.bottle.tar.gz
==> Caveats
The config file: /usr/local/etc/pgbouncer.ini is in the "ini" format and you
will need to edit it for your particular setup. See:
https://pgbouncer.github.io/config.html

The auth_file option should point to the /usr/local/etc/userlist.txt file which
can be populated by the /usr/local/opt/pgbouncer/bin/mkauth.py script.

To have launchd start pgbouncer now and restart at login:
  brew services start pgbouncer
Or, if you do not want/need a background service you can just run:
  pgbouncer -q /usr/local/etc/pgbouncer.ini
==> Summary
🍺  /usr/local/Cellar/pgbouncer/1.8.1: 17 files, 399.9KB

Great, so now we have pgBouncer installed.

To make life easier on ourselves, we’re going to temporarily enable trusted local socket connections in our pg_hba.conf:

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all trust

Right now, this is the only line in my pg_hba.conf. Let’s SIGHUP the postmaster so it takes affect:

doug@ReturnOfTheMac ~> pg_ctl -D $PGDATA reload
server signaled

And test it:

doug@ReturnOfTheMac ~> unset PGPASSWORD ; psql -U doug -d doug -c "select now();"
┌────────────────────────
[...]

Observability is critical piece of the operations puzzle - you have to be able to tell the state of your system based on data. Ideally, this data will be available from a single location. Having multiple applications, each handling separate pieces of data, is a direct way to serious troubles. When the issues start, you have to be able to tell what is going on quickly rather than trying to analyze and merge reports from multiple sources.

ClusterControl, among other features, provides users with a one single point from which to track the health of your databases. In this blog post, we will show some of the observability features in ClusterControl.

Overview Tab

Overview section is a one place to track the state of one cluster, including all the cluster nodes as well as any load balancers.

It provides easy access to multiple pre-defined dashboards which show the most important information for the given type of cluster. ClusterControl supports different open source datastores, and different graphs are displayed based on the vendor. ClusterControl also gives an option to create your own, custom dashboards:

One key feature is that graphs are aggregated across all cluster nodes. This makes it easier to track the state of the whole cluster. If you want to check graphs from each of the nodes, you can easily do that:

By ticking “Show Servers”, all nodes in the cluster will be shown separately allowing you to drill down into each one of them.

Nodes Tab

If you would like to check a particular node in more details, you can do so from the “Nodes” tab.

Here you can find metrics related to given host - CPU, disk, network, memory. All the important bits of data which define how a given server behaves and how loaded it is.

Nodes tab also gives you an option to check the database metrics for a given node:

All of those graphs are customizable, you can easily add more of them:

Nodes tab also contains metrics related to nodes other than databases. For example, for ProxySQL, ClusterControl provides extensive list of graphs to t

[...]
Posted by Luca Ferrari on 2018-08-07 at 00:00

pgxnclient is a wonderful cpan like tool for the PGXN extension network. Unlickily, the client cannot handle PostgreSQL beta version, so I submitted a really small patch to fix the issue.

pgxnclient and beta version

If you, like me, are addicted to terminal mode, you surely love a tool like pgxnclient that allows you to install extension into PostgreSQL from the command line, much like cpan (and friends) does for Perl.

A few days ago, I run into a problem: the `load** command cannot work against a PostgreSQL 11 beta 2 server. At first I reported it with a [ticket])https://github.com/dvarrazzo/pgxnclient/issues/29), but then curiosity hit me and I decided to give a look at very well written source code.

Warning: I’m not a Python developer, or better, I’m a Python-idiot! This means the work I’ve done, even if it seems it works, could be totally wrong, so reviews are welcome.

First I got to the regular expression used to parse a version() output:

m = re.match(r'\S+\s+(\d+)\.(\d+)(?:\.(\d+))?', data) 

where data is the output of a SELECT version();. Now, this works great for a version like 9.6.5 or 10.3, but does not work for 11beta2. Therefore, I decided to implement a two level regular expression check: at first search for a two or three numbers, and if it fails, search for two numbers separated by the beta text string.

 m = re.match(r'\S+\s+(\d+)\.(\d+)(?:\.(\d+))?', data) if m is None: m = re.match( r'\S+\s+(\d+)beta(\d+)', data ) is_beta...

One of the primary challenges with scaling SaaS applications is the database. While you can easily scale your application by adding more servers, scaling your database is a way harder problem. This is particularly true if your application benefits from relational database features, such as transactions, table joins, and database constraints.

At Citus, we make scaling your database easy. Over the past year, we added support for distributed transactions, made Rails and Django integration seamless, and expanded on our SQL support. We also documented approaches to scaling your SaaS database to thousands of customers.

Today, we’re excited to announce the latest release of our distributed database—Citus 7.5. With this release, we’re adding key features that make scaling your SaaS / multi-tenant database easier. If you’re into bulleted lists, these features include the following.

What’s New in Citus 7.5

  • Foreign key constraints from distributed to reference tables
  • SELECT .. FOR UPDATE
  • Per-customer (per-tenant) query statistics (aka Landlord)
  • Advanced Security: Row and column level access controls
  • Native integration with PostgreSQL extensions: HLL and TopN

To try these new features, you can download Citus packages on your local machine or create a Citus distributed database cluster on Citus Cloud. Or, keep reading to learn more about 7.5.

Example SaaS Application

Let’s write an ad analytics app which companies can use to view, change, and analyze their ads. Such an application has the characteristics of a typical SaaS / multi-tenant application. Data from different tenants is stored in a central database, and each tenant has an isolated view of their data.

First, you create two distributed tables using the standard PostgreSQL syntax.

CREATE TABLE companies (
    id bigint NOT NULL,
    name text NOT NULL,
    image_url text,
    CONSTRAINT pk_company_id PRIMARY KEY (id)
);

CREATE TABLE campaigns (
    id bigint NOT NULL,
    company_id bigint NOT NULL,
    country_id int NOT NULL,
    name text NOT NULL,
    cost_mo
[...]

Managing databases is no small task, and can easily be frustrating without knowing what’s happening under the covers. Whether trying to find out if new indexes are helpful, the transaction count on a database at a certain time, or who’s connected to the database at any given time, data that allows the administrators truly know how their databases are performing is king. Luckily, with PostgreSQL, that data for all of this is available in the PostgreSQL system catalog.

The PostgreSQL System Catalog is a schema with tables and views that contain metadata about all the other objects inside the database and more. With it, we can discover when various operations happen, how tables or indexes are accessed, and even whether or not the database system is reading information from memory or needing to fetch data from disk.

Here we will go over an overview of the system catalog, and highlight how to read it, and how to pull useful information from it. Some of the metadata is straightforward, and other pieces take a bit of digesting to generate real useful information. Either way, PostgreSQL gives us a great platform to build whatever information we need about the database itself.

The PostgreSQL Catalog

PostgreSQL stores the metadata information about the database and cluster in the schema ‘pg_catalog’. This information is partially used by PostgreSQL itself to keep track of things itself, but it also is presented so external people / processes can understand the inside of the databases too.

The PostgreSQL Catalog has a pretty solid rule: Look, don’t touch. While PostgreSQL stores all this information in tables like any other application would, the data in the tables are fully managed by PostgreSQL itself, and should not be modified unless an absolute emergency, and even then a rebuild is likely in order afterwards.

We will go over a few useful catalog tables, how to read the data, and clever things we can do with the data itself. There are quite a few tables in the catalog that we won’t go over, but all information for the

[...]
Posted by David Wheeler on 2018-08-02 at 04:31

For years, I’ve managed multiple versions of PostgreSQL by regularly editing and running a simple script that builds each major version from source and installs it in /usr/local. I would shut down the current version, remove the symlink to /usr/local/pgsql, symlink the one I wanted, and start it up again.

This is a pain in the ass.

Recently I wiped my work computer (because reasons) and started reinstalling all my usual tools. PostgreSQL, I decided, no longer needs to run as the postgres user from /usr/local. What would be much nicer, when it came time to test pgTAP against all supported versions of Postgres, would be to use a tool like plenv or rbenv to do all the work for me.

So I wrote pgenv. To use it, clone it into ~/.pgenv (or wherever you want) and add its bin directories to your $PATH environment variable:

git clone https://github.com/theory/pgenv.git
echo 'export PATH="$HOME/.pgenv/bin:$HOME/.pgenv/pgsql/bin:$PATH"' >> ~/.bash_profile

Then you’re ready to go:

pgenv build 10.4

A few minutes later, it’s there:

$ pgenv versions
pgsql-10.4

Let’s use it:

$ pgenv use 10.4
The files belonging to this database system will be owned by user "david".
This user must also own the server process.
#    (initdb output elided)
waiting for server to start.... done
server started
PostgreSQL 10.4 started

Now connect:

$ psql -U postgres
psql (10.4)
Type "help" for help.

postgres=# 

Easy. Each version you install – as far back as 8.0 – has the default super user postgres for compatibility with the usual system-installed version. It also builds all contrib modules, including PL/Perl using /usr/bin/perl.

With this little app in place, I quickly built all the versions I need. Check it out:

$ pgenv versions
     pgsql-10.3
  *  pgsql-10.4
     pgsql-11beta2
     pgsql-8.0.26
     pgsql-8.1.23
     pgsql-8.2.23
     pgsql-8.3.23
     pgsql-8.4.22
     pgsql-9.0.19
     pgsql-9.1.24
     pgsql-9.2.24
     pgsql-9.3.23
     pgsql-9.4.18
     pgsql-9.5.13
     pgsql-9.6.9

Other commands include start, stop, and restart, which

[...]

Crunchy Data recently announced the publication of the CIS PostgreSQL Benchmark by the Center for Internet Security, a nonprofit organization that provides publications around standards and best practices for securing technologies systems. This CIS PostgreSQL Benchmark builds on earlier work that Crunchy started when it helped to publish the PostgreSQL Security Technical Implementation Guide (PostgreSQL STIG) and provides guidance and steps to help secure your PostgreSQL databases.

What is a CIS Benchmark?

A CIS Benchmark is a set of guidelines and best practices for securely configuring a target system. Authoring a CIS Benchmark is a collaborative process as CIS involves considerable peer reviews and discussion before a major version is published, to ensure there is a general consensus on the best practices for deploying a secure system.

A previous article in the PostgreSQL Concurrency series covered how to manage concurrent retweets in an efficient way: in Computing and Caching, we learnt how to maintain a cache right in your PostgreSQL database, using MATERIALIZED VIEWS. We also had a look at how to take care of Batch Updates and Concurrency.

While in the first case we are providing a solution to a technical problem where we want to solve performance issues while keeping the same semantics, in the second case we are actually implementing a part of the application’s Business Logic as a scheduled job.

Today’s article shows a modern technique to handle the scheduling of those business oriented activities that are not tied to any user activity. When thinking about it this way, you certainly don’t want to implement the backbone of your business logic in a shell script that’s directly maintained in the production environment, do you?

You will want to mark your calendars folks on August 15th the Call for papers for PostgresConf Silicon Valley will close. That is just two weeks away!

So let's point your Firefox, Google Chrome, or Safari app to this link right here and get your presentation submitted.

This is the inaugural Silicon Valley conference and from the current submitted papers it is shaping up to be a fantastic event. We can't wait to see everyone and continue our mission of:




Postgres keeps getting better and better. In recent years, the Postgres community has added JSONB support, improved performance, and added so many usability enhancements. The result: you can work even more powerfully with your database. Over the past 8 years, my favorite two enhancements have been JSONB and pgstatstatements. Pgstatstatements is a built-in extension that allows you to get high level insights into queries that are being run as well as their performance—without having to be an expert and without needing a PhD in databases.

Introducing the new landlord feature in Citus 7.5

With Citus 7.5, we’ve gone one step beyond the awesomeness of pg_stat_statements and Postgres, with the new landlord feature in Citus—to give you per-tenant stats.

If you’re not (yet) familiar with our Citus distributed database, we serve a number of use cases from transactional SaaS applications (often for multi-tenant apps that need to grow, and grow, and grow) to powering real-time analytics for customer-facing dashboards.

Many of our multi-tenant customers often ask questions like:

  • Which tenant is most active?
  • How many of my queries are spanning shards?
  • Are any tenants creating load hotspots for me?

With the new landlord feature in Citus (citus_stat_statements), you can get clear insight into what your tenants are doing.

Getting started with landlord in Citus

Landlord parameterizes, like pg_stat_statements every query that is run against your database, how long it takes, and more information various internal activities of each query. Let’s start at the high level by running a SELECT * to citus_stat_statements:

SELECT * 
FROM citus_stat_statements;
  queryid   | userid | dbid  |                          query                          |   executor    | partition_key | calls 
------------+--------+-------+---------------------------------------------------------+---------------+---------------+-------
 1175789717 |     10 | 12558 | select count(*) from emails;                            | real-time     |               |     1
 
[...]
In the previous blog, we saw that parallel index scans leads to significantly improves the performance of quite a few TPC-H queries. It is customary  to analyse if its sister operator, namely index-only scan will benefit similarly when parallelised.

Before getting into that, we will briefly discuss the utility of an index-only scan. In PostgreSQL, indexes are stored at a different location than the tables. For a regular index scan on a table, first the index created on that table is scanned to find the relevant leaf nodes and then the table is scanned for those locations only. Now, if there is some query in which we need only the values of columns which have an index, then we can scan the index tree only and return the required data, since there is nothing extra that we need to retrieve from that table, that type of scan is called index-only scans. To be precise, index-only scans are a special type of access method which uses index alone and does not require to fetch data from the heap.

For example, an index-only scan is likely to show a performance improvement over a regular index scan for the query such as, SELECT count(*) FROM countries WHERE country_area <= <some value>. Assuming we have an index on the column country_area. Here, we can get the tuple information lesser than the required country area by index alone, hence, saving the I/O time to read the tables.

The design and implementation of parallel index-only scan is heavily dependent on the machinery developed for scanning B-tree in parallel. There are no new GUC parameters or configuration settings required for this scan.

Performance of parallel index-only scan

For the industry strength benchmark TPC-H on 300 scale factor, the performance of Q13 is improved by almost 5x with the usage of parallel index-only scan. For this experiment we used TPC-H inspired benchmark for PostgreSQL.  We used 300 scale factor, which gives 300+ GB of database, depending on the available indexes, etc. Additional indexes we created were on columns (l_shipmode, l_shipdate,
[...]

Note from Severalnines: This blog is being published posthumously as Berend Tober passed away on July 16, 2018. We honor his contributions to the PostgreSQL community and wish peace for our friend and guest writer.

Introduction

In a previous article we discussed the PostgreSQL serial pseudo-type, which is useful for populating synthetic key values with incrementing integers. We saw that employing the serial data type keyword in a table data definition language (DDL) statement is implemented as an integer type column declaration that is populated, upon a database insert, with a default value derived from a simple function call. This automated behavior of invoking functional code as part of the integral response to data manipulation language (DML) activity is a powerful feature of sophisticated relational database management systems (RDBMS) like PostgreSQL. In this article we delve further into another more capable aspect to automatically invoke custom code, namely the use of triggers and stored functions.

Use Cases for Triggers and Stored Functions

Let’s talk about why you might want to invest in understanding triggers and stored functions. By building DML code into the database itself, you can avoid duplicate implementation of data-related code in multiple separate applications that may be built to interface with the database. This ensures consistent execution of DML code for data validation, data cleansing, or other functionality such as data auditing (i.e., logging changes) or maintaining a summary table independently of any calling application. Another common use of triggers and stored functions is to make views writable, i.e., to enable inserts and/or updates on complex views or to protect certain column data from unauthorized modification. Additionally, data processed on the server rather than in application code does not cross the network, so there is some lesser risk of data being exposed to eavesdropping as well as a reduction in network congestion. Also, in PostgreSQL stored functions can be configure

[...]

In this world of all things digital where so many of us are online so much of the time—what with architecting, coding, QA'ing, blogging, and slacking—it’s kind of refreshing to step away from our devices and talk to other humans face-to-face at an event.

Especially when it’s a conference chock full of PostgreSQL open source people, from users to developers to community leaders.

Especially when it’s right in our own backyard here in San Francisco.

Especially when it’s PostgresOpen SV 2018.

While Citus Data is a global team—our engineering team is distributed across 5 countries from Canada to the US to Amsterdam to France to Turkey—our headquarters is South of Market in San Francisco, and our SF office is practically walking distance from the conference.

Ok, well, maybe it’s a bit too far to walk. But the point is, you could walk if you wanted to.

PostgresOpen SV 2018 will be in San Francisco on Sep 5-7

Just like last year, PostgresOpen SV will take place at the Parc 55 Hilton Hotel in SF—and bills itself as two days packed with rich PostgreSQL talks and sessions, plus another day of tutorials and trainings.

San Francisco is lovely in September, with amazing food at a million different restaurants, beautiful sunsets and views of the Golden Gate if you go for a walk at Crissy Field, and hills with trolley cars for your hill-riding pleasure. The weather is, well, I’m not a meteorologist, so who knows what it’ll be, but if you’re lucky you’ll get to see some of the ethereal fog that Karl the Fog’s twitter feed raves about.

The reason most of us like PostgresOpen SV: the people and the conversations.

Oh, and the Parc 55 Hilton Hotel is right off BART (one of our public transportation options in the Bay Area, that connects you to the SFO airport as well as Oakland and the East Bay) and within a 5 minute walk from shopping at Union Square. Once you arrive at the venue, you don’t even need to take a taxi, nor a Lyft, nor an Uber.

We’re looking forward to all the talks about Postgres'y things

Like you’d expect, pretty muc

[...]
In the last blog entry,  I talked about creating two instances of PostgreSQL 11 in Amazon RDS Database Preview Environment and setting up native logical replication. Today, Amazon RDS announced support for PostgreSQL 10.4 with native logical replication.  Let's see how to use this new support to replicate data from PostgreSQL 10 in Amazon RDS into PostgreSQL 11 instances in  preview environment  .

I  start with a new PostgreSQL 10.4 instance in Amazon RDS and populated it with data from an older dataset of IMDB.

benchdb-> \d
                          List of relations
 Schema |                 Name                  |   Type   |  Owner
--------+---------------------------------------+----------+---------
 public | acted_in                              | table    | pgadmin
 public | acted_in_idacted_in_seq               | sequence | pgadmin
 public | actors                                | table    | pgadmin
 public | aka_names                             | table    | pgadmin
 public | aka_names_idaka_names_seq             | sequence | pgadmin
 public | aka_titles                            | table    | pgadmin
 public | aka_titles_idaka_titles_seq           | sequence | pgadmin
 public | genres                                | table    | pgadmin
 public | keywords                              | table    | pgadmin
 public | movies                                | table    | pgadmin
 public | movies_genres                         | table    | pgadmin
 public | movies_genres_idmovies_genres_seq     | sequence | pgadmin
 public | movies_keywords                       | table    | pgadmin
 public | movies_keywords_idmovies_keywords_seq | sequence | pgadmin
 public | series                                | table    | pgadmin
(15 rows)

This "production" PostgreSQL 10 database also has data in it.

benchdb=> select count(*) from acted_in;
 count
--------
 618706
(1 row)

benchdb=> select count(*) from movies;
 count
--------
 183510
(1 row)

benchdb=> select count(*) from series;
 count
--------
 162498
(1 row)


In ord
[...]
Posted by Marco Slot in CitusData on 2018-07-25 at 15:20

One of the many unique abilities of SQL databases is to transform data using advanced SQL queries and joins in a transactional manner. Commands like UPDATE and DELETE are commonly used for manipulating individual rows, but they become truly powerful when you can use subqueries to determine which rows to modify and how to modify them. It allows you to implement batch processing operations in a thread-safe, transactional, scalable manner.

Citus recently added support for UPDATE/DELETE commands with subqueries that span across all the data. Together with the CTE infrastructure that we’ve introduced over the past few releases, this gives you a new set of powerful distributed data transformation commands. As always, we’ve made sure that queries are executed as quickly and efficiently as possible by spreading out the work to where the data is stored.

Let’s look at an example of how you can use UPDATE/DELETE with subqueries.

Batching data transformations using DML with subqueries

Let’s say we have a website where people can review movies. The review score is always shown alongside the title of movie, so we want looking up the score to be fast and the fastest way is to store it alongside the movie itself.

CREATE TABLE movies (
    movie_id bigint primary key,
    movie_name text not null,
    release_date date,
    movie_details jsonb,
    review_score double precision,
    review_count bigint not null default 0
);

SELECT create_distributed_table('movies', 'movie_id');

INSERT INTO 
    movies (movie_id, movie_name, release_date) 
VALUES
    (1, 'Star Wars: Episode I – The Phantom Menace', '1999-05-19'),
    (4, 'Star Wars: Episode IV - A New Hope', '1977-12-15');

It’s important to realise that updating the review score whenever a review is added is often a bad idea. An UPDATE takes a row-level lock, which blocks concurrent UPDATEs on the same row. If many people review a movie at the same time (e.g. when it was just released), then they will all have to wait for each other. Moreover, Postgres implements updates as a

[...]