PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
With more than 200 events submitted and approximately 80 slots to be filled, this has been the most difficult schedule to arrange in the history of PostgresConf. By far, the majority of content received we wanted to include in the schedule. It is that level of community support that we work so hard to achieve and we are thankful to the community for supporting PostgresConf. There is no doubt that the number one hurdle the community must overcome is effective access to education on Postgres. The US 2018 event achieves this with two full days of training and three full days of breakout sessions, including the Regulated Industry Summit and Greenplum Summit.

For your enjoyment and education here is our almost granite schedule!

See something you like? Then it is time to buy those tickets!

This event would not be possible without the continued support from the community and our ecosystem partners:

Couple of times, in various places, I was asked: what is the benefit from upgrading to some_version. So far, I just read release docs, and compiled list of what has changed. But this is not necessarily simple – consider upgrade from 9.3.2 to 10.2. That's a lot of changes. So, to be able to answer […]

Last week I wrote about locking behaviour in Postgres, which commands block each other, and how you can diagnose blocked commands. Of course, after the diagnosis you may also want a cure. With Postgres it is possible to shoot yourself in the foot, but Postgres also offers you a way to stay on target. These are some of the important do’s and don’ts that we’ve seen as helpful when working with users to migrate from their single node Postgres database to Citus or when building new real-time analytics apps on Citus.

1: Never add a column with a default value

A golden rule of PostgreSQL is: When you add a column to a table in production, never specify a default.

Adding a column takes a very aggressive lock on the table, which blocks read and write. If you add a column with a default, PostgreSQL will rewrite the whole table to fill in the default for every row, which can take hours on large tables. In the meantime, all queries will block, so your database will be unavailable.

Don’t do this:

-- reads and writes block until it is fully rewritten (hours?)
ALTER TABLE items ADD COLUMN last_update timestamptz DEFAULT now();

Do this instead:

-- select, update, insert, and delete block until the catalog is update (milliseconds)
ALTER TABLE items ADD COLUMN last_update timestamptz;
-- select and insert go through, some updates and deletes block while the table is rewritten
UPDATE items SET last_update = now();

Or better yet, avoid blocking updates and delete for a long time by updating in small batches, e.g.:

do {
  numRowsUpdated = executeUpdate(
    "UPDATE items SET last_update = ? " +
    "WHERE ctid IN (SELECT ctid FROM items WHERE last_update IS NULL LIMIT 5000)",
    now);
} while (numRowsUpdate > 0);

This way, you can add and populate a new column with minimal interruption to your users.

2: Beware of lock queues, use lock timeouts

Every lock in PostgreSQL has a queue. If a transaction B tries to acquire a lock that is already held by transaction A with a conflicting lock level, then transaction B will wait in the l

[...]

One of the many reasons that PostgreSQL is fun to develop with is its robust collection of data types, such as the range type. Range types were introduced in PostgreSQL 9.2 with out-of-the-box support for numeric (integers, numerics) and temporal ranges (dates, timestamps), with infrastructure in place to create ranges of other data types (e.g. inet/cidr type ranges). Range data is found in many applications, from science to finance, and being able to efficiently compare ranges in PostgreSQL can take the onus off of applications workloads.

As more and more posts on PostgreSQL 11 appear on the web, the more outdated you may feel when using Postgres 9. Although the PostgreSQL 10 version release only happened just months ago, people are already are talking about the next version. Things are moving, so you don’t want to be left behind. In this blog we will discuss what you need to know to upgrade to the latest version, Postgres 10.

Upgrade Options

The first thing you should be aware of before you start is that there are several ways of doing the upgrade:

  1. Traditional pg_dumpall(pg_dump) / pg_restore(psql)
  2. Traditional pg_upgrade
  3. Trigger based replication (Slony, self-written)
  4. Using pglogical replication

Why is there such a variety? Because each has a different history, requiring different efforts to be set up and offering different services. Let's look closer at each of them.

Traditional Dump/Restore

pg_dump t > /tmp/f
psql -p 5433 -f /tmp/f

Traditional dump/restore takes the longest time to complete and yet it is often a popular choice for those who can afford the downtime. First, it's as easy as taking a logical backup and restoring it to a new, higher version of the database. You could say it's not an upgrade, really, as you "import" your data to a "new structure". As a result you will end up with two setups - one old (lower version) and the newly upgraded one. If the restoration process finishes without error, you are pretty much there. If not, you have to modify the existing old cluster to eliminate any errors and start the process over again.

If you use psql for import, you might also need to create some preload scripts yourself to execute on the new setup prior to migration. For example, you would want to pg_dumpall -g to get a list of needed roles to prepare in the new setup, or the opposite run pg_dump -x to skip permissions from old one. This process is pretty simple on small databases, the complexity grows with the size and complexity of your db structure and depends on what features you have setup. Basically for this method to be successf

[...]

The Citus distributed database scales out PostgreSQL through sharding, replication, and query parallelization. For replication, our database as a service (by default) leverages the streaming replication logic built into Postgres.

When we talk to Citus users, we often hear questions about setting up Postgres high availability (HA) clusters and managing backups. How do you handle replication and machine failures? What challenges do you run into when setting up Postgres HA?

The PostgreSQL database follows a straightforward replication model. In this model, all writes go to a primary node. The primary node then locally applies those changes and propagates them to secondary nodes.

In the context of Postgres, the built-in replication (known as “streaming replication”) comes with several challenges:

  • Postgres replication doesn’t come with built-in monitoring and failover. When the primary node fails, you need to promote a secondary to be the new primary. This promotion needs to happen in a way where clients write to only one primary node, and they don’t observe data inconsistencies.
  • Many Postgres clients (written in different programming languages) talk to a single endpoint. When the primary node fails, these clients will keep retrying the same IP or DNS name. This makes failover visible to the application.
  • Postgres replicates its entire state. When you need to construct a new secondary node, the secondary needs to replay the entire history of state change from the primary node. This process is resource intensive—and makes it expensive to kill nodes in the head and bring up new ones.

The first two challenges are well understood. Since the last challenge isn’t as widely recognized, we’ll examine it in this blog post.

Three approaches to replication in PostgreSQL

Most people think that when you have a primary and secondary architecture, there’s only one way to set up replication and backups. In practice, Postgres deployments follow one of three approaches.

  1. PostgreSQL streaming replication to replicate data from primary
[...]
Table of Contents Autovacuum Bgwriter Some time ago, I discovered Netdata. This tool allows to collect many metrics (CPU, network, etc). The strength of Netdata is to be quite light and easy to use. It collects data every second, everything is stored in memory. There is actually no history. The goal is to have access to a set of metrics in real time. 1 I added several charts for PostgreSQL, taking a lot of ideas from check_pgactivity.
Posted by Simon Riggs in 2ndQuadrant on 2018-02-20 at 16:44

Various limits on the PostgreSQL database are listed here: https://www.postgresql.org/about/

One of those limits is the Maximum Table Size, listed as 32TB. It’s been that way for many years now.

Only problem is that it has always been wrong, slightly. And now its got much, much bigger.

The table size is limited by the maximum number of blocks in a table, which is 2^32 blocks. The default block size is 8192 bytes, hence the default limit was 32TB as listed. That was wrong in a two ways, because PostgreSQL has always had a configurable block size which allows up to 32768 bytes per block, which would give a maximum size of 128TB. Only problem is that requires an unload/reload to change the block size, so the effective limit per table was 32TB as advertized.

PostgreSQL has always supported Table Inheritance, which has been used in some cases to implement something similar to Table Partitioning in other databases. The big blocker there was that this wasn’t handled well in the Optimizer, so wasn’t easily usable. PostgreSQL’s initial attempt at that was by myself in PostgreSQL 8.1 in 2005, where we introduced constraint_exclusion, though by my own admission that needed more major work. Various tweaks helped, but didn’t change the game significantly. Major work came in the form of two failed attempts to add Partitioning, the first one using Rules and the second one using Triggers, neither of which was very practical. Luckily the next attempt was some years in the planning and started from a good design before it was written, leading to a successsful implementation of Declarative Partitioning in PostgreSQL 10. There is still work to be done and I’m pleased to say it looks like many of the issues will be solved in PostgreSQL 11, with much work contributed by a great many folk from the big names in PostgreSQL new feature development: 2ndQuadrant, EnterpriseDB, NTT Data (listed alphabetically).

Oh yeah, maximum table size. Partitioning theoretically allows you to have one big table made up of many smaller tables. We store th

[...]

PostgreSQL error reporting follows a style guide aimed at providing the database administrator with the information required to efficiently troubleshoot issues. Error messages normally contain a short description, followed by some detailed information, and a hint, if applicable, suggesting the solution. There are other fine details, explained in the guide, such as the use of past or present tense to indicate if the error is temporary or permanent.

Types of Errors and Severity Levels

When reporting errors, PostgreSQL will also return an SQLSTATE error code, therefore errors are classified into several classes. When reviewing the list of classes, note that success and warning are also logged by PostgreSQL to the error log — that is because logging_collector, the PostgreSQL process responsible for logging, sends all messages to stderr by default.

When the logging collector has not been initialized, errors are logged to the system log. For example, when attempting to start the service following the package installation:

[root@omiday ~]# systemctl start postgresql
Job for postgresql.service failed because the control process exited with error code.
See "systemctl  status postgresql.service" and "journalctl  -xe" for details.
[root@omiday ~]# systemctl status postgresql
● postgresql.service - PostgreSQL database server
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2018-01-24 19:10:04 PST; 8s ago
Process: 1945 ExecStartPre=/usr/libexec/postgresql-check-db-dir postgresql (code=exited, status=1/FAILURE)

Jan 24 19:10:04 omiday.can.local systemd[1]: Starting PostgreSQL database server...
Jan 24 19:10:04 omiday.can.local postgresql-check-db-dir[1945]: Directory "/var/lib/pgsql/data" is missing or empty.
Jan 24 19:10:04 omiday.can.local postgresql-check-db-dir[1945]: Use "/usr/bin/postgresql-setup --initdb"
Jan 24 19:10:04 omiday.can.local postgresql-check-db-dir[1945]: to initialize the database cluster.
Jan 24 19:10:04 omi
[...]

PostgreSQL is a powerful RDBMS which can store data as well as do some manipulations on it such as search or calculating some analytical information and one the techniques is FTS. The Full Text Search (or FTS for short) was introduced in PostgreSQL firstly as a tsearch extension and added into PostgreSQL 8.3 as a core feature. Unfortunately, since PostgreSQL 8.3 FTS almost didn’t change and have some issues which were detected during years of the work with it. I’m working on a patch to extend the abilities and flexibility of FTS configuration mechanism in PostgreSQL and want to describe it in more details in this post.

“How does the PostgreSQL optimizer handle views?” or “Are views good or bad?” I assume that every database consultant and every SQL performance expert has heard this kind of question already in the past. Most likely this does not only hold true for PostgreSQL experts but is also true for Oracle, DB2 and MS SQL consultants. Given the fact that views are a really essential feature of SQL it makes sense to take a closer look at the topic in general and hopefully help some people to write better and faster code.

PostgreSQL: What it does to views

Let us create a simple table containing just 10 rows, which can be used throughout the blog to show, how PostgreSQL works and how the optimizer treats things:

test=# CREATE TABLE data AS
        SELECT  *
        FROM    generate_series(1, 10) AS id;
SELECT 10

Then I have created a very simple view:

test=# CREATE VIEW v1 AS
        SELECT  *
        FROM    data
        WHERE   id < 4;
CREATE VIEW

The idea here is simply to filter some data and return all the columns.

Inlining and flattening

The key thing is: The optimizer will process the view just like a “preprocessor” directive. It will try to inline the code and flatten it. Here is an example:

test=# explain SELECT * FROM v1;
                      QUERY PLAN                       
-------------------------------------------------------
 Seq Scan on data  (cost=0.00..41.88 rows=850 width=4)
   Filter: (id < 4)
(2 rows)

When we try to read from the view it is just like running the SQL statement directly. The optimizer will perform the following steps:

SELECT  *
FROM    (SELECT         *
                FROM    data
                WHERE   id < 4
        ) AS v1;

In the next step the subselect will be flattened out completely, which leaves us with:

SELECT * FROM data WHERE id < 4;

We could have done this transformation ourselves but it is easier to make the optimizer do it for us and enjoy the convenience offered by the view.

Joining views

Basically the same mechanism applies, when you are working with joins. Postg

[...]
Posted by Oleg Bartunov in Postgres Professional on 2018-02-19 at 08:56
Nikita Glukhov has managed to compile the postgresql(master branch) + sql/json patches for Termux, so now it's possible to play with sql/json on android phone, using deb-file postgresql-sqljson_11.0-dev-0_aarch64.deb.

You can download the sample database to play with examples in Jsonpath Introduction.



In case you need more details:


0. Install Termux from Google play. I also installed "Hacker keyboard" to be comfortable with command line.
1. Make sure Termux has access to /sdcard ( check under "Settings"->"Apps"->"Termux"->Permissions"->Storage)
2. Download on your phone deb-file postgresql-sqljson_11.0-dev-0_aarch64.deb. Another option is to install curl ( pkg install curl) and then download deb-file directly to your home directory.
3. Run 'dpkg -i /Sdcard/Download/postgresql-sqljson_11.0-dev-0_aarch64.deb'.
4. Initialize cluster 'initdb -D ~/data'
5. Run postgres 'pg_ctl -D ~/data start'
6. psql -d postgres


Example table house.sql:

CREATE TABLE house(js) AS SELECT jsonb '
{
  "info": {
    "contacts": "Postgres Professional\n+7 (495) 150-06-91\ninfo@postgrespro.ru",
    "dates": ["01-02-2015", "04-10-1957 19:28:34 +00", "12-04-1961 09:07:00 +03"]
  },
  "address": {
    "country": "Russia",
    "city": "Moscow",
    "street": "117036, Dmitriya Ulyanova, 7A"
  },
  "lift": false,
  "floor": [
    {
      "level": 1,
      "apt": [
        {"no": 1, "area": 40, "rooms": 1},
        {"no": 2, "area": 80, "rooms": 3},
        {"no": 3, "area": null, "rooms": 2}
      ]
    },
    {
      "level": 2,
      "apt": [
        {"no": 4, "area": 100, "rooms": 3},
        {"no": 5, "area": 60, "rooms": 2}
      ]
    }
  ]
}

';



Last week Tom Lane pushed few significant and big patches that impacts plpgsql engine. Probably any plpgsql related tool should be fixed lot. plpgsql_check is fixed already. I rechecked support for PostgreSQL 9.4, 9.5, 9.6 and 10.
I was faced with interesting problem. Which schema, in my DB, uses the most disk space? Theoretically it's trivial, we have set of helpful functions: pg_column_size pg_database_size pg_indexes_size pg_relation_size pg_table_size pg_tablespace_size pg_total_relation_size But in some cases it becomes more of a problem. For example – when you have thousands of tables … For my sample […]

Database Indexing is the use of special data structures that aim at improving performance, by achieving direct access to data pages. A database index works like the index section of a printed book: by looking in the index section, it is much faster to identify the page(s) which contain the term we are interested in. We can easily locate the pages and access them directly. This is instead of scanning the pages of the book sequentially, till we find the term we are looking for.

Indexes are an essential tool in the hands of a DBA. Using indexes can provide great performance gains for a variety of data domains. PostgreSQL is known for its great extensibility and the rich collection of both core and 3rd party addons, and indexing is no exception to this rule. PostgreSQL indexes cover a rich spectrum of cases, from the simplest b-tree indexes on scalar types to geospatial GiST indexes to fts or json or array GIN indexes.

Indexes, however, as wonderful as they seem (and actually are!) don’t come for free. There is a certain penalty that goes with writes on an indexed table. So the DBA, before examining her options to create a specific index, should first make sure that the said index makes sense in the first place, meaning that the gains from its creation will outweigh the performance loss on writes.

PostgreSQL basic index terminology

Before describing the types of indexes in PostgreSQL and their use, let’s take a look at some terminology that any DBA will come across sooner or later when reading the docs.

  • Index Access Method (also called as Access Method): The index type (B-tree, GiST, GIN, etc)
  • Type: the data type of the indexed column
  • Operator: a function between two data types
  • Operator Family: cross data type operator, by grouping operators of types with similar behaviour
  • Operator Class (also mentioned as index strategy): defines the operators to be used by the index for a column

In PostgreSQL’s system catalog, access methods are stored in pg_am, operator classes in pg_opclass, operator families in pg_opfamily.

[...]

Transaction control in PL procedures

A couple of months back, I wrote about how we now have the ability to write Stored Procedures in PostgreSQL. This post follows up on that and talks about the next step that was implemented: transaction control in PL procedures. The feature was committed on 22-Jan-2018.

With this addition, you now have the ability to call COMMIT and ROLLBACK commands in PL/pgSQL from within a procedure. To illustrate:

CREATE TABLE test1 (a int);

CREATE PROCEDURE transaction_test1()
 AS $$
 BEGIN
     FOR i IN 0..9 LOOP
         INSERT INTO test1 (a) VALUES (i);
         IF i % 2 = 0 THEN
             RAISE NOTICE 'i=%, txid=% will be committed', i, txid_current();
             COMMIT;
         ELSE
             RAISE NOTICE 'i=%, txid=% will be rolledback', i, txid_current();
             ROLLBACK;
         END IF;
     END LOOP;
 END
 $$
 LANGUAGE PLPGSQL;

The results are as follows:

test=# CALL transaction_test1();
 NOTICE:  i=0, txid=723 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=1, txid=724 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 NOTICE:  i=2, txid=725 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=3, txid=726 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 NOTICE:  i=4, txid=727 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=5, txid=728 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 NOTICE:  i=6, txid=729 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=7, txid=730 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 NOTICE:  i=8, txid=731 will be committed
 CONTEXT:  PL/pgSQL function transaction_test1() line 6 at RAISE
 NOTICE:  i=9, txid=732 will be rolledback
 CONTEXT:  PL/pgSQL function transaction_test1() line 9 at RAISE
 CALL

test=# SELECT 
[...]
We have started the Pgpool-II 3.8 development cycle.

https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=fb831287a2ed40505cb9a37136d1f0d0c5d5b22f

New items for 3.8 are summarized in the  Pgpool-II development wiki page.

In the wiki page, pinkish items are possible new features for 3.8. Discussions are in pgpool-hackers mailing list as usual.

There are some interesting features including SCRAM authentication in Pgpool-II. So please join the discussion!

At Citus Data, we engineers take an active role in helping our customers scale out their Postgres database, be it for migrating an existing application or building a new application from scratch. This means we help you with distributing your relational data model—and also with getting the most out of Postgres.

One problem I often see users struggle with when it comes to Postgres is locks. While Postgres is amazing at running multiple operations at the same time, there are a few cases in which Postgres needs to block an operation using a lock. You therefore have to be careful about which locks your transactions take, but with the high-level abstractions that PostgreSQL provides, it can be difficult to know exactly what will happen. This post aims to demystify the locking behaviors in Postgres, and to give advice on how to avoid common problems.

Postgres is pretty amazing with its support for complex, concurrent, ACID transactions

PostgreSQL is unique among open source databases in its support for complex, concurrent, ACID transactions.

To make sure complex transactions can safely run at the same time, PostgreSQL uses several layers of locks to serialise changes to critical sections of the database. Transactions run concurrently until they try to acquire a conflicting lock, for example when they update the same row. In that case, the first transaction to acquire the lock can proceed, and the second one waits until the first transaction commits or aborts.

Table locks block reads &/or writes during DDL

Whenever you run a command or a query on a table, you take a lock on the table. The primary purpose of table-level locks is to block reads and/or writes when changes to the underlying table structure are made during DDL commands such as ALTER TABLE. However, not all DDL commands need to block reads or writes, some only block each other.

Whether a transaction can acquire a lock depends on whether its “lock level” conflicts with that of the holder(s) of the lock. The PostgreSQL documentation gives a detailed overview

[...]

Pgbench is a very well known and handy built-in tool that Postgres DBAs can use for quick performance benchmarking. Main functionality/flow is super simple, but it also has some great optional features like running custom scripts and specifying different probabilities for them. One can also use bash commands to fill query variables for example.

But the thing that has always annoyed me a bit is the fact that one cannot specify the desired database or table size, but has to think in so called “scaling factor” numbers. And we know from documentation that scaling factor of “1” means 100k rows in the main data table “pgbench_accounts”. But how the does scaling factor of 100 (i.e. 10 Million bank accounts) translate to disk size? Which factor do I need to input when wanting to quickly generate a database of 50 GB to see how would random updates perform in case the dataset does not fit into RAM/shared buffers. Currently a bit of trial and error involved 🙁 So how to get rid of the guesswork and be a bit more deterministic? Well I guess one needs a formula that translates input of DB size to scaling factor!

Test data generation

One way to arrive at the magic formula would be to generate a lot of sample data for various scaling factors, measure the resulting on-disk sizes and deriving a formula out of it. That’s exactly what I’ve done.

So I hatched up the below script and booted up a GCE instance, destined to churn the disks nonstop for a bit more than a whole day as it appeared – I’m sorry little computer 🙂 The script is very simple – it runs the pgbench schema initialization with different scaling factors from 1 to 50k and stores the resulting DB/table/index sizes in a table, so that later some statistics/maths could be applied to infer a formula. NB! Postgres version used was 10.1.

psql -c "create table if not exists init_results (scale int not null, table_size_mb int not null, index_size_mb int not null, db_size_mb int not null);"

SCALES="1 10 50 100 150 200 250 300 350 400 450 500 750 1000 1500 2000 2500 3000 3500 400
[...]

The purpose of benchmarking a database is not only to check capability of database, but also the behavior of a particular database against your application. Different hardwares provide different results based on the benchmarking plan that you set. It is very important to isolate the server (the actual one being benchmarked) from other elements like the servers driving the load, or the servers used to collect and store performance metrics. As part of the benchmarking exercise, you must get the application characteristics like a) Is the application is read or write intensive? or b) what is the read/write split (e.g. 80:20)? or c) How large is the dataset?, is the data and structure representative of the actual production database, etc.

PostgreSQL is world's most advanced open source database. If any enterprise RDBMS customer wants to migrate their database to opensource, then PostgreSQL would be the first option to evaluate.

This post covers the following:

  • How to benchmark PostgreSQL
  • What are the key performance factors in PostgreSQL
  • What are levers you can pull to increase performance
  • What are performance pitfalls to avoid
  • What are common mistakes people make?
  • How do you know if your system is performing? What tools can you use?

How to benchmark PostgreSQL

The standard tool to benchmark PostgreSQL is pgbench. By default, pgbench tests are based on TPC-B. It involves 5 SELECT, INSERT, and UPDATE commands per transaction. However, depending on your application behavior, you can write your own script files. Let us look into the default and some script oriented test results. We are going to use the latest version of PostgreSQL for these tests, which is PostgreSQL 10 at the time of writing. You can install it using ClusterControl, or using the instructions here: https://www.openscg.com/bigsql/package-manager/.

Specs of machine

Version: RHEL 6 - 64 bit
Memory : 4GB
Processors: 4
Storage: 50G
PostgreSQL version: 10.0
Database Size: 15G

Before you run benchmarking with pgbench tool, you would need to initialize it bel

[...]

Having a unified view when combining data from different sources is essential for every business. Whether you want to connect to a cloud application, import/export data to/from a text file, or sync data between your datasource and a 3rd party platform – having a database platform that provides a seamless and effective data integration process is critical.

This week on Tuesday, 13 February, 2ndQuadrant hosted the second webinar in their PostgreSQL webinar series. Principal Consultant, Shaun Thomas, conducted the webinar, covering various topics around integration of PostgreSQL with other data sources and database technologies.

The recording of the Webinar is now available. Those who couldn’t make it to the live session can now view Data Integration with PostgreSQL here.

Below, Shaun provides answers to the questions that he was unable to respond to during the live webinar.

Q1: Can all FDWs read/write to their respective backends?

A: Unfortunately not. The Mongo FDW wasn’t able to write until relatively recently. Every FDW has a different answer for this, but good maintainers are upfront about it.

Q2: Why would we want a database with none of its own data?

A: To give it to anyone, use it for reports, drop and recreate it to capture external DDL changes.

Q3: Is it possible to read from Kafka?

A: It is not without a separate wrapper to decode the Kafka topic contents.

Q4: What is the database IDE (tool) that you are using to display the colored output of commands? (like SQL Developer in Oracle)

A: The IDE is Geany, my primary text editor.

Q5: How does postgres FDW handle lost connections? Is there any caching available other than materialized views?

A: Lost connections will be reconnected when a new query is submitted. Any in-transit activity is lost until then.

Q6: Is there an established FDW for Cassandra?

A: The Cassandra FDW available on Github is the latest known native wrapper, but it hasn’t been updated in two years. The others are based on Multicorn, and are consequently using Python to interface with Cassand

[...]
Posted by Jean-Jerome Schmidt on 2018-02-14 at 08:08

Are you wondering what Postgresql schemas are and why they are important and how you can use schemas to make your database implementations more robust and maintainable? This article will introduce the basics of schemas in Postgresql and show you how to create them with some basic examples. Future articles will delve into examples of how to secure and use schemas for real applications.

Firstly, to clear up potential terminology confusion, let’s understand that in the Postgresql world, the term “schema” is maybe somewhat unfortunately overloaded. In the broader context of relational database management systems (RDBMS), the term “schema” might be understood to refer to the overall logical or physical design of the database, i.e., the definition of all the tables, columns, views, and other objects that constitute the database definition. In that broader context a schema might be expressed in an entity-relationship (ER) diagram or a script of data definition language (DDL) statements used to instantiate the application database.

In the Postgresql world, the term “schema” might be better understood as a “namespace”. In fact, in the Postgresql system tables, schemas are recorded in table columns called “name space”, which, IMHO, is more accurate terminology. As a practical matter, whenever I see “schema” in the context of Postgresql I silently reinterpret it as saying “name space”.

But you may ask: “What's a name space?” Generally, a name space is a rather flexible means of organizing and identifying information by name. For example, imagine two neighboring households, the Smiths, Alice and Bob, and the Jones, Bob and Cathy (cf. Figure 1). If we used only first names, it might get confusing as to which person we meant when talking about Bob. But by adding the surname name, Smith or Jones, we uniquely identify which person we mean.

Oftentimes, name spaces are organiz

[...]
I have started writing document about upcoming SQL/JSON support in PostgreSQL. Currently, I mostly concentrated on JSONPATH. Read "Jsonpath introduction" on github. Consider it as WIP, more info will be added.

My intention is to make the standard more understandable for user. I appreciate any help.

On 7th of February 2018, Tom Lane committed patch: Support all SQL:2011 options for window frame clauses. This patch adds the ability to use "RANGE offset PRECEDING/FOLLOWING" frame boundaries in window functions. We'd punted on that back in the original patch to add window functions, because it was not clear how to do it in […]
Support parallel btree index builds. To make this work, tuplesort.c and logtape.c must also support parallelism, so this patch adds that infrastructure and then applies it to the particular case of parallel btree index builds. Testing to date shows that this can often be 2-3x faster than a serial index build. The model for deciding […]

Row Level Security, aka "RLS," allows a database administrator to define if a user should be able to view or manipulate specific rows of data within a table according to a policy. Introduced in PostgreSQL 9.5, row level security added another layer of security for PostgreSQL users who have additional security and compliance considerations for their applications.

At Crunchy Data, we care a lot about data security and supporting PostgreSQL. When we discovered an issue with creating a large row level security policy, we worked to find a workaround that provided a secure outcome but avoided creating a custom fork of PostgreSQL.  As of the publication of this post, the Crunchy Data team is working to address the issue with a patch that will be submitted for inclusion upstream.

To understand the issue and how our team found a solution, I'll first give a quick primer of how row level security works in PostgreSQL.

Is there a single significantly large and important database in the world, which does not suffer from performance problems once in a while? I bet that there are not too many. Therefore every DBA (database administrator) in charge of PostgreSQL should know, how to track down potential performance problems to figure out, what is really going on.

Improving PostgreSQL performance beyond parameter tuning

Many people think that changing parameters in postgresql.conf are the real way to success. However, this is not always the case. Sure, more often than not good database config parameters are highly beneficial. Still, in many cases the real problems will come from some strange query hidden deep in some application logic. It is even quite likely that those queries causing real issues are not the ones you happen to focus on. The natural question now arising is: How can we track down those queries and figure out, what is really going on? My favorite tool to do that is pg_stat_statements, which should always be enabled in my judgement if you are using PostgreSQL 9.2 or higher (please do not use it in older versions).

Enabling pg_stat_statements

To enable pg_stat_statements on your server change the following line in postgresql.conf and restart PostgreSQL:

shared_preload_libraries = ‘pg_stat_statements’

Once this module has been loaded into the server, PostgreSQL will automatically start to collect information. The good thing is that the overhead of the module is really really low (the overhead is basically jsut “noise”).

Then run the following command to create the necessary view to access the data:

CREATE EXTENSION pg_stat_statements;

The extension will deploy a view called pg_stat_statements and make the data easily accessible.

Detecting slow queries in PostgreSQL

The easiest way to find the most interesting queries is to sort the output of pg_stat_statements by total_time:

SELECT * FROM pg_stat_statements ORDER BY total_time DESC;

The beauty here is that the type of query, which is consuming most of time time, will

[...]
Posted by Pavel Stehule on 2018-02-12 at 06:23
bugfix release - fixed crash after searching https://github.com/okbob/pspg/releases/tag/0.9.3
Posted by Andrew Gierth on 2018-02-11 at 15:57

Another quick one from the IRC channel. The problem: to import a json data file (array of objects) on the client system into PostgreSQL without needing to mess with external scripting for the job (and without relying on Unix tools). This approach works for files under 1GB only, of course.

begin;
-- import the json to a lo object and note the OID
\lo_import :filename
\set obj :LASTOID

-- change this to an insert or whatever you need.
-- the 'UTF8' is the encoding to assume that the json data is
-- already in, it will be converted to the database encoding if that
-- differs.
-- the "as r(a integer, ...)" part should be edited to match the json
-- object key names (remember to "quote" names that aren't lowercase)
-- and the desired column types.
create table testtab as
  select *
    from json_to_recordset(convert_from(lo_get(:'obj'),'UTF8')::json)
           as r(a integer, b boolean, c text);

-- clean up the object
\lo_unlink :obj
commit;

Put the above in a file, e.g. import.sql and run as:

psql -v filename="myfile.json" -d mydb -f import.sql
Posted by Federico Campoli on 2018-02-10 at 12:10
An ancient italian proverb says Il backup è quella cosa che andava fatta prima. A rough translation could be something like this: the backup is the thing that had to be done before. The concept of backup is quite confusing. Some people wrongly believe that executing a local backup is sufficient to ensure the recover of the data. In the real world a valid backup is present only if three conditions are true.