PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Andreas Scherbaum on 2018-11-16 at 21:05
Author
Andreas 'ads' Scherbaum

The PostgreSQL Project participates in Google Code-In (GCI) 2018. This is a program which allows pre-university students to pick up tasks defined by the partnering open source projects, learn about these projects, and also win a prize (certificates, t-shirts, hoodies, but also a trip to Google HQ).

 

Every project creates a number different tasks, some technical, some design based, some about updating documentation, or validating bugs. Whatever is useful in order to get to know the project better. Students can select tasks and submit their work. Mentors from the project then evaluate the work, and either approve it or send it back to the student because more work is needed.

 

Now we are halfway into this year's competition, it's time to run the numbers.

 

Continue reading "Google Code-In 2018 - Halftime"
Posted by Denish Patel on 2018-11-16 at 16:14

Postgres supported table partitioning  implementation based on inheritance and triggers for over more than a decade now. However, the declarative partition support was added in Postgres 10 release in Oct 2017.  Since Postgres 10, Postgres  supports built-in declarative partitioning so it was easier to create partitions but you still need to manage trigger to update records on parent table. Additionally, you couldn’t able to add Primary Key and Foreign Keys on partitioned tables. The recent release of Postgres 11 solves all of these problems.

Postgres 11 adds a lot more partitioning features to manage partitioned tables easier than ever! Below is the comparison of partitioning features across Postgres releases:

feature Postgres – 11 postgres -10 9.6
Declarative table partitioning Yes Yes No
Default Partition –

A default partition stores data that does not match the partition key for any other partition

Yes No No
Partitioning by a HASH key Yes Yes No
Support for PRIMARY KEY, FOREIGN KEY, indexes, and triggers on partitioned tables Yes No No
UPDATE on a partition key –

When a partition key is updated on a row, the row is moved to the appropriate partition.

Yes No No

Postgres 11 supports RANGE, LIST and HASH partition types. You can also create sub-partitions  on child tables too!

Let’s take an example to partition the table using RANGE and LIST partition types.

RANGE Partitioning:
 -- create parent table to store SMS campaign subscribers 
app=# CREATE TABLE sms_campaign_subscribers (id bigint not null, sms_campaign_id bigint not null) PARTITION BY RANGE (sms_campaign_id);
CREATE TABLE

-- create child table to store campaign with sms_campaign_id >= 111 and < 112
app=# CREATE TABLE sms_campaign_subscriber_111 PARTITION OF sms_campaign_subscribers FOR VALUES FROM (111) TO (112) ;
CREATE TABLE

-- Describe parent table
app=# \d+ sms_campaign_subscribers
 Table "public.sms_campaign_subscribers"
 Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------+
[...]
Posted by Douglas Hunley on 2018-11-16 at 11:12

pgBackRest 2.07 was announced today. As usual, I immediately downloaded it and tried to get it up and running on my MacBook (currently running Mojave). It wasn’t as straightforward as one might hope, and the online instructions assume a Linux system, so I figured I’d write this up for anyone else attempting the same.

Since this is OSX, we have to do some work to make things right before we even start with the pgBackRest code. First up, get a real OpenSSL install. We’ll use Homebrew for this:

> brew install openssl
[output snipped]
> openssl version -a
LibreSSL 2.6.4
built on: date not available
platform: information not available
options:  bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) blowfish(idx)
compiler: information not available
OPENSSLDIR: "/private/etc/ssl"
> /usr/local/opt/openssl/bin/openssl version -a
OpenSSL 1.0.2p  14 Aug 2018
built on: reproducible build, date unspecified
platform: darwin64-x86_64-cc
options:  bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) idea(int) blowfish(idx)
compiler: clang -I. -I.. -I../include  -fPIC -fno-common -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -arch x86_64 -O3 -DL_ENDIAN -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
OPENSSLDIR: "/usr/local/etc/openssl"

As you can see, the default SSL from OSX is in /usr/bin while the newly installed OpenSSL is in /usr/local/opt/openssl. In my testing, this is enough to proceed with pgBackRest but I prefer to have the openssl binary match the libs and I’m a glutton for punishment, so I replace the OSX binary with the Homebrew one:

> sudo mv /usr/bin/openssl /usr/bin/openssl.old
> sudo ln -s /usr/local/opt/openssl/bin/openssl /usr/bin
> ls -ld /usr/bin/openssl*
lrwxr-xr-x 1 root wheel   34 Nov 16 11:39 /usr/bin/openssl -> /usr/local/opt/openssl/bin/openssl*
-rwxr-xr-x 1 root wheel 1.2M Sep 21 00:16 /usr/bin/openssl.old*

OK, so now we h

[...]

I spent a week in the magnificent city of Lisbon attending the annual European PostgeSQL Conference. This marked the 10th anniversary since the first European PostgreSQL conference and my sixth time attending.

First Impressions

The city was great, the atmosphere was great and it seemed that it would be a very productive and informative week full of interesting conversations with intelligent and friendly people. So basically the very first cool thing I learned in Lisbon is how great Lisbon and Portugal are, but I guess you came here for the rest of the story!

Shared Buffers

We attended the training session “PostgreSQL DBA toolbelt for day-to-day ops”

by Kaarel Moppel (Cybertec). One thing I noted was the setting of shared_buffers. Since shared_buffers actually competes or complements system’s cache it shouldn’t be set to any value between 25% and 75% of the total RAM available. So while, in general, the recommended setting for typical workloads is 25% of RAM, it could be set to >= 75% for special cases, but not in between.

Other things we learned in this session:

  • unfortunately easy online (or offline) activation/enablement of data-checksums is not yet in 11 (initdb/logical replication remains the only option)
  • beware of vm.overcommit_memory, you better disable it by setting it to 2. Set vm.overcommit_ratio to about 80.

Advanced Logical Replication

In the talk of Petr Jelinek (2nd Quadrant), the original authors of logical replication, we learned about more advanced uses of this new exciting technology:

  • Centralized Data Collection: we may have multiple publishers and then a central system with a subscriber to each of those publishers, making data from various sources available in a central system. (typical use: OLAP)
  • Shared global data or in other words a central system for maintaining global data and parameters (such as currencies, stocks, market/commodity values, weather, etc) which publishes to one or more subscribers. Then these data are maintained only in one system but available in all subscribers.
  • Log
[...]
Announcing PostgresConf Silicon Valley 2019, September 18th - 20th at the Hilton San Jose! An absolute perfect pairing of training, breakout sessions, and a fantastic weekend break to enjoy the valley for every speaker, attendee, volunteer, and organizer. 

Didn't you just host the community at PostgresConf Silicon Valley, you ask? Why yes we did! That event was October 15th and 16th of 2018. The event was such an unexpected success that we immediately started working with the hotel to lock in our dates for 2019. We requested mid-October to early November. Unfortunately, the only week they had available was the week of September 15th, 2019. We are again working with the Silicon Valley Postgres Meetup; the fastest growing Postgres meetup in the United States. 

As we continue to be the fastest growing, non-profit, inclusive, and volunteer organized event we are providing you the breakdown of the Silicon Valley 2018 financials:


PostgresConf Silicon Valley is much more cost effective than the "big" conference in Manhattan and that is exactly what we want as a development or "local" conference. We are targeting 50% growth for 2019 and we want do so in a way that is inviting to new community members that won't overwhelm them. We succeeded with that in 2018 and we are going to continue the mission of People, Postgres, Data!

The global non-profit Postgres Conference Series


Posted by Robert Treat in OmniTI on 2018-11-13 at 02:12

TLDR; This evening I put the final blotches on to a new release of phpPgAdmin 5.6. This release adds official support for all recent Postgres versions, fixes a number of smaller bugs, and includes several language updates. While I think upstream packagers need not worry about absorbing this release, I've made downloads generally available from the Github project page, or you can just pull from git to get the latest code. Note this release is designed to run on PHP 5.6.

Now for the backstory...

After much hoopla a few years back about new admin clients and talk of the pgAdmin rewrite, most of the regular contributors had pretty much moved on from the project, hoping to see a clearly better admin tool surface as a replacement. Instead, I saw multiple projects launch, none of which captured the hearts and minds so to speak, and saw the number of pull requests on an ever more abandonded looking project continue to pile up, not to mention thousands of downloads.

As for me, while not doing much publically, privately I was still maintaining two private copies of the code, one which had support for newer Postgres servers, and one which had support for PHP 7; both in rough shape. While my schedule doesn't leave much time for random hacking, about a month ago I saw an upcoming block where I would be conferencing three weeks in a row and suspected I could probably find some time during my travels to do some updates. After a little bit of thought, I decided to do two releases. The first would add support up through Postgres 11, the most recently released version of the server software, and the second would add the aforementioned PHP 7 support. Granted, it's taken longer than I had hoped, probably mostly because that's how software engineering works, but also because I had to literally relearn how it is we were running this project, but I think I've got most of that worked out now.

I suspect the two releases might annoy some people, given that PHP 5.6 is years old and in many peoples minds EOL. But it turns out that a lot of

[...]

PostgreSQL 11 has been officially released and it's packed with exciting changes.Native declarative partitioning is more robust, parallelism is now allowing moreoperations to work in parallel, transactions are now supported in stored procedures and just in time compilation for expressions are just some of the new features.

With every major version, Postgres is getting better, providing more and more tools to developers and better quality of life to the DBAs, making it harder to resist an upgrade.That said, while new major versions come with all these exciting improvements, they also come with internal system changes, making the database clusters incompatible across major versions. This means that upgrades up to Postgres 10, where logical replication was introduced, were impossible without downtime or 3rd party replication tools.


Story time,

Before we get into how to upgrade using logical replication, let's see how upgrading and replication evolved over the years. pg_dump and pg_restore is probably the most traditional way to do an upgrade. This method requires all database writes to be suspended, making it impractical for any reasonably sized production database. Pg_upgrade was introduced in Postgres 8.4, it’s still the most common way to do upgrades. It works under the assumption that the internal storage format rarely changes allowing it to create new system catalog tables and simply reuse the old data files. This means that upgrades are safe and fast. It still requires the database to be down and by default it will copy the datafiles to a new data directory, This can take significant amount of time but it can easily bypassed by using the hard link option provided by pg_upgrade itself. Hard links are only valid in the same filesystem, and with that in mind, this method not only massively reduces downtime, but also eliminates the need of having a second copy of the database cluster. In rare occasions like for example, an upgrade changing storage options, like floating point to int64 date/times, pg_upgrade won’t

[...]

Announcing Release 9 of the PostgreSQL Buildfarm client.

Along with numerous fixes of minor bugs and a couple of not so minor bugs, this release has the following features:

  • new command line parameter --run-parallel for run_branches.pl runs
    all branches in parallel, possibly across animals as well
  • new config setting max_load_avg inhibits a run if the load average
    is higher than the setting
  • new config_option archive_reports saves that number of generations
    of the report sent to the server
  • new command line parameter --show-error-log which outputs the error
    log if any on stdout
  • automatically rerun 3 hours after a git failure, useful on back
    branches where commits can be infrequent
  • automatically convert old pgbuildfarm.org URLs to
    buildfarm.postgresql.org
  • better logic to detect when temp installs are unnecessary
  • better valgrind processing
  • new module to check core perl code for style and syntax
  • allow upstream repos to be rebased
  • add animal name and branch to verbose traces, useful in parallel runs
  • remove old if $branch eq 'global' processing in config file,
    replace with a simple global stanza, the legacy use is still supported.

If you want to run in parallel and you are just running a single animal, changing --run-all to --run-parallel in the command line should be all you need to do. Parallel runs are not run all at once. By default they are launched every 60 seconds. You can also limit the maximum number of parallel runs. The default is 10. I will be adding some notes to the Buildfarm Howto on how to use this feature.

The max_load_avg setting only works on Unix, and requires the installation of the non-standard perl module Unix::Uptime. If this value is set to a non-zero value and the module is not present the script will die. The setting is compared to the load average in the last minute and the last 5 minutes. If either are higher then the run is cancelled.

The release can be downloaded from https://github.com/PGBuildFarm/client-code/releases/tag/REL_9 or https://buildfarm.postgresql.org/downloads

SQL is a language of databases and PostgreSQL is our chosen one. Oftentimes, storing data is but one facet of the process. Typically, in any data-centered endeavor, you will: view and read data, take action or implement changes on the data, garner decision-making information (analytics), or manipulate the stored data in some form or fashion.

SQL is composed of a combination of keywords, commands, and clauses. SQL seems simple. Just a few 'easy' commands here and there. No big deal, right?

But, there is more to SQL than meets the eye. SQL can trip you up on those 'easy' queries.

One challenge (that I must routinely revisit) is understanding that SQL execution order is definitely different from that of its syntax.

In this blog post, I visit, at a high-level, the major SQL clauses as they apply to PostgreSQL. There are many dialects of SQL but PostgreSQL’'s interpretation is the focus here. (Some characteristics of each clause very well may apply to other SQL dialects.)

SQL clauses form the foundation for basic, often-used commands and queries. That being said, advanced queries and examples utilizing Window Functions, CTE's, Derived Tables, etc will not be covered in this post.

As we will see, not all clauses are created equal. Yet, they do operate in tandem, providing query results seamlessly (or not).

Allow me to clarify...

I will periodically make mention of an execution order throughout the blog post as it applies to many of the clauses. But, this is generalized.

To my understanding, more often than not, the optimizer chooses and decides the best query plan for execution.

SELECT - The 'picky' Clause Used to Query the Database

SELECT is one busy clause. It is everywhere. Used more than all the other clauses. Certain clauses you may not need at all. Not so much the case with SELECT, for it is a mandatory clause.

The SELECT clause is typically used for querying the database, containing (at a basic level):

  1. A SELECT list - The columns of data you want.
  2. the source data set(s) - named in the FROM clause. Tables, Vie
[...]
Here is a series of articles that will focus on a new feature in version 11. During the development of this version, a feature caught my attention. It can be found in releases notes : https://www.postgresql.org/docs/11/static/release-11.html Allow heap-only-tuple (HOT) updates for expression indexes when the values of the expressions are unchanged (Konstantin Knizhnik) I admit that this is not very explicit and this feature requires some knowledge about how postgres works, that I will try to explain through several articles:
Posted by Dimitri Fontaine in CitusData on 2018-11-10 at 14:40

An SQL Injection is a security breach, one made famous by the Exploits of a Mom xkcd comic episode in which we read about little Bobby Tables:

PostgreSQL implements a protocol level facility to send the static SQL query text separately from its dynamic arguments. An SQL injection happens when the database server is mistakenly led to consider a dynamic argument of a query as part of the query text. Sending those parts as separate entities over the protocol means that SQL injection is no longer possible.


We’re going to store data the way it’s stored naturally in the brain.

This is a phrase being heard more often today. This blog post is inspired by a short rant by Babak Tourani (@2ndhalf_oracle) and myself had on Twitter today.

How cool is that!!

This phrase is used by companies like MongoDB or Graph Database vendors to explain why they choose to store information / data in an unstructured format. It is new, it is cool, hip and happening. Al the new compute power and storage techniques enable doing this.
How cool is that!!
Well, it is… for the specific use-cases that can benefit from such techniques. Thinking of analytical challenges, where individual bits of information basically have no meaning. If you are analyzing a big bunch of captured data, which is coming from a single source like a machine, or a click-stream or social media, for instance, one single record basically has no meaning. If that is the case, and it is really not very interesting if you have and retain all individual bits of information, but you are interested in “the bigger picture”, these solutions can really help you!

How cool is it, actually?

If it comes to the other situations where you want to store and process information… where you do care about the individual records (I mean, who wants to repopulate their shopping cart on a web-shop 3 times before all the items stick in the cart) there are some historical things that you should be aware of.
Back in the day when computers were invented, all information on computers was stored “the way it’s stored naturally in the brain”.
Back in the day when computers were invented, all we had were documents to store information.
This new cool hip and happening tech is, if anything, not new at all…
Sure, things changed over the last 30 years and with all the new compute power and storage techniques, the frayed ends of data processing have significantly improved. This makes the executing of data analysis, as described above, actually so much better!! Really, we can do things to data, using these co
[...]

It's been over 10 years since PostgreSQL Europe got started in Prato, just outside Florence, and it's time to return to our roots! PostgreSQL Conference Europe 2019 will be held in Milan, Italy, at the Milan Marriott Hotel, on October 15-18, 2019.

More details will be shared as things progress and we are not yet ready to open for sponsorship, call for papers or registrations, but it's time to mark your calendars and block out the week!

Follow us on twitter at @pgconfeu for notifications of when news are posted, check our website or subscribe to our RSS feed for the latest news!

We had only one correct guess in our "guess the location" contest at the closing session of this years conference. This attendee will be contacted personally with information about how to claim their free ticket for next year.

Machine Learning in 10 Lines

Every person that reads newspapers, magazines or any other media of general interest has at least a basic idea of what Machine Learning is. And this is not only a fashion, Machine Learning is already part of our everyday life and will be much more in the future: from personalized advertisement on the Internet to robot dentists or autonomous cars, Machine Learning seems to be some kind of super power capable of everything.

 

But, what is Machine Learning really? It is mainly a set of statistical algorithms that, based on existing data, are capable of deriving insights out of them. These algorithms are basically divided into two families, supervised and unsupervised learning. In supervised learning, the objective is to perform some kind of prediction, such as, for example, if an e-mail message is spam or not (classification), how many beers will be sold next week in a supermarket (regression), etc. Unsupervised Learning, on the contrary, focuses on answering the question how are my cases divided in groups? What these algorithms do (each of them with their particularities) is to bring similar items as close as possible and keep items that differ  from each other as far as possible.

The popularisation of Machine Learning revolutionized the way we do business. Regardless if you are talking of a 10 or 10,000 employees company, if you do not make use of your data to make decisions, you are definitely running behind your competitors.

 

Machine Learning without leaving the Database

Relational Databases are definitely the most essential tools when it comes to data persistence. Although there are other alternatives which could be suitable for certain purposes, there is probably no company with at least a minimal IT Infrastructure that doesn’t have a database.

So if every company has a database, it contains data that is worth using. This means that every company has the opportunity to improve its decision-making process with minimal effort through the use of machine learning. However, he drawbac

[...]
Posted by William Ivanski in 2ndQuadrant on 2018-11-06 at 13:58

PostgreSQL 11 was released recently, with exciting new features. One of them is the ability to write SQL procedures that can perform full transaction management, enabling developers to create more advanced server-side applications. SQL procedures can be created using the CREATE PROCEDURE command and executed using the CALL command. Since OmniDB 2.3.0 it is possible to debug PostgreSQL PL/pgSQL functions. Support to PostgreSQL 11 functions and procedures was added in OmniDB 2.11.0.

Last week we released OmniDB 2.12.0 with nice new features and a new revamped visual, so I’m going to show you how OmniDB 2.12.0 can debug PostgreSQL 11 procedures.

First of all, if you have not done that already, download and install a binary PostgreSQL library called omnidb_plugin and enable it in PostgreSQL’s config file. The debugger also uses a special schema with special tables to control the whole debugging process. This can be manually created or with an extension. For more details on the installation, please refer to the instructions. You can also refer to the documentation about the debugger.

Creating some tables in OmniDB

For our tests, let’s create 2 simple tables, foo and bar. Let’s do that using the OmniDB Console Tab:

CREATE TABLE public.foo (
    a INTEGER PRIMARY KEY
);
 
CREATE TABLE public.bar (
    a INTEGER,
    b INTEGER
);

Omnidb Postgresql 11

Creating a procedure with transaction management

Note that OmniDB has a Procedures node in the tree view. Right-click on it, then click on Create Procedure. It will open a Query Tab with a SQL template showing basic SQL syntax to create a procedure.

Omnidb Postgres 11

If you want to know more about procedures, you can read online documentation without leaving OmniDB. Simple click on Procedures -> Doc: Procedures and a browser tab will be open for you already pointing to the documentation page:

Now let’s go back to the Create Procedure tab and change the code to actually create a procedure, like this:

CREATE OR REPLACE PROCEDURE public.prc_test
(
    p INTEGER
)
LANGUAGE plpgsql
AS
$procedure$
BEGIN
    F
[...]

I recently ran into the need with a customer to track the usage of a specific key throughout the schema. Basically, "what are all the tables and columns referencing this key, directly or indirectly". Luckily, with a little bit of catalog query, that's not hard:

WITH RECURSIVE what (tbl) AS (
   VALUES ('public.tt')
),
t (oid, key, constrid) AS (
 SELECT tbl::regclass::oid, conkey, NULL::oid
  FROM what INNER JOIN pg_constraint ON (contype='p' AND conrelid=tbl::regclass)
UNION ALL
 SELECT conrelid, conkey, c.oid
 FROM pg_constraint c
 INNER JOIN t ON (c.confrelid=t.oid AND c.confkey=t.key)
 WHERE contype='f'
)
SELECT nspname, relname, key, ARRAY(
    SELECT attname FROM pg_attribute a WHERE a.attrelid=t.oid AND attnum=ANY(key)
  )
FROM t
INNER JOIN pg_class cl ON cl.oid=t.oid
INNER JOIN pg_namespace n ON n.oid=cl.relnamespace

The output can be similar to:

 nspname | relname | key | array 
---------+---------+-----+-------
 public  | tt      | {1} | {ttt}
 public  | foo1    | {1} | {a}
 public  | foo2    | {3} | {z}

for a single column key (tt being the table with the primary key in, and the foo1 and foo2 tables referencing it directly or through the other one), or:

 nspname | relname |  key  | array 
---------+---------+-------+-------
 public  | m1      | {1,2} | {a,b}
 public  | m2      | {1,2} | {a,b}

for a multi-column foreign key.

In this particular use-case, it was an efficient way to track down key usage where naming standards for using the key had not always been followed. And of course, we also found a couple of cases where the column had the correct name but lacked the actual FOREIGN KEY definition, but that was done by just looking at the column names.

Posted by Bruce Momjian in EnterpriseDB on 2018-11-05 at 10:00

Having attended many conferences, I have a few suggestions on how to submit successful conference talks. First, determine the type of conference. Then, try to submit talks that match the conference type; possible topics include:

  • New Postgres features
  • User cast studies
  • Internals
  • New workloads
  • Performance
  • Application development

Of course, only some of these topics match specific types of conferences.

Second, submit multiple talks. It is very possible that someone better known than you, or someone with a better abstract, will also submit to the conference. By submitting more than one topic, you increase your chances of submitting something unique and interesting.

Continue Reading »

Posted by Laurenz Albe in Cybertec on 2018-11-05 at 09:00
1000 killed index tuples
© Laurenz Albe 2018

 

Since I only recently learned about the concept of “killed index tuples”, I thought there might be some others who are not yet familiar with this interesting PostgreSQL concept.

This may give you an explanation the next time you encounter wildly varying execution times for the same execution plan of the same PostgreSQL query.

Before we look more closely at the index, let’s review the life cycle of a table row version (“heap tuple”).

Life, death and visibility in the table heap

It is widely known that the visibility of heap tuples is determined by the system columns xmin and xmax (though there is more to xmax than meets the eye). A heap tuple is “dead” if its xmax is less than the xmin of all active transactions.

Now xmin and xmax are only valid if the respective transactions have been marked committed in the “commit log”. Consequently, any transaction that needs to know if it can see a tuple has to consult the commit log. To save future readers that extra work, the first one that consults the commit log will save the information in the tuple’s “hint bits”.

Dead tuples are eventually reclaimed by VACUUM.

This is all fairly well known, but how is the situation with index entries?

Life, death and visibility in the index

To avoid redundancy and to keep index tuples small, the visibility information is not stored in the index.
The status of an index tuple is determined by the heap tuple it points to, and both are removed by VACUUM at the same time.

As a consequence, an index scan has to inspect the heap tuple to determine if it can “see” an entry. This is the case even if all the columns needed are in the index tuple itself. Even worse, this “heap access” will result in random I/O, which is not very efficient on spinning disks.

This makes index scans in PostgreSQL more expensive than in other database management systems that use a different architecture. To mitigate that, several features have been introduced over the years:

  • PostgreSQL 8.1 introduced the “bitmp index scan”. This scan method fi
[...]
Previously I tested performance of pl/PgSQL coded foreign keys to partitioned table. Now, let's see if I can make creation of them a bit easier. Using the same schema as before, I see that adding actual fkeys is pretty complicated. I need to create two separate functions, and four triggers, remembering what goes where. This … Continue reading "Foreign Key to partitioned table – part 3"
Posted by Andreas Scherbaum on 2018-11-03 at 21:00
Author
Andreas 'ads' Scherbaum

For a long time I was using a Makefile to quickly build, start, stop and then wipe a predefined PostgreSQL version. That comes handy if you just want to test something on an older version, without actually installing the software. Everything happens in a single directory, even a different port is assigned.

When I needed that setup recently, I ran into unrelated build errors:

relpath.c:21:10: fatal error: catalog/pg_tablespace_d.h: No such file or directory
 #include "catalog/pg_tablespace_d.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Can't be - pg_tablespace_d.h is included in the tarball I'm using.

 

 

Continue reading "Using Makefiles to build PostgreSQL"
Posted by Abdul Yadi on 2018-11-03 at 06:01

What is my favourite PostgreSQL GUI-admin tool? pgAdmin3. I love its light weight user interface and simple navigation. Thanks to BigSQL Development Team for surviving the tool from freeze.

With PostgreSQL release 11.0, here is my patch file corresponding catalog table changes: pgadmin3-patch-text-file

First, clone pgAdmin3 project: clone git clone https://bitbucket.org/openscg/pgadmin3-lts.git

Then, apply the patch: patch -p0 -i [patch-text-file]

Oldies but goldies.

Posted by Rafia Sabih in EnterpriseDB on 2018-11-02 at 04:58
It was my first time at PGConf Europe this year, like many other firsts it was special, hence the blog.

Let's start with some of the basics, PostgreSQL conferences are held in a somewhat regional basis. There are many of them like,  PGConf India, PGConf USA, PGConf Europe, PGConf Asia, and then there are other one day events called PgDays. Coming back to PGConf Europe 2018,  it was organised from 23-26 October in Lisbon Marriott, Lisbon.

My talk 'Parallel Query in PG: how not to (mis)use it?' was scheduled on the first slot of last day. So, I had enough time to analyse and study the audience and prepare accordingly. But, first things first...

The conference started with a one day training session on 22 Oct, one has to buy different tickets for training and conference. You get a free registration for the conference only if you're the speaker. I wasn't part of the training session, hence will not be discussing anything about it. This was my day to rest and try the Portugal cuisine.

The next day was the start of the conference. It was opened by Magnus Hagander covering the logistics and introducing us to the conference halls, etc., must say it was one entertaining start. The next was the keynote by Paul Ramsey. The keynote was my first comprehensive introduction to PostGIS. Further, there was a nice snack buffet arranged in the lobby, and this was my time to know more people, the most exciting part of any conference. I happened to catch Tom Lane!

Henceforth, I was forced to take some difficult decisions like which talk to attend, since there were three parallel sessions going on. There was such a variety of areas covered in the conference and most of them have amazing presentations, that it made me greedy and hate the idea of parallel sessions.

To keep the discussion short, I enjoyed being exposed to some of the new areas and uses of postgres like, challenges of using postgres on cloud, multi-column indexes, pluggable storage, benchmarking,  efficient query planning in latest PG, new and old features of pos
[...]
Posted by Bruce Momjian in EnterpriseDB on 2018-11-01 at 11:45

Some open source projects have a distinction between the developers of the open source software and its users. Since Postgres was originally developed in a university, and none of the university developers continued when Internet-based development started in 1996, all our active developers see themselves as stewards of code developed before we arrived. This causes a flatter organizational structure and helps to forge closer user/developer ties.

(translated from original French version)

The Rencontres Hivernales du Libre (RHL) (Winter Meeting of Freedom) takes place 25-27 January 2019 at St-Cergue.

Swisslinux.org invites the free software community to come and share workshops, great meals and good times.

This year, we celebrate the 5th edition with the theme «Exploit».

Please think creatively and submit proposals exploring this theme: lectures, workshops, performances and other activities are all welcome.

RHL'19 is situated directly at the base of some family-friendly ski pistes suitable for beginners and more adventurous skiers. It is also a great location for alpine walking trails.

Why, who?

RHL'19 brings together the forces of freedom in the Leman basin, Romandy, neighbouring France and further afield (there is an excellent train connection from Geneva airport). Hackers and activists come together to share a relaxing weekend and discover new things with free technology and software.

If you have a project to present (in 5 minutes, an hour or another format) or activities to share with other geeks, please send an email to rhl-team@lists.swisslinux.org or submit it through the form.

If you have any specific venue requirements please contact the team.

You can find detailed information on the event web site.

Please ask if you need help finding accommodation or any other advice planning your trip to the region.

Materialized views were a long awaited feature within Postgres for a number of years. They finally arrived in Postgres 9.3, though at the time were limited. In Postgres 9.3 when you refreshed materialized views it would hold a lock on the table while they were being refreshed. If your workload was extremely busines hours based this could work, but if you were powering something to end-users this was a deal breaker. In Postgres 9.4 we saw Postgres achieve the ability to refresh materialized views concurrently. With this we now have fully baked materialized view support, but even still we’ve seen they may not always be the right approach.

What is a view?

For those of you that aren’t database experts we’re going to backup a little bit. To know what a materialized view is we’re first going to look at a standard view. A view is a defined query that you can query against as if it were a table. Views are especially helpful when you have complex data models that often combine for some standard report/building block. We’ll look at an example in just a moment as we get to a materialized views.

Views are great for simplifying copy/paste of complex SQL. The downside is that each time a view is executed the results are recomputed. For large datasets this can cause scanning of a lot of data, invalidate your cache, and in general just be slow. Enter materialized views

Materializing your views

Let’s start with an example schema that could contain a lot of raw data. In this case a very basic web analytics tool that records pageview, the time it occurred, and the session id of the user.

CREATE TABLE pageviews (
  id bigserial,
  page text,
  occurred_at timestamptz,
  session_id bigint
);

There are a number of different views that could be very common based on this raw data. And if we have a real-time dashboard we’re powering it can quickly become unfeasible to query this raw data as a query would take too long. Instead we can do some rollups with materialized views:

CREATE MATERIALIZED VIEW rollups AS 
SELECT date_trunc('day') a
[...]

Managing traffic to the database can get harder and harder as it increases in amount and the database is actually distributed across multiple servers. PostgreSQL clients usually talk to a single endpoint. When a primary node fails, the database clients will keep retrying the same IP. In case you have failed over to a secondary node, the application needs to be updated with the new endpoint. This is where you would want to put a load balancer between the applications and the database instances. It can direct applications to available/healthy database nodes and failover when required. Another benefit would be to increase read performance by using replicas effectively. It is possible to create a read-only port that balances reads across replicas. In this blog, we will cover HAProxy. We’ll see what is, how it works and how to deploy it for PostgreSQL.

What is HAProxy?

HAProxy is an open source proxy that can be used to implement high availability, load balancing and proxying for TCP and HTTP based applications.

As a load balancer, HAProxy distributes traffic from one origin to one or more destinations and can define specific rules and/or protocols for this task. If any of the destinations stops responding, it is marked as offline, and the traffic is sent to the rest of the available destinations.

How to install and configure HAProxy manually

To install HAProxy on Linux you can use the following commands:

On Ubuntu/Debian OS:

$ apt-get install haproxy -y

On CentOS/RedHat OS:

$ yum install haproxy -y

And then we need to edit the following configuration file to manage our HAProxy configuration:

$ /etc/haproxy/haproxy.cfg

Configuring our HAProxy is not complicated, but we need to know what we are doing. We have several parameters to configure, depending on how we want HAProxy to work. For more information, we can follow the documentation about the HAProxy configuration.

Let's look at a basic configuration example. Suppose that you have the following database topology:

Database Topology Example
Database Topology Example

We want to create a HA

[...]
Parallel sequential scan is the first parallel access method in PostgreSQL and is introduced in version 9.6.  The committer of this feature and my colleague at EnterpriseDB Robert Haas wrote an awesome blog on it, there is another great blog by another PostgreSQL committer and my colleague Amit Kapila. Both of these blogs explain this access method, its design, usage, and related parameters. 

Still, I could not help but notice that there are curiosities around the usage of this access method. Every now and then I could see a complaint saying parallel sequential scan is not getting selected or it is degrading the performance of a query.  So, I decided to write this blog to cater more practical scenarios and specifically focus on its less talked about aspect  -- where parallel sequential scan would (should) not improve the performance.

Before diving into the details of parallel SeqScan, let's first understand the basic infrastructure and terminology related to it in PostgreSQL. The processes that run in parallel and scan the tuples of a relation are called parallel workers or workers in short. There is one special worker namely leader which co-ordinates and collects the output of the scan from each  of the worker. This worker may or may not participate in scanning the relation depending on it's load in dividing and combining processes. End users can also control the involvement of leader in relation scan by GUC parameter parallel_leader_participation, it is a boolean parameter. 

Now, let's understand the concept of parallel scan in PostgreSQL by a simple example.
  • Let there be a table T (a int, b int) containing 100 tuples
  • Let's say we have two workers and one leader,
  • Cost of scanning one tuple is 10
  • Cost of communicating a tuple from worker to leader is 20
  • Cost of dividing the tuples among workers is 30
  • For simplicity, let's assume that leader gives 50 tuples to each of the worker
Now, let's analyse if parallel scan will be faster than non parallel scan,

Cost of SeqScan = 10*100 = 1000
Cost of Parallel SeqSc
[...]

PostgreSQL is referred to as “The world’s most advanced open source database” – but what does PostgreSQL have that other open source relational databases don’t?  

2ndQuadrant recently hosted a webinar on this very topic: PostgreSQL is NOT your traditional SQL database, presented by Gülçin Yıldırım Jelínek, Cloud Services Manager at 2ndQuadrant.

The recording of the webinar is now available here.

Questions that Gülçin couldn’t respond to during the live webinar have been answered below.

Q1: What exactly is the role of postgresql for a marketplace like ebay or rakuten?

A1: This question is not very clear. If the question is about whether Postgres can be used in an e-commerce website, the answer is yes.

 

Q2: I’m in process of switching from MS SQL Server to Postgres and I have an issue:
Simple search in text columns with diacritics chart.

Ex: table Person
Name
——-
Ștefan
ștefan
Stefan

When I search:

 SELECT * FROM pers WHERE Name LIKE 'ste%';

I want to retrieve all records from above.
In SQL Server there’s a simple way to accomplish this – I use: COLLATE Latin1_General_100_CI_AI when I define column, and that it’s.
Do you have recommendations to accomplish the same task in Postgres?

A2: The unaccenting collations are not supported in PostgreSQL. You can query like this to get the same result:

SELECT * FROM pers WHERE unaccent(Name) ILIKE 'ste%';


For any questions, comments, or feedback, please visit our website or send an email to webinar@2ndquadrant.com.

Posted by Quinn Weaver in pgExperts on 2018-10-31 at 05:56
PostgreSQL keeps track of which WAL files go with which timelines in small history files. Each time you make a base backup, a history file is born. The file is written once and never updated. It's a simple system, and it works well and silently.

In fact, sometimes it works a little too silently.

At PostgreSQL Experts we've run into the problem where a client's history files disappear because they are stored in S3, and there's a lifecycle configuration in place that says to move everything over a certain age to Glacier. That's a good policy for WAL files!

Unfortunately, it's not a good policy for history files: without the latest history file you can't restore the latest backup, and without past history files, you are unable to do PITR to certain points in time.

The solution we used was to move the whole WAL archive to S3 Standard-Infrequent Access storage, dissolving the problem with lifecycle configurations while controlling costs. But you could also fix this by editing the lifecycle configuration.

The important thing is this: hold on to all history files. They're tiny text files, and when you need them, you really need them. This is also a good reason to test restores, not just of the latest backup, but of database states at arbitrary points in time.

*    *    *

Addendum: another very common problem we see is WAL archives that become corrupted because a client accidentally pointed a two primaries at the same WAL archive (for instance, they might have copied a postgresql.conf file by hand, or via a DevOps tool like Puppet). In this case, the whole archive is corrupted, and you're best off starting with a fresh S3 bucket or an empty directory and doing a new base backup immediately.

One of the many nice features of pgBackRest is that it will notice this and prevent you from doing it. Fewer footguns → better backups.
Previously I wrote about how to create foreign key pointing to partitioned table. Final solution in there required four separate functions and four triggers for each key between two tables. Let's see how fast it is, and if it's possible to make it simpler. First bit – performance test (for making it simpler you will … Continue reading "Foreign Key to partitioned table – part 2"