PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Federico Campoli on 2017-11-20 at 06:51
Has been a while since I wrote a post on this blog.
I've been quite busy on coding pgchameleon's version 2.
I finally managed to release an alpha1 the 11th of November but this release had several issues which prevented the users to have real tests.
However, after a week of debugging I've released pg_chameleon v2.0.0.alpha2 which is now quite usable.


For a complete command line reference and an usage example click here.

The documentation now have a dedicated a page with the configuration file details explained.

Please note this pre-release and despite the changelog shows many fixes, there is still a lot of work to do.
Do not use it in production.

So, why pgchameleon v2 and why the version 1 can't be fixed?

Well, pg_chameleon started as an experiment and the version 1 was built without any clue on the architecture and what should be the final result.

This caused the version 1 to have several issues making very difficult to improve it without dismantling the existing logic.

Building up from fresh start required less time to reorganise the original in the correct way.
The code now is more modular, but is still not very pythonic.
My bad, I'm not a developer after all.

The major changes from the version 1 are the following.

  • Python 3 only development
  • Supports the replica from multiple MySQL schemas withing a single MySQL instance. The source's and target schema names can be different.
  • Conservative approach to the replica. Tables which generate errors are automatically excluded from the replica.
  • Daemonised init_replica process.
  • Daemonised replica process with two separated subprocess, one for the read and one for the replay.
  • Soft replica initialisation. The tables are locked when needed and stored with their log coordinates. The replica daemons will put the database in a consistent status gradually.
  • Rollbar integration for a simpler error detection.

The version 2 improves the display of the show_status command when a source name is specified.

There is also an error log in the replica schema which saves th[...]
(This post is in continuation to my previous post regarding Initializing RDS Postgres Instance) This simple SQL "Initializes" the EBS volume linked to an RDS Instance, something which isn't possible to do without sending workload (and experience high Latency in the first run). Key scenarios, where this is really helpful are: Create a Read-Replica (or Hot Standby in Postgres terms) Restore a
Posted by Adrien Nayrat in Dalibo on 2017-11-19 at 14:30
Posted by Pavel Stehule on 2017-11-19 at 10:52
I did some few changes and pspg should supports pgcli and mysql too.

Please, test it.
Posted by Pavel Stehule on 2017-11-17 at 20:54
I fixed some issues, and now the pspg can be used with mysql
MariaDB [(none)]> use sakila
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

MariaDB [sakila]> pager ~/src/pspg-master/pspg -s 2 -X
PAGER set to '~/src/pspg-master/pspg -s 2 -X'

MariaDB [sakila]> select * from nicer_but_slower_film_list limit 10;
Posted by Pavel Golub in MicroOLAP on 2017-11-17 at 12:59

1. Windows 10 supports Subsystem for Linux.
2. psql has some issues with non-English locales under Windows, see Notes for Windows Users.

So I installed WSL, then choose Ubuntu, started Bash and executed:

pasha_golub@PASHA:~$ sudo apt install postgresql-client

Cool. Now I have native Linux psql and, of course, I want to try Pavel Stehule’s Postgres pager!
1zjn2a


Filed under: PostgreSQL Tagged: Linux, psql, window
Posted by Pavel Golub in MicroOLAP on 2017-11-17 at 12:24
Posted by REGINA OBE in PostGIS on 2017-11-16 at 23:51

To commemorate PostGIS Day 2017, I put together a 3D scene listing my favorite functions.

You can see it PostGIS Day in 3D

For more than a decade, the Postgres community has released new major versions almost annually to meet the evolving needs of the Database Industry. The first great release of Postgres was 8.3 in 2008, with a clean and evolutionary release of 8.4 eighteen months later. The 2017 v10 release of Postgres is the first version that I would consider truly, “Enterprise Ready.”

PostgreSQL has seen great success in the commercial enterprise for years. However, that success has largely been relegated to departmental success; e.g. the installation is solving a specific problem for a specific department. Often it’s the public facing data source for a larger Oracle instance or to manipulate GIS data. In these roles, PostgreSQL has been a defacto choice for enterprises for a very long time.

As major cloud providers such as AWS, Microsoft Azure, and Google continue to implement and upgrade their PostgreSQL support you can expect to hear more about migrations. The desire to escape Oracle has never been stronger. We will start to see not just small businesses but also large businesses migrating their CRM and ERP systems.

With full ACID compliance, mature parallel scan support, native partitioning, and logical replication, PostgreSQL provides more features and performance for platforms such as Odoo and industry specific providers such as Timescale and Brytlyt. These and other features are why V10 is going to build the ecosystem.

Since August I have presented on v10 several times, most recently to a full room at PGConf Seattle. I have also presented at multiple groups in Vancouver B.C. and Colorado. The overwhelming consensus, even among those who don’t currently use PostgreSQL, is that the database is on their list of migration possibilities. V10 is the defining version of PostgreSQL and we are now in a time where we will begin to see mass adoption across many industries.

It is going to be years before a company such as Oracle starts to feel more than pinpricks at the number of migrations companies are performing but each one of t

[...]

For anybody following this blog, you'll know I do this every year. PGConf.EU completed several weeks ago, and we have now collected the statistics, and I'd like to share some numbers.

Let me start with some statistics that are not based on the feedback, but instead based on the core contents of our registration database. I've had several people ask exactly how we count our attendees when we say it's the largest PGConf.EU ever, so here are the numbers:

Our total number of attendees registered was 437. This includes all regular attendees, speakers, training attendees, sponsor tickets and exhibitor only tickets. Of these 437 people, 12 never showed up. This was a mix of a couple of sponsor tickets and regular attendees, and 3 training attendees. This means we had 425 people actually present.

We don't take attendance each day. Right after the keynote on the first day there were just over 20 people who had not yet shown up, and by the end of the conference the total that number was down to 12. There were definitely fewer than 400 people who remained on a late Friday afternoon for the closing sessions, but at lunchtime the crowd was approximately the same size.

On top of the 437 actual attendees, we also had 5 further sponsor tickets that were never claimed. And we had another 59 people still on the waitlist, since we were unfortunately up against venue limits and we not able to sell all the requested tickets.

Now, aligned scrolling in expanded mode is implemented. Some bugs are fixed too. Please, test it.

Source: https://github.com/okbob/pspg
Posted by Pavel Golub in MicroOLAP on 2017-11-14 at 15:15

 

Some more on Twitter:


Filed under: PostgreSQL Tagged: community, fun, pgconf.eu, photo, PostgreSQL
Posted by Bruce Momjian in EnterpriseDB on 2017-11-13 at 20:30

I previously mentioned my visit to Hangzhou, China. A video interview from that trip is now available, as well as my presentation in English and Chinese (starts at 153 minutes).

The most interesting part of my visit to Shenzhen, China was an evening meeting with 30 Postgres community members discussing how to increase Postgres adoption in China.

In our article Exploring a Data Set in SQL we discovered a data set related to music: the Chinook sample database.

Our discovery led us to find albums containing tracks of multiple genres, and for the analytics we were then pursuing, we wanted to clean the data set and assign a single genre per album. We did that in SQL of course, and didn’t actually edit the data.

Finding the most frequent input value in a group is a job for the mode() WITHIN GROUP (ORDER BY sort_expression) Ordered-Set Aggregate Function, as documented in the PostgreSQL page about Aggregate Functions.

An alternative way to upgrade to PostgreSQL 10 native partitioning using ALTER TABLEs and PLPGSQL.

Today's post is a bit spontaneous, because a few days ago I read the post about PostgreSQL's new native table partitioning explaining how to migrate from an old inheritance partitioning to the new one.

Author of the post proposed using pg_dump and pg_restore utilities. This way is quite simple and requires minimal effort. However, in some instances, especially in case of large partitions, dumping partition's data, restoring and creating indexes might take a long time.


In this post, I would like to offer an alternative way for partitioning upgrade. This method based on some ALTER commands wrapped into PLPGSQL procedure. Of course, I should note this technique only work if you already upgraded to PostgreSQL 10.
Let's consider a classic example, with master table and several tables which partitioned using timestamp column. Using the following commands it's possible to create our example tables.
# CREATE TABLE events (
    id serial primary key,
    cta_point integer,
    decay_factor integer,
    created_at timestamp without time zone);
# CREATE TABLE events_201708 (LIKE events INCLUDING ALL) INHERITS (events);
# CREATE TABLE events_201709 (LIKE events INCLUDING ALL) INHERITS (events);
# CREATE TABLE events_201710 (LIKE events INCLUDING ALL) INHERITS (events);
# CREATE TABLE events_201711 (LIKE events INCLUDING ALL) INHERITS (events);

Next step is to create trigger function and the trigger itself which would use our function when new records are inserted into the table.

# CREATE OR REPLACE FUNCTION fn_insert_events() RETURNS TRIGGER AS
    $function$
    BEGIN 
    EXECUTE format($$INSERT INTO %I VALUES ($1.*)$$, 'events_'||to_char(NEW.created_at, 'YYYYMM')) USING NEW;
    RETURN NULL;
    END;
    $function$ LANGUAGE plpgsql;
# CREATE TRIGGER events_trigger BEFORE INSERT ON events FOR EACH ROW EXECUTE PROCEDURE fn_insert_events();

As last step, let's add some generated data into our partitions.

# INSERT INTO eve
[...]

In our last article, we explored how to run Postgres in some very basic Docker scenarios. Based on our experiments back then, we can obtain images, create containers, and mount to various types of storage.

Boring!

It’s not just boring, it’s mundane. It doesn’t do anything. Sure we can run Postgres in a container, but that’s true about a lot of things. You know what’s exciting? Setting up Postgres streaming replication between two docker containers.

Let’s get started.

He say “I know you, you know me”

The first thing we need to do is create a container that will be our primary host. Postgres requires a user with REPLICATION permission, as well as a special entry in the pg_hba.conf file for the “replication” pseudo-database. We could start a regular Postgres container, connect to it, and set all of that up ourselves, but it’s much cooler to create our own image which does that for us.

The Postgres Docker Hub image has excellent instructions for extending the official image, so let’s start there.

Since we know the the two modifications we need to make, let’s create a basic script to automate the steps:

#!/bin/bash

if [ $(grep -c "replication repuser" ${PGDATA}/pg_hba.conf) -gt 0 ]; then
    exit 0
fi

psql -U "$POSTGRES_USER" \
     -c "CREATE USER repuser WITH REPLICATION" postgres

echo "host replication repuser all trust" >> ${PGDATA}/pg_hba.conf
pg_ctl -D ${PGDATA} reload

Since we could be attaching to a pre-existing Postgres volume, we need to check whether or not we already modified it so the repuser user could connect. Aside from that, all we do is create the user and add a very insecure host entry so it can connect to initiate streaming replication.

He got hair down to his knee

The next step is to create a Dockerfile to define our image. We didn’t do much, so that’s a thankfully easy task. Here’s how that looks:

FROM postgres:latest

COPY scripts/enable_replication.sh /docker-entrypoint-initdb.d

We saved our previous script as scripts/enable_replication.sh to keep things together. The official Postgres i

[...]
On 9th of November 2017, Robert Haas committed patch: Add hash partitioning. Hash partitioning is useful when you want to partition a growing data set evenly. This can be useful to keep table sizes reasonable, which makes maintenance operations such as VACUUM faster, or to enable partition-wise join. At present, we still depend on constraint […]

Corrective updates are available for all supported PostgreSQL branches: 10.1, 9.6.6, 9.5.10, 9.4.15, 9.3.20 and 9.2.24, which contain a batch of bug fixes, including fixes for problems that could lead to data corruption.

The release of updates for the 9.3 branch will last until September 2018, 9.4 until December 2019, 9.5 until January 2021, 9.6 until September 2021, and 10 until October 2020. PostgreSQL version 9.2 is now End-of-Life (EOL).

From the fixes, we can highlight serious problems with BRIN indexes (some rows to not be included in the indexing),  several fixes for logical replication and crash when logical decoding is invoked from a PL language function, low-probability crash in processing of nested trigger firings,  fixes for parallel query execution and fix some  json(b) functions.

In addition to fixing errors in new releases, three security vulnerabilities have also been fixed:

  • CVE-2017-12172: Start scripts permit database administrator to modify root-owned files
  • CVE-2017-15098: Memory disclosure in JSON functions
  • CVE-2017-15099: INSERT … ON CONFLICT DO UPDATE fails to enforce SELECT privileges

 

It is with great pleasure that we announce the preliminary program for PGConf Local: Austin!


We have received a plethora of positive feedback from the local Postgres and Data communities and we are proud to host a second PGConf event held in Austin in 2017.

Training Opportunity:
Breakout Sessions:
  • Event Sourcing with a Postgres Event Store by Scott Bellware
  • Using GIS in PostgreSQL by Lloyd Albin
  • Trees/Hierarchical Data in the SQL Database by Ryan Murphy
  • Amazon Aurora with PostgreSQL compatibility by James Finnerty
  • Deep Dive into the RDS PostgreSQL Universe by Grant McAlister
  • Open Source Communities as Biological Ecosystems by Debra Cerda
  • The Power of Postgres Replication by Joshua D. Drake
  • Multi-cloud deployment of PostgreSQL in minutes. by Stephen Holt

PGConf Local: Austin is made possible by the wonderful team of volunteers including the Austin PostgreSQL User Group and our sponsors:



Diamond2

Platinum: Compose.IO, OpenSCG, 2ndQuadrant, and Microsoft

In this article, we will focus on upgrade from 9.4 (and above) versions to PostgreSQL 10, leveraging the full features of native partitioning. For elaborate details on PostgreSQL 10 partitioning, please refer to our KB article Postgresql 10 partitioning.

Test Environment: OS: CentOS 7 PostgreSQL version: 9.6.3 and 10.0 pg_partman version: 3.1.0

Migrating from 9.x version (partitioned table with inheritance) to pg10 (native partitioning)

Consider a database with several tables some of which are quite huge. In 9.6, as known, PostgreSQL has improved the performance involving huge tables by what is called as “partitioning.” These huge tables are managed internally as partitioned tables using inheritance feature. Envisage we have a table “Inventory” with a million records. This table has four child tables (inv_quarter1, inv_quarter2, inv_quarter3, and inv_quarter4) which inherit the properties from parent table Inventory.

postgres=# \d+ test.inventory
                                  Table "test.inventory"
    Column    |         Type          | Modifiers | Storage  | Stats target | Description
--------------+-----------------------+-----------+----------+--------------+-------------
 product      | character varying(10) |           | extended |              |
 units_sold   | character varying(10) |           | extended |              |
 invoice_date | date                  |           | plain    |              |
 units_remain | character varying(10) |           | extended |              |
Triggers:
    orders_insert_trigger BEFORE INSERT ON test.inventory FOR EACH ROW EXECUTE PROCEDURE orders_insert_simple()
Child tables: test.inv_quarter1,
              test.inv_quarter2,
              test.inv_quarter3,
              test.inv_quarter4

postgres=# \d+ test.inv_quarter1
                                Table "test.inv_quarter1"
    Column    |         Type          | Modifiers | Storage  | Stats target | Description
--------------+-----------------------+-----------+----------+--------------+-------------
 product  
[...]
Posted by Dimitri Fontaine on 2017-11-09 at 08:34

Now that my book Mastering PostgreSQL in Application Development is released (and selling well, thanks guys!), I’ve had some questions about the title.

The idea is that to become good at anything, we need to practice. We practice a lot, and it’s even better when we are actively trying to learn, following what’s named deliberate practice.

If you’ve used a relational database, you understand basic INSERT statements. Even if you come from a NoSQL background, you likely grok inserts. Within the Postgres world, there is a utility that is useful for fast bulk ingestion: \copy. Postgres \copy is a mechanism for you to bulk load data in or out of Postgre.

First, lets pause. Do you even need to bulk load data and what’s it have to do with Citus? We see customers leverage Citus for a number of different uses. When looking at Citus for a transactional workload, say as the system of record for some kind of multi-tenant SaaS app, your app is mostly performing standard insert/updates/deletes.

But when you’re leveraging Citus for real-time analytics, you may already have a separate ingest pipeline. In this case you might be looking at event data, which can be higher in volume than a standard web app workload. If you already have an ingest pipeline that reads off Apache Kafka or Kinesis, you could be a great candidate for bulk ingest.

Back to our feature presentation: Postgres copy. Copy is interesting because you can achieve much higher throughput than with single row inserts.

Postgres \copy:

  • Can be executed from within psql or an application language
  • Supports CSV as well as binary data
  • Is transactional

A look at Postgres \copy performance for bulk ingest

Let’s take a look at both how Postgres copy performs over single row inserts, then we’ll move onto how you can leverage \copy. There are essentially 3 primary ways to scale out your writes in single-node Postgres. And if you need to scale out ingest even further beyond a single node you can look to Citus as each node in a Citus database cluster is able to help you at scaling writes.

Starting simple with inserts, using sample GitHub event data

First let’s see how we perform on a standard insert.

To perform this test, I created a set of 1,000,000 inserts from the GitHub event dataset. I then connected to Postgres with psql and ran \i single_row_inserts.sql. This command executed all the insert queries.

The re

[...]
As many of you know, that AWS RDS Postgres uses EBS which has an interesting feature called Lazy Loading that allows it to instantiate a disk (the size of which can be mostly anything from 10GB to 6TB) and it comes online within a matter of minutes. Although a fantastic feature, this however, can lead to unexpected outcomes when high-end production load is thrown at a newly launched RDS Postgres

As of PostGIS 2.3, the postgis extension was changed to no longer allow relocation. All function calls within the extension are now schema qualified.

While this change fixed some issues with database restore, it created the issue of if you installed PostGIS in a schema other than the one you wanted to it is not intuitive how to move it to a different schema. Luckily there is a way to do this.

For this exercise, I will install PostGIS in the default schema and then demonstrate how to move it into another schema location.

You can run these steps using psql or pgAdmin or any other PostgreSQL tool you want.

Continue Reading by clicking title hyperlink ..
Posted by Paul Ramsey in PostGIS on 2017-11-06 at 20:00

Just to give my two cents exposing a problem that I had a week ago when one of ours databases repeatedly entered in recovery mode because of a query.

Let’s put some emotion and start from the beginning.

Suddenly everyone starts to complain that the ERP is down, the company is losing money, the end of the world is near and some heads gonna roll.

With 15 people around you, you calmly check space left, load average, database connections an so on.

After some researching at the log, you could spot a query looking for an invalid memory address. After some debugging, that’s what I’ve found:

To avoid rewriting on a function, it was used a native function from PostgreSQL to get the IP and port from the database so the move from the test server to the production server would be easier. The functions are inet_server_addr() and inet_server_port().

It’s not a problem when your test and/or dev environments are equal to the production. As we don’t have to treat too many connections in the test environment, we do not use PgBouncer, a very light connection pooler.

Because of a very specific demand, we need to connect out of a transaction to get some values inside a running function, so we use a method from a C++ library to connect to the database, get a value and update it, returning to the original function an integer.

When that function was called in the production environment, PGBouncer returns nothing or some trash, and the C++ function, for some reason, tries to reach an invalid memory address and the database, to preserve itself, enters into recovery mode.

To fix the whole problem, we have a lot of solutions:

  • Rewrite the C++ function to validate the parameters, which will not resolve the PgBouncer thing;
  • Put fixed values for IP and port at the functions;
  • Create a user for this specific call and connect to the main port, instead of the PgBouncer port, which is by far the worst solution;
  • Diminishing this type of problems by creating dev/test environments equal the production.

Not a difficult problem to fix, neither to spot,

[...]

An age-old question is whether it is better to put data in a database or a file system. Of course, the answer is "it depends," but let's look at what it depends on.

First, Postgres stores its data in the file system, so the file system must be good for something. Postgres also stores some of its configuration files in the file system, e.g. pg_hba.conf, so it isn't just a issue that once you have a database, everything is better in a database.

It usually comes down to evaluating database positives vs. negatives — first the database storage positives:

  • Do you need to view the data in multiple ways?
  • Is synchronizing database data and file system data going to be difficult?
  • Do you need multi-object commit synchronization?
  • Do applications need a single, guaranteed-consistent view of their data?
  • Is a file system API unreasonable or inaccessible to clients?

Continue Reading »

Posted by Pavel Golub in MicroOLAP on 2017-11-06 at 11:46
Serial-identity

Serial-identity

As you probably know PostgreSQL 10 introduced IDENTITY columns. They are pretty much like serial columns from the first sight. To learn about differences one may refer to the perfect blog post by Peter Eisentraut.

So I was wondering what if I mix some serial with some identity. So I executed such simple query:

CREATE TABLE foo(
id SERIAL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
);

Of course, I got an error:

ERROR: both default and identity specified for column "id" of table "foo"
SQL state: 42601

“SQL state: 42601” stands for a syntax error. But the interesting thing is that SERIAL, as you know, converted to CREATE SEQUENCE and SET DEFAULT.

So now I know what exactly the reason of error. You cannot have both DEFAULT and IDENTITY for a column.


Filed under: PostgreSQL Tagged: grammar, PostgreSQL, release, SQL, trick
Today is the day my book Mastering PostgreSQL in Application Development launches! I’m all excited that everybody interested is now able to actually read my book! Mastering PostgreSQL in Application Development targets application developers who want to learn SQL properly, and actually master this programming language. Most developers don’t think of SQL as a programming language, mainly because they don’t have full control of the execution plan of their queries.