PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Greg Sabino Mullane in End Point on 2015-07-01 at 18:22:00

Back in the old days, upgrading Postgres required doing a pg_dump and loading the resulting logical SQL into the new database. This could be a very slow, very painful process, requiring a lot of downtime. While there were other solutions (such as Bucardo) that allowed little (or even zero) downtime, setting them up was a large complex task. Enter the pg_upgrade program, which attempts to upgrade a cluster with minimal downtime. Just how fast is it? I grew tired of answering this question from clients with vague answers such as "it depends" and "really, really fast" and decided to generate some data for ballpark answers.

Spoiler: it's either about 3.5 times as fast as pg_dump, or insanely fast at a flat 15 seconds or so. Before going further, let's discuss the methodology used.

I used the venerable pgbench program to generate some sample tables and data, and then upgraded the resulting database, going from Postgres version 9.3 to 9.4. The pgbench program comes with Postgres, and simply requires an --initialize argument to create the test tables. There is also a --scale argument you can provide to increase the amount of initial data - each increment increases the number of rows in the largest table, pgbench_accounts, by one hundred thousand rows. Here are the scale runs I did, along with the number of rows and overall database size for each level:

Effect of --scale
--scale Rows in pgbench_accounts Database size
100 10,000,000 1418 MB
150 15,000,000 2123 MB
200 20,000,000 2829 MB
250 25,000,000 3535 MB
300 30,000,000 4241 MB
350 35,000,000 4947 MB
400 40,000,000 5652 MB
450 45,000,000 6358 MB
500 50,000,000 7064 MB
550 55,000,000 7770 MB
600 60,000,000 8476 MB

To test the speed of the pg_dump program, I used this simple command:

$ pg_dump postgres | psql postgres -q -p 5433 -f -

I did make one important optimization, which was to set fsync off on the target database (version 9.4). Although this setting should never be turned off in production - or anytime y

[continue reading]

PostGIS 2.2 is planned to reach feature freeze June 30th 2015 so we can make the September PostgreSQL 9.5 curtain call with confidence. Great KNN enhancements for PostgreSQL 9.5 only users. I've been busy getting all my ducks lined up. A lot on tiger geocoder and address standardizer extension to be shipped with windows builds, story for later. One other feature we plan to ship with the windows PostGIS 2.2 builds is the ogr_fdw ogr_fdw Foreign data wrapper extension. I've been nagging Paul Ramsey a lot about issues with it, this in particular https://github.com/pramsey/pgsql-ogr-fdw/issues/25, and after some prodding, he finally put his nose in and fixed them and pinged Even Rouault for some help on a GDAL specific item.

Needless to say, I've been super happy with the progress and support I've gotten with ogr_fdw development and really enjoying my ogr_fdw use. The XLSX reading a file saved after the connection was open required a fix in GDAL 2.0 branch (which missed GDAL 2.0.0 release, so because of this, this new package contains a GDAL 2.0.1ish library. Hopeful GDAL 2.0.1 will be out before PostGIS 2.2.0 comes out so I can release without guilt with this fix.


Continue reading "PostgreSQL OGR FDW update and PostGIS 2.2 news"

jsonb is coming up with a set of new features in Postgres 9.5. Most of them have been introduced by the following commit:

commit: c6947010ceb42143d9f047c65c1eac2b38928ab7
author: Andrew Dunstan <andrew@dunslane.net>
date: Tue, 12 May 2015 15:52:45 -0400
Additional functions and operators for jsonb

jsonb_pretty(jsonb) produces nicely indented json output.
jsonb || jsonb concatenates two jsonb values.
jsonb - text removes a key and its associated value from the json
jsonb - int removes the designated array element
jsonb - text[] removes a key and associated value or array element at
the designated path
jsonb_replace(jsonb,text[],jsonb) replaces the array element designated
by the path or the value associated with the key designated by the path
with the given value.

Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core
by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.

Note that some slight modifications have been done after this commit though. So the list of new operators and functions presented here is not exactly the one listed in this commit log but the one that will be included in Postgres 9.5 alpha 1 that will be released next week. Also, something worth mentioning is that portion of this work is available as the extension jsonbx that is compatible even with 9.4 installations (see here).

So, 4 new operators have been added in the existing jsonb set in 9.5.

jsonb || jsonb for concatenation on two jsonb fields, where two things can be noted. First, key name ordering is done depending on their names (this is not surprising as on-disk-format is a parsed tree). Then, the last value of a given key will be used as jsonb enforces key uniqueness, even of course if values are of json type.

=# SELECT '{"a1":"v1","a3":"v3"}'::jsonb || '{"a2":"v2"}'::jsonb AS field;
                 field
--------------------------------------
 {"a1": "v1", "a2": "v2", "a3": "v3"}
(1 row)
=# SELECT '{"a1":{"b1":"y1","b2":"y2"},"a2":"v2"}'::jsonb ||
          '{"a1":"v1"}'::jsonb AS field;
          field

[continue reading]

Posted by Josh Berkus in pgExperts on 2015-06-26 at 21:36:00
It's boomtime in San Francisco, which means we're also full swing into The Cult of the NDA.  This includes many of our venues for SFPUG meetups; they require signing a confidentiality disclaimer before entering their office.  While I question the utility of this, since these hosts are providing us with free space, food, and drink, I'm really not in a position to argue.  So I launched a survey a month ago to see how much of a problem this is for our members.  I thought it might be useful to share the results with other PUG leaders so that they can also develop policies around this.

Here's the results. First, let me give you the overall results in a pie chart.  Scroll down to the bottom of my post for my suggested policy conclusions.


Incidentally, these are all graphed using iPython Notebook and Pylab, which is awesome way to do one-off graphs.  Here's the code for that graph:

    %matplotlib inline
    import psycopg2
    from pylab import *

    conn=psycopg2.connect('dbname=sfpug host=127.0.0.1')
    cur = conn.cursor()
    cur.execute("""SELECT att, count(*) as members
        FROM ndasurvey GROUP BY att ORDER BY att""");

    labels = []
    fracs = []
    explode = []
    for rec in cur:
        labels.append(rec[0])
        fracs.append(rec[1])
        explode.append(0.05)

    figure(1, figsize=(6,6))
    pie(fracs, explode=explode, labels=labels,
                    autopct='%1.0f%%', shadow=True, startangle=90)
    title('Attitudes Towards Venue NDAs: Overall')
    show()


So overall we have a somewhat split distribution.  BTW, here's the definitions of the attitudes:

  • won't attend: I won't go to a meetup which requires signing
  • needs review: I/my employer must review the agreement first
  • depends on text: depends on what agreement says
  • go anway: I don't like them, but I'll still go
  • don't care: I don't care, whatever
Does this differ for which of our three divisions (San Francisco, East Bay, and South Bay) it is?






So, East Bay attendees don't seem to care in general, and South Bay attendees are m

[continue reading]

Posted by Josh Berkus on 2015-06-26 at 18:25:22

main-image

In the wake of DockerCon, the July meetup in San Francisco will be all about Linux Containers:

  • Chris Winslett of Compose.io will present Governor, an auto-failover system for managing a cluster of PostgreSQL replicas.
  • Madhuri Yechuri of ClusterHQ will explain Flocker, which uses the brand-new plugin interface for Docker in order to support volume management and data migration for containers.

If you're not familiar with all of this containerish stuff, Josh Berkus will do a 5-minute primer on using Postgres with Docker at the beginning of the session. Hosted by Salesforce.com in their new Rincon Cafe.

(note: Salesforce will require attendees to RSVP with their full names, and sign a confidentiality statement, which will be supplied to attendees in advance)

As a database, PGDB (PostgreSQL) is fairly standard in its use of SQL. Developers of all colors however, might have trouble switching gears and thinking in set operations, since so many language constructs focus on conditionals and looping. Last week in the performance pitfalls series, we discussed a bit of Set Theory, and how ignorance of its implications can be disastrous. But what about the more mundane?

What happens, for instance, when we treat a database like a programming language?

This time, the example tables will emulate a pretty terrible ordering system with a user table of 1000 users, and a million orders distributed over roughly the last three years. Here it is, in all its glory:

CREATE TABLE sys_user
(
    user_id      SERIAL       NOT NULL,
    username     VARCHAR      NOT NULL,
    password     VARCHAR      NOT NULL,
    last_order   TIMESTAMPTZ  NULL,
    created_dt   TIMESTAMPTZ  NOT NULL DEFAULT now(),
    modified_dt  TIMESTAMPTZ  NOT NULL DEFAULT now()
);
 
INSERT INTO sys_user (username, password, created_dt, modified_dt)
SELECT 'user' || a.id,
       md5('use-bcrypt-instead' || 'user' || a.id || 'somepassword'),
       now() - (a.id % 1000 || 'd')::INTERVAL,
       now() - (a.id % 100 || 'd')::INTERVAL
  FROM generate_series(1, 1000) a(id);
 
ALTER TABLE sys_user ADD CONSTRAINT pk_user_id
      PRIMARY KEY (user_id);
 
ANALYZE sys_user;
 
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    user_id      INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (product_id, item_count, user_id, order_dt, valid_dt)
SELECT (a.id % 100000) + 1, (a.id % 100) + 1, (a.id % 1000) + 1,
       now() - (a.id % 1000 || 'd')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CON

[continue reading]

I am back from the PGCon 2015 and was fortunate enough to present my first paper on "Native Compilation" technology. Also got opportunity to meet most creamy  folks of PostgreSQL community, found everyone to be very polite and easy to go.

As part of this Native Compilation technology, I mostly focused on the Native Compilation of Relation, which we call it as Schema Binding.

Some of the details from presentation are as below (For complete presentation please visit Go Faster With Native Compilation):

Native Compilation:
Native Compilation is a methodology to reduce CPU instructions by executing only instruction specific to given query/objects unlike interpreted execution. Steps are:

  • Generate C-code specific to objects/query.
  • Compile C-code to generate DLL and load with server executable.
  • Call specialized function instead of generalized function.
Schema Binding:
Native Compilation of relation is called the Schema Binding. Since most of the properties of a particular remains same once it is created, so its data gets stored and accessed in the similar patter irrespective of any data. So instead of accessing tuples for same relation in generic way, we create a specialized access function for the relation during its creation and the same gets used for further query containing that table.


It gives performance improvement of upto 30% on standard TPC-H benchmark.


The nseq datatype allows to store DNA and RNA sequences consisting of the letters AGCT or AGCU respectively in PostgreSQL.

By encoding four bases per Byte, it uses 75 percent less space on disk than text. While this idea is obvious and far from being novel, it still is one of the most efficient compression schemes for DNA/RNA, if not the most efficient.

As of now, nseq only supports very basic native operations. The main shortcoming is, that it has to be decompressed into text hence expanding by 4x, for e. g. substring operations and the like.

This will change.

Enough said - here it is...
A short blog to enable write-able mongo_fdw extension in PostgreSQL 9.4. PostgreSQL provides a powerful feature called Foreign Data Wrappers (FDW), which allows DBAs to connect to other data sources from within PostgreSQL. Foreign Data Wrapper implementation is based on SQL/MED, that's supported from PostgreSQL 9.1 version onwards, which means we can now access remote database through PostgreSQL seamlessly. Today we have variety of FDW's available, in this blog, we will be compiling a latest version of write-able FDW "mongo_fdw" to access MongoDB.

Latest mongo_fdw extension is based on Mongo-c-driver and Libbson. To implement mongo_fdw, first we need to compile all the dependencies required by the extension. Below are the step-by-step execution on my CentOS 7 (64bit) machine with PostgreSQL 9.4 installed.

Step 1. First install dependency packages required by Mongo-c-Driver and Libbson.
yum install git automake autoconf libtool gcc
Step 2. Clone mongo_fdw repository from Github.
git clone https://github.com/EnterpriseDB/mongo_fdw.git
Step 3. Pre-compilation require pkgconfig/pkg-config (installed in Step 1) and PostgreSQL pg_config location set in the path.
[root@localhost ~]# export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
[root@localhost ~]# export PATH=/opt/PostgreSQL/9.4/bin:$PATH

[root@localhost mongo_fdw]# type pg_config
pg_config is /opt/PostgreSQL/9.4/bin/pg_config
Step 4. Mongo_fdw compilation can be done manually or with the help of auto-compilation script (autogen.sh) provided in the bundle. Here, I will be using auto-compilation script, which will  download and install required mongo-c-driver and libbson libraries in default location(/usr/local/lib). For more details on compilation script refer to the documentation here.
cd mongo_fdw/
./autogen.sh --with-master
make
make install
After compilation, we can notice the files created in PostgreSQL home directory.
-bash-4.2$ find $PWD -name "mongo*"
/opt/PostgreSQL/9.4/lib/postgresql/mongo_fdw.so
/opt/PostgreSQL/9.4/share/postgresql/exte

[continue reading]

Posted by Shaun M. Thomas on 2015-06-23 at 20:30:24

Well, I’ve returned from PGCon 2015 in Canada, and after a couple days to decompress, it’s time to share. I wrote about the PGCon 2014 unconference after returning to Chicago last year, so I felt it was only fitting that I start there. I feel as strongly now as I did a year ago, that directly interacting with the PostgreSQL maintainers at this level helps the community thrive. Even though PGCon is generally a more developer-focused conference, being able to brainstorm with the bigwigs, even if nothing comes of it, means the ideas have been given a fair shake.

The format this year was quite a bit different than last year. I attribute this to Josh Berkus once again, in what I see as his attempt to formalize the process and give it more weight. Indeed, it’s hard to argue with his results. Just take a look at the 2015 Unconference Wiki page. It’s a deluge of information I wish we had about the 2014 talks, from attendees and talk summaries, to relevant external links and the all-important schedule. I’m a bit biased in that regard because I tend to produce and consume vast swaths of excessive information, but it’s an excellent reflection on how much the PostgreSQL community values documentation in general.

Unfortunately due to inclement weather, I missed the voting process and the first day of talks entirely. I desperately missed watching the talk selection process, though Josh said they did a lot of that electronically because several people would be late to the conference. I’m not sure how I missed that, so I’ll blame email; it deserves it anyway. Regardless, after witnessing the 2014 talk selection, I stand by my earlier assertion that it’s a sight to behold. It warms my crooked old heart to watch people so excited about technology and the amicable interaction they have with the friendly developers.

Despite the setbacks, I did attend several chats on Wednesday, which is another departure from last year’s format. In 2014, the selection process and every talk were constrained to one long Saturday and was very ad-ho

[continue reading]

Posted by Bruce Momjian in EnterpriseDB on 2015-06-20 at 18:07:50

This nice photo was taken during this year's PgCon Developer Meeting. My family was surprised at how many other developers they have met over the years. A signed copy of the photo was sold for USD ~$500 at the conference's charity auction.

The meeting was very productive, and in fact the entire conference has been productive.

Here are my slides from pgCon. First, my slides and Dmitry Dolgov's slides from the presentation on 9.5 Json Features, and also those from my lightning talk on the Rotfang Foreign Data Wrapper.
Posted by Andrew Dunstan in pgExperts on 2015-06-20 at 01:54:00
The other day I gave a lightning talk at pgCon about a Foreign Data Wrapper called Rotfang, which is for generating arbitrary amounts of random data. This is intended for generating test cases. The software supports a small number of data types natively, but also allows you to use user-supplied functions to generate data. It's available on bitbucket. A short example:
CREATE EXTENSION rotfang_fdw;

CREATE FUNCTION random_tstz (typmod int)
RETURNS timestamptz
LANGUAGE SQL as
$$
SELECT now() - (random() * 20.0) * interval '1 year'
$$;

CREATE FOREIGN TABLE rand2 (b boolean, ts timestamptz)
SERVER rotfang
OPTIONS (maxrows '10', func_ts 'random_tstz');

SELECT * FROM rand2;

Functions are great. Having cut my teeth on a database that didn’t even provide the ability to define functions, I’ve come to almost take them for granted in PGDB (PostgreSQL). However, with this kind of ubiquity, sometimes they can be overused in ways that don’t seem to be part of the common programmer lexicon. In this week’s PG Phriday series on performance-killing missteps, I’m going to talk a bit about set theory, and how a certain amount of familiarity is necessary to properly interact with a database.

One of the axioms of set theory states that f(x) = y, and x !~ y. That is, a function applied to some value produces a value that may not be equivalent to the original. Put another way, a spayed pet does not have the same reproductive capacity as a regular pet, and is thus not equal to the original. In the context of a database, this is extremely relevant because equivalence is what makes calculations and restrictions possible.

Given this, consider what a database index does. When creating an index, the database takes the value of one or more columns and essentially builds a pointer tree to quickly locate a record in ln time. This is much faster than reading every record in a table and applying a filter to remove unwanted records. That’s something even novice developers tend to know. But they don’t know how functions modify this scenario.

Given the very simplified introduction to set theory, applying a function to the column value means the database must discard equality. Remember: x !~ y. The database has indexed x, not y. So when a function is used in a WHERE clause, any index on x will be ignored. This makes perfect sense, considering there are an infinite number of possible functions, and the output of each is indeterminate. It’s not possible to predict the end result of every possible function.

To further illustrate this, we’ll reuse one of our tried-and-true examples:

CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT        

[continue reading]

After seeing an email on pgsql-general about a user willing to be able to archive WAL from a standby to store them locally and to save bandwidth by only receiving the WAL segments through a WAL stream, let's talk about a new feature of Postgres 9.5 that will introduce exactly what this user was looking for, as known as being able to archive WAL from a standby to have more complicated archiving strategies. This feature has been introduced by this commit:

commit: ffd37740ee6fcd434416ec0c5461f7040e0a11de
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Fri, 15 May 2015 18:55:24 +0300
Add archive_mode='always' option.

In 'always' mode, the standby independently archives all files it receives
from the primary.

Original patch by Fujii Masao, docs and review by me.

As mentioned in the commit message, setting archive_mode = 'always' will make a standby receiving WAL from a primary server archive the segments whose reception has been completed. While it can be interesting for even a set of nodes running on the same host to have each of them archive independently WAL segments on different partitions, this becomes more interesting when nodes are on separate hosts to be able for example to reduce the bandwidth usage as the bandwidth necessary to archive the WAL segments on the standby host is directly included in the WAL stream that a standby gets from its root node, saving resources at the same time.

Let's have a look at how this actually works with a simple set of nodes, one master and one standby running on the same host, listening respectively to ports 5432 and 5433 for example. Each node runs the following archiving configuration:

$ psql -At -c 'show archive_command' -p 5432
cp -i %p /path/to/archive/5432/%f.master
$ psql -At -c 'show archive_command' -p 5433
cp -i %p /path/to/archive/5432/%f.standby
$ psql -At -c 'show archive_mode' -p 5432
always
$ psql -At -c 'show archive_mode' -p 5433
always

So with that, both the standby and its primary node will archive their WAL segments once they are cons

[continue reading]

Posted by Denish Patel in OmniTI on 2015-06-18 at 19:02:59
Postgres 9.4 introduced an awesome feature Replication Slots. This allows you to implement Postgres replication without using any external archive management tools. Yesterday, I presented three hours long tutorial on “Out of the box Postgres 9.4 Replication using Replication slots” at PgCon (Postgres conference) in Ottawa,Canada. If you want to follow along slides with VM, you … Continue reading Postgres Replication using Replication Slots
Posted by Andrew Dunstan in pgExperts on 2015-06-14 at 15:44:00
Yesterday I proposed an Unconference talk about Set types, and I've had a couple of people ask me about what Set types are and what they would be for, so here is a brief rundown.

Say you have a table of things that you want to be able to apply some set of tags to. It might be blog posts, for example. The tags might be subject classifications, or reader ratings. Let's say for the sake of argument that it's going to be subject classifications, and that we actually have a (large) enum type to standardize that.

The classic way to do this is to have a table of {blog_post_id, tag} which will be unique on the combination. Up to now adding a tag is a nuisance - you need to make sure the tag doesn't already exist or you'll get a uniqueness violation. In 9.5 that's improved with INSERT ... ON CONFLICT IGNORE. Even then it's a bit more work that I really want to do. What I really want is something like this:
UPDATE blog_posts
SET subject_tags = subject_tags + 'my_new_subject'
WHERE post_id = 12345
and then I'll be able to do
SELECT * FROM blog_posts
WHERE subject_tags ? 'some subject'
It's also possible to do this, somewhat inefficiently, by using hstore or jsonb fields (in jsonb you'd just have a top level object). In both cases you would use dummy values - say make everything have a value of 1. But that's ugly, and error prone, and rather hackish.

Now I don't know if there is enough extra value here to justify the work and code maintenance involved. I've managed to live without it all these years without attacking what little hair I have left. But it's something where I have occasionally thought "I could do this so much more simply and elegantly if I had a set type." So I'd like to see if there is enough interest to make it into a real project.
You may have noticed that we've had a lot of churn in PostgreSQL update releases lately, doing three updates in less than two months.  We expect that we're done with that now, so it's time to get on updating all of your servers with today's update release.

Users of PostgreSQL 9.3 and 9.4 should update this weekend if possible.  This cumulative update fixes several problems with our "multixact" mechanism which have been plaguing Postgres since the release of 9.3.0.  While most users will not hit these issues, if you do, the bugs cause unrecoverable data corruption.  In other words, waiting "until you see a problem" is a really, really bad idea.

Additionally, one of the problems fixed requires cleaning up some garbage left by the version of pg_upgrade which shipped with Postgres versions 9.3.0 to 9.3.4.  If you used pg_upgrade to upgrade to one of those versions -- even if you subsequently upgraded to 9.4 -- then the server is liable to trigger a database-wide all-table autovacuum as soon as you restart after applying 9.3.9 or 9.4.2. 

If you are prepared for minutes to hours of degraded performance after applying the update, this is not a problem.  If it is a problem, though, you can do precautionary vacuuming before you apply the update.  Detailed instructions are supplied in the release notes on how to do this.

Now, go update those servers!
Posted by Bruce Momjian in EnterpriseDB on 2015-06-12 at 18:45:01

The draft Postgres 9.5 release notes are now online. This document will be adjusted regularly until the final release, which is expected later this year.

There are a lot of database engines out there. As such, a developer or DBA will naturally have varying levels of experience with each, and some of this might conflict with how PGDB (PostgreSQL) operates. These kinds of embedded misunderstandings can cause potential issues by themselves, but in this particular case, corrective action is fairly simple.

So this week, I’d like to talk about indexes. Many people treat them as a “make query faster” button, and this often results in egregious misuse. Indexes take space, require CPU resources to maintain, and bring overhead that adversely affects INSERT and UPDATE performance. Most know these drawbacks and generally acknowledge that too many indexes can be detrimental. However, with some databases, even column order can make a drastic difference. PGDB is among these.

To illustrate the point, let’s make a simple test case of one-million records in an inventory tracking table of one thousand products over the course of about three years.

CREATE TABLE sys_inventory_snapshot
(
    product_id   SERIAL       NOT NULL,
    record_dt    TIMESTAMPTZ  NOT NULL,
    item_count   INT          NOT NULL,
    order_count  INT          NOT NULL
);
 
INSERT INTO sys_inventory_snapshot (
       product_id, item_count, order_count, record_dt
)
SELECT b.id, a.id, a.id,
       now() - (a.id || 'd')::INTERVAL
  FROM generate_series(1, 1000) a(id),
       generate_series(1, 1000) b(id);
 
ALTER TABLE sys_inventory_snapshot ADD CONSTRAINT pk_inventory_snapshot
      PRIMARY KEY (product_id, record_dt);

At first glance, this looks pretty good. If we want information based on date or product, it’s right there. Things are even better if we have both values available! The query plan is also encouraging:

EXPLAIN ANALYZE
SELECT *
  FROM sys_inventory_snapshot
 WHERE record_dt >= CURRENT_DATE - INTERVAL '1 day';
 
                            QUERY PLAN                             
--------------------------------------------------------------------
 Bitmap Heap Scan ON sys_inventory_snap

[continue reading]

As title suggests,  if you are running Postgres in your environment, it is very important that you plan to upgrade Postgres with the latest announced release.

I upgraded Postgres recently in last couple of weeks. Should I upgrade again? Why?

Yes, you should plan to upgrade again.

Postgres released data corruption and security bug fix release on May 22nd and follow up release on June 4th to fix some of the issue introduced in previous release. However, the June 4th release did not fix the  multixact wraparound bugs (Thread1Thread2 ) introduced in recent versions.  The wraparound bug could cause problem starting Postgres after the crash , so it is critical to apply. The wraparound bug is more critical for 9.3 & 9.4 but it is very good idea to upgrade for other versions to make sure there isn’t any problem. Read the release notes for the further details/explanation.

Special care should be taken regarding vacuuming , if you ever ran Postgres 9.3 version and used pg_upgrade for the production databases.

What if I run Postgres 8.X (8.1,8.2,8.3.8.4) ?

Yes, absolutely!!  You are running EOL or unsupported version of Postgres. It is very important that you upgrade Postgre database to supported release  because it is most likely to hit any or all of the  security and/or data loss bugs fixed in recent Postgres releases.

What if I run Postgres 9.X  but have not applied the the latest minor release?

Yes, you should upgrade to the latest minor release (3rd digit in version numbering). The minor release only requires installing new binaries and restart the database.  It does _not_ require to use pg_dump/restore or pg_upgrade.

You should be upgrading to the following latest relevant Postgres release for your environment ASAP:

  • 9.4.4
  • 9.3.9
  • 9.2.13
  • 9.1.18
  • 9.0.22
How can I keep track of the supported Postgres releases?

Postgres releases support can be found here. It is worth to note that , if you are running Postgres 9.0,  you should plan for major version upgrade (most likely to latest Postgres 9.4.X release)  before the holiday

[continue reading]

Temporary tables are a core feature of SQL and are commonly used by people around the globe. PostgreSQL provides a nice implementation of temporary tables, which has served me well over the years. An interesting question arises when using a temporary table: What if a temporary table has the same name as a “real” table? What […]
Posted by gabrielle roth on 2015-06-12 at 00:00:00

When: 6-8pm Thursday June 18, 2015
Where: Iovation
Who: Mark Wong
What: Pg Performance

Come have a chat with our local performace expert, Mark Wong.

This presentation will take a look at the performance changes in the development version of PostgreSQL. A selection of features, some proposed and some already committed, are evaluated using synthetic open source workloads. One of the workloads used is the former OSDL’s Database Test 3 that consists of business oriented ad-hob queries with concurrent data modification.


If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Elevators open at 5:45 and building security closes access to the floor at 6:30.

The building is on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots.

See you there!


PG Partition Manager has been the most popular project I’ve ever done a significant amount of work on and I really appreciate everyone’s feedback for the roughly 2 years it’s been out there. I’ve got plenty more ideas for development and features and look forward to being able to move forward on them with this new major version released.

PostgreSQL 9.3 introduced the ability for user created, programmable background workers (BGW). 9.4 then introduced the ability to dynamically start & stop these with an already running cluster. The first thing that popped into my mind when I heard about this was hopefully having some sort of built-in scheduling system. There still hasn’t been a generalized version of anything like this, so in the mean time I studied the worker_spi contrib module. This is a very simple BGW example with a basic scheduler that runs a process with a configurable interval. This is basically all pg_partman needs for partition maintenance, and what required an external scheduler like cron before.

pg_partman_bgw.dbname
    * Required. The database(s) that `run_maintenance()` will run on. If more than one, use a comma separated list. If not set, BGW will do nothing.

pg_partman_bgw.interval
    * Number of seconds between calls to `run_maintenance()`. Default is 3600 (1 hour).
    * See further documenation on suggested values for this based on partition types & intervals used.

pg_partman_bgw.role
    * The role that `run_maintenance()` will run as. Default is "postgres". Only a single role name is allowed.

pg_partman_bgw.analyze
    * Same purpose as the p_analyze argument to `run_maintenance()`. Set to 'on' for TRUE. Set to 'off' for FALSE. Default is 'on'.

pg_partman_bgw.jobmon
     Same purpose as the p_jobmon argument to `run_maintenance()`. Set to 'on' for TRUE. Set to 'off' for FALSE. Default is 'on'.

The above are the new postgresql.conf options that pg_partman v2.0.0 can use to control the background worker it now comes with. These options can be changed at any time with a simple reload, but

[continue reading]

Posted by Bruce Momjian in EnterpriseDB on 2015-06-10 at 12:30:01

A long-requested pg_upgrade feature is the ability to do major-version upgrades of standby servers without requiring recreation of the standby servers. Thanks to Stephen Frost, it is now possible to perform such upgrades using rsync, as outlined in the 9.5 pg_upgrade documentation, step 9. This works not only for 9.5, but for all supported versions of pg_upgrade, back to 9.0.

Posted by Bruce Momjian in EnterpriseDB on 2015-06-09 at 23:15:01

There has been some confusion by old and new community members about the purpose of the core team, and this lack of understanding has caused some avoidable problems. Therefore, the core team has written a core charter and published it on our website. Hopefully this will be helpful to people.

Greetings PostgreSQL Gurus,

We are delighted to have Bruce Momjian from EnterpriseDB for the August meeting!  Our thanks to EnterpriseDB for loaning us this premier “teacher of all things PostgreSQL.”

You can reply to the Dallas PUG here: DFW PUG on Meetup.

YeSQL: Battling the NoSQL Hype Cycle with Postgres

Wednesday, Aug 12, 2015, 7:00 PM

Realpage
4000 International Pkwy Carrollton, TX

4 Gurus Attending

Greetings PostgreSQL Gurus,We are delighted to have Bruce Momjian from EnterpriseDB for the August meeting!  Our thanks to EnterpriseDB for loaning us this premier “teacher of all things PostgreSQL.”Please note the exception to our regular schedule.  This meeting will be on the second Wednesday of August!  Realpage has been kind enough to sponsor…

Check out this Meetup →

Please note the exception to our regular schedule.  This meeting will be on the second Wednesday of August!  Realpage has been kind enough to sponsor us for this  event.  The meeting location will be in the Realpage offices.  Be sure to let them know that we appreciate it.

Abstract:

Relational databases are regularly challenged by new technologies that promise to make SQL obsolete, but the new technologies often fade into obscurity.  This talk explores how new NoSQL technologies are unique, and how existing relational database systems like Postgres are adapting to handle NoSQL workloads.

About Bruce:

Bruce Momjian is a co-founder of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996 as a committer and community leader.

He is the author of PostgreSQL: Introduction and Concepts, published by Addison-Wesley. Bruce is employed by EnterpriseDB as a Senior Database Architect.

This is the meeting you won’t want to miss!  You’ll get to meet Bruce, along with everyone in Dallas that is anybody at all in the PostgreSQL community.   Now is your chance, don’t miss it.
See you there,

Kirk Roybal

Posted by Quinn Weaver in pgExperts on 2015-06-09 at 20:08:00

A client complained of mysterious connection spikes. "We think we're getting hit by bots, but we're not sure," they said; "Can you look into it?"

So I edited postgresql.conf, turned on log_connections and log_disconnections, did a 'pg_ctl -D MY_DATA_DIR reload', and waited (in my case I also set log_min_duration_statment = 0, but I don't recommend that unless you can watch to make sure the logs aren't filling the log partition — and make sure you're not going to hammer a slow log device).

A while later, I did

ls -t data/pg_log/*.csv | head -5 | xargs grep -c connection

(That says "List log files in order from most recent to least recent, take the five most recent, and count the lines in each file containing 'connection'.")

I got lucky. A spike was obviously in progress; the current log file showed a count two orders of magnitude higher than the other log files'. Since the problem was in progress, no post mortem was required; I just mailed the client and asked "Do your web logs show bot traffic right now?"

The client mailed right back saying, "Yes, we see a bunch of connections from Baidu's web crawler; it's going nuts hitting our site."

Baidu is a Chinese search engine. It turns out its crawling bot is well-known for being aggressive. As that link explains, the bot doesn't respect Crawl-delay in robots.txt (that's annoying but unsurprising, since Crawl-delay is a non-standard extension).

So if this happens to you, you have these options:

  1. Register at Baidu's site (Chinese-only) to request less-aggressive crawling.
  2. Try to tune PostgreSQL connection pooling to handle the onslaught.
  3. Block Baidu entirely. This is an extreme option — not advised if you do business in China, or if you have Chinese-speaking customers elsewhere who are reaching you through Baidu. The latter requires analytics to figure out.
Posted by Jim Nasby on 2015-06-08 at 17:31:24

On June 30, 2015, a leap second will occur to synchronize atomic clocks with the Earths’ slowing rotation . The International Earth Rotation and Systems Service (IERS) has used this method 26 times since 1972. The leap second of 2012 caused brief disruptions in computer networks for large sites including LinkedIn, Gawker, Mozilla and Reddit. Most importantly, the critical systems including flight check-ins for Qantas Airlines were disrupted.

On October 1, 2015, Nasdaq will increase the granularity of timestamps on some of its datafeeds to reflect nanosecond-level timing increments as part of the rollout of Versions 2.1 of each feed. According to NasdaqTrader.com’s Data Technical News #2015 – 11, a planned parallel period for 2.0 and 2.1 will occur October 1, 2015, through January 29, 2016 for those feeds which will have new versions including NASDAQ Best Bid and Offer (QBBO), Nasdaq Last Sale (NLS), and NLS Plus.

It would seem that the world is getting more precise in measuring time, but there’s something innately flawed with the rationale behind these efforts.

You think you know time — but you really don’t

The crux of the problem with timestamps is people mistake measurements of time (ie: a timestamp) with the flow of time that we experience in the real world.  The time that we experience in the real world is immutable, irrevocable, and infinitely divisible. It flows forever onward in a single direction as a series of events. You can never undo something (as much as we might sometimes wish we could!)

You can’t actually pin down real-world time, because you can’t measure an actual instant in time. We like to think that we talk about an instant of time, but what we’re really talking about is a duration. 6:14 PM isn’t an instant, it’s a duration of time between 6:14 PM and 6:15 PM. Increasing your precision doesn’t change this. A second is always a second long. A nanosecond is still just a duration. No matter how finely you slice it, multiple events can happen within a single second, nanosecond, femtosecond, and so

[continue reading]

Today's highlight is about a new system view that will land in Postgres 9.5, that has been introduced by this commit:

commit: 9029f4b37406b21abb7516a2fd5643e0961810f8
author: Magnus Hagander <magnus@hagander.net>
date: Sun, 12 Apr 2015 19:07:46 +0200
Add system view pg_stat_ssl

This view shows information about all connections, such as if the
connection is using SSL, which cipher is used, and which client
certificate (if any) is used.

Reviews by Alex Shulgin, Heikki Linnakangas, Andres Freund & Michael
Paquier

pg_stat_ssl is a system view showing statistics about the SSL usage of a given connection, with one row per connection. The view is shaped as follows, with information about if SSL is enabled, what is the version used, the cipher, etc.

=# \d pg_stat_ssl
   View "pg_catalog.pg_stat_ssl"
   Column    |  Type   | Modifiers
-------------+---------+-----------
 pid         | integer |
 ssl         | boolean |
 version     | text    |
 cipher      | text    |
 bits        | integer |
 compression | boolean |
 clientdn    | text    |

SSL is not by default enabled in a Postgres build, even if most of the versions of Postgres distributed in many OSes do use it. Still, if the switch --with-openssl has not been used at configure all the connections will be reported with NULL columns, and SSL of course is reported as disabled. If SSL is disabled on the server on the server, this system view reports the same information:

=# SELECT * FROM pg_stat_ssl;
  pid  | ssl | version | cipher | bits | compression | clientdn
-------+-----+---------+--------+------+-------------+----------
 12533 | f   | null    | null   | null | null        | null
(1 row)

Now, when SSL is enabled on a server, one row is reported per connection. Here is for example the case of two connections currently running on a server, one using SSL to connect, and another not using SSL:

=# SHOW ssl;
 ssl
-----
 on
(1 row)
=# SELECT ssl, version, bits, compression FROM pg_stat_ssl;
 ssl | version | bits | compression
-----+---------+------+-----

[continue reading]