PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.

A few weeks ago I posted a series of articles explaining the ideas and goals behind memory contexts, basics of implementation in PostgreSQL, and even some common issues you may run into when using the memory contexts.

I mentioned there are two main reasons why memory contexts are introduced. First, to simplify the memory management - tracking life cycle of all the allocated pieces, making it less burdensome for the developers and also preventing some usual memory leak scenarios etc.

The second goal is (of course) improving performance. Because everyone knows (or rather believes) that the generic allocators (malloc/free implementations provided by your OS - kernel/libc/...) are slow. But is that still true?

After all, there are papers like Reconsidering Custom Memory Allocation essentially demonstrating that most custom allocators don't really perform any better than the a generic one (although they claim that allocators based on regions - aka blocks - are a notable exception).

Also, we're getting a new kernel version every few months, presumably getting improvements in the memory management too. So maybe there were some significant improvements rendering the additional complexity of custom allocator pointless?

We can't just zap the memory contexts completely, as we'd loose the first benefit (tracking the allocated pieces). But maybe we could change the implementation so that it really is just a thin wrapper around malloc/free with a simple trackinng ... so let's try that.

As explained here, the default implementation is based on blocks, that are then slided into chunks in response to palloc() calls. All this happens in PostgreSQL code, and the consequence is that it's impossible to free the chunks directly (by calling free()) because it's the whole block that gets allocated by malloc().

So let's rip out all the block-related stuff, leaving us with contexts looking like this:

typedef struct AllocSetContext
{
    MemoryContextData header;   /* Standard memory-context fields */
    AllocChunk chunks;          /* d

[continue reading]

On 11th of November, Fujii Masao committed patch: Add generate_series(numeric, numeric).   Платон Малюгин Reviewed by Michael Paquier, Ali Akbar and Marti Raudsepp generate_series() is one of the most commonly used functions – at least for me. If you're not familiar with it – it generates set of rows with values based on arguments. For […]
On 7th of November, Alvaro Herrera committed patch: BRIN is a new index access method intended to accelerate scans of very large tables, without the maintenance overhead of btrees or other traditional indexes. They work by maintaining "summary" data about block ranges. Bitmap index scans work by reading each summary tuple and comparing them with […]
Posted by Leo Hsu and Regina Obe on 2014-11-21 at 21:16:00

Yesterday was PostGIS day or as some may call it, Post GIS day and a couple of interesting things happened this day:

  • PostgreSQL 9.4 RC1 came out.
  • There were parties and unconferences, many summarized on http://2014.postgisday.rocks
  • I managed to entertain myself with a Conway's game of life PostGIS raster Map algebra style and pondered how wonderful PostGIS would be if it could generate animated gifs with some sort of aggregate function; to which I was politely called crazy by some of my fellow PSC friends.

But what was greatest of all and took the cake were these pictures:


Continue reading "PostGIS Day synopsis"

I recently updated the PostBooks packages in Debian and Ubuntu to version 4.7. This is the version that was released in Ubuntu 14.10 (Utopic Unicorn) and is part of the upcoming Debian 8 (jessie) release.

Better prospects for Fedora and RHEL/CentOS/EPEL packages

As well as getting the packages ready, I've been in contact with xTuple helping them generalize their build system to make packaging easier. This has eliminated the need to patch the makefiles during the build. As well as making it easier to support the Debian/Ubuntu packages, this should make it far easier for somebody to create a spec file for RPM packaging too.

Debian wins a prize

While visiting xTupleCon 2014 in Norfolk, I was delighted to receive the Community Member of the Year award which I happily accepted not just for my own efforts but for the Debian Project as a whole.

Steve Hackbarth, Director of Product Development at xTuple, myself and the impressive Community Member of the Year trophy

This is a great example of the productive relationships that exist between Debian, upstream developers and the wider free software community and it is great to be part of a team that can synthesize the work from so many other developers into ready-to-run solutions on a 100% free software platform.

Receiving this award really made me think about all the effort that has gone into making it possible to apt-get install postbooks and all the people who have collectively done far more work than myself to make this possible:

[continue reading]

This is a corner-case advise to anyone looking for a solution as to why sqsh / tsql / freetds combination is working perfectly in one SQL2012 instance but unable to login to a newly configured SQL2012 instance, the details for which just came in. Symptoms Sqsh / Tsql / FreeTDS is perfectly configured The setup logs in to another SQLServer perfectly well All this when you are able to login

A couple of days ago the following feature related to pg_dump has been committed and will be in Postgres 9.5:

commit: be1cc8f46f57a04e69d9e4dd268d34da885fe6eb
author: Simon Riggs <simon@2ndQuadrant.com>
date: Mon, 17 Nov 2014 22:15:07 +0000
Add pg_dump --snapshot option

Allows pg_dump to use a snapshot previously defined by a concurrent
session that has either used pg_export_snapshot() or obtained a
snapshot when creating a logical slot. When this option is used with
parallel pg_dump, the snapshot defined by this option is used and no
new snapshot is taken.

Simon Riggs and Michael Paquier

First, let's talk briefly about exported snapshots, a feature that has been introduced in PostgreSQL 9.2. With it, it is possible to export a snapshot from a first session with pg_export_snapshot, and by reusing this snapshot in transactions of other sessions all the transactions can share exactly the same state image of the database. When using this feature something like that needs to be done for the first session exporting the snapshot:

=# BEGIN;
BEGIN
=# SELECT pg_export_snapshot();
 pg_export_snapshot
--------------------
 000003F1-1
(1 row)

Then other sessions in parallel can use SET TRANSACTION SNAPSHOT to import back the snapshot and share the same database view as all the other transactions using this snapshot (be it the transaction exporting the snapshot or the other sessions that already imported it).

=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
=# SET TRANSACTION SNAPSHOT '000003F1-1';
SET
=# -- Do stuff
[...]
=# COMMIT;
COMMIT

Note that the transaction that exported the snapshot needs to remain active as long as the other sessions have not consumed it with SET TRANSACTION. This snapshot export and import dance is actually used by pg_dump since 9.3 for parallel dumps to make consistent the dump acquisition across the threads, whose number is defined by --jobs, doing the work.

Now, this commit adding the option --snapshot is simply what a transaction importing a snapshot does: caller can

[continue reading]

In the first place, we need to know about why such requirement needed. IMO, its absolutely a business necessity to maintain some kind of historical data on the target database(Slave Node). Especially, out of multiple slave nodes one of the slave node to retain the very first form of the data when it initially written into the database.

To accomplish this requirement, we should come up with some kind of filters like TRIGGERs/RULEs on Slave Node so that it avoids relaying DELETE and UPDATE statements. Since we are dealing with Slony-I, it doesn't have such built-in mechanism to filter DML's while replaying them on slave node though it has gathered all events from the Master node.(AFAIK Mysql,Oracle,SQL Server do support filters).

To get this straight, traditional Slony-I way maintains uniqueness of rows across all the nodes with its core concept of tables must have primary keys. In such architecture design, its hard to exclude DELETE/UPDATE statements, take an example of primary key column "orderid" of "orders" table has a first INSERT statement with value 100 and its been replicated as first form on filtered Slave Node. Later a DELETE statement executed for "orderid=100" and deleted row, now if any INSERT or UPDATE statement attempts to use the "orderid=100" then Slave node hits with duplicate key violation and it simple break the replication.
ERROR:  duplicate key value violates unique constraint "reptest_pkey"
DETAIL: Key (id)=(2) already exists.
CONTEXT: SQL statement "INSERT INTO "public"."reptest" ("id", "name") VALUES ($1, $2);"
.....
or
....
CONTEXT: SQL statement "UPDATE ONLY "public"."reptest" SET "id" = $1 WHERE "id" = $2;"
2014-11-17 23:18:53 PST ERROR remoteWorkerThread_1: SYNC aborted
Thus, implementing rule not an issue yet one should be extremely cautious when its in place. In reality however applying these filters on Slony-I slave node are very fragile, especially application/developer should always keep this in mind any duplicate entry of row by INSERT OR UPDATE could break the replication.

As D

[continue reading]

Posted by Josh Berkus in pgExperts on 2014-11-19 at 21:05:00
A month ago I got into an argument on IRC with Sergey about telling people to avoid kernel 3.2.  This turned out to be a very productive argument, because Sergey then went and did a battery of performance tests against various Linux kernels on Ubuntu. Go read it now, I'll wait.

My takeaways from this:

  • Kernel 3.2 is in fact lethally bad.
  • Kernel 3.13 is the best out of kernel 3.X so far.  I hope that this can be credited to the PostgreSQL team's work with the LFS/MM group.
  • No 3.X kernel yet has quite the throughput of 2.6.32, at least at moderate memory sizes and core counts.
  • However, kernel 3.13 has substantially lower write volumes at almost the same throughput.  This means that if you are write-bound on IO, 3.13 will improve your performance considerably.
  • If your database is mostly-reads and highly concurrent, consider enabling
    kernel.sched_autogroup_enabled.
    
Thanks a lot to Sergey for doing this testing, and thanks even more to the LFS/MM group for improving IO performance so much in 3.13.

Posted by Christophe Pettus in pgExperts on 2014-11-19 at 06:36:54

One common source of query problems in PostgreSQL results an unexpectedly-bad query plan when a LIMIT clause is included in a query. The typical symptom is that PostgreSQL picks an index-based plan that actually takes much, much longer than if a different index, or no index at all, had been used.

Here’s an example. First, we create a simple table and an index on it:

xof=# CREATE TABLE sample (
xof(#   i INTEGER,
xof(#   f FLOAT
xof(# );
CREATE TABLE
xof=# CREATE INDEX ON sample(f);
CREATE INDEX

And fill it with some data:

xof=# INSERT INTO sample SELECT 0, random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# ANALYZE;
ANALYZE

Then, for about 5% of the table, we set i to 1:

UPDATE sample SET i=1 WHERE f<.0.05;
ANALYZE;

Now, let’s find all of the entires where i is 1, in descending order of f.

xof=# EXPLAIN ANALYZE SELECT * FROM sample WHERE i=1 ORDER BY f DESC;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=399309.76..401406.04 rows=838509 width=12) (actual time=1415.166..1511.202 rows=499607 loops=1)
   Sort Key: f
   Sort Method: quicksort  Memory: 35708kB
   ->  Seq Scan on sample  (cost=0.00..316811.10 rows=838509 width=12) (actual time=1101.836..1173.262 rows=499607 loops=1)
         Filter: (i = 1)
         Rows Removed by Filter: 9500393
 Total runtime: 1542.529 ms
(7 rows)

So, 1.5 seconds to do a sequential scan on the whole table. So, just getting the first 10 entries from that should be much faster, right?

xof=# EXPLAIN ANALYZE SELECT * FROM sample WHERE i=1 ORDER BY f DESC LIMIT 10;
                                                                        QUERY PLAN                                                                        
------------------------------------------------------------------------------------------------------------------------------------

[continue reading]

Posted by Josh Berkus on 2014-11-18 at 22:55:24

Use Postgres? In the San Francisco Bay Area? Maybe you're planning to attend FOSS4G-NA or EclipseCon? Well, join us as well! pgDay SF 2015 will be on March 10th in Burlingame, CA. We are currently looking for speakers and sponsors for the event.

pgDaySF will be a one-day, one track event held alongside FOSS4G North America and EclipseCon, allowing for cross-pollination among geo geeks, Java programmers, and PostgreSQL fans. We are looking for both user-oriented and advanced database talks, with a slant towards PostGIS. Interested? Submit a talk now. Submissions for full talks close on December 9th.

If your company uses or supports PostgreSQL, or markets products to PostgreSQL and PostGIS users, then you may want to sponsor the pgDay as well. Currently we're looking for one Sponsor and up to five Supporters. This is especially good for companies looking to hire PostgreSQL DBAs.

pgDay SF 2015 is sponsored by Google and PostgreSQL Experts Inc..

Posted by Josh Berkus on 2014-11-18 at 06:28:51

Tuesday night David Fetter and I are going to present all about the cool new aggregation features in 9.4 and 9.5, including WITHIN GROUP, FILTER, GROUPING SETS, and more.

Posted by Josh Berkus on 2014-11-18 at 06:28:51

Tuesday night David Fetter and Josh Berkus are going to present all about the cool new aggregation features in 9.4 and 9.5, including WITHIN GROUP, FILTER, GROUPING SETS, and more.

Posted by Tomas Vondra on 2014-11-18 at 03:00:00

On Linux, a "compiler" is usually a synonym to gcc, but clang is gaining more and more adoption. Over the years, phoronix published several articles comparing of performance of various clang and gcc versions, suggesting that while clang improves over time, gcc still wins in most benchmarks - except maybe "compilation time" where clang is a clear winner. But none of the benchmarks is really a database-style application, so the question is how much difference can you get by switching a compiler (or a compiler version). So I did a bunch of tests, with gcc versions 4.1-4.9, clang 3.1-3.5, and just for fun with icc 2013 and 2015. And here are the results.

I did two usual types of tests - pgbench, representing a transactional workload (lots of small queries), and a subset of TPC-DS benchmark, representing analytical workloads (a few queries chewing large amounts of data).

I'll present results from a machine with i5-2500k CPU, 8GB RAM and an SSD drive, running Gentoo with kernel 3.12.20. I did rudimentary PostgreSQL tuning, mostly by tweaking postgresql.conf like this:

shared_buffers = 1GB
work_mem = 128MB
maintenance_work_mem = 256MB
checkpoint_segments = 64
effective_io_concurrency = 32

I do have results from another machine, but in general it confirms the results presented here. The PostgreSQL was compiled like this

./configure --prefix=...
make install

i.e. nothing special (no custom tweaks, etc.). The rest of the system is compiled with gcc 4.7.

pgbench

I did pgbench with three dataset sizes - small (~150MB), medium (~25% RAM) and large (~200% RAM). For each scale I ran pgbench with 4 clients (which is the number of cores on the CPU) for 15 minutes, repeated 3x, and averaged the results. And all this in read-write and read-only mode.

The first observation is that once you start hitting the drives, compiler makes absolutely no measurable difference. That makes results from all the read-write tests (for all scales) uninteresting, as well as the read-only test on large dataset - for all these tests the I/O i

[continue reading]

Posted by Josh Berkus on 2014-11-17 at 23:49:12

main-image

SFPUG has a new home page, here. This page will be used to post our meetups, as well as local events including the upcoming pgDay SF. It runs on Ghost and PostgreSQL 9.3.

Stay tuned!

Posted by Josh Berkus on 2014-11-17 at 23:49:12

main-image

SFPUG has a new home page, here. This page will be used to post our meetups, as well as local events including the upcoming pgDay SF. It runs on Ghost and PostgreSQL 9.3.

Stay tuned!

JD wrote:

A couple of weeks ago I spoke at Bellingham Linux User Group with a talk entitled: An Evening with PostgreSQL. It was an enlightening talk for me because it was the first time, in a long time, that I have spoke to a non-postgresql community. Most of the people in attendance were Linux Users of course but also a few Mongo as well as MySQL users. I was asked questions such as, "Why would I use PostgreSQL over MySQL?". To be honest, I didn't even realize that was still a question but it opens up a huge advocacy opportunity.

read more

I have seen many customers coming up with below errors and asking for root cause. They wonder with the reasons behind it and say "Ah, its because of a bad hardware or a kernel.. I hate it, just want to know how to avoid these"

Lets start with this:
ERROR: could not read block 4285 in file "base/xxxxx/xxxx": read only 0 of 8192 bytes

... have rarely been known to be caused by bugs in specific Linux kernel versions.  Such errors are more often caused by bad hardware, anti-virus software, improper backup/restore procedures, etc.

One very common cause for such corruption lately seems to be incorrect backup and restore. (For example, failure to exclude or delete all files from the pg_xlog directory can cause problems like this, or using filesystem "snapshots" which aren't really atomic.) The history of the database, including any recoveries from backup or promotion of replicas to primary, could indicate whether this is a possible cause. Faulty hardware is another fairly common cause, including SANs. If fsync or full_page_writes were ever turned off for the cluster, that could also explain it.

It is good to establish the cause where possible, so that future corruption can be avoided, but to recover the cluster should normally be dumped with pg_dumpall and/or pg_dump, and restored to a freshly created (via initdb) cluster on a machine which is not suspected of causing corruption. It may be possible to fix up or drop and recreate individual damaged objects, but when doing that it can be hard to be sure that the last of the corruption (or the cause of the initial corruption) has been eliminated.

Here is a nice article to find why-is-my-database-corrupted from Robert Haas.

Errors like this:
ERROR: unexpected data beyond EOF in block xxxx of relation pg_tblspc/xxxx
HINT: This has been seen to occur with buggy kernels; consider updating your system.

... are most often caused by Linux kernel bugs. If you are seeing both types of errors suggests it is likely that a hardware problem (like bad RAM) may be the cause

[continue reading]

Posted by Magnus Hagander in Redpill Linpro on 2014-11-16 at 13:56:00

It's that time of the year again - we've wrapped PGConf.EU 2014, and I've just closed the feedback system, so it's time to take a look at what's been said.

We're keeping fairly consistent numbers with previous years, which is something we are definitely happy with. We did have a slight drop in "overall view", since this year we had 8% ranking us as 3, a worse score than we saw last year, and we had a couple of fewer people voting 5. And a slight shift from 5 to 4 on the programme. The numbers are still good of course, but since we had a tiny drop last year as well, we need to step our game back up for next year!

http://photos.smugmug.com/photos/i-hd7xTCK/0/O/i-hd7xTCK.png http://photos.smugmug.com/photos/i-XgSPD3S/0/O/i-XgSPD3S.png

This year we had a slightly bigger spread of how users identify themselves, seeing most categories chip away a little on DBAs and Developers, but they are still definitely the dominating categories. We also have a lot of returning developers - it's cool to see so many people who have been to every one of our events so far, combined with a full 25% being first-time attendees!

http://photos.smugmug.com/photos/i-RgCv9Gs/0/O/i-RgCv9Gs.png http://photos.smugmug.com/photos/i-MVQcnbt/0/O/i-MVQcnbt.png


Continue reading "PGConf.EU 2014 - feedback is in"

PGDay 2015 San Francisco will be held March 10th 2015 in Hyatt, San Franciso Airport, Burlingame, CA (Just outside of San Francisco). This year PGDay will be hosted along-side Free and Open Source Geospatial North America (FOSS4GNA) conference 2015 which runs March 9th-12th and EclipseCon NA 2015. Speaker submissions for FOSS4GNA 2015 and EclipseCon NA 2015 will end this Monday November 17th, 2015.


Continue reading "FOSS4GNA 2015 and PGDay San Francisco March 2015"
Great tool for replication and using by most of the organizations for replicating b/w different versions of postgres and can do upgrade with minimum downtime(almost 0 sometimes??) and popular tool which I am weak at ... yeah, its SLONY. It was always in my TODO list. So finally, I learned it.. got some hands on.

And I found answer for my own question.. "when was the last time you did something NEW for the first time".

I see a lot posts on installing and configuring Slony, however when I had to do a switchover and failover of Slony for a customer, faced hard time with google. Ok, so better I have it somewhere, why not here !!!.

Customer wanted to upgrade their database from PostgreSQL 8.4 to 9.2. They have around 397 tables and wanted one set for each table hence 397 sets. Just for convenience, I'm taking 10 tables/sets to explain.

Create tables using below script in source(8.4) and target(9.3) databases:
source=# select 'create table slony_tab_'||generate_series(1,10)||'(t int primary key);';
Inserted values using below script in source database:
source=# select 'insert into slony_tab_'||a||' values (generate_series(1,100));' from generate_series(1,10) a;

Configure Slony using below scripts:

1. Init cluser script
#################################################################################################

cluster name = shadow;
node 1 admin conninfo='host=127.0.0.1 dbname=source user=postgres port=5434';
node 2 admin conninfo='host=127.0.0.1 dbname=target user=postgres port=5432';
init cluster (id = 1 , comment = 'Primary Node For the Slave postgres');
#Setting Store Nodes ...
store node (id = 2, event node = 1 , comment = 'Slave Node For The Primary postgres');
#Setting Store Paths ...
echo 'Stored all nodes in the slony catalogs';
store path(server = 1 , client = 2, conninfo = 'host=127.0.0.1 dbname=source user=postgres port=5434');
store path(server = 2, client = 1 , conninfo = 'host=127.0.0.1 dbname=target user=postgres port=5432');
echo 'Stored all Store Paths for Failover and Switchover into slony catalog

[continue reading]

Recently Josh Berkus published an article Why you need to avoid Linux Kernel 3.2

Group photo PGDayITNovember 7th 2014 was the eight Italian PostgreSQL Day, the national event dedicated to the promotion of the world’s most advanced open source database. The Italian edition is one of the most enduring in the whole Postgres community (the first one took place in July 2007) and the results of the activity of a very established non profit organisation such as ITPUG (Italian PostgreSQL Users Group).

The Italian PGDay took place in Prato, historical location for this event, in the premises of the Prato campus (PIN) of the University of Florence. And for the first time, the attendance of the event went over 100 people, with a final counting of 124 registered people (including speakers and staff). I was also extremely happy to notice a relevant presence of women at PGDay – I believe around 10%.

It was a pleasure to have international speakers like Magnus and Simon, in Prato for the nth time, as well as a new entry like Mladen Marinovic from Croatia. There were 14 talks in total, spread in two parallel sessions, and an interactive training session (ITPUG labs) in the second room.

I was delighted to deliver the opening keynote, a summary of my experience and relationship with PostgreSQL from both a community and professional level. It was focused on us, knowledge workers, that can decide to invest in open source for our continuous improvement. And what better than studying (as well as teaching in schools) software like Linux and PostgreSQL? I then quickly outlined the most common objections towards the adoption of PostgreSQL (including the funniest or more depressing ones) that I have encountered so far in my career (e.g.: “I just want to know: Can Postgres manage millions of records?”). Then unrolled the reasons why I believe choosing to adopt PostgreSQL now is the most wise and strategic choice/decision that can be made for a data management solution.

I want also to thank some important Italian companies that decided to come out and publicly said why Postgres is the right choice for their daily management of data, thei

[continue reading]

Posted by Joshua Tolley in EndPoint on 2014-11-12 at 23:33:00

From Flickr user Jitze Couperus

When debugging a problem, it's always frustrating to get sidetracked hunting down the relevant logs. PostgreSQL users can select any of several different ways to handle database logs, or even choose a combination. But especially for new users, or those getting used to an unfamiliar system, just finding the logs can be difficult. To ease that pain, here's a key to help dig up the correct logs.

Where are log entries sent?

First, connect to PostgreSQL with psql, pgadmin, or some other client that lets you run SQL queries, and run this:
foo=# show log_destination ;
 log_destination 
-----------------
 stderr
(1 row)
The log_destination setting tells PostgreSQL where log entries should go. In most cases it will be one of four values, though it can also be a comma-separated list of any of those four values. We'll discuss each in turn.

SYSLOG

Syslog is a complex beast, and if your logs are going here, you'll want more than this blog post to help you. Different systems have different syslog daemons, those daemons have different capabilities and require different configurations, and we simply can't cover them all here. Your syslog may be configured to send PostgreSQL logs anywhere on the system, or even to an external server. For your purposes, though, you'll need to know what "ident" and "facility" you're using. These values tag each syslog message coming from PostgreSQL, and allow the syslog daemon to sort out where the message should go. You can find them like this:
foo=# show syslog_facility ;
 syslog_facility 
-----------------
 local0
(1 row)

foo=# show syslog_ident ;
 syslog_ident 
--------------
 postgres
(1 row)
Syslog is often useful, in that it allows administrators to collect logs from many applications into one place, to relieve the database server of logging I/O overhead (which may or may not actually help anything), or any number of other interesting rearrangements of log data.

EVENTLOG

For PostgreSQL systems running on Windows, you can send log entries to the Windo

[continue reading]

Posted by Joshua Drake in CommandPrompt on 2014-11-12 at 16:00:08
I am not a big fan of AWS. It is a closed platform. It is designed to be the Apple of the Cloud to the Eve of Postgres users. That said, customers drive business and some of our customers use AWS, even if begrudgingly. Because of these factors we are getting very good at getting PostgreSQL to perform on AWS/EBS, albeit with some disclosures:
  1. That high IO latency is an acceptable business requirement.
  2. That you are willing to spend a lot of money to get performance you can get for less money using bare metal: rented or not. Note: This is a cloud issue not an AWS issue.

Using the following base configuration (see adjustments for each configuration after the graphic):

port = 5432                             
max_connections = 500                   
ssl = true                              
shared_buffers = 4GB                    
temp_buffers = 8MB                      
work_mem = 47MB                         
maintenance_work_mem = 512MB            
wal_level = hot_standby                 
synchronous_commit = on         
commit_delay = 0                        
commit_siblings = 5                     
checkpoint_segments = 30               
checkpoint_timeout = 10min              
checkpoint_completion_target = 0.9      
random_page_cost = 1.0                  
effective_cache_size = 26GB

Each test was run using pgbench against 9.1 except for configuration 9 which was 9.3:

pgbench -F 100 -s 100 postgres -c 500 -j10 -t1000 -p5433

Here are some of our latest findings:

The AWS configuration is:

16 Cores
30G of memory (free -h reports 29G)
(2) PIOPS volumes at 2000 IOPS a piece.
The PIOPS volumes are not in A RAID and are mounted separately.
The PIOPS volumes are formatted with xfs and default options
The PIOPS volumes were warmed.
  1. Configuration 1:
    $PGDATA and pg_xlog on the same partition
    synchronous_commit = on
  2. Configuration 2:
    $PGDATA and pg_xlog on the same partition
    synchronous_commit = off
  3. Configuration 3:
    $PGDATA and pg_xlog on the same partition
    synchronous_commit = off
    commit_delay = 100000
    commit_sib

[continue reading]

A new index type, called BRIN. or Block Range INdex is showing up in PostgreSQL 9.5, introduced by this commit:

commit: 7516f5259411c02ae89e49084452dc342aadb2ae
author: Alvaro Herrera <alvherre@alvh.no-ip.org>
date: Fri, 7 Nov 2014 16:38:14 -0300
BRIN: Block Range Indexes

BRIN is a new index access method intended to accelerate scans of very
large tables, without the maintenance overhead of btrees or other
traditional indexes.  They work by maintaining "summary" data about
block ranges.  Bitmap index scans work by reading each summary tuple and
comparing them with the query quals; all pages in the range are returned
in a lossy TID bitmap if the quals are consistent with the values in the
summary tuple, otherwise not.  Normal index scans are not supported
because these indexes do not store TIDs.

By nature, using a BRIN index for a query scan is a kind of mix between a sequential scan and an index scan because what such an index scan is storing a range of data for a given fixed number of data blocks. So this type of index finds its advantages in very large relations that cannot sustain the size of for example a btree where all values are indexed, and that is even better with data that has a high ordering across the relation blocks. For example let's take the case of a simple table where the data is completely ordered across data pages like this one with 100 million tuples:

=# CREATE TABLE brin_example AS SELECT generate_series(1,100000000) AS id;
SELECT 100000000
=# CREATE INDEX btree_index ON brin_example(id);
CREATE INDEX
Time: 239033.974 ms
=# CREATE INDEX brin_index ON brin_example USING brin(id);
CREATE INDEX
Time: 42538.188 ms
=# \d brin_example
Table "public.brin_example"
 Column |  Type   | Modifiers
--------+---------+-----------
 id     | integer |
Indexes:
    "brin_index" brin (id)
    "btree_index" btree (id)

Note that the creation of the BRIN index was largely faster: it has less index entries to write so it generates less traffic. By default, 128 blocks are used to calculate a range of value

[continue reading]

Posted by Brian Dunavant in OmniTI on 2014-11-11 at 22:37:47

I wrote an article on my company blog on using Postgres’ writable CTE feature to improve performance and write cleaner code. The article is available at:

http://omniti.com/seeds/writable-ctes-improve-performance

I’ve been told by a number of people I should expand it further to include updates and the implications there and then consider doing a talk on it at a Postgres conference.   Hrm….

Posted by Bruce Momjian in EnterpriseDB on 2014-11-11 at 22:30:01

I just returned from two weeks in Russia, and I am happy to report that Postgres is experiencing strong growth there. I have regularly complained that Russian Postgres adoption was lagging, but the sanctions have tipped the scales and moved Russia into Postgres-hyper-adoption mode. New activities include:

As part of my visit I spoke at EnterpriseDB partner LANIT. The hour-long presentation was recorded and covers:

  • The Postgres community development process
  • Comparison of Postgres to proprietary databases
  • The future direction of Postgres

Seriously, you should. You should submit a talk proposal to PGConf US 2015 – the worst thing that will happen is the talk committee will say “no” and offer a bunch of reasons to help you get your talk approved next year! Believe it or not, speaking at a PostgreSQL conference is a great way to help the community at large, and I hope this personal story I am going to share will shed some light as to why.

read more

Posted by Feng Tian on 2014-11-11 at 05:31:00
Today is the biggest internet shopping day.  If you read Chinese, you may find that Alibaba posts job openings for PostgreSQL dba from time to time.   It would be interesting to find out how many transactions and/or analytic workloads inside Alibaba is handled by PostgreSQL.

Back to TPCH.  Q16 is particularly tough for us to optimize.  Again, it is better to look at a simpler example,

ftian=# create table t as select x, x % 10 as i, x % 100 as j, 'random string' || (x % 100) as s from generate_series(1, 1000000) x;
SELECT 1000000
Time: 784.034 ms

ftian=# select count(distinct i) from t group by s, j;
Time: 7991.994 ms

Grouping one million rows in 8 sec, kind of slow.  So let's try, 

ftian=# select count(i) from t group by s, j;
Time: 99.029 ms

So it must be count(distinct).  Well, we wasted quite some time chasing the distinct.  Profiling, before optimizing, we should've known better.  Real reason is the select count(distinct) will trigger a sort agg instead of hash agg.   Distinct, will need some resource in aggregate function and it is very hard to cap the consumption in hash agg.   So a sort agg is used, which actually is a fine plan. 

ftian=# explain select count(distinct i) from t group by s, j;
                               QUERY PLAN                               
------------------------------------------------------------------------
 GroupAggregate  (cost=117010.84..127110.84 rows=10000 width=23)
   ->  Sort  (cost=117010.84..119510.84 rows=1000000 width=23)
         Sort Key: s, j
         ->  Seq Scan on t  (cost=0.00..17353.00 rows=1000000 width=23)
(4 rows)

Is sort that slow?

ftian=# select count(distinct i) from t group by j, s;
Time: 1418.938 ms

5x faster.  What is going on?   Note that we switched grouping order from group by s, j to group by j, s.    These two queries are equivalent, except the order of result -- well, if you really care about ordering, you should have an order by clause.  

The true cost lies in string comparison with collation.   The database uses utf-8 encoding a

[continue reading]