PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Craig Kerstiens on 2014-10-01 at 07:00:00

Most web applications will add/remove columns over time. This is extremely common early on and even mature applications will continue modifying their schemas with new columns. An all too common pitfall when adding new columns is setting a not null constraint in Postgres.

Not null constraints

What happens when you have a not null constraint on a table is it will re-write the entire table. Under the cover Postgres is really just an append only log. So when you update or delete data it’s really just writing new data. This means when you add a column with a new value it has to write a new record. If you do this requiring columns to not be null then you’re re-writing your entire table.

Where this becomes problematic for larger applications is it will hold a lock preventing you from writing new data during this time.

A better way

Of course you may want to not allow nulls and you may want to set a default value, the problem simply comes when you try to do this all at once. The safest approach at least in terms of uptime for your table –> data –> appliction is to break apart these steps.

  1. Start by simply adding the column with allowing nulls but setting a default value
  2. Run a background job that will go and retroactively update the new column to your default value
  3. Add your not null constraint.

Yes it’s a few extra steps, but I can say from having walked through this with a number of developers and their apps it makes for a much smoother process for making changes to your apps.

Posted by Payal Singh in OmniTI on 2014-09-30 at 20:12:00
A while ago a got a task to change the owner of a group of functions. While the number of functions wasn't too high, it was still enough that I began looking at ways to change the owner in a batch, instead of having to manually change it for each function.
In case of other database objects, changing owners is fairly simple. It can be accomplished in two steps:

1. Get list of all tables/sequences/views:

payal@testvagrant:~$ psql -qAt -c "SELECT 'ALTER TABLE '||schemaname||'.'||tablename||' OWNER TO new_owner;' FROM pg_tables WHERE schemaname = 'payal'" > test.txt

This will give us the following file:

payal@testvagrant:~$ cat test.txt
ALTER TABLE payal.new_audit_users OWNER TO new_owner;
ALTER TABLE payal.v_count_states OWNER TO new_owner;
ALTER TABLE payal.test OWNER TO new_owner;
ALTER TABLE payal.old_audit_users OWNER TO new_owner;
ALTER TABLE payal.old_audit OWNER TO new_owner;
ALTER TABLE payal.adwords_dump OWNER TO new_owner;
ALTER TABLE payal.affiliate OWNER TO new_owner;
ALTER TABLE payal.new_affiliate OWNER TO new_owner;
ALTER TABLE payal.partest OWNER TO new_owner;
ALTER TABLE payal.audit_test OWNER TO new_owner;
ALTER TABLE payal.batatawada OWNER TO new_owner;
ALTER TABLE payal.dup_key_err OWNER TO new_owner;
ALTER TABLE payal.new_audit OWNER TO new_owner;


2. Now all that is needed is to run this file with psql:

payal@testvagrant:~$ psql < test.txt
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE



That simple! An alternate solution can be found here.

However, things get a little tricky with functions due to argument specifications. Basically, one needs to specify a function's arguments along with the function name to alter it. For example:

ALTER FUNCTION hstore.tconvert(text, text) OWNER TO hstore;

Using the method described above to changed owner for tables, you cannot get the function arguments from pg_proc. Instead, postgres has a function pg_get_function_identity_arguments(func_oid) t

[continue reading]

Posted by Tomas Vondra on 2014-09-30 at 15:00:00

This is the post I promised last week, explaining a few common issues with memory contexts. The issues mostly lead to excessive overhead, i.e. excessive usage of resources - usually memory. And that's exactly what this post is about, so whenever I say "overhead" you can read "excessive memory usage." I will also try to give advices on how to avoid those issues or minimize the impact.

As I briefly mentioned when explaining allocation set internals, there are three main sources of overhead:

  • chunk headers - for large chunks, this gets negligible
  • unused space (because of 2^N chunks) - expected ~25% for randomly sized chunks, but can get much worse
  • reuse not working efficiently - we'll see some examples how this can happen

If you haven't read that post, it's probably the right time to do that. Also, if you want to learn more about memory management and allocator implementations, there's a great post at IBM developerWorks explaining it quite well and also listing many interesting additional resources (e.g. various malloc implementations).

So let's see some usual (but somehow unexpected) examples of palloc overhead.

Allocating many tiny pieces of memory

Say we need to store a lot of small elements, a few bytes each - e.g. 64-bit integers, words in an ispell dictionary or something like that. You could do a separate palloc call for each element (and keep the pointer), but in that case you'll pay the 'header' price for each element. For example by doing this

int64 *items[1024];
for (i = 0; i < 1024; i++)
    items[i] = (int*)palloc(sizeof(int64));

you might think you allocated ~8kB of memory (1024 x 8B), but in fact this prepends each value with 16B header. So you end up with about 24kB (not counting the items array, which needs additional 8kB), which is ~3x the requested amount. If we asked for smaller values (e.g. 4B integers), the overhead would be even larger because 8B is the smallest chunk allocated by palloc (as mentioned in the previous post).

Let's assume all the elements have about the same life span / scop

[continue reading]

Posted by gabrielle roth on 2014-09-30 at 01:58:19
Yep, I’m headed to Madrid! I’ll be reprising my Autovacuum talk from SCALE, and am really looking forward to meeting some new folks. I’ll be helping out at the conference in some capacity, so come say hello. For reference, the conference schedule is here: http://www.postgresql.eu/events/schedule/pgconfeu2014/ Other talks I plan to attend: Wednesday: Performance Archaeology sounds […]
Posted by Josh Berkus in pgExperts on 2014-09-29 at 23:49:00
In fact, you really need to avoid every kernel between 3.0 and 3.8.  While RHEL has been sticking to the 2.6 kernels (which have their own issues, but not as bad as this), Ubuntu has released various 3.X kernels for 12.04.  Why is this an issue?  Well, let me give you two pictures.

Here's private benchmark workload running against PostgreSQL 9.3 on Ubuntu 12.04 with kernel 3.2.0.  This is the IO utilization graph for the replica database, running a read-only workload:



Sorry for cropping; had to eliminate some proprietary information.  The X-axis is time. This graph shows MB/s data transfers -- in this case, reads -- for the main database disk array.  As you can see, it goes from 150MB/s to over 300MB/s.  If this wasn't an SSD array, this machine would have fallen over. 

Then we upgraded it to kernel 3.13, and ran the same exact workload as the previous test.  Here's the new graph:



Bit of a difference, eh?  Now we're between 40 and 60MB/s for the exact same workload: an 80% reduction in IO.   We can thank the smart folks in the Linux FS/MM group for hammering down a whole slew of performance issues.

So, check your Postgres servers and make sure you're not running a bad kernel!
EDB recently blogged new results from benchmarking PostgreSQL 9.4 and Mongodb

The newest round of performance comparisons of PostgreSQL and MongoDB produced a near repeat of the results from the first tests that proved PostgreSQL can outperform MongoDB. The advances Postgres has made with JSON and JSONB have transformed Postgres’ ability to support a document database.

50Mil-chart

This blog motivates me to write this post to point EDB for another set of benchmarks with more operators included provided by jsquery.

After PGCon-2014, where we presented first version of jsquery, we made several enhancements worth to mention (see my slides from Japan (PDF).

1). We added simple built-in jsquery optimizer, which recognizes non-selective part of a query and push it to recheck, so recheck works like a FILTER.
2) If you don't like, how optimizer works, you may use HINTING (well, jsquery is an extension after all).

We understand, that this is just a temporal solution for impatient people wante to use jsonb in 9.4, which, honestly, has rather primitive support. Yes, we just didn't have time to do all the best, we even missed several useful functions we did for nested hstore. Hope, we'll have contrib/jsonbx soon. Jsquery was our experiment to play with indexes and the set of operations was chosen especially from this point of view. We are working on better approach, where jsquery will be implemented on sql-level (see this post (in russian) and eventually, after someone implements statistics for jsonb, optimizer wil do its work !

More details are below.

1). Optimizer. Jsquery is opaque to optimizer, so original version had very distressed problem (we demonstrated this at PGCon-2014:
select count(*) from jr where jr @@ ' similar_product_ids && ["B000089778"] 
AND product_sales_rank( $ > 10000 AND $  < 20000)';


runs 129.309 ms, while
 
select count(*) from jr where jr @@  ' similar_product_ids && ["B000089778"]';


took only 0.394 ms !

product_sales_rank( $ > 10000 AND $ < 20000) is non-selective and better not to use index

[continue reading]

The following simple commit has improved the verbose logs of pg_dump (the ones that can be invocated with option -v and that are useful to keep a log trace when using cron jobs kicking pg_dump), by making the schema names of the relations dumped show up as well:

commit: 2bde29739d1e28f58e901b7e53057b8ddc0ec286
author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
date: Tue, 26 Aug 2014 11:50:48 +0300
Show schema names in pg_dump verbose output.

Fabrízio de Royes Mello, reviewed by Michael Paquier

Let's take the case of a simple schema, with the same table name used on two different schemas:

=# CREATE SCHEMA foo1;
CREATE SCHEMA
=# CREATE SCHEMA foo2;
CREATE SCHEMA
=# CREATE TABLE foo1.dumped_table (a int);
CREATE TABLE
=# CREATE TABLE foo2.dumped_table (a int);
CREATE TABLE

With pg_dump bundled with 9.4 and older versions, each relation cannot be really identified (think about the case of having multiple versions of an application schema stored in the same database, but with different schema names):

$ pg_dump -v 2>&1 >/dev/null | grep dumped_table | grep TABLE
pg_dump: creating TABLE dumped_table
pg_dump: creating TABLE dumped_table
pg_dump: setting owner and privileges for TABLE dumped_table
pg_dump: setting owner and privileges for TABLE dumped_table
pg_dump: setting owner and privileges for TABLE DATA dumped_table
pg_dump: setting owner and privileges for TABLE DATA dumped_table

Now with 9.5, the following logs are showed.

$ pg_dump -v 2>&1 >/dev/null | grep dumped_table | grep TABLE
pg_dump: creating TABLE "foo1"."dumped_table"
pg_dump: creating TABLE "foo2"."dumped_table"
pg_dump: setting owner and privileges for TABLE "foo1"."dumped_table"
pg_dump: setting owner and privileges for TABLE "foo2"."dumped_table"
pg_dump: setting owner and privileges for TABLE DATA "foo1"."dumped_table"
pg_dump: setting owner and privileges for TABLE DATA "foo2"."dumped_table"

Note as well the quotes put around the relation and schema names, making this output more consistent with the other utilities in PostgreS

[continue reading]

Posted by Andrew Dunstan in pgExperts on 2014-09-26 at 19:38:00
I see that a new release of MySQL has been made, and they are touting the fact that they are allowing the omission of unaggregated items in a SELECT list from a GROUP BY clause, if they are functionally dependent on the items in the GROUP BY clause. This would happen, for example, where the items in the GROUP BY list form a primary key. It's a nice feature.

It's also a feature that PostgreSQL has had for three years.
Posted by Andrew Dunstan in pgExperts on 2014-09-26 at 12:54:00
Say you have a file that consists of one JSON document per line. How can you import it into a table easily? This is a problem I was called on to help a colleague with yesterday. Using COPY is the obvious answer, but this turns out not to be quite so simple to do.

In text mode, COPY will be simply defeated by the presence of a backslash in the JSON. So, for example, any field that contains an embedded double quote mark, or an embedded newline, or anything else that needs escaping according to the JSON spec, will cause failure. And in text mode you have very little control over how it works - you can't, for example, specify a different ESCAPE character. So text mode simply won't work.

CSV mode is more flexible, but poses different problems. Here, instead of backslash causing a problem, QUOTE characters can cause a problem. First, JSON itself uses the default QUOTE character (double quote) to quote all string values. But if we change use an alternative like single quote, then the presence of any single quote in the JSON leads us into difficulties. Second, JSON also uses the default DELIMITER (comma) extensively. So, clearly we need to use something else for the QUOTE and DELIMITER options. (By default, in CSV mode, the ESCAPE character is the same as the QUOTE character, so we don't need to worry about it separately.)

What we in fact want is  to specify QUOTE and DELIMITER characters that can't appear at all in the JSON. Then the whole line will be seen as a single unquoted datum, which is exactly what we want. There is a small set of single-byte characters that happen to be illegal in JSON, so we can be sure that choosing them for these options should do the right thing with any legal JSON. These are the control characters. So the solution we came up with looks like this:
copy the_table(jsonfield) 
from '/path/to/jsondata'
csv quote e'\x01' delimiter e'\x02';
Of course, if the JSON has embedded newlines as punctuation, this wont work. So it's important that you configure whatever is producing the JSON not to i

[continue reading]


pg_xlog with a dummy file
(image by Andrew Malone)

Running out of disk space in the pg_xlog directory is a fairly common Postgres problem. This important directory holds the WAL (Write Ahead Log) files. (WAL files contain a record of all changes made to the database - see the link for more details). Because of the near write‑only nature of this directory, it is often put on a separate disk. Fixing the out of space error is fairly easy: I will discuss a few remedies below.

When the pg_xlog directory fills up and new files cannot be written to it, Postgres will stop running, try to automatically restart, fail to do so, and give up. The pg_xlog directory is so important that Postgres cannot function until there is enough space cleared out to start writing files again. When this problem occurs, the Postgres logs will give you a pretty clear indication of the problem. They will look similar to this:


PANIC:  could not write to file "pg_xlog/xlogtemp.559": No space left on device
STATEMENT:  insert into abc(a) select 123 from generate_series(1,12345)
LOG:  server process (PID 559) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: insert into abc(a) select 123 from generate_series(1,12345)
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally an
d possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2014-09-16 10:36:47 EDT
LOG:  database system was not properly shut down; automatic recovery in progress
FATAL:  the database system is in recovery mode
LOG:  redo starts at 0/162FE44
LOG:  redo done at 0/1FFFF78
LOG:  last completed transaction was at log time 2014-09-16 10:38:50.010177-04
PANIC

[continue reading]

PostGIS raster has so so many functions and probably at least 10 ways of doing something some much much slower than others. Suppose you have a raster, or you have a raster area of interest — say elevation raster for example, and you want to know the distinct pixel values in the area. The temptation is to reach for ST_Value function in raster, but there is a much much more efficient function to use, and that is the ST_ValueCount function.

ST_ValueCount function is one of many statistical raster functions available with PostGIS 2.0+. It is a set returning function that returns 2 values for each row: a pixel value (val), and a count of pixels in the raster that have that value. It also has variants that allow you to filter for certain pixel values.

This tip was prompted by the question on stackexchange How can I extract all distinct values from a PostGIS Raster?

Continue Reading by clicking title hyperlink ..
Posted by Josh Berkus in pgExperts on 2014-09-25 at 18:01:00
For the past couple of years, SFPUG has tried (stress on "tried") to share our meetups and talks with the world via streaming and archival video.  At first, this was wonderful because it allowed folks without a good local user group to tune in and participate, among other things helping launch our Beijing PUG.  This is now coming to an end because it is simply too hard to do, and nobody is stepping up to make it happen.

First, we have the issue that there simply aren't good platforms for streaming for a low-budget nonprofit anymore.  JustinTV is off the air, Ustream has scads of obnoxious ads which interrupt the talk (or costs $100/month for "pro"), and Google Hangouts on Air simply don't work.  For proof of the latter, try to watch to the end of this presentation.  Alternatives requires setting up your own streaming website gateway and video archives, and I simply don't have time.

And there we're getting to the big reason why SFPUG video will stop: nobody has volunteered to do it.  I have my hands full scheduling and running the meetings.  I've called for video volunteers from the meetup group, but nobody has stepped forwards.  So, no more video.

That out of the way, we will be reposting a bunch of archival videos from  JustinTV onto YouTube.  Announcements when they're all up.



I inserted rows from a couple of child tables of a large partitioned table to a new single table. The column ordering was preserved. Found that although the size of child tables as a whole was more than the single, non-partitioned table with the same data, the difference itself was in KB, and hence not very significant.

First, I inserted rows from 3 child tables (each ~700-800MB) into my new non-partitioned table:

$ insert into payal.hits3 select * from tracking.hits_p2014_08_25;
INSERT 0 11992623
$ insert into payal.hits3 select * from tracking.hits_p2014_08_26;
INSERT 0 13127131
$ insert into payal.hits3 select * from tracking.hits_p2014_08_27;
INSERT 0 13095656

Then, I did a vacuum full for each of the child tables, and my new non-partitioned table:

$ vacuum full verbose tracking.hits_p2014_08_26;
INFO:  vacuuming "tracking.hits_p2014_08_26"
INFO:  "hits_p2014_08_26": found 0 removable, 13127131 nonremovable row versions in 111578 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 3.61s/8.79u sec elapsed 39.03 sec.
VACUUM
$ vacuum full verbose tracking.hits_p2014_08_27;
INFO:  vacuuming "tracking.hits_p2014_08_27"
INFO:  "hits_p2014_08_27": found 0 removable, 13095656 nonremovable row versions in 111268 pages
INFO:  "hits_p2014_08_27": found 0 removable, 13095656 nonremovable row versions in 111268 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 15.85s/8.12u sec elapsed 39.19 sec.
VACUUM
$ vacuum full verbose tracking.hits_p2014_08_25;
INFO:  vacuuming "tracking.hits_p2014_08_25"
INFO:  "hits_p2014_08_25": found 0 removable, 11992623 nonremovable row versions in 99752 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 2.15s/7.57u sec elapsed 49.11 sec.
VACUUM
$ vacuum full verbose payal.hits3;
INFO:  vacuuming "payal.hits3"
INFO:  "hits3": found 0 removable, 38215410 nonremovable row versions in 317945 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU 11.78s/27.62u sec elapsed 137.90 sec.
VACUUM

Lets see what pg_size_pretty returns:

$ select pg_size_

[continue reading]

Posted by US PostgreSQL Association on 2014-09-24 at 19:03:44

In an effort to utilize infrastructure that is far more capable than what we can as volunteers provide, PgUS has moved to Google Apps. As a 501c3 we are offered a host of benefits that will help us grow the organization as well as increase collaboration and communication among the board and corporation members. We expect to be performing technical and advocacy hangouts often, as well as configuring a PgUS YouTube channel to put forth video resources for those wanting to know and learn more about PostgreSQL.

Look for more news in the future. It is exciting times!

As United States PostgreSQL continues its support for PostgreSQL User Groups it is my pleasure to announce that the Dallas/Forth Worth PostgreSQL User Group has decided to become part of the PgUS family. The resolution passed with 100% consent and it is great to see them on board. You may visit the resolution here:

https://postgresql.us/node/144

read more

Posted by John Graber in EnterpriseDB on 2014-09-24 at 14:58:00
Last week, I had the opportunity to attend my first Postgres conference, that being +Postgres Open 2014.  Two days packed with a ridiculous amount of useful information and fantastic Postgres community interaction.  And now that the conference & sponsor t-shirts are washed, many new social media connections have been made, and the backlog of work email has been addressed —

Many Thanks

First off, the program committee deserves a veritable heap of congratulations and gratitude piled upon them from the attendees.  From the quality of the speakers to the well planned (and orchestrated) schedule, and even to the menu — everything was top notch. And therefore, I am calling you out by name, +Kris Pennella, +Gabrielle R, +Cindy Wise, +Stephen Frost, +Jonathan Katz, & +Selena Deckelmann.  Thanks for your time and effort!

The Talks

As with any (great) conference, the difficult part is the inability to attend multiple presentations at the same time, and thus being forced to choose.  The schedule this year offered no shortage of these situations.  But when your choices include community figures of the likes of +Bruce Momjian presenting on "Explaining the Postgres Query Optimizer" and +Simon Riggs with "How VACUUM works, and what to do when it doesn't", you simply can't go wrong.  Nonetheless, I wish I hadn't had to miss out on the talks given by +denish patel , +Álvaro Hernández Tortosa, +Gurjeet Singh, and  +Vibhor Kumar, to name but a few.

Fortunately, the talks were recorded and will be uploaded to the Postgres Open YouTube channel in about a month or so.  In the meantime, the presenters' slides are available on the Postgres Open wiki.


Not Just for DBAs

I particularly want to call +Dimitri Fontaine's talk, "PostgreSQL for developers" to the attention of developer crowd and as being an excellent example of why these conferences are not just for DBAs.  As a consultant, on virtually every job I see examples of poorly written SQL as being the root cause of a poorly performing database (even after proper tuning of config p

[continue reading]

I’ve always worked on PgJDBC, the JDBC Type 4 driver for PostgreSQL, with just a terminal, ant and vim. I recently had occasion to do some PgJDBC debugging work on Windows specifics so I set up Eclipse to avoid having to work on the Windows command prompt.

As the process isn’t completely obvious, here’s how to set up Eclipse Luna to work with the PgJDBC sources.


If you don’t have it already, download JDK 1.8 and Eclipse Luna.

Now download JDK 1.6 from Oracle (or install the appropriate OpenJDK). You’ll need an older JDK because the JDK doesn’t offer a way to mask out new classes and interfaces from the standard library when compiling for backward compatibility – so you may unwittingly use Java 7 or Java 8-only classes unless you target the Java 6 VM.

(If you want to work with the JDBC3 driver you need Java 5 instead, but that’s getting seriously obsolete now).

Now it’s time to import the sources and configure Eclipse so it knows how to work with them:

  1. Register JDK 1.6 with Eclipse:
    • Window -> Preferences, Java, Installed JREs, Add…
    • Set JRE Name to “JDK 1.6″
    • Set JRE Home to the install directory of the JDK – not the contained JRE. It’ll be a directory named something like jdk1.6.0_45 and it’ll have a jre subdirectory. Select the jdk1.6.0_45 directory not the jre subdir.
    • Finish…
    • and OK the preferences dialog.

  2. Import from git and create a project:
    • File -> Import, Git, Projects from Git, then follow the prompts for the repo
    • After the clone completes, Eclipse will prompt you to “Select a wizard for importing projects from git”. Choose Import as a General Project
    • Leave the project name as pgjdbc and hit Finish
  3. Configure ant
    • Get properties on the “pgjdbc” project
    • Go to the Builders tab
    • Press New…
    • In the resulting dialog’s Main tab set the Buildfile to the build.xml in the root of the PgJDBC checkout and set the base directory to the checkout root, e.g. buildfile = ${workspace_loc:/pgjdbc/build.xml} and base directory = ${workspace_loc:/pgjdbc}
    • In the Classpath tab, Add… the maven-ant-tasks jar, lib/maven-ant-t

[continue reading]

Posted by Leo Hsu and Regina Obe on 2014-09-23 at 18:55:00

At FOSS4G we gave two presentations. The videos from other presentations are FOSS4G 2014 album. I have to commend the organizers for gathering such a rich collection of videos. Most of the videos turned out very well and are also downloadable in MP4 in various resolutions. It really was a treat being able to watch all I missed. I think there are still some videos that will be uploaded soon. As mentioned lots of PostGIS/PostgreSQL talks (or at least people using PostGIS/PostgreSQL in GIS).


Continue reading "FOSS4G and PGOpen 2014 presentations"
Posted by Josh Berkus in pgExperts on 2014-09-23 at 17:51:00
One thing which we do a lot for clients is moving databases from one server to another via pg_dump and pg_restore.  Since this process often occurs during a downtime, it's critical to do the pg_dump and pg_restore as quickly as possible.  Here's a few tips:
  • Use the -j multiprocess option for pg_restore (and, on 9.3, for pg_dump as well).  Ideal concurrency is generally two less than the number of cores you have, up to a limit of 8.  Users with many ( > 1000) tables will benefit from even higher levels of concurrency.
  • Doing a compressed pg_dump, copying it (with speed options), and restoring on the remote server is usually faster than piping output unless you have a very fast network.
  • If you're using binary replication, it's faster to disable it while restoring a large database, and then reclone the replicas from the new database.  Assuming there aren't other databases on the system in replication, of course.
  • You should set some postgresql.conf options for fast restore.
"What postgresql.conf options should I set for fast restore?" you ask?  Well, let me add a few caveats first:
  • The below assumes that the restored database will be the only database running on the target system; they are not safe settings for production databases.
  • It assumes that if the pg_restore fails you're going to delete the target database and start over.
  • These settings will break replication as well as PITR backup.
  • These settings will require a restart of PostgreSQL to get to production settings afterwards.
shared_buffers = 1/2 of what you'd usually set
maintenance_work_mem = 1GB-2GB
fsync = off
synchronous_commit = off
wal_level = minimal
full_page_writes = off
wal_buffers = 64MB
checkpoint_segments = 256 or higher
max_wal_senders = 0
wal_keep_segments = 0
archive_mode = off
autovacuum = off
all activity logging settings disabled

Some more notes:
  • you want to set maintenance_work_mem as high as possible, up to 2GB, for building new indexes.  However, since we're doing concurrent restore, you don't want to get carried away; your limit sh

[continue reading]

Posted by Tomas Vondra on 2014-09-23 at 15:00:00

Last week I explained (or attempted to) the basics of memory contexts in PostgreSQL. It was mostly about the motivation behind the concept of memory contexts, and some basic overview of how it all works together in PostgreSQL.

I planned to write a follow-up post about various "gotchas" related to memory contexts, but as I was writing that post I ended up explaining more and more details about internals of AllocSet (the only memory context implementation in PostgreSQL). So I've decided to split that post into two parts - first one (that you're reading) explains the internals of allocation sets. The next post will finally deal with the gotchas.

The level of detail should be sufficient for understanding the main principles (and some tricks) used in the AllocSet allocator. I won't explain all the subtle details - if you're interested in that (and the code is quite readable, if you understand the purpose), please consult the actual code in aset.c. Actually, it might be useful to keep that file opened and read the related functions as I explain what palloc and pfree do (at the end of this post).

blocks

The introductory post mentioned that Allocation Sets are based on "block pools" - that means the allocator requests memory in large blocks (using malloc), and then slices these blocks into smaller pieces to handle palloc requests.

The block sizes are somewhat configurable, but by default the allocator starts with 8kB blocks and every time it needs another block it doubles the size up to 8MB (see the constants at the end of memutils.h). So it first allocates 8kB block, then 16kB, 32kB, 64kB, ... 8MB (and then keeps allocating 8MB blocks). You may change different sizes, but the minimum allowed block size is 1kB, and the block sizes are always 2^N bytes.

To make the management easier, each block is decorated with a header, represented by this structure defined in aset.c:

typedef struct AllocBlockData
{
    AllocSet    aset;   /* aset that owns this block */
    AllocBlock  next;   /* next block in aset's blocks list *

[continue reading]

Posted by gabrielle roth on 2014-09-23 at 03:19:19
Many thanks to the speakers, my fellow conference committee members, and especially our chair, Kris Pennella, for organizing the best PgOpen yet. (Speakers: please upload your slides or a link to your slides to the wiki.) I came back with a big to-do/to-try list: check out Catherine Devlin’s DDL generator, familiarize myself with the FILTER […]

Tonight I presented a talk on using JSON in Postgres at the Toronto Postgres users group. Pivotal hosted the talk at their lovely downtown Toronto office. Turnout was good with a little over 15 people attending (not including the construction workers banging against some nearby windows).

I talked about the JSON and JSONB datatypes in Postgres and some idea for appropriate uses of NoSQL features in a SQL database like Postgres.

My slides are available for download

We are thinking of having lighting and ignite talks for the next meetup. If anyone is in the Toronto area and wants to give a short (5 minute) talk on a Postgres related topic let me know.


Posted by Paul Ramsey on 2014-09-22 at 23:30:00

A specially extended feature frenzy for FOSS4G 2014 in Portland. Usually I only frenzy for 25 minutes at a time, but they gave me an hour long session!

PostGIS Feature Frenzy — Paul Ramsey from FOSS4G on Vimeo.

Thanks to the organizers for giving me the big room and big slot!

Posted by Josh Berkus in pgExperts on 2014-09-19 at 00:48:00
Recently a client asked us to help them find and weed out duplicate indexes.  We had some old queries to do this, but they tended to produce a lot of false positives, and in a database with over 2000 indexes that wasn't going to cut it.  So I rewrote those queries to make them a bit more intelligent and discriminating, and to supply more information to the user on which to base decisions about whether to drop an index.

Here's the first query, which selects only indexes which have exactly the same columns.  Let me explain the columns of output it produces:
  • schema_name, table_name, index_name: the obvious
  • index_cols: a comman-delimited list of index columns
  • indexdef: a CREATE statement for how the index was created, per pg_indexes view
  • index_scans: the number of scans on this index per pg_stat_user_indexes
Now, go run in on your own databases.  I'll wait.

So, you probably noticed that we still get some false positives, yes?  That's because an index can have all the same columns but still be different.  For example, it could use varchar_pattern_ops, GiST, or be a partial index.  However, we want to see those because often they are functionally duplicates of other indexes even though they are not exactly the same.  For example, you probably don't need both an index on ( status WHERE cancelled is null ) and on ( status ).

What about indexes which contain all of the columns of another index, plus some more?  Like if you have one index on (id, name) you probably don't need another index on just (id).  Well, here's a query to find partial matches.

This second query looks for indexes where one index contains all of the same columns as a second index, plus some more, and they both share the same first column.  While a lot of these indexes might not actually be duplicates, a lot of them will be.

Obviously, you could come up with other variations on this, for example searching for all multicolumn indexes with the same columns in a different order, or indexes with the same first two columns but others different.  To cr

[continue reading]

Posted by gabrielle roth on 2014-09-19 at 00:21:00

The Portal Project hosted at PSU is near & dear to our hearts here at PDXPUG. (It’s backed by an almost 3TB Postgres database.) We’ve had several talks about this project over the years:

eXtreme Database Makeover (Episode 2): PORTAL – Kristin Tufte
Metro Simulation Database – Jim Cser
R and Postgres – Chris Monsere (I think this is where we first heard about bus bunching)
Extreme Database Makeover – Portal Edition – William van Hevelingin

Kristin Tufte most recently spoke at the PDXPUG PgDay about current development on this project, which is now in its 10th year. Future plans include data from more rural locations, more detailed bus stats, and possibly a new bikeshare program. We look forward to hearing more about it!


Posted by Tomas Vondra on 2014-09-19 at 00:00:00

If I had to name one thing that surprised me the most back when I started messing with C and PostgreSQL, I'd probably name memory contexts. I never met this concept before, so it seemd rather strange, and there's not much documentation introducing it. I recently read an interesting paper summarizing architecture of a database system (by Hellerstein, Stonebraker and Hamilton), and there's actually devote a whole section (7.2 Memory Allocator) to memory contexts (aka allocators). The section explicitly mentions PostgreSQL as having a fairly sophisticated allocator, but sadly it's very short (only ~2 pages) and describes only the general ideas, without going discussing the code and challenges - which is understandable, because the are many possible implementations. BTW the paper is very nice, definitely recommend reading it.

But this blog is a good place to present details of the PostgreSQL memory contexts, including the issues you'll face when using them. If you're a seasoned PostgreSQL hacker, chances are you know all of this (feel free to point out any inaccuracies), but if you're just starting hacking PostgreSQL in C, this blog post might be useful for you.

Now, when I said there's not much documentation about memory contexts, I was lying a bit. The are plenty of comments in memutils.h and aset.c, explaining the internals quite well - but who reads code comments, right? Also, you can only read them when you realize how important memory contexts are (and find the appropriate files). Another issue is that the comments only explain "how it works" and not some of the consequences (like, palloc overhead, for example).

Motivation

But, why do we even need memory contexts? In C, you simply call malloc whenever you need to allocate memory on heap, and when you're done with the memory, you call free. It's simple and for short programs this is pretty sufficient and manageable, but as the program gets more complex (passing allocated pieces between functions) it becomes really difficult to track all those little pieces of

[continue reading]

I was doing some research for a customer who wanted to know where the real value to performance is. Here are some pricing structures between GCE, AWS and Softlayer. For comparison Softlayer is bare metal versus virtual.

GCE: 670.00
16 CPUS
60G Memory
2500GB HD space

GCE: 763.08
16 CPUS
104G Memory
2500GB HD space

Amazon: 911.88
16 CPUS
30G Memory
3000GB HD Space

Amazon: 1534.00
r3.4xlarge
16 CPUS
122.0 Memory
SSD 1 x 320
3000GB HD Space

Amazon: 1679.00
c3.8xlarge
32 CPUS
60.0 Memory
SSD 2 x 320
3000GB HD Space

None of the above include egress bandwidth charges. Ingress is free.

Softlayer: ~815 (with 72GB memory ~ 950)
16 Cores
RAID 10
4TB (4 2TB drives)
48GB Memory

Softlayer: ~1035 (with 72GB memory ~ 1150)
16 Cores
RAID 10
3TB (6 1TB drives, I also looked at 8-750GB and the price was the same. Lastly I also looked at using 2TB drives but the cost is all about the same)
48GB Memory

Found a great walkthrough on setting up WAL-E to use python-swiftclient for storage in Rackspace Cloud Files: https://developer.rackspace.com/blog/postgresql-plus-wal-e-plus-cloudfiles-equals-awesome/

Unfortunately by default, your backups use the public URL for Cloud Files and eat into metered public bandwidth.

The way to work around this is to set the endpoint_type to internalURL instead of the default publicURL.

You do that by setting the following environment variable:

SWIFT_ENDPOINT_TYPE='internalURL'

That allows WAL-E to use Servicenet for base backups and WAL archiving which will be much faster and not eat into your metered public bandwidth.
Posted by Denish Patel in OmniTI on 2014-09-18 at 18:42:40

Today, I presented on “Postgres in Amazon RDS” topic at Postgres Open Conference in Chicago. Here is the slide deck:

Preface


This article demonstrates the simplest cases regarding autoserialization and deserialization to the database of objects in PGObject.   It also demonstrates a minimal subset of the problems that three valued logic introduces and the most general solutions to those problems.  The next article in this series will address more specific solutions and more complex scenarios.

The Problems


Often times we want to have database fields automatically turned into object types which are useful to an application.  The example here turns SQL numeric fields into Perl Math::Bigfloat objects. However the transformation isn't perfect and if not carefully done can be lossy.  Most applications types don't support database nulls properly and therefore a NULL making a round trip may end up with an unexpected value if we aren't careful.  Therefore we have to create our type in a way which can make round trips in a proper, lossless way.

NULLs introduce another subtle problem with such mappings, in that object methods are usually not prepared to handle them properly.  One solution here is to try to follow the basic functional programming approach and copy on write.  This prevents a lot of problems.  Most Math::BigFloat operations do not mutate the objects so we are relatively safe there, but we still have to be careful.

The simplest way to address this is to build into one's approach a basic sensitivity into three value logic.  However, this poses a number of problems, in that one can accidentally assign a value which can have other values which can impact things elsewhere.

A key principle on all our types is that they should handle a null round trip properly for the data type, i.e. a null from the db should be turned into a null on database insert.  We generally allow programmers to check the types for nulls, but don't explicitly handle them with three value logic in the application (that's the programmer's job).

The Example Module and Repository


This article follows the code of PGObject::Type::BigFloat..  The code is li

[continue reading]