PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Twitter
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Andrew Dunstan in pgExperts on 2012-01-28 at 14:18:57
Someone asked on IRC how to build an external module using Microsoft C. I have done a lot of building core Postgres with various versions of Microsoft C, and written or committed substantial parts of the build system we use for it, but sadly I don't know the answer, and couldn't find one anywhere in our docs or on the wiki. Maybe there isn't a simple answer. If so, it's long since time we should remedy that. Unix people tend to look down on Windows, but in fact some of the largest deployments of Postgres anywhere are on Windows. It should not be beyond our wits to support this better.

There are an almost unlimited number of articles on the web about how to find gaps in sequences in SQL. And it doesn't have to be very hard. Doing it in a "partitioned sequence" makes it a bit harder, but still not very hard. But when I turned to a window aggregate to do that, I was immediately told "hey, that's a good example of a window aggregate to solve your daily chores, you should blog about that". So here we go - yet another example of finding a gap in a sequence using SQL.

I have a database that is very simply structured - it's got a primary key made out of (groupid, year, month, seq), all integers. On top of that it has a couple of largish text fields and an fti field for full text search. (Initiated people will know right away which database this is). The sequence in the seq column resets to zero for each combination of (groupid, year, month). And I wanted to find out where there were gaps in it, and how big they were, to debug the tool that wrote the data into the database. This is really easy with a window aggregate:


SELECT * FROM (
   SELECT
      gropid,
      year,
      month,
      seq,
      seq-lag(seq,1) OVER (PARTITION BY groupid, year, month ORDER BY seq) AS gap FROM mytable
) AS t
WHERE NOT (t.gap=1)
ORDER BY groupid, year, month, seq
 

One advantage to using a window aggregate for this is that we actually get the whole row back, and not just the primary key - so it's easy enough to include all the data you need to figure something out.

What about performance? I don't really have a big database to test this on, so I can't say for sure. It's going to be a sequential scan, since I look at the whole table,and not just parts of it. It takes about 4 seconds to run over a table of about a million rows, 2.7Gb, on a modest VM with no actual I/O capacity to speak of and a very limited amount of memory, returning about 100 rows. It's certainly by far fast enough for me in this case.

And as a bonus, it found me two bugs in the loading script and at least one bug in somebody elses code that I'm now wa

[continue reading]

This video presentation by Dwight Merriman (MongoDB) at OSCON Data 2011 explores the role of NoSQL databases. He makes some very interesting points:

  • NoSQL was created to allow horizontal, read-write scaling
  • NoSQL uses a dynamic schema
  • NoSQL is similar to dynamically-typed languages, e.g. Perl, PHP
  • NoSQL has serious limitations compared to relational data storage, e.g. reporting

What I found most interesting about his presentation was the comparison to dynamically-typed languages. When dynamically-typed scripting languages started to become popular years ago, there was a sense that they were only for trivial applications, compared to compiled language that were for serious work. As CPUs became faster, scripting languages were increasingly used for production applications, particularly web applications that are modified frequently. There might be a similar pattern now in the (improper?) dismissal of NoSQL for serious applications.

Continue Reading »

Posted by David Wheeler in pgExperts on 2012-01-27 at 07:32:00

In my previous post, I outlined the basics of a configuration-file and dependency-tracking SQL deployment architecture. But I left it off wanting to eliminate the need for such a file and still have it all work. This post outlines just how to do that by relying on VCS history to determine what changes need to be deployed or reverted.

Read More »

Posted by Jim Smith in Bull Inf Sys on 2012-01-26 at 19:46:59
We want to set up a Postgres Users group in the Arizona/Southwest area. Are you interested? We're in metro Phoenix. Any local Postgres users out there?
Posted by Bruce Momjian in EnterpriseDB on 2012-01-26 at 16:15:01

The term "scalability" gets injected into database conversations often, but scalability discussions often end without a clear consensus. I believe this is because "scalability", like "replication", actually means different things to different people.

There are two dimensions to database scalability; the first dimension specifies if additional servers are involved:

  • vertical: The hardware capabilities of a single database server are increased to increase throughput
  • horizontal: The number of a database servers is increased to increase throughput

Continue Reading »

Posted by Andrew Dunstan in pgExperts on 2012-01-26 at 15:25:30
I have a (non-business) request to install and support WordPress. Now this is a project that doesn't seem to want to support PostgreSQL. Not for any good reason that I can tell. They just can't be bothered. So I found that there's a little plugin you can get that is a slightly clever hack. It apparently acts like a shim that intercepts the queries written in MySQL dialect SQL and rewrites them in the Postgres dialect.

When I had a look in it, however, it didn't look like there was any sort of use of parameterized statements. So then I had a look inside the Wordpress code. And they have a prepare method, but it doesn't actually use parameterized statements. What it does is interpolate values, supposedly properly quoted, into an SQL statement. Argh! Little Bobby Tables should have taught us that this is the wrong way to go. Use placeholders! There's no excuse for not doing it properly, no matter what your database.

I looked at the PHP API that Wordpress is using. It looks like it's the original mysql API that doesn't have a prepare method, unlike the postgres module. The "improved" Mysql extension for PHP does, however. Pity they aren't using it.
Posted by David Wheeler in pgExperts on 2012-01-26 at 05:00:00

I've been thinking a lot about SQL change management. I've written about this before, implemented a dubious implementation of SQL migrations, and used a dependency-tracking solution with its own set of challenges. Nothing has satisfied me. But I think I may finally have cracked this thing wide open.

Read More »

Posted by Leo Hsu and Regina Obe on 2012-01-25 at 20:55:00

PostgreSQL has aggregate functions called bool_and and bool_or which it's had for as far back as I can remember. What do they do? given rows of premises (things that resolve to booleans), bool_and will return true if all of the premises are true. Similarly bool_or will return true if any of the premises in the set of rows is true. What if however your boolean expressions are not in rows, but instead passed in as a sequence of arbitrary statements of questionable fact. We want a function like bool_or or bool_and that takes an arbitrary number of boolean arguments. Are there functions that fit the bill. Indeed there are, but they don't scream out and say I work with booleans because they fit into a class of function we discussed in The wonders of Any Element and that also happen to be variadic functions. These are none other than greatest and least and they are old timer functions that you can find in most versions of PostgreSQL. We'll demonstrate how to use all 4 with booleans in this article. It must be said that greatest and least are much more useful when applied to other data types like dates and numbers, but we were amused at the parallel with booleans.

Side note: we've started to write our book on PostgreSQL that will be published by O'Reilly. We hope to finish this book within the next 5-7 months but have preliminary e-Book drafts before then for early purchasers to scan. The focus of the book will be PostgreSQL 9.1 with some highlights the upcoming PostgreSQL 9.2. Of course oldie but goodie topics are in the book too. It's a thrill to be writing again.

Continue reading "True or False every which way"
Posted by Andrew Dunstan in pgExperts on 2012-01-25 at 16:16:16
From time to time I hear supposed purists sneer at the PostgreSQL Enum feature. I usually try to restrain myself from replying to these people. But it's nice to know that they are appreciated, as this tweet shows.
Posted by Andrew Dunstan in pgExperts on 2012-01-25 at 16:09:10
I recently gave the Perl guys a bit of stick about the state of their bug tracker. However, I must give Ricardo Signes (who I am told is the current Perl Pumpking) and our own David Wheeler kudos for persisting to ensure that the bug I tried to file actually got filed, and seems to have some attention. Let's hope it gets fixed.
Posted by Bruce Momjian in EnterpriseDB on 2012-01-25 at 14:15:01

While database software can be the cause of outages, for Postgres, it is often not the software but the hardware that causes failures — and storage is often the failing component. Magnetic disk is one of the few moving parts on a computer, and hence prone to breakage, and solid-state drives (SSDs) have a finite write limit.

While waiting for storage to start making loud noises or fail is an option, a better option is to use some type of monitoring that warns of storage failure before it occurs, e.g. enter SMART. SMART is a system developed by storage vendors that allows the operating system to query diagnostics on the drive and warn of unusual storage behavior before failure occurs. While read/write failures are reported by the kernel, SMART parameters often warn of danger before failure occurs. Below is the SMART output from a Western Digital (WDC) WD20EARX magnetic disk drive:

Continue Reading »


Joomla, one of the world’s most popular open source content management systems (CMS) used for everything from websites to blogs to Intranets, today announces the immediate availability of Joomla 2.5. Along with new features such as advanced search and automatic notification of Joomla core and extension updates, the Joomla CMS for the first time includes multi-database support with the addition of Microsoft SQL Server. Previous versions of Joomla were compatible exclusively with MySQL databases.

Way to go Joomla! But why don’t you guys mention PostgreSQL database in the main release story? Do you really think that MSSQL is more common choice for the database layer for any CMS? Seriously?


Filed under: Announces, PostgreSQL Tagged: CMS, Joomla!, microsoft sql server, PostgreSQL, release

My presentation from SCALE 10x, “PostgreSQL Performance When It’s Not Your Job” is now available for download.

Postgres consists of roughly 1.1 million lines of C code, which is compiled into an executable with millions of CPU instructions. Of the many CPU machine-language instructions in the Postgres server executable, which one is the most important? That might seem like an odd question, and one that is hard to answer, but I think I know the answer.

You might wonder, "If Postgres is written in C, how would we find the most important machine-language instruction?" Well, there is a trick to that. Postgres is not completely written in C. There is a very small file (1000 lines) with C code that adds specific assembly-language CPU instructions into the executable. This file is called s_lock.h. It is an include file that is referenced in various parts of the server code that allows very fast locking operations. The C language doesn't supply fast-locking infrastructure, so Postgres is required to supply its own locking instructions for all twelve supported CPU architectures. (Operating system kernels do supply locking instructions, but they are much too slow to be used for Postgres.)

Continue Reading »

Posted by Andrew Dunstan in pgExperts on 2012-01-23 at 17:32:00
Fedora 16 ships with v8 (I'm not sure how far it goes back, Fedora 15 at least), which makes installing PLV8 extremely easy. Here's what I did earlier today. It took about a minute. I already had an installed and running instance of Postgres where I wanted PLV8 installed. So I did this:
cd inst.json
sudo yum install v8 v8-devel
hg clone https://code.google.com/p/plv8js/
cd plv8js
PATH=../bin:$PATH make USE_PGXS=1
PATH=../bin:$PATH make USE_PGXS=1 install
cd ..
bin/createdb testplv8
bin/psql -c 'create extension plv8; create language plv8;' testplv8

Pretty simple, very quick.
Posted by Bruce Momjian in EnterpriseDB on 2012-01-23 at 15:30:02

The new server is 2-10 times faster than my old 2003 server, but that 10x speedup is only possible for applications that:

  • Do lots of random I/O, thanks to the SSDs. Postgres already supports tablespace-specific random_page_cost settings, but it would be interesting to see if there are cases that can be optimized for low random pages costs. This is probably not an immediate requirement because the in-memory algorithms already assume a low random page cost.
  • Can be highly parallelized. See my previous blog entry regarding parallelism. The 16 virtual cores in this server certainly offer more parallelism opportunities than my old two-core system.

Other observations:

  • It takes serious money to do the job right, roughly USD $4k — hopefully increased productivity and reliability will pay back this investment.
  • I actually started the upgrade two years ago by adjusting my scripts to be more portable; this made the migration go much smoother. The same method can be used for migrations to Postgres by rewriting SQL queries to be more portable before the migration. Reliable hardware is often the best way to ensure Postgres reliability.
  • My hot-swappable SATA-2 drive bays allow for a flexible hard-drive-based backup solution (no more magnetic tapes). File system snapshots allow similar backups for Postgres tablespaces, but it would be good if this were more flexible. It would also be cool if you could move a drive containing Postgres tablespaces from one server to another (perhaps after freezing the rows).

Continue Reading »

Posted by Andrew Dunstan in pgExperts on 2012-01-23 at 12:34:00
People seem to be getting very excited about JSON in 9.2. So I just tried using the new type in combination with PLV8. It seems to work pretty well. Let's say we want to index a field in our JSON object called "x", so when we query on it the query will run fast. No problem. Here's a very simple function that gets the member out of the JSON:
CREATE or replace FUNCTION jmember (j json, key text )
 RETURNS text
 LANGUAGE plv8 
 IMMUTABLE
AS $function$

  var ej = JSON.parse(j);
  if (typeof ej != 'object')
        return NULL;
  return JSON.stringify(ej[key]);

$function$;
In reality we'd want something a bit more sophisticated than this, but you can get the idea from this. Armed with this function we could now create our index, using the functional index feature of PostgreSQL:
CREATE INDEX x_in_json ON mytable (jmember(jsonfield,'x'));
Now, when we issue a query like
SELECT *
FROM mytable
WHERE jmember(jsonfield,'x') = 'foo';
It should be able to use the index. This is reasonably analogous to a very simple use of MongoDB's ensureIndex() function.

We could make this somewhat nicer by providing some operators, and maybe building in a function like this, but the fundamental idea should work pretty much the same.
During last months I wrote some tools to help me with my daily duties. I’d like to let you know you about them, as you might find them useful. So, here we go: pg.logs.tail Available from OmniTI SVN. It’s a smarter “tail -f” for PostgreSQL logs. Smarter in a way, that it knows that PostgreSQL [...]
On 19t of January, Magnus Hagander committed patch: Separate state from query string in pg_stat_activity   This separates the state (running/idle/idleintransaction etc) into it's own field ("state"), and leaves the query field containing just query text.   The query text will now mean "current query" when a query is running and "last query" in other [...]
I am one of the supporters of keeping as much business logic in the database itself. This reduces the access layer of the application to mostly dumb transport and data transformation logic that can be implemented using different technologies and frameworks, without the need to re-implement critical data consistency and data distribution logic in several places, and gives an easy possibility to control what you are doing with your data and how your applications are allowed to access this data or even change exchange the underlying data structures transparently from the upper level of the code and without the need of a downtime. It also gives a possibility to add an additional layer of security allowing access to the data only through stored procedures, that can change their security execution context as needed (SECURITY DEFINER feature of PostgreSQL).

This approach has some disadvantages of course. One of the biggest technical problems, that is very easily becoming an organizational problem if you have a relatively big teem of developers, a problem of how to rapidly rollout new features without touching old functioning stored procedures, so that old versions of your upper level applications can still access the previous versions of stored procedures, and newly rolled out nodes with new software stack on them, access new stored procedures doing something more, or less, or returning some other data sets compared to their previous versions. And of course hundreds of stored procedures that are there to access and manipulate data are enough to make any attempt to keep all new versions of them backwards compatible, a nightmare.

Classical way to do this, would be to keep all the changes backwards compatible and if it is not possible, then create a new version of a stored procedure with some version suffix like _v2, mark the previous version as deprecated and after all your software stack is rolled out to use that new function, just drop the previous version.  But if you are rolling out new version of the whole stack on

[continue reading]

Posted by Selena Deckelmann on 2012-01-22 at 17:17:57

Slides (as of this moment) are here: Mistakes were made. I changed quite a bit of the beginning and end, given how bit the audience is. Previous talks, we’ve usually ended with a fun “omg, here’s the craziest story I know” session. I imagine we’ll get a little bit of that today.

Postgres folks will note a relevant picture on slide 13. :)

This is my first keynote! Thanks so much to SCALE for inviting me. There were at least 1500 registered attendees as of Friday, so looking forward to a big crowd.

On 19t of January, Heikki Linnakangas committed patch: Make pg_relation_size() and friends return NULL if the object doesn't exist.   That avoids errors when the functions are used in queries like "SELECT pg_relation_size(oid) FROM pg_class", and a table is dropped concurrently.   Phil Sorber This patch on its own is not very visible, but it [...]
Posted by Marc Balmer in micro systems on 2012-01-22 at 11:23:02
Most open source applications that use a PostgreSQL database follow the same scheme for authorization: A single role is created in the database, which owns all objects in database and has full access to them. As soon as the application is launched, it connects to the database with this role, using a password that is stored somewhere, e.g. in a config file.

From a users perspective, this is nice: The user does not even need to know there is a database under the hood. From a security perspective, this is a nightmare, for mostly two reasons.

Continue reading "Get Database Security Right"
Posted by Mark Wong on 2012-01-21 at 03:39:23

11 people showed up for our first meeting of the new year.  Thanks to Iovation for providing a comfortable space with pizza.

We will be having another PRP soon, as well as a YAMS hackathon.  We may combine the two into one event.  Watch this space for details.

Here are Tim’s slides from his Database Trending talk last night.  I can’t wait to try this at home!

Database Trending

I converted his .odp slides to .ppt so’s I could a) upload them to WP (no .odp allowed!) and b) include his notes.


Posted by Bruce Momjian in EnterpriseDB on 2012-01-20 at 14:30:01

A few weeks ago, I finally replaced my eight-year-old home server. The age of my server, and its operating system, (BSD/OS, last officially updated in 2002) were a frequent source of amusement among Postgres community members. The new server is:

Super Micro 7046A-T 4U Tower Workstation 2 x Intel Xeon E5620 2.4GHz Quad-Core Processors Crucial 24GB Dual-Rank PC3-10600 DDR3 SDRAM Intel 160GB 320 Series SSD Drive 4 x Western Digital Caviar Green 2TB Hard Drives

Continue Reading »

Posted by Bruce Momjian in EnterpriseDB on 2012-01-19 at 20:00:01

As a followup to my previous blog entry, I want to show queries that allow users to analyze TOAST tables. First, we find the TOAST details about the test heap table:

SELECT oid, relname, reltoastrelid, reltoastidxid FROM pg_class where relname = 'test';
  oid  | relname | reltoastrelid | reltoastidxid
-------+---------+---------------+---------------
 17172 | test    |         17175 |             0

Continue Reading »

Posted by Keith Fiske in OmniTI on 2012-01-19 at 15:21:06

I've finally gotten Git support added into pg_extractor. This works pretty much the same as the SVN option did already. One important difference is that there are two options for committing

--git

This just does a local commit to a locally maintained repository

--gitpush

This does a local commit as well as push to an already configured remote repository

You use either one option or the other, not both. The Git options also expects a proper .gitconfig file for your environment to be set up for the user running pg_extractor. There is no option for passing the git username like SVN has (and I don't see a need for one). Remote repositories will also have to be configured in advance of using the push option.

An important thing to note about using svn or git options with pg_extractor is that it does not do any initial VCS setups on the folders it creates and outputs too. It's best to run it first without any VCS options to get an initial dump and perform a manual commit (and/or push with git). Then for any future runs of pg_extractor, use the VCS options to track changes.

As always, please report any bugs or issues!

Tags: 

Posted by John DeSoi on 2012-01-19 at 05:14:31

The source for the pgEdit TextMate bundle is now available on GitHub at https://github.com/desoi/pgedit-textmate.

Posted by Andrew Dunstan in pgExperts on 2012-01-18 at 18:22:45
We don't provide any special indexing for XML on PostgreSQL - in fact there are no comparison operators defined for the type at all - and the current JSON patch won't provide anything special there either. But I have been wondering exactly what sort of indexing might be useful for tree structured objects. For the most part I'm inclined to think that these should be treated as singleton objects where we don't need to search on them (and if we do then the database is probably very badly structured). At least that's how I use them. For example, I can imagine storing a web session object as XML or JSON. But I'm going to know the session ID and store that as the key of the session table. I should never need to search for a session by its content, only by its ID. But let's say we did need to. What sort of operators would we use to index the data?