Running accurate database benchmark tests is hard. I’ve managed to publish a good number of them without being embarrassed by errors in procedure or results, but today I have a retraction to make. Last year I did a conference talk called “Seeking PostgreSQL” that focused on worst case situations for storage. And that, it turns out, had a giant error. The results for the Intel 320 Series SSD were much lower in some cases than they should have been, because the drive’s NCQ feature wasn’t working properly. When presenting this talk I had a few people push back that the results looked weird, and I was suspicious too. I have a correction to publish now, and I think the way this slipped by me is itself interesting. The full updated SeekingPostgres talk is also available, with all of the original graphs followed by an “Oops!” section showing the next data.
Native Command Queueing is an important optimization for seek heavy workloads. When trying to optimize work for a mechanical disk drive, it’s very important to know where the drive is currently at when deciding where to go next. If you have a read for that same area of the drive in the queue, you want to read that one now, get the I/O out of the way while you’re nearby, and then move to another physical area of the disk.
However, on a SSD, you might think that re-ordering commands isn’t that important. If reads are always inexpensive, taking a constant and small period of time on a flash device, their order doesn’t matter, right? Well, that’s wrong on a few counts. The idea that reads always take the same amount of time on SSD is a popular misconception. There’s a bit of uncertainty around what else is happening in the drive. Flash cells are made of blocks larger than a single database read. What happens if you are reading 8K of a cell that is being rewritten right now, because someone is updating another 8K section? Coordinating that is likely to pause your read for a moment. It doesn’t take much lag at SSD speeds to result in a noticable slowdown.
Thanks to Shaun M. Thomas, I have been offered a numeric copy of the “Instant PostgreSQL Backup” book from Packt publishing, and was provided with the “Instant PostgreSQL Starter” book to review. Considering my current work-situation, doing a lot of PostgreSQL advertising and basic teaching, I was interested in reviewing this one…
Like the Instant collection ditto says, it’s short and fast. I kind of disagree with the “focused” for this one, but it’s perfectly fine considering the aim of that book.
Years ago, when I was a kid, I discovered databases with a tiny MySQL-oriented book. It teaches you the basis : how to install, basic SQL queries, some rudimentary PHP integration. This book looks a bit like its PostgreSQL-based counterpart. It’s a quick travel through installation, basic manipulation, and the (controversy) “Top 9 features you need to know about”. And that’s exactly the kind of book we need.
So, what’s inside ? I’d say what you need to kick-start with PostgreSQL.
The installation part is straight forward : download, click, done. Now you can launch pgadmin, create an user, a database, and you’re done. Next time someone tells you PostgreSQL ain’t easy to install, show him that book.
The second part is a fast SQL discovery, covering a few PostgreSQL niceties. It’s damn simple : Create, Read, Update, Delete. You won’t learn about indexes, functions, advanced queries here. For someone discovering SQL, it’s what needs to be known to just start…
The last part, “Top 9 features you need to know about”, is a bit more hard to describe. PostgreSQL is a RDBMS with included batteries, choosing 9 features must have been a really hard time for the author, and I think nobody can be blamed for not choosing that or that feature you like : too much choice… The author spends some time on pg_crypto, the RETURNING clause with serial, hstore, XML, even recursive queries… This is, from my point of view, the troublesome part of the book : mentioning all these features means introducing complicated SQL queries. I would never te
I’ve been hacking on a tool to allow resynchronizing an old master server after failover. Please take a look: https://github.com/vmware/pg_rewind.
We just concluded the PgCon Developer Meeting. The two big items from me were that EnterpriseDB has dedicated staff to start work on parallelizing Postgres queries, particularly in-memory sorts. I have previously expressed the importance (and complexity) of parallelism. Second, EnterpriseDB has also dedicated staff to help improve Pgpool-II. Pgpool is the swiss army knife of replication tools, and I am hopeful that additional development work will further increase its popularity.
The Developer Meeting meeting notes (summary) has lots of additional information about the big things coming from everyone next year.
One of the
clients of OmniTI requested
help to provide sample application to insert
JSON data into Postgres using Java JDBC driver . I’m not Java expert
so it took a while for me to write a simple java code to insert
data. TBH, I took help to write test application from one of our
Java engineers at OmniTI. Now, test application is ready and next
step is to make it work with JSON datatype ! After struggling a
little to find out work around for string escaping in JAVA code, I
stumbled upon data type issue!
Here is the test application code to connect to
my local Postgres installation and insert JSON data into sample
table:
postgres=# \d sample
Table "public.sample"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
data | json |
denishs-MacBook-Air-2:java denish$ java -cp $CLASSPATH
PgJSONExample
-------- PostgreSQL JDBC Connection Testing ------------
PostgreSQL JDBC Driver Registered!
You made it, take control your database now!
Something exploded running the insert: ERROR: column "data" is of
type json but expression is of type character varying
Hint: You will need to rewrite or cast the expression.
Position: 42
After some research , I
found out that there is no standard JSON type on java side
so adding support for json to postgres jdbc is not straight forward
! StackOverflow
answer helped me for testing out the JSON datatype
handling at psql level. As Craig mentioned in the answer that
the correct way to solve this problem is to write a custom Java
mapping type that uses the JDBC setObject method. This can be a
tricky though. A simpler workaround is to tell PostgreSQL to
cast implicitly from text to json:
postgres=# create cast (text as json) without function as
implicit;
CREATE CAST
The WITHOUT FUNCTION clause is used because text and json have
the same on-disk and in-memory representation, they’re basically
just aliases for the same data type. AS IMPLICIT tells PostgreSQL
it can convert without being explicitly told to, allowing things
like this to work:
postgres=# prepare test(tex
One of the things that really frustrated me about the KNN GIST distance box box centroid operators that came in PostgreSQL 9.1 and PostGIS 2.0 was the fact that one of the elements needed to be constant to take advantage of the index. In PostGIS speak, this meant you couldn't put it in the FROM clause and could only enjoy it in one of two ways.
Having recently posted some thoughts on Shaun Thomas' " "PostgreSQL Backup and Restore How-to" review ", Packt asked me if I'd like to review the new " Instant PostgreSQL Starter " by Daniel K. Lyons and kindly provided me with access to the ebook version. As I'm happily in a situation where I may need to introduce PostgreSQL to new users, I was interested in taking a look and here's a quick overview.
It follows the same "Instant" format as the backup booklet, which I quite like as it provides a useful way of focussing on particular aspects of PostgreSQL without being bogged down in reams of tl;dr documentation. " Instant Pg Starter " is divided into three sections:
Installation Quick start – creating your first table Top 9 features you need to know aboutIt occurs to me I forgot to congratulate the winners of the free ebooks. So without further adieu:
Congrats to the winners. But more, I call upon them to pay it
forward by contributing to the community, either by corresponding
with the excellent PostgreSQL
mailing lists, or maybe submitting a patch or two to the code.
There’s a lot of ground to cover, and more warm bodies always
helps. ![]()
Thanks again, everyone!
There’s a fantastic set of blog posts about distributed databases and network partitioning, starting with this post explaining the perils of trying to “communicate with someone who doesn’t know you’re alive.”
The next post is about Postgres and 2-phase commit. And there are four additional posts in the series.
The whole series worth reading for anyone interested in data
stores, consistency and Postgres! ![]()
In our previous article we went through describing what retention policies are and how they can be enforced on your PostgreSQL server backups with Barman 1.2. In this post, we will go through the configuration aspects.
For the sake of simplicity, we assume a typical scenario which involves taking full backups once a week through the “barman backup” command. Suppose you want to automatically keep the latest 4 backups and let Barman automatically delete the old ones (obsolete).
The main configuration option for retention policies in Barman is “retention_policy” which can be defined both at global or server level. If you want all your servers by default to keep the last 4 periodical backups, you need to add in the general section of Barman’s configuration file the following line:
[barman] ... // General settings retention_policy: REDUNDANCY 4
When the next “barman cron” command is executed (every minute if you installed Barman using RPMs or Debian/Ubuntu packages), Barman checks for the number of available full periodical backups for every server, order them in descending chronological order (from the most recent to the oldest one) and deletes backups from the 5th position onwards.
In case you have several servers backed up on the same Barman host and you want to differentiate the retention policy for a specific server, you can simply edit that server configuration section (or file, see “Managing the backup of several PostgreSQL servers with Barman“) and define a different setting:
[malcolm] description = Malcolm Rocks ssh_command = ssh malcolm conninfo = host=malcolm port=5432 user=postgres dbname=postgres retention_policy: REDUNDANCY 8
However, Barman allows systems administrators to manage retention policies based on time, in terms of recovery window and point of recoverability. For example, you can set another server to allow to recover at any point in time in the last 3 months:
[angus] description = Angus Rocks ssh_command = ssh angus conninfo = host=angus port=5432 user=postgres dbname=postgres retention
I’ve got a Beer & Tell to give about alembic. Alembic is a migration tool that works with SQLAlchemy. I’m using it for database migrations with PostgreSQL.
So, here’s what I want to say today:
alembic revision -m "bug XXXXXX Add a new table"
--autogenerateThe most difficult thing to deal with so far are the many User Defined Functions that we use in Socorro. This isn’t something that any migration tools I tested deal well with.
Happy to answer questions! And I’ll see about making a longer talk about this transition soon.
Do you use PostgreSQL and truly believe it’s “the world’s most advanced open source database” and that its upcoming 9.3 release will make it even more awesome?
Do you also use Python and believe it’s “an easy to learn, powerful programming language” with “elegant syntax” that makes it an ideal language for developing applications and tools around PostgreSQL, such as Pyrseas?
Then we could use your help. For starters, we want to add support for the MATERIALIZED VIEWs and EVENT TRIGGERs coming up in PG 9.3.
We have also been requested to add the capability to load and maintain “static data” (relatively small, unchanging tables) as part of yamltodb, so that it can be integrated more easily into database version control workflows.
And for the next release, Pyrseas 0.7, we’d like to include the first version of the database augmentation tool which will support declarative implementation of business logic in the database–starting off with audit trail columns. Some work has been done on this already, but it needs integration with the current code and tests.
Or perhaps coding is not your forte, but you’re really good at explaining and documenting technical “stuff”. Then you could give us a hand with revamping the docs, maybe writing a tutorial so that users have a smooth ride using our tools.
Or maybe you have your own ideas as to how improve the PostgreSQL version control experience. We’d love to hear those too.
If you’d like to help, you can fork the code on GitHub, join the mailing list and introduce yourself, or leave a comment below.
Backup and recovery in Postgres-XC has some parallels to PostgreSQL, but with its own wrinkles.
initdb [OPTION]... [DATADIR]Now, any data corruption found will be notified as below:
........
-k, --data-checksums use data page checksums
initdb -D data_directory -k
postgres=# select * from corruption_test;In earlier version,just an error message.
WARNING: page verification failed, calculated checksum 63023 but expected 48009
ERROR: invalid page in block 0 of relation base/12896/16409
postgres=# select * from corruption_test where id=1;That's cool right....
ERROR: invalid page header in block 0 of relation base/12870/18192
$ export PGDATA=/usr/local/pg93beta/dataSome points on Disk page checksums:
$ pg_controldata
....
....
....
Data page checksum version: 1
The Call for Papers for DjangoCon US 2013 is now open.