Tonight I presented a talk on using JSON in Postgres at the Toronto Postgres users group. Pivotal hosted the talk at their lovely downtown Toronto office. Turnout was good with a little over 15 people attending (not including the construction workers banging against some nearby windows).
I talked about the JSON and JSONB datatypes in Postgres and some idea for appropriate uses of NoSQL features in a SQL database like Postgres.
My slides are available for download
We are thinking of having lighting and ignite talks for the next meetup. If anyone is in the Toronto area and wants to give a short (5 minute) talk on a Postgres related topic let me know.
The Portal Project hosted at PSU is near & dear to our hearts here at PDXPUG. (It’s backed by an almost 3TB Postgres database.) We’ve had several talks about this project over the years:
eXtreme Database Makeover (Episode 2): PORTAL –
Metro Simulation Database – Jim Cser
R and Postgres – Chris Monsere (I think this is where we first heard about bus bunching)
Extreme Database Makeover – Portal Edition – William van Hevelingin
Kristin Tufte most recently spoke at the PDXPUG PgDay about current development on this project, which is now in its 10th year. Future plans include data from more rural locations, more detailed bus stats, and possibly a new bikeshare program. We look forward to hearing more about it!
If I had to name one thing that surprised me the most back when I started messing with C and PostgreSQL, I'd probably name memory contexts. I never met this concept before, so it seemd rather strange, and there's not much documentation introducing it. I recently read an interesting paper summarizing architecture of a database system (by Hellerstein, Stonebraker and Hamilton), and there's actually devote a whole section (7.2 Memory Allocator) to memory contexts (aka allocators). The section explicitly mentions PostgreSQL as having a fairly sophisticated allocator, but sadly it's very short (only ~2 pages) and describes only the general ideas, without going discussing the code and challenges - which is understandable, because the are many possible implementations. BTW the paper is very nice, definitely recommend reading it.
But this blog is a good place to present details of the PostgreSQL memory contexts, including the issues you'll face when using them. If you're a seasoned PostgreSQL hacker, chances are you know all of this (feel free to point out any inaccuracies), but if you're just starting hacking PostgreSQL in C, this blog post might be useful for you.
Now, when I said there's not much documentation about memory
contexts, I was lying a bit. The are plenty of comments in
aset.c, explaining the internals quite well - but who reads
code comments, right? Also, you can only read them when you realize
how important memory contexts are (and find the appropriate files).
Another issue is that the comments only explain "how it works" and
not some of the consequences (like,
But, why do we even need memory contexts? In C, you simply call
malloc whenever you need to allocate memory on heap,
and when you're done with the memory, you call
It's simple and for short programs this is pretty sufficient and
manageable, but as the program gets more complex (passing allocated
pieces between functions) it becomes really difficult to track all
those little pieces of
2500GB HD space
2500GB HD space
3000GB HD Space
SSD 1 x 320
3000GB HD Space
SSD 2 x 320
3000GB HD Space
None of the above include egress bandwidth charges. Ingress is free.
Softlayer: ~815 (with 72GB memory ~ 950)
4TB (4 2TB drives)
Softlayer: ~1035 (with 72GB memory ~ 1150)
3TB (6 1TB drives, I also looked at 8-750GB and the price was the same. Lastly I also looked at using 2TB drives but the cost is all about the same)
Today, I presented on “Postgres in Amazon RDS” topic at Postgres Open Conference in Chicago. Here is the slide deck:
At FOSS4G this year, I wanted to take a run at the decision process around open source with particular reference to the decision to adopt PostGIS: what do managers need to know before they can get comfortable with the idea of making the move.
# import to vertica
zcat data.sql | pv -s 16986105538 -p -t -r | vsql
0:13:56 [4.22MB/s] [==============> ] 14%
I have yet to run PostgreSQL on GCE in production. I am still testing it but I have learned the following:
Either disk can be provisioned as a raw device allowing you to use Linux Software Raid to build a RAID 10 which even further increases speed and reliability. Think about that, 4 SSD provisioned disks in a RAID 10...
The downside I see outside of the general arguments against cloud services (shared tenancy, all your data in a big brother, lack of control over your resources, general distaste for $vendor, or whatever else we in our right minds can think up) is that GCE is current limited to 16 virtual CPUS and 104GB of memory.
What does that mean? Well it means that it is likely that GCE is perfect for 99% of PostgreSQL workloads. By far the majority of PostgreSQL need less than 104GB of memory. Granted, we have customers that have 256GB, 512GB and even more but those are few and far between.
It also means that EC2 is no longer your only choice for dynamic cloud provisioned VMs for PostgreSQL. Give it a shot, the more competition in this space the better.
It has been a little quiet on the U.S. front of late. Alas, summer of 2014 has come and gone and it is time to strap on the gators and get a little muddy. Although we have been relatively quiet we have been doing some work. In 2013 the board appointed two new board members, Jonathan S. Katz and Jim Mlodgeski. We also affiliated with multiple PostgreSQL User Groups:
Thanks for Bucardo team for responding my previous post. My cascaded slave replication works as expected.
Today I notice there is still something to do related with delta
and track tables.
Single table replication scenario:
Db-A/Tbl-T1 (master) => Db-B/Tbl-T2 (slave) => Db-C/Tbl-T3 (cascaded slave)
Every change on Table T1 replicated to T2, then T2 to T3. After a while, VAC successfully cleans delta and track tables on Db-A. But not on Db-B.
I detect 2 issues:
1. If cascaded replication T2 to T3 successful, the delta table on Db-B is not be cleaned up by VAC.
2. If cascaded replication T2 to T3 failed before VAC schedule, the delta table on Db-B will be cleaned up by VAC. Then, cascaded replication from T2 to T3 losts.
I fix it by modifying SQL inside bucardo.bucardo_purge_delta(text, text):
Need advice from Bucardo team.
Postgres 9.5 will come up with an additional logging option making possible to log replication commands that are being received by a node. It has been introduced by this commit.
commit: 4ad2a548050fdde07fed93e6c60a4d0a7eba0622 author: Fujii Masao <email@example.com> date: Sat, 13 Sep 2014 02:55:45 +0900 Add GUC to enable logging of replication commands. Previously replication commands like IDENTIFY_COMMAND were not logged even when log_statements is set to all. Some users who want to audit all types of statements were not satisfied with this situation. To address the problem, this commit adds new GUC log_replication_commands. If it's enabled, all replication commands are logged in the server log. There are many ways to allow us to enable that logging. For example, we can extend log_statement so that replication commands are logged when it's set to all. But per discussion in the community, we reached the consensus to add separate GUC for that. Reviewed by Ian Barwick, Robert Haas and Heikki Linnakangas.
The new parameter is called log_replication_commands and needs to be set in postgresql.conf. Default is off to not log this new information that may surprise existing users after an upgrade to 9.5 and newer versions. And actually replication commands received by a node were already logged at DEBUG1 level by the server. A last thing to note is that if log_replication_commands is enabled, all the commands will be printed as LOG and not as DEBUG1, which is kept for backward-compatibility purposes.
Now, a server enabling this logging mode...
$ psql -At -c 'show log_replication_commands' on
... Is able to show replication commands in LOG mode. Here is for example the set of commands set by a standby starting up:
LOG: received replication command: IDENTIFY_SYSTEM LOG: received replication command: START_REPLICATION 0/3000000 TIMELINE 1
This will certainly help utilities and users running audit for replication, so looking forward to see log parsing tools like pgbadger make some nice outputs using this
A while ago I wrote about compiling PostgreSQL extensions under Visual Studio – without having to recompile the whole PostgreSQL source tree.
I just finished the pg_sysdatetime extension, which is mainly for Windows but also supports compilation with PGXS on *nix. It’s small enough that it serves as a useful example of how to support Windows compilation in your extension, so it’s something I think is worth sharing with the community.
The actual Visual Studio project creation process took about twenty minutes, and would’ve taken less if I wasn’t working remotely over Remote Desktop on an AWS EC2 instance. Most of the time was taken by the simple but fiddly and annoying process of adding the include paths and library path for the x86 and x64 configurations. That’s necessary because MSVC can’t just get them from pg_config and doesn’t have seem to have user-defined project variables to let you specify a $(PGINSTALLDIR) in one place.
Working on Windows isn’t always fun – but it’s not as hard as it’s often made out to be either. If you maintain an extension but haven’t added Windows support it might be easier than you expect to do so.
Packaging it for x86 and x64 versions of each major PostgreSQL release, on the other hand… well, lets just say we could still use PGXS support for Windows with a “make installer” target.
We had about 50 folks attend the PDXPUGDay 2014 last week, between DjangoCon and Foss4g. A lot of folks were already in town for one of the other confs, but several folks also day tripped from SeaPUG! Thanks for coming on down.
Thanks again to our speakers:
(Plus our lightning talk speakers: Josh B, Mark W, and Basil!)
And of course, PSU for hosting us.
Videos are linked from the wiki.
If you weren't able to make it to FOSS4G 2014 this year, you can still experience the event Live. All the tracks are being televised live and its pretty good reception. https://2014.foss4g.org/live/. Lots of GIS users using PostGIS and PostgreSQL. People seem to love Node.JS too.
After hearing enough about Node.JS from all these people, and this guy (Bill Dollins), I decided to try this out for myself.
A co-worker of mine did a blog post last year that I’ve found incredibly useful when assisting clients with getting shared_buffers tuned accurately.
You can follow his queries there for using pg_buffercache to find out how your shared_buffers are actually being used. But I had an incident recently that I thought would be interesting to share that shows how shared_buffers may not need to be set nearly as high as you believe it should. Or it can equally show you that you that you definitely need to increase it. Object names have been sanitized to protect the innocent.
To set the stage, the database total size is roughly 260GB and the use case is high data ingestion with some reporting done on just the most recent data at the time. shared_buffers is set to 8GB. The other thing to note is that this is the only database in the cluster. pg_buffercache is installed on a per database basis, so you’ll have to install it on each database in the cluster and do some additional totalling to figure out your optimal setting in the end.
database=# SELECT c.relname , pg_size_pretty(count(*) * 8192) as buffered , round(100.0 * count(*) / ( SELECT setting FROM pg_settings WHERE name='shared_buffers')::integer,1) AS buffers_percent , round(100.0 * count(*) * 8192 / pg_relation_size(c.oid),1) AS percent_of_relation FROM pg_class c INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database()) GROUP BY c.oid, c.relname ORDER BY 3 DESC LIMIT 10; relname | buffered | buffers_percent | percent_of_relation -------------------------------------+----------+-----------------+--------------------- table1 | 7479 MB | 91.3 | 9.3 table2 | 362 MB | 4.4 | 100.0 table3 | 311 MB | 3.8 | 0.8 table4