PostgreSQL
The world's most advanced open source database
Top posters
Number of posts in the past month
Top teams
Number of posts in the past month
Feeds
Planet
  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.
Contact
  • Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.
Posted by Yann Larrivee on 2016-08-23 at 16:30
ConFoo Montreal: March 8th-10th 2016

ConFoo Montreal: March 8th-10th 2016

Want to get your web development ideas in front of a live audience? The call for papers for the ConFoo Montreal 2017 web developer conference is open! If you have a burning desire to hold forth about PHP, Java, Ruby, Python, or any other web development topics, we want to see your proposals. The window is open only from August 21 to September 20, 2016, so hurry. An added benefit: If your proposal is selected and you live outside of the Montreal area, we will cover your travel and hotel.

You’ll have 45 minutes to wow the crowd, with 35 minutes for your topic and 10 minutes for Q&A. We can’t wait to see your proposals. Knock us out!

ConFoo Montreal will be held on March 8-10, 2017. For those of you who already know about our conference, be aware that this annual tradition will still be running in addition to ConFoo Vancouver. Visit our site to learn more about both events.

Posted by gabrielle roth on 2016-08-23 at 01:16
We’re currently redesigning our data warehouse, and we’re experimenting with a clock (or time, or time-of-day) dimension to represent the time of day, separate from the date dimension we already use. This table should contain a record for each minute of a day, allowing us to easily determine business hours in various timezones, etc etc. I’m […]

Walbouncer was covered on the blog already when it was first announced, but that was almost 2 years ago – so it would be a good time to echo it out again, especially in light of compatibility update to support PostgresSQL 9.5, addition of a helper script for bootstrapping replicas called Walbouncer-companion and making the […]

The post Walbouncer refreshed – a proxy for selective PostgreSQL physical replication appeared first on Cybertec - The PostgreSQL Database Company.

Posted by Pavel Stehule on 2016-08-21 at 07:40
I finished the patch for XMLTABLE in PostgreSQL. It is working well. Examples what I found on net works:

postgres=#   SELECT t.id, x.*
FROM employees t,
XMLTABLE ('/Employees/Employee[age>40]'
PASSING t.data
COLUMNS firstname VARCHAR(30) PATH 'firstname',
lastname VARCHAR(30) PATH 'lastname',
age VARCHAR(30) PATH 'age') x
WHERE t.id = 1;
┌────┬───────────┬──────────┬─────┐
│ id │ firstname │ lastname │ age │
╞════╪═══════════╪══════════╪═════╡
│ 1 │ Jim │ Moriarty │ 52 │
│ 1 │ Mycroft │ Holmes │ 41 │
└────┴───────────┴──────────┴─────┘
(2 rows)

Time: 1.619 ms
postgres=# SELECT t.id, x.*
FROM employees t,
XMLTABLE ('/Employees/Employee[@emplid=2222]'
PASSING t.data
COLUMNS firstname VARCHAR(30) PATH 'firstname',
lastname VARCHAR(30) PATH 'lastname') x
WHERE t.id = 1;
┌────┬───────────┬──────────┐
│ id │ firstname │ lastname │
╞════╪═══════════╪══════════╡
│ 1 │ Sherlock │ Homes │
└────┴───────────┴──────────┘
(1 row)

Time: 1.606 ms
postgres=# SELECT emp.id, x.*
FROM employees emp,
XMLTABLE ('/Employees/Employee'
PASSING emp.data
COLUMNS firstname VARCHAR(30) PATH 'firstname',
type VARCHAR(30) PATH '@type') x;
┌────┬───────────┬───────┐
│ id │ firstname │ type │
╞════╪═══════════╪═══════╡
│ 1 │ John │ admin │
│ 1 │ Sherlock │ admin │
│ 1 │ Jim │ user │
│ 1 │ Mycroft │ user │
└────┴───────────┴───────┘
(4 rows)

Time: 1.556 ms

Please, test it, check it.
Posted by Shaun M. Thomas on 2016-08-19 at 18:42

All database engines, even Postgres, occasionally present a seemingly intractable problem that will drive everyone insane while attempting to isolate it. All it takes is the perfect storm of situational circumstances, and even a perfectly running stack of software will grind to a screeching halt. It’s situations like this that we must turn to various Postgres forensic tools to track down the issue before management starts firing people in frustration.

We’ve already discussed how connections can be blocked by improper operation ordering. But what if thing’s aren’t so straight-forward? When there are more than two actors involved, the potential for weird interactions increases to a point where replicating the issue in a clean environment is often difficult or impossible. So let’s make sure a few things are in place.

First things first: logging. We should tweak log settings from the defaults, as they provide a wealth of after-the-fact diagnostic information. All of these should be in postgresql.conf:

log_checkpoints = on
log_statement = ddl
log_min_duration_statement = 1000
log_line_prefix = '%p|%u|%d|%r|%t|'

There are a lot of other log settings, but these are the ones I personally consider essential modifications. Why? Let’s examine them one by one.

  • log_checkpoints: A checkpoint happens when Postgres writes pending modifications to table files. If this setting is enabled, we learn when that process started, how long it took, how much data was written, how much time was spent syncing to disk and thus waiting for the underlying OS to flush the data, and so on. Enabling this can single-handedly track down insufficient write performance of a storage device, or reveal times of heavy write activity that may suggest better transaction log coverage, and so on. It’s a crime this isn’t enabled by default.
  • log_statement: The ddl setting will log any modification to a database object. Create a table? Logged. Modify a view? Logged. Drop an index? Logged. This is the minimal level of activity auditing a DBA should expect.
[...]
Posted by Chris Travers on 2016-08-19 at 05:43
So one of the folks I do work with is moving a large database from PostgreSQL to Hadoop.  The reasons are sound -- volume and velocity are major issues for them, and PostgreSQL is not going away in their data center and in their industry there is a lot more Hadoop usage and tooling than there is PostgreSQL tooling for life science analytics (Hadoop is likely to replace both PostgreSQL and, hopefully, a massive amount of data on NFS).  However this has provided an opportunity to think about big data problems and solutions and their implications.  At the same time I have seen as many people moving from Hadoop to PostgreSQL as the other way around.  No, LedgerSMB will never likely use Hadoop as a backend.  It is definitely not the right solution to any of our problems.

Big data problems tend to fall into three categories, namely managing ever increasing volume of data, managing increasing velocity of data, and dealing with greater variety of data structure.  It's worth noting that these are categories of problems, not specific problems themselves, and the problems within the categories are sufficiently varied that there is no solution for everyone.  Moreover these solutions are hardly without their own significant costs.  All too often I have seen programs like Hadoop pushed as a general solution without attention to these costs and the result is usually something that is overly complex and hard to maintain, may be slow, and doesn't work very well.

So the first point worth noting is that big data solutions are specialist solutions, while relational database solutions for OLTP and analytics are generalist solutions.  Usually those who are smart start with the generalist solutions and move to the specialist solutions unless they know out of the box that the specialist solutions address a specific problem they know they have.  No, Hadoop does not make a great general ETL platform.....

One of the key things to note is that Hadoop is built to solve all three problems simultaneously.  This means that you effectively bu[...]
Posted by Simon Riggs in 2ndQuadrant on 2016-08-18 at 15:30

Postgres-BDR has now reached 1.0 production status.

Over the last 2 years, Postgres-BDR has been used daily for mission critical production systems.

As you might imagine, it’s been improved by both bug fixes and feature enhancements that allow it to be used smoothly, so its mature, robust and feature-rich.

The BDR Project introduced logical replication for PostgreSQL, now available as pglogical. In addition, it introduced replication slots, background workers and many other features. But is it still relevant?

Postgres-BDR delivers all of these features that aren’t yet in PostgreSQL 9.6, and likely won’t all be in PostgreSQL 10.0 either

  • Automatic replication of DDL, reducing maintenance costs for DBAs
  • Automatic replication of sequences
  • Conflict resolution and logging
Posted by Ernst-Georg Schmid on 2016-08-18 at 12:10
Did you know that you can make a Hexastore from a RDF triple in just one line of SQL? (This needs PostgreSQL 9.4 or better, because of the multi-array unnest)

CREATE OR REPLACE FUNCTION hexify(
    IN sub text,
    IN pred text,
    IN obj text)
  RETURNS TABLE(ord text, a text, b text, c text) AS
$$select A.t || B.t || C.t as ord, A.v, B.v, C.v from (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as A(v, t) cross join (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as B(v, t) cross join (select * from unnest(ARRAY[sub, pred, obj],ARRAY['s', 'p', 'o'])) as C(v, t) where a.v != b.v and a.v != c.v and b.v != c.v order by ord desc$$
  LANGUAGE sql IMMUTABLE STRICT
  COST 100
  ROWS 6;

SELECT* FROM hexify('subject','predicate','object');


Sometimes, PostgreSQL SQL is just awesome...

More on Hexastores here and here.

Part six of six in our annual PyCon Must-See Series, a weekly highlight of talks our staff especially loved at PyCon. With so many fantastic talks, it’s hard to know where to start, so here’s our short list.

Coming from a heavy database admin background, I found Craig Kerstiens’s “Postgres Present and Future “to be incredibly well organized and engaging. Of particular interest to me, because I am somewhat new to Postgres (while having more background history with MS SQL), was the deep dive into indexes in Postgres.

Check out 5:44-8:39 to find out when to use different types of indexes, outside of the standard B-Tree. For instance, Gin indexes are helpful when searching multiple values for a single column, ie. an array field or a JSONB field.

Click over to 17:22-19:33 to learn about the new Bloom Filter in Postgres 9.6, which is coming out in a few months. This extension seems like it will be incredibly useful to speed up queries on wide tables with a bunch of different options.


More in the annual PyCon Must-See Talks Series.

PostgreSQL is an awesome project and it evolves at an amazing rate. We’ll focus on evolution of fault tolerance capabilities in PostgreSQL throughout its versions with a series of blog posts. This is the fourth post of the series and we’ll talk about synchronous commit and its effects on fault tolerance and dependability of PostgreSQL.

If you would like to witness the evolution progress from the beginning, please check the first three blog posts of the series below. Each post is independent, so you don’t actually need to read one to understand another.

  1. Evolution of Fault Tolerance in PostgreSQL 
  2. Evolution of Fault Tolerance in PostgreSQL: Replication Phase 
  3. Evolution of Fault Tolerance in PostgreSQL: Time Travel

synchronous

Synchronous Commit

By default, PostgreSQL implements asynchronous replication, where data is streamed out whenever convenient for the server. This can mean data loss in case of failover. It’s possible to ask Postgres to require one (or more) standbys to acknowledge replication of the data prior to commit, this is called synchronous replication (synchronous commit).

With synchronous replication, the replication delay directly affects the elapsed time of transactions on the master. With asynchronous replication, the master may continue at full speed.

Synchronous replication guarantees that data is written to at least two nodes before the user or application is told that a transaction has committed.

The user can select the commit mode of each transaction, so that it is possible to have both synchronous and asynchronous commit transactions running concurrently.

This allows flexible trade-offs between performance and certainty of transaction durability.

Configuring Synchronous Commit

For setting up synchronous replication in Postgres we need to configure synchronous_commit parameter in postgresql.conf.

The parameter specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a success indication to the client. Valid values are onremote_applyremote_w

[...]

I have updated my Rich in the Jungle presentation with new pricing for AWS vs. Softlayer. Things haven't changed much, in terms of raw performance per dollar (which is not the only qualifier) Softlayer is clearly the winner.

During recent years there has been quite a lot of fuzz about “insert-only” approaches and some database-like products (Datomic, Apache Samza) have emerged, being inspired by the idea of having an immutable datastore. In light of cheap storage and powerful hardware, I see the idea definitely having potential for certain use cases. So why not […]

The post Insert-only data modeling with PostgreSQL? appeared first on Cybertec - The PostgreSQL Database Company.

In almost every releases of Pgpool-II, we import the latest version of PostgreSQL's SQL parser (more precisely, the raw parser). This time, our new Pgpool-II developer faced with an interesting problem while importing PostgreSQL 9.6's parser.

In PostgreSQL, the SQL parser is written in bion, a general-purpose parser generator. The particular file including the SQL grammar rules is named "gram.y". gram.y used to include "scan.l", which is a lexical scanner written in flex.

 In reality, gram.y is translated into a C source file by bison, then compiled. Same thing can be said to scan.l, which is translated by flex though.

So the main part of SQL parser  source file was single big file consisted of gram.y and scan.l.

From PostgreSQL 9.6, however, PostgreSQL developers decided to keep gram.y and flex.l separated.

Build backend/parser/scan.l and interfaces/ecpg/preproc/pgc.l standalone.

This gives enough confusion to the developer in charge of the work and took some time before realize the change. I would say it's a fun part of the task when we work on an OSS project:-)  However I cannot stop saying that it would be nice if the SQL parser is exported as a separate library so that we do not need this kind of work in every releases of Pgpool-II.
So for those of you who know, I now spend most of my time doing more general PostgreSQL consulting and a fair bit of time still on LedgerSMB.  One of my major projects lately has been on a large scientific computing platform currently run on PostgreSQL, but due to volume and velocity of data being moved to Hadoop (the client maintains other fairly large PostgreSQL instances with no intention of moving btw).

With this client's permission I have decided to take a lot of the work I have done in optimizing their job queue system and create an extension under PostgreSQL for it..  The job queue currently runs tens of millions of jobs per day (meaning twice that number of write queries, and a fair number of read queries too) and is one of the most heavily optimized parts of the system, so this will be based on a large number of lessons learned on what is a surprisingly hard problem.

It is worth contrasting this to pg_message_queue of which I am also the author.  pg_message_queue is intended as a light-weight, easy to use message queue extension that one can use to plug into other programs to solve common problems where notification and message transfer are the main problems.  This project will be an industrial scale job queuing system aimed at massive concurrency.  As a result simplicity and ease of use take second place to raw power and performance under load.  In other words here I am not afraid to assume the dba and programming teams know what they are doing and has the expertise to read the manual and implement appropriately.

The first version (1.x) will support all supported versions of PostgreSQL and make the following guarantees:


  1. massively multiparallel, non-blocking performance  (we currently use with 600+ connections to PostgreSQL by worker processes.
  2. Partitioning, coalescing, and cancelling of jobs similar in some ways to TheSchwartz
  3. Exponential pushback based on number of times a job has failed
  4. Jobs may be issued again after deletion but that this can always be detected and bad jobs pruned
  5. Optionally jo
[...]
Posted by Shaun M. Thomas on 2016-08-12 at 16:48

“Hey! That row shouldn’t be in that table! How the heck did that get there!? Alright, who wrote the application client filters, because you’re fired!”

Good application developers know never to trust client input, but not all realize that a single app is rarely the only vector into a database. Databases don’t just preserve data with various levels of paranoia, they’re also the central nexus of a constellation of apps, scripts, APIs, GUIs, BMIs, HMOs, and STDs. As such, unless every single one of those share a common ancestor that sanitizes, boils, renders, and formats content before storage, there’s bound to be inconsistencies. That’s just how things work.

One of the major benefits of using an RDBMS, is that engines like Postgres provide several mechanisms for ensuring data integrity beyond crash durability and transaction control. Continuing our discussion on database objects from last week, we vaguely referred to other types of constraint. So, what other mechanisms are available aside from primary keys and unique constraints?

Let’s start with foreign keys, since they illustrate how tables are related and enforce that relation to reject invalid content. Here are two simply related tables and a couple of rows:

CREATE TABLE pet_type
(
  type_id  SERIAL   PRIMARY KEY,
  animal   VARCHAR  UNIQUE NOT NULL
);
 
CREATE TABLE pet
(
  pet_id      SERIAL   PRIMARY KEY,
  type_id     INT      NOT NULL,
  pet_name    VARCHAR  NOT NULL,
  owner_name  VARCHAR  NOT NULL
);
 
ALTER TABLE pet
  ADD CONSTRAINT fk_pet_pet_type FOREIGN KEY (type_id)
      REFERENCES pet_type (type_id);
 
INSERT INTO pet_type (animal) VALUES ('cat'), ('dog');
 
INSERT INTO pet
  (type_id, pet_name, owner_name)
VALUES
  (1, 'Meow Meow Fuzzyface', 'Cedric'),
  (2, 'Mr. Peanutbutter', 'Paul');

Foreign keys provide a buffer between pending modifications by enforcing the relationship. In this case, we can’t remove “dog” as a pet type because at least one pet references it. We also can’t insert a pet fish, because there’s no corresponding type.

T

[...]
Posted by Craig Ringer in 2ndQuadrant on 2016-08-12 at 05:22

I’m pleased to say that we’ve just released Postgres-BDR 1.0, based on PostgreSQL 9.4.9.

This release contains significant improvements to DDL replication locking, global sequences, documentation, performance, and more. It also removes the deprecated UDR component in favour of pglogical.

It’s taken a lot of work to get to this point. This release sets the foundation to port BDR to PostgreSQL 9.6 and to enhance its high-availability capabilities, and I’m excited to be pushing BDR forward.

Posted by gabrielle roth on 2016-08-11 at 23:10

When: 6-8pm Thursday August 18, 2106
Where: iovation
Who: Brian Panulla, Marty Zajac, Gabrielle Roth
What: ETL Throwdown (or up)

For August, we’re having a panel discussion on various ETL tools we’ve tried.
Brian: moving from bulk ETL to near real time ETL with Tungsten Replicator to replicate from MySQL to PostgreSQL and Solr
Marty: Pentaho
Gabrielle: CloverETL, home-grown solution using postgres_fdw, and a brief rant about Informatica if we have time.


If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

iovation provides us a light dinner (usually sandwiches or pizza).

Elevators open at 5:45 and building security closes access to the floor at 6:30.

See you there!


PostgreSQL has a bagful of server configuration parameters (249 according to my counting for version 9.5) at your disposal, which mostly is a good thing as it enables to take the maximum out of your hardware if you’re willing to put in the necessary time. But some of the parameters might leave the door open […]

The post Logging of data modifications and the “log_statement” configuration parameter appeared first on Cybertec - The PostgreSQL Database Company.

Posted by Paul Ramsey in PostGIS on 2016-08-10 at 16:05
Before doing a deep dive into the subject, a short outline about PgBouncer, its a lightweight connection pooler for PostgreSQL that dramatically reduces the processing time and resources for maintaining a large number of client connections to one or more databases. Typically used to increase the number of user connections that can be handled in a high performance environment. For more details on Installing/Configuring PgBouncer refer to the documentation here.
Like other tools, PgBouncer has a  stderr/syslog logging architecture to record connection, disconnection, and  pooler_errors with different verbosity levels. As of now, the greater part of logging go to one single file "pgbouncer.log" and grows endlessly. Sometimes, it might be a potential risk of making a system unresponsive due to lack of disk space on the log file location. At present, PgBouncer logging has no in-built configuration to rotate logs on the basis of age or size, hence it forces users to choose alternative methods. IMO, there are two approaches to handle it :-
  1. Configure PgBouncer in "syslog" method to rely on OS log rotation or
  2. Configure log rotation using OS utilities on "pgbouncer.log" file.

Method 1:

Its pretty straightforward to configure syslog in PgBouncer, set "syslog" to 1 (default 0); give a name to begin the log line in OS logs in "syslog_ident" (default 'pgbouncer') and specify the facility details in "syslog_facility" (default daemon). A sample output from my OS logs(/var/log/messages):
Aug 5 16:54:27 raghavt pgbouncer[62549]: C-0x1cdfe60: postgres/postgres@unix(62621):6432 login attempt: db=postgres user=postgres tls=no
Aug 5 16:54:27 raghavt pgbouncer[62549]: S-0x1ce4b10: postgres/postgres@127.0.0.1:5432 new connection to server (from 127.0.0.1:38947)
Aug 5 16:54:27 raghavt pgbouncer[62549]: C-0x1cdfe60: postgres/postgres@unix(62621):6432 closing because: client close request (age=0)
Note: If "syslog" enabled, comment or blank out the "logfile" parameter, else it will be additional logging. 

Method 2:

Logrotate is
[...]
Posted by Tomas Vondra in 2ndQuadrant on 2016-08-08 at 15:05

I’ve published multiple benchmarks comparing different PostgreSQL versions, as for example the performance archaeology talk (evaluating PostgreSQL 7.4 up to 9.4), and all those benchmark assumed fixed environment (hardware, kernel, …). Which is fine in many cases (e.g. when evaluating performance impact of a patch), but on production those things do change over time – you get hardware upgrades and from time to time you get an update with a new kernel version.

For hardware upgrades (better storage, more RAM, faster CPUs, …), the impact is usually fairly easy to predict, and moreover people generally realize they need to assess the impact by analyzing the bottlenecks on production and perhaps even testing the new hardware first.

But for what about kernel updates? Sadly we usually don’t do much benchmarking in this area. The assumption is mostly that new kernels are better than older ones (faster, more efficient, scale to more CPU cores). But is it really true? And how big is the difference? For example what if you upgrade a kernel from 3.0 to 4.7 – will that affect the performance, and if yes, will the performance improve or not?

From time to time we get reports about serious regressions with a particular kernel version, or sudden improvement between kernel versions. So clearly, kernel versions may affects performance.

I’m aware of a single PostgreSQL benchmark comparing different kernel versions, made in 2014 by Sergey Konoplev in response to recommendations to avoid 3.0 – 3.8 kernels. But that benchmark is fairly old (the last kernel version available ~18 months ago was 3.13, while nowadays we have 3.19 and 4.6), so I’ve decided to run some benchmarks with current kernels (and PostgreSQL 9.6beta1).

PostgreSQL vs. kernel versions

But first, let me discuss some significant differences between policies governing commits in the two projects. In PostgreSQL we have the concept of major and minor versions – major versions (e.g. 9.5) are released roughly once a year, and include various new features. Minor versions

[...]
It's possible to use EXPLAIN output in plpgsql. In the default text format, the result comes back as a set of text values, so you can process them in a loop like this:

declare
   exp text;
begin
   for exp in explain myquery
   loop
      raise notice '%', exp;
   end loop;
end;


If you use json format output, however, the result comes back as a single json document rather than a set of lines. You still might need a loop - I haven't found another way yet of getting the output from EXPLAIN into a variable - but the loop will only have one iteration. Here is an example taken from a function I wrote the other day that lets you get the estimated number of rows from the plan:


declare
   exp json;
begin
   for exp in explain (format json) myquery
   loop
      raise notice 'rows: %', exp#>>'{0,Plan,Plan Rows}';
   end loop;
end;

Sometimes it happens that a query slows down dramatically for no obvious reason. In other cases queries go south in case a certain set of input parameters is used. In some of those situations wrong optimizer estimates can be the root cause of all evil. One of my “favorite” issues with wrong planner estimates: Underestimated […]

The post Exploding runtime: How nested loops can destroy speed appeared first on Cybertec - The PostgreSQL Database Company.

This is a technical post !

What if you want to search %asd% pattern and do it fast ? Wildspeed extension provides GIN index support for wildcard search for LIKE operator.


Authors

Oleg Bartunov , Moscow, Moscow University, Russia
Teodor Sigaev , Moscow, Moscow University,Russia

License

Stable version, included into PostgreSQL distribution, released under BSD license. Development version, available from this site, released under the GNU General Public License, version 2 (June 1991)

Downloads

Latest version of wildspeed is available from
git clone git://sigaev.ru/wildspeed

Wildspeed provides opclass (wildcard_ops) and uses partial match feature of GIN, available since 8.4. Also, it supports full index scan.

The size of index can be very big, since it contains entries for all permutations of the original word, see [1] for details. For example, word hello will be indexed as well as its all permutations:

=# select permute('hello');
               permute                
--------------------------------------
 {hello$,ello$h,llo$he,lo$hel,o$hell}


Notice, symbol '$' is used only for visualization, in actual implementation null-symbol '\0' is used.

Search query rewritten as prefix search:

*X  -> X$*
X*Y -> Y$X*
*X* -> X*


For example, search for 'hel*o' will be rewritten as 'o$hel'.

Special function permute(TEXT), which returns all permutations of argument, provided for test purposes.

Performance of wildspeed depends on search pattern. Basically, wildspeed is less effective than btree index with text_pattern_ops for prefix search (the difference is greatly reduced for long prefixes) and much faster for wildcard search.

Wildspeed by default uses optimization (skip short patterns if there are long one), which can be turned off in Makefile by removing define -DOPTIMIZE_WILDCARD_QUERY.

References:

1. http://www.cs.wright.edu/~tkprasad/courses/cs499/L05TolerantIR.ppt, see also, http://nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html


Examples:

Table words contains 747358 records, w[...]
Posted by Pavel Stehule on 2016-08-07 at 09:32
The parsing of XML is not simple in Postgres. For more complex data I prefer external procedures in PLPerlu or PLPythonu based on usage of Xmlreader. I hope so this this time will be history. With ANSI SQL XMLTABLE function a transformation of any XML document to table is simple and fast:

postgres=# SELECT * FROM xmldata;
┌──────────────────────────────────────────────────────────────────┐
│ data │
╞══════════════════════════════════════════════════════════════════╡
│ <ROWS> ↵│
│ <ROW id="1"> ↵│
│ <COUNTRY_ID>AU</COUNTRY_ID> ↵│
│ <COUNTRY_NAME>Australia</COUNTRY_NAME> ↵│
│ <REGION_ID>3</REGION_ID> ↵│
│ </ROW> ↵│
│ <ROW id="2"> ↵│
│ <COUNTRY_ID>CN</COUNTRY_ID> ↵│
│ <COUNTRY_NAME>China</COUNTRY_NAME> ↵│
│ <REGION_ID>3</REGION_ID> ↵│
│ </ROW> ↵│
│ <ROW id="3"> ↵│
│ <COUNTRY_ID>HK</COUNTRY_ID> ↵│
│ <COUNTRY_NAME>HongKong</COUNTRY_NAME> ↵│
│ <REGION_ID>3</REGION_ID> ↵│
│ </ROW> ↵│
│ <ROW id="4"> ↵│
│ <COUNTRY_ID>IN</COUNTRY_ID> ↵│
│ <COUNTRY_NAME>India</COUNTRY_NAME> ↵│
│ <REGION_ID>3</REGION_ID> ↵│
│ </ROW> ↵│
│ <ROW id="5"> ↵│
│ <COUNTRY_ID>JP</COUN
[...]
Posted by Shaun M. Thomas on 2016-08-05 at 17:38

Occasionally with a lot of similar-sounding terminology, there’s ample room for misunderstandings. This is especially relevant with overloaded terms like ‘index’, which can be anything from a data lookup to a term used in mutual funds. This is further complicated if a developer’s first experience with databases is with another variant with sufficiently divergent syntax, methodology, or assumptions. To prevent future code refactors born of misunderstandings, let’s build a basic Postgres glossary to act as an executive summary for app devs.

Let’s consider this a continuation of our exploration of databases and schemas. We need a solid glossary of terms centered around frequently conflated concepts.

Tables

At its core, a Postgres table is just a container for columns of various types. Each record in a table is stored as a row of one or more columns. We can see that easily enough here:

CREATE TABLE animal_sound
(
  animal  VARCHAR,
  sound   VARCHAR
);
 
INSERT INTO animal_sound (animal, sound) VALUES ('Cow', 'Moo');
INSERT INTO animal_sound (animal, sound) VALUES ('Cat', 'Meow');
 
SELECT * FROM animal_sound;
 
 animal | sound 
--------+-------
 Cow    | Moo
 Cat    | Meow

Nothing groundbreaking here, right? Even the most rank database newbie should know this much. It’s when we start adding elements on top that things start to go sideways. Let’s take this one step at a time to prevent that from happening.

Sequences

The next thing a developer will probably want is an auto-incrementing field of some kind. MySQL does this with their AUTO_INCREMENT decorator. MS SQL uses a function called IDENTITY that needs parameters to determine operation. Oracle requires something called a SEQUENCE that must either be used explicitly or tied to the table with a TRIGGER. Postgres uses sequences too, but they tend to be hidden behind the SERIAL data type.

DROP TABLE animal_sound;
 
CREATE TABLE animal_sound
(
  id      SERIAL,
  animal  VARCHAR,
  sound   VARCHAR
);
 
INSERT INTO animal_sound (animal, sound) VALUES ('Co
[...]
Which the discussion of CVE-2016-1238, a quick and easy fix for broken code that has been suggested is to add the following line to the top of broken Perl scripts:  Note this applies to Perl as run anywhere, whether pl/perlU, plain perl, or something else.

use lib '.';

In some corners, this has become the goto solution for the problem (pun quite deliberate).  It works, gets the job done, and introduces subtle, hidden, and extremely dangerous problems in the process.

For those interested in the concerns specific to PostgreSQL, these will be discussed near the end of this article.

Security and the Garden Path


I am borrowing an idea here from linguistics, the idea of the garden path, as something that I think highlights a lot of subtle security problems.  Consider the newspaper headline "Number of Lothian patients made ill by drinking rockets."  Starts off simple enough and you get to the end, realizing you must have misread it (and you did, probably, since the number of patients increased who were made ill by drinking, not that some people got sick because they drank hydrazine).  The obvious reading and the logical reading diverge and this leads to a lot of fun in linguistic circles.

The same basic problem occurs with regard to security problems.  Usually security problems occur because of two problems. Either people do something obviously insecure (plain text authentication for ftp users where it matters) or they do something that looks on the surface like it is secure but behind the scenes does something unexpected.

Perl here has a few surprises here because parsing and running a script is a multi-pass process but we tend to read it as a single pass. Normally these don't cause real problems but in certain cases there are very subtle dangers lurking.  Here, with use lib '.', it is possible to inject code into a running program as long as an attacker can get a file placed in the current working directory of the program.

Relative paths, including the current working directory, do have legitimate use cases, bu[...]
Posted by Simon Riggs in 2ndQuadrant on 2016-08-05 at 06:30

In a recent blog, I described features for PostgreSQL Core that we’ve been working on
http://blog.2ndquadrant.com/postgresql-10-roadmap/

Many people have asked for a similar roadmap for BDR and Postgres-XL. I can confirm that both are under active development and in active use.

Postgres-XL (link)

  • XL 9.5 v1.2 is now available, with more updates coming.
  • XL 9.6 has begun development work, together with active consideration of how to merge parts of that back into Postgres Core and/or minimize the overall set of changes.

BDR (link)

  • BDR 9.4 v0.9.3 is current version. We’re continuing to work on BDR 9.4 and will be publishing v1.0 sometime soon
  • BDR 9.6 is the next objective.
  • We’re also working to merge Logical Replication into PostgreSQL 10 and hopefully the rest of BDR functionality into PostgreSQL 10 or 11.

More details on the contents of those releases will be available separately.

It happens on a quite regular basis that people contact the Cybertec PostgreSQL Support Desk to get  support for a pretty common thing: Paginators on websites. What is the main issue? How hard can be it to display simple tables after all? A typical scenario with paginators Suppose you want to display a table on […]

The post paginators – The story about voluntary pain appeared first on Cybertec - The PostgreSQL Database Company.