DevHeads.net

big database resulting in small dump

I have a 8.4 database (installed on ubuntu 10.04 x86_64). It holds Zabbix
database. The database on disk takes 10Gb. SQL dump takes only 2Gb. I've
gone through
<a href="http://archives.postgresql.org/pgsql-general/2008-08/msg00316.php" title="http://archives.postgresql.org/pgsql-general/2008-08/msg00316.php">http://archives.postgresql.org/pgsql-general/2008-08/msg00316.php</a> and got
some hints. Naturally, the biggest table is history (the second biggest is
history_uint. Together they make about 95% of total size). I've tried to
perform CLUSTER on it, but seemed to be taking forever (3 hours and still
not completed). So I cancelled it and went with database drop and restore.
It resulted in database taking up 6.4Gb instead of 10Gb. This is a good
improvement, but still isn't quite what I expect. I would appreciate some
clarification.

Comments

Re: big database resulting in small dump

By Craig Ringer at 07/20/2012 - 21:41

On 07/21/2012 02:05 AM, Ilya Ivanov wrote:
To elaborate on the answers already posted:

Plain text dumps only contain the data its self. In many databases the
table contents are a small part of the overall database size.
Additionally, data is stored on disk in a structure optimised for speed
of access, not disk space consumption, so the same data can be much more
compact in the dump format.

Finally, data in dumps is much, much more efficiently compressed than it
is in the tables. In tables the main rows aren't compressed at all, and
TOASTed values like big text fields are individually compressed, which
is immensely less space efficient than compressing them all together. On
the other hand, it allows random access where the dump format just
doesn't - kind of important for a database!

The rest of the space is used by:

- Indexes, which can get quite big.

- Free space in tables from deleted rows that haven't yet been replaced
by a new inserted row.
If you have a non-default FILLFACTOR there can be lots of this.

- "bloat" - wasted space in tables and indexes, typically caused by
insufficiently frequent autovacuum

- ... probably more I've forgotten

When you dump and reload you not only get rid of any bloat in your
tables and indexes, but you effectively REINDEX your database.
PostgreSQL can often create much more compact and efficient index
structures when it does a CREATE INDEX on a full table (like when
restoring a dump) than when it does a CREATE INDEX on an empty table
followed by lots of inserts.

Re: big database resulting in small dump

By Lonni J Friedman at 07/20/2012 - 14:09

On Fri, Jul 20, 2012 at 11:05 AM, Ilya Ivanov < ... at ngs dot ru> wrote:
Its not entirely clear what behavior you expect here. Assuming that
you're referring to running pg_dump, then you should just about never
expect the size of the resulting dump to be equal to the amount of
disk space the database server files consume on disk. For example,
when I pg_dump a database that consumes about 290GB of disk, the
resulting dump is about 1.3GB. This is normal & expected behavior.

Re: big database resulting in small dump

By Tom Lane at 07/20/2012 - 14:23

Lonni J Friedman < ... at gmail dot com> writes:
The fine manual says someplace that databases are commonly about 5X the
size of a plain-text dump, which is right in line with Ilya's results.
Lonni's DB sounds a bit bloated :-(, though maybe he's got an atypically
large set of indexes.

regards, tom lane

Re: big database resulting in small dump

By Lonni J Friedman at 07/20/2012 - 14:26

On Fri, Jul 20, 2012 at 11:23 AM, Tom Lane < ... at sss dot pgh.pa.us> wrote:
I do have a lot of indices. Also, I'm using a lot of partitions, so
there are a relatively large number of tables.

Re: big database resulting in small dump

By Ilya Ivanov at 07/20/2012 - 14:37

well, it'd be good to have a link to the resource that says about 5x ratio,
but in general I'm satisfied with that explanation. Thank you.

Re: big database resulting in small dump

By Tom Lane at 07/20/2012 - 14:46

Ilya Ivanov < ... at ngs dot ru> writes:
[ digs around ... ] It's at the bottom of this page:
<a href="http://www.postgresql.org/docs/9.1/static/install-requirements.html" title="http://www.postgresql.org/docs/9.1/static/install-requirements.html">http://www.postgresql.org/docs/9.1/static/install-requirements.html</a>

which I will grant is maybe not the best place for it anymore, since
relatively few people do their own builds from source these days.

regards, tom lane