~
I have been searching for a PostgreSQL-derived project with a
"less-is-best" Philosophy. Even though I have read about quite a bit
of PG forks out there, what I have in mind is more like a baseline
than a fork.
~
My intention is not wrapping the same thing in a different package or
code add-ons/value-added features on top of PG, but ridding PG of
quite a bit of its internal capabilities and just use its very
baseline.
~
All I would need PG for is raw data warehousing, memory,
I/O-subsystem management, MVCC/transaction management ... No fanciness
whatsoever. What do you need to, say, format dates in the database if
formatting/pretty-printing and internalization can be taken care more
appropriately in the calling environment say Python or Java? All is
needed is to store a long representing the date. Why are arrays needed
in a the DB proper when serialization and marshaling/casting can be
taken care of in the calling environment. If you are using say, java,
all you need PG to do is to faithfully store a sequence of bytes and
you would do the (de)serialization very naturally indeed.
~
There used to be a postgresql-base-<version> package with the bare
minimum of source code to build and run PostgreSQL which I think would
be a good starting point, but I don't find it in the mirror sites
anymore
~
<a href="http://wwwmaster.postgresql.org/download/mirrors-ftp" title="http://wwwmaster.postgresql.org/download/mirrors-ftp">http://wwwmaster.postgresql.org/download/mirrors-ftp</a>
~
Where can I find it?
~
I know the result will not be a SQL-compliant DBMS anymore, yet I
wonder how much faster would SQL+client code doing such things as
formatting "on-the-fly" work.
~
Do you know of such tests even in a regular PG installation?
~
Do you see any usefulness in such a project?
~
Do you know of such a project? Anyone interested? Any suggestions to
someone embarking in it?
~
It would be great if PG developers see any good in it and do it themselves ;-)
~
lbrtchx
Comments
Re: (another ;-)) PostgreSQL-derived project ...
By Uwe Schroeder at 09/25/2011 - 00:43Maybe you let us in a little more on what you're trying to accomplish. What it
looks like to me right now is that you're looking for a non-sql compliant SQL
database where a lot of the data integrity is actually coded in the
application :-)
I bet there are database systems out there that do exactly or nearly what you
want. Maybe an object oriented one may suit you better than a relational
system?
For me, I sure don't use all that postgresql has to offer, but I like that it
does a lot of things for me and I code most of what my application does inside
the database using views, stored procedures and triggers. That approach strips
down on application complexity. My apps don't have to do any post-processing
of the data - I query the records I need and the app merely displays them.
Yes, sometimes I wish postgresql was more "high performance" - but then, I
drive an old, paid for, practical car and not a formula one racer without a
boot or spare tire. My point being: postgresql does what it does very reliably
and although not the best performer on the market, it is a database I would
trust my payroll with - and there are few where I'd make that statement.
Never had any data loss ever, never had it crash on me. Give good hardware to
postgresql and you will get good performance with exceptional stability and
integrity.
Uwe
Re: (another ;-)) PostgreSQL-derived project ...
By Albretch Mueller at 09/25/2011 - 02:11~
Well, at least I thought you would tell me where the postgresql-base
is to be found. The last version I found is:
~
<a href="http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2" title="http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base-8.3beta2.tar.bz2">http://freebsd.csie.nctu.edu.tw/pub/distfiles/postgresql/postgresql-base...</a>
~
and I wondered what that is and why there are no postgresql-base
after "8.3beta2"
~
Re: (another ;-)) PostgreSQL-derived project ...
By Chris Travers at 09/25/2011 - 15:25My own experience here is that while it is generally possible to
create additional overhead by mis-use of advanced features, *in
general* you save more overhead and get clearer code by pushing what
you can into the database within reason.
I can give you a good example. Some years ago, I was working on an
accounting application someone else wrote which stored all monetary
values as double-precision floats and then handled arbitrary precision
math in the front-end of the application. This meant:
1) To detect if an invoice was closed, it would retrieve all gl lines
associated with the invoice and an AR/AP account and see if these
totalled to 0 in the middleware. This performed ok for a small
database, but for a large one, it didn't work so well....... Had the
application used NUMERIC types, this could have been more easily done
with HAVING clause, and this could have been done far more efficiently
on the db server.
2) It made the application relatively sensitive to rounding errors---
sum() with group by would return different numbers with different
groupings in sufficiently large databases.
So here you get a case where the application was made less robust and
performed quite a bit worse by not using arbitrary math capabilities
of PostgreSQL.
In general my experience is that it is far easier to tune performance
of an app as is described here (where all presentation is done in db)
than it is an app where a lot of it is done in middle-ware or
front-end.
For example, consider the following: I need to determine all of the
years that have dates in a database table with, say, 50M records. If
I have a database query which does this all at once, when it performs
badly, I can tune it, and there are fewer tradeoffs I have to make.
I'd add it performs remarkably well IMHO as well as reliably.
Best Wishes,
Chris Travers
Re: (another ;-)) PostgreSQL-derived project ...
By Albretch Mueller at 09/25/2011 - 17:41On 9/25/11, David Johnston < ... at yahoo dot com> wrote:
Re: (another ;-)) PostgreSQL-derived project ...
By Uwe Schroeder at 09/25/2011 - 22:48Well, politicians and Microsoft, Oracle etc. :-)
So you're keeping a lot in memory, which to me suggests plenty of hardware is
available. One of my current apps chews up 8Gb of memory just for the app and
I can't afford to get a 64Gb or more server. If I wanted to keep permanently
accessed data in memory, I'd need somewhere around 1/2 a terrabyte of memory -
so obviously not an option (or maybe really bad database design :-) )
That said, just considering the cost/effort it takes to strip Postgresql down,
why don't you go with a server that has 1TB of solid state discs? That strips
down the I/O bottleneck considerably without any effort.
In my experience "data formatting" goes both ways, in and out. Out is
obviously not a major issue because errors don't cause data corruption. In,
however, is a different issue. Errors in "inwards" conversion will cause data
corruption. So unless you have an application that does very little "in" and a
lot of "out", you still have to code a lot of data conversion which otherwise
someone else (the postgresql developers) have already done for you.
Maybe it does. I never coded Java because I don't like to use technology where
Oracle can come sue me :-) I do know however that a lot of languages have
quirks with dates and internationalization (python you mentioned earlier being
one of them)
Yes, a long value - which can represent pretty much any valid and invalid date
ever devised, so again you don't really know what's in the database when you
leave the validation to the application.
Which still depends on your use case. Your assumption is that every piece of
code is coded in Java - which is fine if that's what your application calls
for. It's going to be a major hassle when you ever have to re-code in a
different language though.
I agree to disagree on this one. The date value the database stores in this
case is a long. Any "long" can be converted into a valid date - but is it
really the date that was entered in the first place? If I give a date
representation, i.e. 12/31/2010 to the database, I personally don't really
care how the database stores the date underneath. All that interests me is
that the next time I ask for that field I get 12/31/2010 back. There is no
error that can be made other than user error if you ask the database to store
a specific date representation. There are errors you can make in your own
conversion code which can lead to a different "long" stored than intended. So
again data integrity is at least partially in the application and not the
database.
With the right design, you will have to rewrite the visual layer, not the
application logic. Errors in the visual layer are of little consequence
(except disgruntled users). So yes, if you use some kind of middleware that
does all the converting and validating for you, the difference is negligible.
But then, why write your own when the database already provides that
functionality?
This one I can agree on. My background is government and financial industry and
neither use a system with client side validation - at least not that I have
seen. Actually I've seen military systems which handle 98% of the
"application" inside the database and just import/export text files without any
GUI whatsoever.
As I've said earlier, it all depends on what you're trying to do, what the
requirements are and how sensitive the data is. In my way of thinking - being
a bit paranoid as it is - I rather have nothing in memory and everything on at
least 10 different computer systems spread out over the planet, just so that
one of the systems survives the next asteroid impact with all my data intact
:-) (i.e. one of my webservers doesn't even have the session id in memory -
everything is in postgresql and replicated to different servers. Given, a big
I/O hog because there's multiple records for every page ... but any cleaning
woman can pull the plug and nothing will happen because the hot standbys
someplace else will simply take over - people don't even have to log in again,
the session is still valid on the standby)
Re: (another ;-)) PostgreSQL-derived project ...
By John R Pierce at 09/26/2011 - 01:16On 09/25/11 7:48 PM, Uwe Schroeder wrote:
its the old hammer and nail thing [1]. a pure Java programmer wants to
see everything in Java as its the tool he knows.
[1] - If your only tool is a hammer, the whole world looks like a nail.
Re: (another ;-)) PostgreSQL-derived project ...
By Karsten Hilbert at 09/25/2011 - 18:04Quite obviously you have got no clue and didn't bother
checking either.
Karsten
Re: (another ;-)) PostgreSQL-derived project ...
By Albretch Mueller at 09/26/2011 - 01:25On 9/25/11, Karsten Hilbert <Karsten. ... at gmx dot net> wrote:
Re: (another ;-)) PostgreSQL-derived project ...
By Albretch Mueller at 09/26/2011 - 01:55On 9/26/11, Uwe Schroeder < ... at oss4u dot com> wrote:
Re: (another ;-)) PostgreSQL-derived project ...
By Uwe Schroeder at 09/25/2011 - 14:57Take dates for example: you'd have to code very carefully to catch all the
different ways dates are represented on this planet. Your application has to
handle this since all the database knows at this point is an absolute time
(i.e. seconds since epoch or something the like) and your app has to convert
every occurrence of a date to this format or the database won't find anything
or even worse store something wrong.
Same goes for numbers: if everything is stored in a byte sequence, how does
the database know you try to compare the number 1 with the character "1"?
Again, your app would have to handle that.
To me, that's moving data integrity into the application.
Yes, I have, as have many others. Simple example: program a website like, say
Facebook. So you have thousands of users from all over the world. Your website
code handles all the data conversions. Now Apple comes along and sells an
iPhone which silly enough a lot of people like and try to use to access your
website. You now face the problem that you need a second website doing the
same thing as the first website except solely made for touch-screen devices.
You will be forced to rewrite a lot of your code because all the data
conversion is in the code. Even worse, if you'd have to make an iphone or
android app in lieu of the second website, you'd have to recode everything you
did in a different language - i.e. objective C.
If you leave these things to the database, you "simply" write a second client
for a different platform and you don't have to fuzz around to get the
conversions correct because the application receives the data already
converted.
Sure this all depends on what application you need this specialized database
engine for. If it's an application for a very defined environment you can
dictate how data is to be input and train users. If it's an application for
the big wild world you will have problems with users doing stupid things
beyond your control like writing "P.O. Box 1" into the zipcode field where you
expected a 10 digit number. I rather have the database catch those cases and
reject storing the bad input. That saves me a lot of validation code in my
app.
Re: (another ;-)) PostgreSQL-derived project ...
By Tom Lane at 09/25/2011 - 12:25Albretch Mueller < ... at gmail dot com> writes:
We stopped bothering because the split tarballs weren't really good for
anything separately. They were never independently buildable pieces,
and were only meant to ease downloading the distribution over unreliable
internet connections. That concern was obsolete some years ago. The
only part of the PG distribution that's ever been meant to be separately
buildable is libpq and some of the client-side tools. If you want to
start stripping down the server, you're on your own.
Now, having said that, there has been some interest in pushing
lesser-used chunks like the geometric datatypes out into extensions.
I don't see how that's going to result in any significant performance
gain, though.
regards, tom lane
Re: (another ;-)) PostgreSQL-derived project ...
By Martijn van Oos... at 09/25/2011 - 11:52On Sun, Sep 25, 2011 at 06:11:36AM +0000, Albretch Mueller wrote:
Notwithstanding the rest of your post, I'm surpised you missed the
website:
<a href="http://www.postgresql.org/download/" title="http://www.postgresql.org/download/">http://www.postgresql.org/download/</a>
There's a source code link, as well as several others.
Have a nice day,
Re: (another ;-)) PostgreSQL-derived project ...
By Scott Ribe at 09/25/2011 - 11:22What on earth makes you think the db engine compares numbers as strings???
Re: (another ;-)) PostgreSQL-derived project ...
By Alban Hertroys at 09/25/2011 - 10:14Data types aren't stored in the database as character strings (unless you define your columns as text, of course).
When data in the database gets compared to data in a query (for example, when you use a WHERE clause that compares a date column to a given date), the data in the query gets transformed to the appropriate type (text to date, in this case) - just once. That's efficient enough that the difference in performance between a numerical value and the string representation doesn't matter.
I don't know what you're trying to say in the above, but you seem to base your hypothesis on wrong assumptions.
Alban Hertroys
Re: (another ;-)) PostgreSQL-derived project ...
By David Johnston at 09/25/2011 - 09:21No; not worth my effort.
The ARRAY_AGG() function in particular has been very useful in queries I write.
Your whole post implies this otherwise there is no meaningful reason to look for something excluding features (assuming proper and correct implementation).
Is this the best response you can come up with? The crux of the counter-argument is that by having PostgreSQL handle 'advanced' features application code avoids the need to do so. The principle of code-reuse and the fact the features are executed by the same program holding the data make this a de-facto truth (and yes, one that we are probably taking for granted). But, if you really feel a bare-bones implementation of PostgreSQL is worthwhile you are the one that needs to test (and state explicitly) your own underlying assumptions to see whether they hold and thus make such an endeavor worthwhile.
David J.
Re: (another ;-)) PostgreSQL-derived project ...
By Darren Duncan at 09/25/2011 - 00:11Based on your description, I suggest you might want to look at SQLite. It
provides a number of compile-time options where you can exclude various features
you don't want from the binary, when simply ignoring the extra features isn't
good enough. -- Darren Duncan
Albretch Mueller wrote:
Re: (another ;-)) PostgreSQL-derived project ...
By David Johnston at 09/24/2011 - 23:39On Sep 24, 2011, at 22:54, Albretch Mueller < ... at gmail dot com> wrote:
I can't tell if you mean this as a humorous post of if you just having something in your eye ;-)
I cannot imagine you would benefit that much by removing these capabilities compared to simply ignoring them.
As a developer I still have to deal with dates and arrays so while PostgreSQL could do less the work still has to be done. I happier having a group of programmers more skilled than myself doing it instead of me. Plus, by having it in the DB I avoid considerable considerable overhead and can now use those features within my SQL statements/queries.
I can see where adding complexity could weigh into a do/ignore decision but once it's in and tested why would you want to remove a feature? If it had serious performance implications maybe, but even then arguing for a runtime enable/disable flag would be best so everyone could decide based upon their unique circumstances.
Not a project developer but unless and until you can identify meaningful areas of performance degradation you are simply guessing that in simply being feature rich PostgreSQL has sub-optimal performance; but if most/all of the overhead is in areas that you deem critical/core then nothing you would do would have a meaningful impact; and improving the core areas would improve not only your own situation but the core project as well.
"Premature optimization is the root of all evil." (someone not me)
David J.
Re: (another ;-)) PostgreSQL-derived project ...
By Mike Christensen at 09/24/2011 - 23:32Doesn't Yahoo! have some super modified mega-high-performant version
of Postgres? Last I heard (which was like 2008) they were planning on
putting it online. I think it involved a columnar oriented table
format or something. Did this ever happen?