Ok, so since someone has been singing the praises of MongoDB, and others
have been mentioned, I figured I'd provide a contrarian view and see if you
can convince me otherwise.
I'm a big fan of relational databases. Have been using them since I
graduated from college in 1993, starting with DB2, followed with MySQL[and
boy was THAT interesting. DB2 was always like 2 years behind all the neat
features in other relational databases. Then I went to MySQL and not only
did it lack those features, it lacked a lot of what solid, dependable DB2
had! And it was on purpose! They deliberately choose to keep MySQL lean
and mean and avoid things like foreign keys, stored procedures, and such.]
My experience is that almost any application can be broken up and thought of
as tables. Especially in the business world, people naturally think in
terms of spreadsheets since the spreadsheet is king there. And a
spreadsheet is nothing but a table.
And by putting everything in well documented[ha ha!] tables with consistent
column and table naming schemes, even power users can use query tools such
as Navicat to build their own queries and reports easily. So by keeping
everything in a well understood industry standard format, we lower the skill
level needed to access and create reports on the underlying data - always a
good thing since I personally hate it when someone asks me to create a
report on sales from last year "just like this other one except we need to
include wholesale prices", There is no challenge there, no fun. Just pure
grunt work.
So all this talk of moving away from SQL makes me nervous. Will cluefull
users still be able to envision the data so they can pull reports. Heck,
are there even the user friendly point and click tools for them to do
so?[Personally I never use the query builder in Navicat and find it tedious,
but I know plenty of power users who CAN do that].
To me, it looks like migrating to this new method of storing data will end
up "locking" the business data up in a format that raises the cost to access
the data. It reminds me of the way Magentoo is designed, with those oh so
cool tables for storing field values without creating new table fields.
Sure, it may make it easier to expand/change the system, but having to do
multiple joins to the same dang table to get different pieces of data makes
the data harder to get to for non programmers!
My feeling on business data is that business data belongs TO the business
creating it. Not to some programmer who is the only one who can access
it[or worse, to some company that stores it in a proprietary format and
won't allow the data to be exported!] - so at the moment, I'm not seeing
that sort of access for data in MongoDB. Command line pseodo queries is
not enough, I want to know the data is easy to get out for a power user -
not me.
--
----
Hudson Valley Sudbury School
What GPL is for application users
Our school is for students
Help your children grow, change, and learn
Let your child direct, control, amend
Check out http://www.sudburyschool.org
http://www.nyphp.org/Show-Participation
I kind of think they are different tools for different jobs. NoSQL has
risen out of the need for massively scalable databases. For the vast
majority of apps, using one is overkill, and probably leads to the
kind of messy data access and reporting scenarios you envision.
In other words, that "How do I query it?" "You write a map reduce
function in Erlang." cartoon is right on the money.
http://www.nyphp.org/Show-Participation
You may enjoy playing with Core Data using ObjC on an iPhone. Kind of the
next evolution I think.
I stick to SQLite these days. It guides the cruise missile and probably your
watch. I guess after 20 years playing with DBs (and still learning
something new everyday), going back to my roots now. ;-)
http://cocoadevcentral.com/articles/000085.php
http://www.sqlite.org/
:-)
--
IM/iChat: ejpusa
Links: http://del.icio.us/ejpusa
Follow me: http://www.twitter.com/ejpusa
Karma: http://www.coderswithconscience.com
http://www.nyphp.org/Show-Participation
DISCLAIMER: I'm a certified Oracle8 DBA, so don't think I'm some young
web developer who doesn't know data modeling or relational theory.
Here's my passionate plea ;-)
For CMS/WCM purposes, there's nothing better. I know of a site that is
in the millions of views daily that has nothing but MongoDB on the
backend, and not only is performance great but so is ease of code
maintenance. The performance benefits have been so significant that
they implemented real-time analytics as well - and they don't cache
anything coming in from MongoDB, as there's no need.
For commerce systems, you still have separate "tables" for users,
products and orders. What you don't have is a third normal form schema
with 45,000 tables and a massive performance and maintenance headache.
I've launched a commerce site myself very recently that sustains 900
requests per second. Absolutely sure this would not have worked as
well with a relational engine under the hood.
I'd go as far as to say that the next-generation document databases
are doing a lot to challenge the norms. Remember when foreign keys
were considered essential for data integrity? Remember when a database
was only considered ready when it was fully ACID and ANSI SQL
compliant?
Think again. Part of me hates this transition, because I've spent the
past 20 years really mastering the horse and buggy; but these jet
planes sure are fast, and they do change the game. Just don't try to
ride one like a horse!
-- Mitch
http://www.nyphp.org/Show-Participation
This might be a bit of a side topic, but does anyone have any good resources
to read about this noSQL stuff? I've looked VERY briefly at MongoDB, but
didn't think it was anything that amazingly fantastical that everyone seems
to be talking about. But if it is the next generation, I'd like to be on
board.
Thanks,
Brian
--
Brian O'Connor
http://www.nyphp.org/Show-Participation
Maybe http://nosql-databases.org/ :-)
You also might want to see if there's a NOSQL meeting taking place
nearby (or a talk featuring Mongo, Hadoop, Tokyo Cabinet or Couch).
What I like is that a lot of these databases use REST interfaces and some
of them use JSON (which scales really well).
--
Aj.
http://www.nyphp.org/Show-Participation
You can see Dwight Merriman, founder of 10gen (MongoDB), and his most
recent presentation online in HD:
http://www.mefeedia.com/watch/25943056
In it he discusses the different categories of noSQL databases, as
well as the differences, strengths/weaknesses of each implementation
approach. Actually a good overview of noSQL, not just about MongoDB.
In the end you should play with one or more just to get a feel for how
they would impact your development. I think it is very understated how
going non-relational can benefit the development process.
Ajai's points on JSON is also a killer feature of sorts, as you can
really leverage rich data structures (and nested data) via Ajax and
RIA with zero transformation needed.
My previous post ended with a somewhat silly analogy of horse
carriages and jet planes. However that is why most folks are usually
unimpressed when they evaluate - as they make the mistake of forcing a
relational model on a document database.
-- Mitch
Thanks for all the well thought out replies .
My takeaway is that these databases have 2 important benefits.
1) Performance - they perform much much better than relational databases
2) Ease of programming - they make it faster and easier to code your apps
The big downside seems to be reporting, they lock you into needing a
programmer for reporting. Which, if it makes it easier to program, is not
so bad. And I can definitely see the attraction for internal applications
for a company that has an IT staff that can include a couple of part time
programmers[a couple for turnover reasons].
I'm not seeing it as a good fit for the company that wants to hire "experts"
to come in and do their work. Or the small business setting up a site where
they will have a consultant set it up, and they will use it. The former
mainly because until it is in widespread enough use, it makes it much harder
to find someone else to work on it - and being locked to a single provider
is never a good thing. The latter for the same reason.
So I can see the use..... I'm just not a "cutting edge" person. I use tried
and true solutions where at the end of the day Gary is not a requirement for
future changes to the system.
--
----
Hudson Valley Sudbury School
What GPL is for application users
Our school is for students
Help your children grow, change, and learn
Let your child direct, control, amend
Check out http://www.sudburyschool.org
http://www.nyphp.org/Show-Participation
Thanks for all the well thought out replies .
My takeaway is that these databases have 2 important benefits.
1) Performance - they perform much much better than relational databases
2) Ease of programming - they make it faster and easier to code your apps
The big downside seems to be reporting, they lock you into needing a
programmer for reporting. Which, if it makes it easier to program, is not
so bad. And I can definitely see the attraction for internal applications
for a company that has an IT staff that can include a couple of part time
programmers[a couple for turnover reasons].
I'm not seeing it as a good fit for the company that wants to hire "experts"
to come in and do their work. Or the small business setting up a site where
they will have a consultant set it up, and they will use it. The former
mainly because until it is in widespread enough use, it makes it much harder
to find someone else to work on it - and being locked to a single provider
is never a good thing. The latter for the same reason.
So I can see the use..... I'm just not a "cutting edge" person. I use tried
and true solutions where at the end of the day Gary
--
----
Hudson Valley Sudbury School
What GPL is for application users
Our school is for students
Help your children grow, change, and learn
Let your child direct, control, amend
Check out http://www.sudburyschool.org
http://www.nyphp.org/Show-Participation
Here here! I think that context is everything and the points you made
are spot on.....so why this huge interest in non-relational db's now?
I'd say it in 2 words Web 2.0 (well actually 1 word and 1integer).
Could Facebook, Twitter and any of the others have any idea of what
their db should look like or evolve to? I doubt it, and so for these
cases where the industry is not mature the non-relational makes perfect
sense. But for mature industries, then organizing the data with clearly
defined attributes and organization will give the biggest bang for the
buck to the business who's inevitably using it (and paying the bills).
It'll be interesting to see as these new industries mature and the next
generations have a better idea of what they'll be/need to do whether
there will be a migration away from the non-relational...
Anyway, just my 2 cents from a neophyte who knows just enough to be
dangerous.
Peter
Funny, but some (valid? fair? not sure) points about NoSQL databases:
http://highscalability.com/blog/2009/11/25/brian-akers-hilar[..]
accompanying slides: http://www.slideshare.net/brianaker/no-sql-talk
http://www.nyphp.org/Show-Participation
He's trying to be funny, at the expense of being less than 20%
accurate. In all fairness the term noSQL is generic enough to no
longer be about specific features, as something like Project Voldemort
has a totally different featureset and target than MongoDB, which in
turn is not trying to solve the exact same problems as CouchDB or
hBase. I'd try to avoid sweeping generalizations about noSQL, but I
think he was just trying to be funny and not misinform.
-- Mitch
http://www.nyphp.org/Show-Participation
Nobody else thought it was very revealing when someone shouted out "I like
my job" (assumedly a DBA job) as a reason not to use NoSQL?! I love MySQL
and NoSQL DBs certainly do not fit all projects, but in the ones where it
does fit it saves a huge amount of development time and makes the
dedicated-DBA position somewhat obsolete.
On that note, all you PHP + MongoDB users should check out phpMoAdmin, a GUI
tool to manage your Mongo data. It is open-source, AJAX-based, has nothing
to configure and is self-contained in a single 75kb file! Begin using in
10-seconds: simply place the moadmin.php file anywhere on your web site and
it just works!
Learn more, download & use:
http://www.phpMoAdmin.com
Contribute your code to the phpMoAdmin project:
http://github.com/MongoDB-Rox/phpMoAdmin-MongoDB-Admin-Tool-[..]
I don't follow this one. How does it make the dedicated DBA job obsolete?
Here is my experience withy the DBA world[fill disclosure, I started as a
programmer, moved into Windows Admin and DB2 DBA, then moved to Lotus Notes
Admin and Programming, before moving into PHP/Mysql programming]
The role of the DBA is to keep the system running, to provide a check on the
developers who tend to throw any old query together and blame the network,
the os, or the database for their lousy performance choices, to recover the
system when it inevitably has some weird failure, and to provide a central
resource for all things data.
Programmers who butt heads with DBA's would rather just have them out of the
way so they can finish their work. Of course, chances are that programmer
will be long gone when the crud hits the fan and so doesn't give a damn
about recovering from stupid choices.
Over the years, again and again I've seen the "this eliminates the DBA
position" - MySQL......Lotus Notes.....everything. What they really mean
is that you don't need a DBA to start throwing code up and together and
rolling it out.
Plus when you have a small team of programmers....one of them becomes the
DBA in effect....doing the small bits of work for it in the initial phases
and providing that central check.
Where you store the data doesn't matter, you still need someone for all
those functions once your system achieves a certain level of complexity and
commercial value. When having the system down for more than an hour is a
crisis.
Whether you need someone full time for that, or a maintenance contract with
a consulting team which built the system, is irrelevant - you need that
person there. Monitoring, checking performance, catching problems BEFORE
they occur.
If all coders where like me...knowing a good bit of network admin, some
amount of systems admin, database admin, and programming - sure you don't
need that. But my experience is that this is rare, most people specialize
in one of those skills.....which leads to a tendency for finger pointing
when things go wrong[it's the network...no its the system...no it's the
code....]. Of course, this always kept me busy with Lotus Notes
troubleshooting problems and implementing solutions when they cross
specialities[I especially would love the discussion where everyone thinks
the "ideal" solution is to fix the problem involving days of effort by one
person....when everyone also acknowledged that there were hour or so of work
arounds they could use to make it work.....but that required THEM to do the
work rather than pawn it off on someone else. So they were all too happy to
cost the company days of manpower to avoid an hours work.]
Sorry....hotbutton here. I suspect that the "wit" who responded that way
was not a DBA but a programmer speaking as if they were the programmer
stereotype of a DBA....and I have little patience for crap programmers who
only care about their code and are willing to torpedo the business rather
than follow a little process.
http://www.nyphp.org/Show-Participation
A database is a database...they all have similarities, and the SQL part is
the least important part of being a DBA.
Understanding about tuning, memory, file access, client configuration,
backups, restores, backup strategies[do you want a specific point in time,
do you need rolling logs], redudant strategies, etc.
All of this is irrelevant to the underlying system, whether it is a file
server, a DB2 database, or a MongoDB.
Granted, I started this thread complaining that I want nice GUI tools to
manage and explore my data...but that is my own sheer laziness since I am
primarily a developer and not a DBA.
If I was a DBA, I'd want a great command line api and I'd tend to script my
own tools rather than relying on canned crud. At least that is what every
DBA....and every SysAdmin outside of Windows admins do that I know of...[for
some reason...Windows Admins don't have this tendency... wheras I would
always throw perl on any windows box I was fiddling with and script stuff
rather than count on a GUI tool.
I dunno... I really read that comment as an ongoing of the Admin vs
Programmer war...... a senseless war that destroys business productivity,
in my opinion.
Oh, and yes, I agree that at some level one needs a dedicated DBA..... the
whole thing is it's not really based on the complexity of the systems, but
rather it is a business decision. When you can afford it, you should have a
half time DBA who can be on call at other times. Sure, we can all think of
at what technical level a business should hire a DBA......but the truth is
the world runs on money - so unless that DBA is working for free, the
decision is more likely to be based on how much money is coming into the
business rather than how complex the system is.
http://www.nyphp.org/Show-Participation
This becomes more of a systems administrator thing though, doesn't it?
MongoDB is a great example, as you can run it without a configuration
file - the only understanding anyone really needs is the development
team using MongoDB, as they obviously need to know what works (and
consequently what doesn't).
There's no way I could justify a full salary for someone to just sit
and watch MongoDB instances over on EC2 or the datacenter. That's
basically all they would do.
Part of the push to go non-relational is the desire to push away from
overly complex and convoluted proprietary platforms. I look at it like
this:
1) In the beginning, there were relational databases. They were big
and full of features, and it was desired to put as much "business
logic" in the database as possible - therefore a genuine need for
specialized support staff.
2) Hello, World Wide Web! Scaling these relational databases was hard,
and they were the main source of consternation and frustration for
development teams of high-traffic sites.
3) Facebook (among others) learn that to really scale, you need to do
your joins at the app layer, and everyone starts pulling all that
logic out of the database and back into the application.
4) So why are we using a relational database again?
Not saying this was a smart path to go, or even the right one; however
it is where we are, and there are reasons we've started down the route
to modern databases: They think like modern languages do (objects),
they have additional features for scale as part and parcel of their
base functionality (sharding, mapreduce), and take advantage of modern
systems for minimal configuration needs and best performance (memory
mapped files).
-- Mitch
http://www.nyphp.org/Show-Participation
Well...based on my background I don't see a difference between sys admin and
DBA......
I've seen any number of relational databases that run "without a config
file"....all that meant was that they used all the defaults.... Is it any
different with MongoDB?
Keeping an eye on memory usage, server load, network load....checking
reports and ensuring that if you have a sharded system that everything is
correctly loading to the right places and no changes need to be made....
checking file access and ensuring there are no problems there.... tuning
any caches to ensure they work efficiently.... maintaining the
documentation and data maps for the system so new apps which are going to
use the user profile or make changes to it know what already exists.
To me...a DBA is the central part of a project. He or that team is the one
everyone should go to when adding things to ensure it isn't already done
somewhere else.
If the system crashes and the company is losing a quarter of a million
dollars in revenue a day because the data became corrupted and no one is
quite sure when the last good backup was taken[since it was the non existant
DBA's job to simulate crashes and restore data from backups and ensure
everything works]....I think that not having a DBA or Systems Admin or
whatever you want to call him was foolish.
Now....if your losing 1000 dollars a day...well then a DBA isn't justified.
It's a business cost benefit analysis. Not a technical one. Can the
company afford a complete crash and startover from some random point in
time...and if not do they have the money to pay for an admin to keep
everything running smoothly? Sometimes you live with the
risk.....especially when everything is bright, shiny and new and the
developers who set everything up know it all like the backs of their hands
and are interested in it.
But years down the line...when you have hundreds of cobbled little sub
projects on it...the original developers have moved on to the next big thing
and don't want to touch that old dinosaur? Dang straight you get a DBA.
--
----
Hudson Valley Sudbury School
What GPL is for application users
Our school is for students
Help your children grow, change, and learn
Let your child direct, control, amend
Check out http://www.sudburyschool.org
http://www.nyphp.org/Show-Participation
That doesn't mean you need a dedicated DBA, like you would with a
heavyweight like Oracle, DB2 or even MySQL.
This *does* mean your developers need to be competent in the
technologies that they use, which unfortunately doesn't seem like a
standard practice. Actually, regardless of what database you use, your
developers need to be competent.
As well, your operations team needs competence in the technologies
that they are supposed to support, but again that won't require a
dedicated specialist for just the database portion of your
architecture.
That's the crux of my points, which I think are either misunderstood
or I failed to be clear, and I apologize for that.
As to the "can run without a config file" comment, you need to try
MongoDB for yourself to understand what I mean. MongoDB does not have
the need for preallocation of RAM, disk, etc. Unlike relational
databases that need all kinds of tweaking to make run well in your
particular scenario, databases like MongoDB simply don't need all that
additional help to be extremely fast.
This actually points back at my above sentiment that using a modern
database like MongoDB does mean there's zero justification for a
dedicated database person. Yes of course you still need operations
folks, but the need for a one-trick-pony should obviously be
diminished.
-- Mitch
http://www.nyphp.org/Show-Participation
Memcache is another great example, even if it is not persistent it is
still a data store. Have you ever heard of a company hiring a
"Memcache Administrator"? I mean, other than Facebook? :-)
Looking around the others and feeling the same - CouchDB, hBase,
MemcacheDB... These systems are expecting the bulk of computation to
live in the application layer, and they are focused on storing data...
Hence the low administration requirements.
And please note that I'm a certified Oracle DBA - went through
certification back in the 8i years. I've spent the last 15 years
really getting good at this, and should be the last guy calling for
change ;-)
-- Mitch
http://www.nyphp.org/Show-Participation
Anders Nawroth Mon, 14 Dec 2009 04:07:16 -0800
Regarding tables as an abstraction that everyone understands: When domain experts explain their domain on the whiteboard they rather draw boxes and lines between them then start defining tables. What they create is actually a graph model of the domain, and one nice thing about graph databases are that this model can be used almost directly as implementation model of the domain.