scale

Relational Database, Why Bother?

I'm sure there are actual very good answers to the "Why Bother?" portion of this posts title.  But, this post is more or less in response to Scaling out MySQL from Nati Shalom's blog. The argument essentially that you should augment the relational database layer with an IMDB (In Memory Data Grid) for transactional activities and use the Relational DB as a back end persistent data store.  It is also a nice run down of various things that one might have to do to enable a MySQL Relational database layer to scale and continue to perform as load increases to insane levels where vertical scaling becomes impossible or cost prohibitive.

In reading that post I just could not stop thinking about all the hoops we all jump through to get around the fact that current implementations of Relational Databases just do not seem to be able to provide the performance and scale that successful modern web applications demand.  

Using in memory data grids like Coherence or in memory distributed cache technology like memcached gives me the scalability and performance I need to handle modern web application transaction loads on the systems I design. I use them for a couple of reasons.

1. Protect the database from meltdown
2. Enable shared access to data across a horizontally scalable clusters of machines

I have considered that the work being done on columnar databases like Vertica might be interesting to apply to web applications but I have not had a chance to really dig into that idea.

So, because of the limitations of my primary permanent relational data store I am forced to have to take the transactions out of the database.  Which makes me continue to ask the question over and over again of why I need the relational database anyway when I often don't use or need referential itegrity (I see DBA's shivering everywhere when I say that).  I really think that things like Mnesia, CouchDB, SimpleDB, HBase, Bigtable, and other technologies along those lines are coming in fast and furious to replace the relational database in its entirety as the persistent data store anyway.  This is especially true if you need to do major heavy lifting data mining of the data store or fancy things like Rackspace's log parsing with Hadoop or the NYT creating 11 million PDF's in 24 hours.

Resources:

Scaling Out MySQL by Nati Shalom
http://natishalom.typepad.com/nati_shaloms_blog/2008/03/scaling-out-mys.html

Vertica
http://www.vertica.com/

Memcached
http://www.productionscale.com/display/Search?searchQuery=memcached&moduleId=1481658
http://www.danga.com/memcached/

Oracle Coherence
http://www.oracle.com/technology/products/coherence/index.html

Article about Technology Choice and Scalability

Kendall Miller has written an interesting post at the Reliable Systems blog.  To summarize, it's not the technology implementation the determines the ultimate scalability of the system but it is the software architecture itself.

This is a recurring theme that I hear time and time again from well researched web systems implementers.  In my experience something like 80% or more, of the problems companies will have operationally when trying to scale a system efficiently will trace back to the work and care taken by the developers, software, and systems architects very early on in the projects life cycle.  This dramatically effects TCO, Total Cost of Ownership, of a system over time.

I'm not saying scale early necessarily because you don't have to over engineer things like software and infrastructure up front.   But, you should be thinking about scalability early in the overall design and systems implementation or you will suffer later.  It's just good sense and I think the article written by Kendall Miller and the subsequent comments to the article echoes this as well.

If you like this article and would like to read more by this author you can subscribe to this sites RSS feed with your favorite reader and/or follow the author's Twitter feed for related posts and information.

Resources: 

Original Article:  http://kendall.srellim.org/development/technology-is-not-scalable


 

Cloud Service Pricing Model Thoughts

I ran across Nirvanix a few days ago thanks to a joint press release they had with 3Tera.  I liked what they had built but I wasn't totally on board with their pricing model.  So, this is is a very brief few thoughts about cloud service pricing models primarily in the Context of Nirvanix, a Cloud Storage Service Company.

Nirvanix has created an API accessiblem globally distributed, highly available storage area network.  When properly integrated in any of a variety of ways to an application or work flow might be very useful and cost effective indeed.  Some of the uses they list on their site are CDN Origin Storage, Digital Lockers, Online Archiving, Backups, extending storage services to managed services and SaaS, etc.  However, the problem I ran into when looking at the service wasn't the service itself, but the pricing. 

I looked at their pricing and projecting costs month over month at scale to use that service with any degree of certainty would be difficult at the higher levels of complexity.  I'd have to estimate and add up bandwidth, media, primary storage itself, geographic co-location instances, level of extended service, methods of access, support, etc.  You get the idea. It looks like they've built a service reasonably resembling a service but then gone far from a utility pricing model which makes it more difficult to cost than necessary.  I just want to use the service and have the utility pricing match the utility service.

Pricing should not be this complicated.  Relative to Amazon Web Services for projects that use several AWS services this one isn't so bad.  But, both fail in my opinion to create a reasonable utility pricing model to go along with their utility services.  Ideally, I'd prefer to see a more simple tiered rate structure based on some overall system transaction/unit of work usage metric (SPU? Storage Processing Unit?).  I would tend to expect the rate to decline as my usage volume goes up to a point then probably plateau or even go up again at very high levels.  Within that model I'd be able to use any of the services provided if I am paying or a reasonable subset if I'm using a starter or free level of service for a while.  The point is that in just the way the cloud abstracts away some of the complexities of infrastructure it should also strive to abstract away the complexities of infrastructure pricing and offer an easily understandable model.

If you like this article you can subscribe to the sites RSS feed with your favorite reader and/or even follow the author's Twitter feed for related posts. 

Resources:

Nirvanix Pricing: http://www.nirvanix.com/gettingStarted.aspx

AWS Pricing: http://www.amazon.com/aws (click on various links in the infrastructure services side bar)

3Tera: http://www.3tera.com

Throwing Hardware at Software Problems

Here is a summary quote from an article by the Pythian Group worth reading to help people remember that bad design can hobble even the most insane hardware rigs.

"There is a time and place for hardware upgrades. The problem is that most people find hardware easier and just aren’t aware of how expensive their app logic really is. I find tuning easier and more cost effective."

The article explains an example of a database query and data model that brings a 10 node Oracle RAC cluster on a SAN to it's knees.  After properly tuning and indexing the need for the 10 node cluster disappeared and it was replaced with a mere two nodes.  I'd say that saved someone a few hundred thousand dollars a year.

Resources:

Good Database Design is Mightier than Hardware

 

Architecture Quality: Operational Manageability by Dan Pritchett

Featured Topic on InfoQ today is "Performance and Scalability."  In particular I enjoyed watching a video/slideshow presentation by Dan Pritchett of Ebay called Architecture Quality: Operational Manageability.  Here are some of my comments, take away, and thoughts.

This is a great watch for any developers and executives that want to understand a little bit more about what technology operations teams face from day to day.

One of my favorite quotes, "Reality: Inefficient software has driven data centers to the brink of municipal power delivery capabilities."  I can tell you that almost no one understands this concept at all.  So, it's refreshing to hear someone say it out loud anyway.  What if your software was "charged" by the unit for the power your code used?  How would that change your software design?  I think if we look into the past, we'll see things have been this way already.  In a way, pay by the hour types of utility computing resources like EC2 can push you to a more efficient design.  If you need a large instance instead of a small it costs quite a lot more per hour for example.

"Grid provides the framework for operational scalability."  Remember, in general and in my opinion, grid equals cloud.  Essentially though, it's not a silver bullet.  You still have to architect your software properly to make it all work.

Another one, "Developers bonuses take a hit when site availably misses."  Developers carry pagers and also have to get up in the middle of the night with operations to "feel the pain" they cause.

The biggest take away, and I have a lot more to say about this, is that operations, development, and business must communicate and work together from the very beginning.

Design for failure and rollback.

Resources:

The Topic at InfoQ: http://www.infoq.com/performance-scalability/

The Presentation (41:27) - http://www.infoq.com/presentations/operational-manageability