Kent Langley

March 9, 2009

Article: Building cloud-ready, multicore-friendly applications, Part 1: Design principles

Kent Langley

March 9, 2009

I ran across this article at the Java World website while reading up on Appistry blogs. I find myself often in the position of explaining why application XYZ can almost certainly be moved to the cloud but might not exactly be the most cloud friendly application. Therefore, it might not be able to achieve things like elasticity, especially elasticity (the ability to scale up/down according to demand) out of the box.

The article goes into good detail starting with the ideas:

Atomicity
Statelessness
Idempotence
Parallelism

These are all very important. I won't be repeating all the various fine definitions of each here.

Kent Langley

October 23, 2008

scale

Things to Consider When Planning Your Application System and Software Architecture for Scalability Over Time

Kent Langley

October 23, 2008

scale

What follows is a pretty basic checklist that I've recently factored out a many experiences in my day to day work as a "Scale Consultant." When I'm reviewing sites and talking to developers that are just starting out, have been in operation and just started to grow, or even well established sites who are migrating to Cloud Services I found that I was talking about these same things over and over.

This isn't intended to be an exhaustive treatment of the subject matter by any means. It is quite literally a "Was this considered when I was deciding on my software and systems architecture?" checklist. So, without further ado.

ORM for Data Partitioning and Query Splitting - Most databases are easily overwhelmed by Slashdot or Media effects. Having a way to split queries between updates and deletes from the start is a very wise move. This is often easily done now with the ORM layers that some frameworks use.
Monitoring process, resources, and uptime - I could probably write an entire article just about this subject. Monitoring done right is generally divided into three parts.
1. Process Monitoring - This makes sure things are running and stay running within certain tolerances. Examples are God, Monit, SMF.
2. Resource Monitoring - This is fine grained CPU, Memory, Disk Space, Disk IO, Networking, etc. Examples are Nagios, Ganglia, Munin, ZenOSS. Choosing correctly depends on your specific situation.
3. UpTime Monitoring - This is the only monitor people usually do if they do any at all. This should be a disinterested 3rd party to provide accountability and what I call a 3rd party eye in the sky should any dispute about uptime arise. I like webmetrics and pingdom at the moment as reasonably priced services that add good value.
Performance Testing and Capacity Planning - You can't drive a car if you can't see the road (or at least some representation of it). It is the same with your internet application. You can't make good decisions without doing some degree of Performance Testing and Capacity planning.
Static vs. Dynamic Content splitting / CDN - There is just no argument that this must be done at scale. There are many ways to do it. Reverse Proxy, Splitting Static and Dynamic content in a variety of ways, and more. Make sure your framework or application supports this feature. Some make it easy, some make it difficult.
Bundling and Compressing JS and CSS - Sites have a prolific amount of CSS and JS files these days. It's critical that you learn to bundle them, compress, version, and then properly cache those bundles. This can have a dramatic effect on page load time.
Logging - Log appropriately and monitor those logs. I never tire of sending developers back to their desk to check the logs when they tell me the server is broken. It's fun. Check your logs for common errors. In fact, perhaps you should write a small script to monitor the log files for seg faults, 500's, 404's, and other types of errors. Proactive rules.
Pragmatic Caching- There are many, many types and layers of caching. Most current web applications will have between 3-5 layers of caching at least to maintain acceptable performance and scalability of critical services. Learn everything you can about caching at the various layers in your technology stack.
Functional Decomposition - This was once overly expensive for many people. Now, with virtualization and cloud computing you can easily decompose your entire application into functional silos that are independently scalable and speak to one another as required. For example, app servers, monitoring, log aggregation, databases, message queues, upload servers, video encoding servers, and many more. Don't shove everything into one box anymore. Break it down by function.
Deployment - At the very beginning of your development lifecycle you should integrate your deployment process. It should be efficient, it should have a roll back capability, and it should be almost entirely automated to development, staging, and production environments. But, in some cases, humans should gate the deploy of course.
Asynchronous Practices - Remember that functional decomposition? For ever task when one function talks to another ask yourself, does that REALLY have to happen in real time or can it happen over time. Learn the CAP theorem. You will learn quickly that in most cases work can be queued and done by a separate process aside from the event that caused the work to need to be done in the first place. A good example is logging. I saw an application framework that kept every single logged event in a relational database and did all those inserts in real time for reporting purposes. Is that really necessary? Probably not. Put them into a file or even a cache and process the file elsewhere as a batch job out of user experience band.
Make sure your application processes are as lean as possible. I'll demonstrate by way of a bad example. If you application server requires 30-90MB for a single request thread to be processed and can only seem to hammer out between 6-7 of these requests per second then you're going to be hurting seriously in the wallet down the line. That's just way to expensive for most applications. On a reasonable sized application server you'd only be able to support a handful of concurrent requests. So you'd need 1000's of servers to handle millions of requests. I don't care what service you use, 1000's of servers are expensive!

Since I do this daily, these things seem obvious to me. They don't all fit for every situation because every situation is unique. I hope that by writing them down they help someone else out who is just starting to feel the pressure of growth. If you do/think of most or all of these things up-front things will be a little bit better for you down the line if things heat up.

Update on 2008-11-06 17:23 by Kent Langley

This article was translated to French. Here is the link:

http://www.haute-disponibilite.net/2008/10/28/fiabiliser-votre-architecture/

Update on 2008-12-08 00:04 by Kent Langley

This article was just republished by Sys-Con.com. So, here's a little link love back to them.

I found that is was quickly one of my most well read articles over time. So, I've been working on a follow up to flesh things our a bit more. It'll either be a single large document or a series. I'm not sure which I'd prefer to do yet. If there are any opinons or requests please let me know.

Update on 2009-01-02 21:13 by Kent Langley

One of the blogs I read, AKF Partners, posted a top 10 things for 2009 that is quite related to this post. So, it's worth a read as well I think.

Develop the ability to rollback
Break changes into smaller pieces
Remove SPOFs
Remove synchronous calls
Incent a culture of excellence
Develop a disaster recovery plan
Develop quality into the product from the start
Split your application or database
Start Logging
Celebrate your success

Full Article: http://akfpartners.com/techblog/2009/01/02/new-year%E2%80%99s-tech-resolutions/

There is a fair bit of overlap w/ my list in this article and another I published earlier about launching a website. All together they make a nice guide.

10 Simple Rules for Launching a Web Site

http://blog.solutionset.com/wpmu/2008/07/23/10-simple-rules-for-how-to-launch-a-web-site-successfully/

Kent Langley

September 24, 2008

scale

Cloud Computing with Java

Kent Langley

September 24, 2008

scale

I just posted on the company blog where I work about my work and testing using Gigaspaces XAP. In summary, we did a very cool series of Monte Carlo simulations as a test case for cloud computing with Java using Joyent's Infrastructure as a Service (IaaS), Gigaspaces Platform as a Service (PaaS), and Owen Taylor's simulation software application. We did this to test scalability and ease of use of the platform. It was a fun test series.

Most importantly for me is that, in my opinion, we did some real honest to goodness cloud computing. To take an excerpt from the blog, I said,

Cloud Computing is the act of deploying, elastically scaling, managing, and running Cloud Computing Applications on Cloud Computers. Cloud Computing Applications are those applications that are well designed to run on Cloud Computers. In this case, Joyent provided infrastructure as a service (IaaS). Gigaspaces provided the Gigaspaces XAP Platform as a Service (PaaS). A savvy developer, Owen, created a cloud computing software application, the Monte Carlo simulation. We definitely did some Cloud Computing in this test!

This ties back to a previous series of articles I have written here more or less. Now, instead of just writing about I'm finding good examples and doing it for real.

Full article is available here:

Cloud Computing using Gigaspaces XAP on Joyent Accelerators

http://www.joyeur.com/2008/09/24/cloud-computing-using-gigaspaces-xap-on-joyent-accelerators

Previous Related Articles:

http://www.productionscale.com/home/category/cloud-computing

Kent Langley

September 16, 2008

scale

Drupal: Billions of Page Views per Month

Kent Langley

September 16, 2008

scale

I am in fact still here. I have just been busy having started a new job recently. One of the more interesting projects I've been working on lately is actually a Drupal 6.2 project.

As a Scale Consultant on of my first new projects was working w/ a client to scale and test their Drupal installation to being capable of well over 2 BILLION page views per month! Well over... That's been quite fun.

In the spirit of sharing. I put the info up on the wiki at Joyent.

http://wiki.joyent.com/all-accelerators:kb:drupal

There's a nice diagram at the end that yours truly made. Here's a smaller version.

Drupal

I have some other really neat projects I've been working on in a new program I'm running called Joyent Labs. This is probably by far where I get to have the most fun during the week. In running the labs I'm essentially getting to work with some really cutting edge cloud computing companies, CMS developers, business partners, and much more. I've got some great things brewing in the labs and I can't wait to unleash the creations!

I have several articles on the back burner for here that I've been working on a while as well. Now that think I'm settling into my new groove a bit more I should be able to get back to blogging more consistently. I miss it!

So, that's where I've been. Sadly, not on vacation, but having fun none the less.

Kent Langley

July 25, 2008

scale

Scalaris: One to Watch

Kent Langley

July 25, 2008

scale

Scalaris is a scalable and fault-tolerant structured storage with strong data consistency for online databases or Web 2.0 services.

Without system interruption it scales from a few PCs to thousands of servers. According to the demands, add or take away computers from a production system at anytime without risking any service downtime. Scalaris does all the rest: fail-over, data distribution, load-balancing, replication, strong consistency, and transactions.

I ran into this during a marathon most of the night research session last night on a number of topics related to scalability, concurrent programming, and message passing systems architecture. I was impressed. This is not lightweight reading and I will definitely have to go over it again to fully grok. But, I'm excited about some possibilities a system like this might enable.

Sources: http://www.onscale.de/scalaris.html

And that is the news for now...