As a consultant specializing in enterprise Drupal, I often find myself answering a relatively small (and consistent) set of questions about the Drupal platform early on in any engagement. Many of the larger Drupal sites are media and publishing oriented, which makes these questions even more common across projects. As a result, I’ve started to compile a list of most common Drupal questions and the answers I’ve been able to provide. First up is the most common concern enterprises have about Drupal – scalability.

To be fair, scalability and performance mean different things to different organizations, so this question can be a bit misleading but it’s still Drupal’s biggest perceived issue, so it’s generally the thing that gets asked first.

To begin with, let’s take a look at some numbers. On projects in which I’ve been involved, I’ve seen the following stats achieved:

  • Over 10 million page views per month
  • Over 2 million page views in a single day
  • Over 1 million nodes on a large site (5 million+ PV/month)
  • Over 9000 concurrent logged in users
  • Over 30,000 registrations in a single day

While those numbers might not represent the top tier of high-traffic sites, clearly many businesses’ needs can effectively be met by a well-built Dupal site. Of course, “well-built” is the key. Fortunately, when it comes to scalability, many of the challenges that need to be overcome for a Drupal site are the same that crop up on any typical LAMP stack site.

Some Tips

Tuning

Current Drupal best practice is to generally build your site for its features and then allocate time to tune modules, queries, database setup, hardware, caching, etc. Overall this practice works well, but it does rely on the development team to be experienced and disciplined throughout the project so that any refactoring needed at the end of the project doesn’t become too onerous. A few things to watch out for along the way:

  • Choose contributed modules carefully – this means reading issue queues, reviewing code and particularly looking at database interaction (data model, query efficiency, etc.) for any modules with which you’re not familiar. The good news is that this gets easier as your familiarity with Drupal improves.
    • CCK – just as one example, CCK is an excellent module that is nearly part of Drupal core and a part of every Drupal project I’ve worked on. However, simply sharing fields between multiple content types (something that’s almost encouraged in the module’s admin) can lead to a fragmented data model and significant performance hits on high-traffic sites. Avoiding (or at least reducing) this practice can significantly improve your chances of scaling success.
  • Don’t underestimate theming – much of the Drupal theming system exists via convention and it’s certainly easy to ignore best practices. Drupal theming is not just a matter of slapping some HTML on a fully baked site. It’s actually one of the more complex (and underestimated) portions of the Drupal architecture and is a place where even experienced Drupal developers can get into trouble. This is a big topic and one I’ll pursue more in future posts.

Database

The database layer is generally where most scaling bottlenecks occur (again, as is typical in LAMP stack web apps) and there are a few quick and easy steps to take as a first pass at tuning up MySQL.

  • InnoDB vs. MyISAM – although there is some debate about exactly which tables to switch, the overall sentiment in the Drupal community is that changing the majority of your tables to InnoDB will vastly improve performance by reducing tables locks. In my experience, this is one simple task that provides immediate and significant improvements on underperforming sites.
  • Query caching – Tuning the MySQL query cache will also significantly improve performance. While this is a well documented practice (in general, as well as for Drupal) there is, unfortunately, no silver bullet here. Trial and error in a development environment with contiuned tuning in production is the best way to go.
  • Slow query logs – at Optaros, we’re lucky to have some great MySQL experts who can help out with even the most challenging query issues using all the typical tools, plus some great patches like the msl (microslow) patch, which logs slow queries with a microtime resolution.
  • Hardware, memory and configuration – another area that varies by installation, making sure your MySQL install has the best possible hardware and most possible memory available to it (and that it’s properly configured to use them!) will provide huge benefits.

Caching

Because Drupal is, in the end, a database intenstive application the best way to improve performance is through aggressive caching. APC, memcache, CDNs, MySQL Proxy and static page caching all play a role in speeding up an enterprise Drupal deployment. Typical solutions include a master/slave database setup, optionally with MySQL proxy to split database reads and writes, using Drupal’s cache router module to implement caching with memcache and offloading static content to a CDN (something that can be done without hacking Drupal, despite what you might read).

So the final word on Drupal scaling is that, while it may not be ready to run Amazon and Yahoo, there’s not too much left that Drupal can’t do performance-wise. Of course that doesn’t mean it’s completely ready to go out of the box, but as long as adequate attention is paid to performance-related best practices Drupal can be a solid option for high-volume sites.

Leave a comment

-->
blog comments powered by Disqus

Contact Us