Lanyrd’s Big Move

Last week we pulled off two major changes to the infrastructure which runs Lanyrd, and both with no downtime - but what exactly did we do, and why?

The first major change was perhaps the tougher of the two - we changed the main database that Lanyrd runs off of from MySQL to PostgreSQL.

For those unfamiliar with the intricacies of databases, this is tough as it means we need to convert the entire site's data format in one go - and if it goes wrong, we have to roll back or, if things aren't monitored closely, there's a small risk of losing some data.

The reasoning for this move was mostly for database features - MySQL lacks a full transaction model, fast column addition, and it's quite bad at using multiple CPU cores, whereas PostgreSQL has all of these - allowing us to make changes to the site in the future with no downtime or read-only mode at all.

The second major change was us moving from Amazon Web Services - EC2 and RDS in particular - to running on dedicated hardware, which we rent through Softlayer.

There's nothing wrong with AWS - indeed, we still run a staging environment there - but our database benefits greatly from the low latencies of physical disks, and there aren't very many hosted PostgreSQL services on EC2 that fit our needs.

As part of the move, we also rearranged our services so that we have no single point of failure - everything is either running on multiple servers (like our Django code) or has a warm standby (the databases and load-balancer).

Read-only mode

Our read-only mode banner

Both changes required us to stop saving new data to Lanyrd during the move, and so we opted to do them at the same time to minimise the amount of time we spent in read-only mode. There's some risk involved here, of course - doing two major changes at once requires more careful planning and rehearsal - but we want to minimise the time we spend in read-only mode (in fact, this is only the second time since Lanyrd's launch).

During the week before the move, I scripted the entire move process as much as possible, giving us one command that would sync all of our main database, our Redis data and our search data, and did several dry runs onto a test environment, with Tom and Simon helping out with checking and ideas over the week.

We caught a few bugs, mostly with the database conversion. The conversion was performed by a dump converter I'd written myself, and we had a few problems with escaping and missing indexes, but those were both spotted by the eagle-eyed Lanyrd team during the testing phase.

If you're interested in the database conversion script we used, you can find it in our GitHub repository - there are some more technical details about how it works in my blog post.

Running the gauntlet

We'd analysed our traffic and picked Tuesday morning as the time that would impact the least number of people. One of the advantages of being a UK startup is that the time difference means that the US and Canada are asleep during the morning, giving a nice low-traffic area that's still in working hours.

With that set, the Monday was a final dry-run, a quick load-test of the new site using a traffic-replaying system we have, and then the move took place on the Tuesday, at 10am.

Apart from one minor hiccup with getting read-only mode turned on, the move went quite smoothly, and we were back up and out of read-only mode before midday. Lanyrd stayed available throughout the move, and read-only mode did its job admirably.

With only more more minor problem during last week - which we were able to deal with swiftly - things do seem to have gone rather well, and we're eager to start putting our more powerful servers and new database features to good use!

(If you're interested in more precise technical details on how the move went, there's a more in-depth article on Andrew's blog)

myfreeweb commented…

> there aren't very many hosted PostgreSQL services on EC2 that fit our needs

Heroku Postgres doesn't fit your needs? (It's not just for Heroku PaaS users)

Commented at 4:56pm on 13th November 2012

Gavin Towey commented…

Curious, what kind of throughput were you running through MySQL and at what levels of concurrency?

Commented at 5:22pm on 13th November 2012

Dario ♛ Galvis commented…

Gavin: +1

Commented at 7:35pm on 13th November 2012

Rob Bergin commented…

And if you want VMware to buy you - you need to be on PostgreSQL.

Commented at 7:43pm on 13th November 2012

James Moss commented…

Did you find you had to rewrite many queries going from MySQL > Postgres? How did you test that?

Commented at 8:10pm on 13th November 2012

Simon Willison commented…

Gavin/Dario - I'm afraid we can't release performance metrics like that. MySQL's performance was fine for us, the problem was altering tables.

James: almost all of our database queries are handled by Django's ORM, which abstracts away most of the differences. We had to fix a couple of places that were using group by and attempting to order by a column that wasn't grouped (MySQL is OK with this, PostgreSQL isn't) - they were .annotate() queries that had a default ordering, and the fix was to add .order_by() to the end. We had a few hand-written queries which needed fixing. We tested using our existing unit tests and by running PostgreSQL on our staging server for a few weeks before the move.

Commented at 11:37am on 14th November 2012

Steve Holden commented…

I'd be interested to know what Plan B was. Which is to say, if the migration had failed, what was the fallback plan?

Commented at 4:30pm on 14th November 2012

Simon Willison commented…

We still had the existing site running on MySQL on EC2, and since it was in read-only mode there were no changes being made to that database - if the migration to PostgreSQL/SoftLayer had failed, we would have switched the old site out of read-only mode and continued as we had before (then figured out what went wrong and scheduled another migration window).

Commented at 10:12am on 15th November 2012

You need to sign in to comment on this entry

When

Time 2:43pm

Date 13th November 2012

Share

Stay in the loop

Subscribe to our blog

Stay in the moment

Follow us on Twitter