Coming soon: backup and restore the data in your Google App Engine applications

Google app Engine Backup Restore Datastore

Today, I made a full backup of the data on the Singularity web site on Google App Engine and restored it on my local machine running the SDK.

If you're not familiar with Google App Engine, you may be thinking, "so what?" Big deal, Aral, I can do backups with a click of the button in PhpMyAdmin. Unfortunately, though, there isn't currently a publicly available data export feature for Google App Engine, much less a solution to backup and restore your data easily. (One of the top criticisms aimed at Google App Engine is that you cannot backup/download your data.)

As far as I know, this is the first time a backup and restore has been done on Google App Engine (though Google engineers may have successfully tested their own solutions internally.)

My solution works by backing up the datastore incrementally into Python code (I ran into every possible limit on App Engine while developing this, as those of you following my Twitters today will have witnessed).

Yep, that's right, the backups are stored as Python code. "What about restores?", I hear you ask. Well, a backup is pretty useless if you cannot restore it. Restoring the backups is as easy as running the generated python code (either all at once, on the local SDK, or incrementally on the deployment environment.)

The four use cases I see for this are as follows:

  1. Backup you data (data safety)
  2. Backup your data and restore to the local SDK (local testing with real data)
  3. Backup your data and restore to a different App Engine instance (staging server)
  4. Backup your data and restore to your live application instance (data recovery)

I've already successfully handled use case 1 and I'm currently almost done with a generic restore feature that should correctly handle use cases 2-4.

When I've got restores working properly for the Singularity app, I'm going to decouple these handlers from the app and create a separate Django app that you can include in your own projects to give you backup and restore functionality in Google App Engine.

I will be releasing this as an open source project as part of Singularity's Open Source Initiative, alongside OpenCountryCodes, The European VAT Number Validation API, and The GAE SWF Project.

I've also got plans to make the whole process entirely seamless but my priority is working on the Singularity web conference so don't expect a very polished solution immediately. I will be working on this as my "20% project" of sorts, though, as it is essential to have a solid backup/restore solution for commercial apps on Google App Engine.

To read more about this, including some of the challenges, check out the following thread on the forums: Datastore backup solution (almost ready).

Based on forum postings, it would appear that Google is also working on the data export issue and I'm talking with Pete Koomen at the moment regarding my solution. Check out Pete's thread on backups here: Feedback on data exporter design?

Finally, one of the challenges in working with real data on the local SDK was that the SDK would grind to a halt after you populated a certain number of rows in the local datastore; see performance issue with SDK datastore with large volumne (>1000 rows). Thankfully, Baptiste Lepilleur created a very timely patch (see Issue 390) to work around the problem. This means that restoring a backup to your local machine will not take forever (it's still not blisteringly fast -- taking 0.1 sec per put but at least the duration does not increase linearly like it used to).

I plan to get restores fully working tomorrow (along with my other, more social responsibilities for Singularity and Pistach.io). Follow me on Twitter or keep an eye on the blog and/or the relevant thread on the forums for updates.

We're close to having a working data backup and restore solution on Google App Engine! :)