Google App Engine Backup and Restore (Gaebar) released

Update: Beta 2 Released! See the Gaebar Beta 2 announcement.

Here's my Christmas present for the Google App Engine community: Google App Engine Backup and Restore (or Gaebar, for short).

Gaebar is an easy-to-use, standalone Django application that you can plug in to your existing Google App Engine Django or app-engine-patch-based Django applications on Google App Engine to give them datastore backup and restore functionality.

For a quick overview of Gaebar, watch the screencast, above. For the impatient, links to the project pages where you can download Gaebar follow.

Downloads

Gaebar is hosted on GitHub. You can either download archives or clone the repository (or install Gaebar as a git submodule) via git. Alongside the Gaebar project itself are two sample applications — one built on Google App Engine Django and the other on app-engine-patch — that contain the Gaebar functional test suite. The functional test suite tests every datatype supported by Google App Engine as well as references, Expandos, and ancestor relationships.

Please make sure you read the readme files after downloading the projects for installation and usage instructions.

What you can do with Gaebar

(You can, of course, also backup and restore your local development datastore as well as your staging application, etc.)

I've got a huge datastore, will Gaebar work for me?

Congratulations on the impressive size of your datastore! The answer should be "yes!" The largest datastore I've tested it with is the <head> web conference datastore. The latest backup had 18, 969 rows from 14 models backed up into 225 code shards.

What's a staging application?

A staging application is a new Google App Engine concept made possible by Gaebar. Basically, if your application is myapp.appspot.com, you can use a separate application (say, myapp-staging.appspot.com) in the same way as you would use a staging server in traditional development.

Your staging application can let you try out new features and test with real data without having your users see your changes until you are ready to deploy to your main application.

In fact, it's a perfect staging environment since it is identical to your deployment environment. For an example of this, see the screencast.

How Gaebar works

Gaebar backs up the data in your datastore to Python code. It restores your data by running the generated Python code.

Since a backup is a long running process, and since Google App Engine doesn't support long-running processes, Gaebar fakes a long running process by breaking up the backup and restore processes into bite-sized chunks and repeatedly hitting the server via Ajax calls.

By default, Gaebar backs up 5 rows at a time to avoid the short term CPU and 10-second call duration quotas and splits the generated code into code shards of approx. 300KB to avoid the 1MB limit on objects. You can change these defaults in the views.py file if your app has higher quotas and you want faster backups and restores.

Once a backup of a remote server is complete, Gaebar automatically hits your local development server. From there on, the local development server makes a series of calls to the remote server to download the backup files (code shards) from the remote server. Once the backup is complete, you will see a new backup folder in gaebar/backups with the contents of your backup.

Here's an example of some generated backup code from the Google App Engine Django test application (updated for the upcoming Beta 2 release):

import pickle
from google.appengine.api.datastore import datastore_types
from app1.models import *
from app2.models import *

def row_0(app_name):
  existing_entity = Profile.get(datastore_types.Key.from_path('Profile', 1, _app=app_name))
  if existing_entity:
    existing_entity.delete()
  profile_0 = Profile(key_name="id1", friends = pickle.loads('(lp0\n.'), in_relationship_with = pickle.loads('N.'), full_name = pickle.loads('VPaul Booth\np0\n.'))
  profile_0.put()

def row_1(app_name):
  existing_entity = Profile.get(datastore_types.Key.from_path('Profile', 2, _app=app_name))
  if existing_entity:
    existing_entity.delete()
  profile_1 = Profile(key_name="id2", full_name = pickle.loads('VAral Balkan\np0\n.'), friends = [datastore_types.Key.from_path('Profile', u'stephalicious', _app=app_name), datastore_types.Key.from_path('Profile', 'id1', _app=app_name)], in_relationship_with = datastore_types.Key.from_path('Profile', u'stephalicious', _app=app_name))
	profile_1.put()

To restore, you simply deploy your application, along with the backup folder, to your deployment environment and hit the Restore button in Gaebar. (If you have a large datastore both the backup and restore processes will take a long time, especially when restoring to a local development server.)

The restore process simply calls each of the generated row functions and each row function restores a single row into the datastore.

A note on the screencast

When mentioning how to install Gaebar, I left out that you also need to add the URL mapping for Gaebar to your application's urls.py. That, along with the rest of the installation instructions are in the readme.txt file, which I highly recommend that you peruse.

Have your say!

As in all things, my approach to blog posts is that they should evolve over time and your feedback is invaluable in achieving this by helping me fix factual errors, fill in details, and expand the original post.

What do you think of Gaebar? Have you run into any issues that need fixing? Do you have other suggestions on how to improve it? Or do you, perhaps, have a patch to send me that adds Webapp support or some other feature? Leave me a comment and let me know!

Gaebar is a Naklab™ production released under GNU GPL v3 and sponsored by the <head> web conference.

Gaebar Logo

Comments