Why Google App Engine is broken and what Google must do to fix it.

I've been working a lot with Google App Engine in the past few months. I still maintain, as I reiterated recently in my Boagworld interview with Paul, that it's a great idea and has great potential. And the Google App Engine team has been wonderful in working with me and helping me out. Unfortunately, there are a couple of fundamental issues that must be addressed, and addressed properly, before Google App Engine can be taken seriously as a web application platform.

(And no, Java support is most definitely not one of them.)

Rather than writing a lengthy diatribe, I am going to succinctly list the main showstoppers that Google needs to fix to make Google App Engine work:

1MB limit on data structures

Although this is hinted at in various bits of the documentation, it is not stated plainly anywhere so let me state it for the record:

Google App Engine does not support any data structure that is larger than 1MB in size.

This includes files. So you can't host that 1.2MB PDF you want to offer for download as part of your site.

The limit affects blobs, text, and any other fields in the datastore, as well as variables in Python (so you cannot get around the limitation by breaking things up into smaller pieces in the datastore and stitching them together later in code).

The only way to get around this limit is to stitch the data structure together on the client by making several Ajax calls. The number of use cases where this is useful, of course, is severely limited.

Among other things, the 1MB limit on data structures makes it nigh on impossible to run reports and makes having an administrative shell on the deployment environment pretty much useless as serialized results of queries with hundreds of items will quickly hit the limit.

1,000 item limit on query offsets

The datastore has always had a 1,000 item limit on query results. I don't have a problem with this.

However, they recently also introduced a 1,000 item limit on offsets. This means that you can at most get the 2,000th entity for a given kind.

Again this isn't documented anywhere.

Unless you have been keeping a sortable field (like a numeric index that you were manually saving) in your datastore, this new limitation effectively locks you out of your data beyond the 2,000th item for each kind.

The fact that such a radical change was introduced without any forewarning or even an announcement is, in and of itself, troubling.

Currently, the only way to get around this limitation is to make sure that you have a sortable field that you can limit your queries on. I would highly recommend saving a numeric key for every entity. This does mean, however, that you will be doing two datastore writes for every entity (once to save the record and get its numeric id, and another to write the numeric id into a separate field that you can sort on). This is both more cumbersome and will no doubt raise the risk of your running into another Google App Engine limit: The short-term high CPU quota.

(From what I've heard, the Google team is working on adding sortable keys in a future update.)

The short-term high CPU quota

Not only are your calls limited to returning in 10 seconds but if you actually try to do anything too stressful within those calls, you will quickly hit the short-term high CPU quota limit.

Thankfully, Google has raised this limit (and the other limits) for my app but plain vanilla apps are not as lucky. In my tests, I've found that the high CPU limit can be randomly triggered even in calls that return within a second. To tell you the truth, I don't actually know what causes these. I memcache nearly everything and try my hardest not to stress the system out and yet I routinely see the high CPU warnings in my logs for even the most mundane calls.

(I've also seen strange high CPU errors in my logs informing me that I am 1.0x above the high CPU limit which makes no sense at all.)

The short-term high CPU quotas must be removed. Not only that but Google must review how it handles quotas in general.

Quotas in general

When was the last time you saw an "over quota" error on one of your favorite web applications?

Google's handing of quotas is a major step backwards to the days of Geocities and "this user has used too much bandwidth" errors. Probably acceptable if you're hosting pictures of Little Timmy and Sally Jo's Summer Camp Adventure on Geocities, not so acceptable if you're hosting your next big web app on what you thought was Google's infinitely-scalable Cloud solution.

Simply put, they couldn't have come up with a worse PR campaign for Google App Engine if they had hired Steve Ballmer to handle the job.

Think about it:

You build an awesome new app on Google App Engine. You tell your friends. They tell 1,000 of their friends on Twitter who tell 1,000 of their friends and then, suddenly, you have all these developers hitting Google App Engine for the first time to see your app. Paradoxically, by doing that, they trigger the "intelligent throttling" "feature" in Google App Engine which freaks out and shuts down your app with an "Over Quota" error -- effectively making the "Over Quota" message the first impression most of your audience has of Google App Engine.

Not good.

Especially not good when your unique selling point is that your system can scale.

We don't care that it can scale. We care that it does scale. And that it scales when you need it the most.

But that's not the worst bit.

Admin? What's that?

Currently Google App Engine does only one half of what a web application needs (and does that only half well). As it stands today, Google App Engine is a highly scalable request/response system that is tuned to handle lots of tiny calls.

A typical web application, however, needs more than that.

While concentrating on making applications scalable, Google App Engine entirely ignores a crucial use case of any web application: administration. The administrative features of your web application may not be consumer-facing but they are just as important. They may include features for running reports or mailing all of your users. Key, essential tasks for any modern web application.

With Google App Engine today, if you have more than a couple of thousand members/records in your datastore, you can forget about running any sort of admin task.

A total lack of long running processes, coupled with the 1MB limit on data structures and the 1,000 limit on query offsets means that you cannot run reports or backups. (Not that there is a data backup system currently available for Google App Engine -- I have one that I wrote myself which used to work but is currently crippled due to the 1,000 item limit on offsets.)

Similarly, unless you knew to plan ahead and create a sortable key to query on, you will find yourself locked out reaching certain records in your datastore (for example, not being able to email all of your members).

These are core showstoppers for anyone considering building a real-world application on Google App Engine and I can only hope that they are at the top of the engineering team's list of new features.

25% ready for prime-time

As things stand, Google App Engine is about 25% ready for prime time. Once quotas are handled properly and the 1MB limit removed, it will be about 50% ready. The other 50% has nothing to do with scalability and everything to do with everything else that a typical web application needs. Specifically, long running processes and the ability to run reports, aggregate data, and perform operations on large data sets.

In effect, Google App Engine is entirely missing a separate mode of operation and this glaring omission must be addressed before Google App Engine can be deemed a serious web application platform.

Priorities and showstoppers

I really hope that Google is not working on adding support for other languages to Google App Engine, as they mentioned that they were at the Google Developer Day in London, when there are such fundamentally crippling issues with the platform that must be addressed first. Adding support for other languages to Google App Engine today is like sewing a new set of drapes for a house that doesn't have any walls yet.

In summary, before Google App Engine can be used for real web applications, the following issues have to be addressed:

It may just be that Google has to implement two modes of operation for each application on Google App Engine: one that is request/response-only and scales (i.e., basically what we have today, with better quota handling) and a separate admin mode that has long-running processes and isn't crippled by data structure size limits and short-term high CPU quotas.

As things stand today, running a real-world application on Google App Engine is a complete nightmare because the system completely ignores the essential administration-related use cases for web applications that we take for granted on other platforms.

Update: Several people have brought it to my attention that I forgot to mention another major showstopper for most applications, which is the lack of SSL. I ran into this early on, did my research, and found that PayPal was the only viable e-commerce solution at the time if your application needs to receive notification callbacks on purchases (ironically, Google Checkout requires SSL for its notifications API).

Comments