Tag Archives: Amazon AWS

Cloud storage features wish list

This is a follow up to my previous article about DynamoDB shortcomings. Here I’m talking about solutions I’m familiar with: AWS DynamoDB, MS Azure Storage Tables, Google AppEngine Datastore. As far as I know there is no other solutions of comparable scale / maturity out there.

Transparent support for data compression.

All the storages impose some limit on item size or attribute size. So when storing large objects like texts or JSON it makes sense to compress them as one way to mitigate the limit. This looks like totaly user-land problem, however life would be much easier if one could specify content-encoding for each attribute. This way one could have let’s say data attribute and serialize it with or without compression based on the specific conditions or software version. Then one can dream of having this natively supported by official client libraries — your cloud is only as good as your client library. Just let user set  is_compressible optional flag for each attribute, and let library handle the rest, and do transparent decompression on read based on content-encoding. And as a third level support web management console should understand and support decompression too.

Transparent support for “overflow” storage

This may be familiar to those who knows how Postgres et al store data. While it is short it is stored as a single row record, once it becomes too large it “overflows” to separate table. I understand technical reasons of 400KB / 1MB limits and reasoning that “nobody will ever need more” ;)) However occasionally people do need more. In our use case 99.9% or 99.99% of data fits into the limit (with a good headroom), but then there is that 0.01–0.1% which does not, and it is a headache. Please, do transparent support for large objects. Make it slow, make it expensive, but do support large object.

Transparent support for encryption

Some people/businesses are really nervous (or conservative, or cautious in a good sense) about storing sensitive data in the cloud. DynamoDB does not even provide an encryption-at-rest guaranty. But many would want not only at-rest, but end-to-end encryption to. Similar to compression support described above, it would be great to be able to tag sensitive attributes as encrypted and provide AES key for encode and decode operation. Storage may also support storing key thumbprint together with encrypted attribute to simplify decryption and key rotation.

API versioning scheme for cloud REST services

Traditional approach

Traditionally it is considered a good practise to have REST API versioning like “/api/v1/xxx” and when you transition to next version of your API you introduce “/api/v2/xxx”. It works well for traditional web application deployments where HTTP reverse proxy / load balancer front end exists. In such set up application-level load balancer can also work as an HTTP router, and you can configure where requests go. For example you can define “v1” request go to one rack, “v2” to other, and static files to third one.

Cloud services tend to simplify things and they often don’t have path-based HTTP routing capabilities. For example Microsoft Azure does load balancing at TCP level. So any request can hit any VM, so your application needs to be prepared to handle all types of requests at each VM instance. This introduces a lot of issues:

  • No isolation. If you have a bug in v1 code which can be exploited all your instances are vulnerable;
  • Development limitations. You need to support whole code base making sure ‘v1’ and ‘v2’ live nicely together. Consider situation when some dependency is used in both ‘v1’ and ‘v2’ but require specific different versions;
  • Technology lock in. It either impossible or very hard to have ‘v1’ in C# and ‘v2’ in Python.
  • Deployment hurdle. You need to update all VMs in your cluster each time.
  • Capacity planing and monitoring. It is hard to understand how much resources are consumed by ‘v1’ calls vs. ‘v2’

Overall it is very risky, and eventually can become absolutely unmanageable.

Cloud aware approach

To overcome these and probably other difficulties I suggest to separate APIs at domain name level: e.g. “v1.api.mysite.com”, “v2.api.mysite.com”. It is then fairy easy to setup DNS mapping of the names to particular cloud clusters handling the requests. This way your clusters for “v1” and “v2” are completely independent – deployment, monitoring, they can even run on different platforms – JS, python, .NET, or even in different clouds – Amazon and Azure for example. You can scale them independently – scale v1 cluster down as load reduces, and scale v2 cluster up as its load increases. Then you can deploy “v2” into multiple locations and enable geo load balancing still keeping “v1” legacy set up compact and cheap. You can have ‘v1’ in house, and ‘v2’ hoste in cloud. Overall such set up is much more flexible and elastic, more easy to manage and maintain.

Of course this technique applies not only to HTTP REST API services but to others as well – HTTP, RTMP etc.

Comparison of BLOB cloud storage services

In the table below I’m trying to compare Amazon AWS S3, Microsoft Azure Storage, Google Cloud Storage, Rackspace CloudFiles.

Hope this is useful.

Amazon AWS S3 MS Azure Google Cloud Storage Rackspace Cloud Files
Max account size Unlimited? 200 TB per storage account(by default you have 20 storage accounts per subscription) = 4PB Unlimited? ??
Geo redundancy +extra price +extra price ?
SLA Availability SLA: 99.9%Designed to provide 99.999999999% durability and 99.99% availabilityReduced Redundancy:Designed to provide 99.99% durability and 99.99% availability Availability SLA: 99.9% Availability SLA: 99.9% Availability SLA: 99.9%
HTTP + + +
HTTPS + + + +
Container namespace Global LName collision is possible Storage account Global L Account
Container (bucket) levels single single single single
Root container -but with HTTP each bucket can have DNS name (not possible with HTTPS) + -but each bucket can have DNS name
Cross origin CORS support + +experimental +
Max number of containers (buckets) 100 unlimited Unlimited? 500,000
Container name length 3-63a few limitations, basic form is “my.aws.bucket“in US region:255 bytes[A-Za-z0-9.-_]But DNS-compliant is recommended 3-63 chracters, valid DNS name 3 to 63 charactersvalid DNS nameIf name is dot-separated each component <63 character, and 222 total 256after URL encoding, no ‘/’ character
Max files per container Unlimited, no performance penalty unlimited unlimited Unlimited, but API throttling begins at 50000 files.
List files, max results 1000 5000 ? 10,000
List files returns metadata +(configurable)
File (BLOB/object) name 1024 1024 1024 1024 bytes after URL encoding
Max file size 5TB 200GB (Block blob)1TB (page BLOB) 5TB UnlimitedMultiple files in the same container with same name prefix can be joined into single virtual file.
Max upload size 5 GB 64 MB 5 MB? 5GB
HTTP Chunked upload + +
Parallel upload + + ?(Resumable Uploads of Unknown Size) +
Upload Data checksumming + + +
Server-side copy +(up to 5 GB?) +supports external sources + +
File metadata 2KB, ASCII,no update 8KB + (size limit non specified)No update 90 pairs or 4096 bytes total
# data centers 8 8 ? 2: ORD, DFW
CDN-enabled + + ? +
Authentication HMAC-SHA1 signing HMAC-SHA256signing OAuth 2.0Google cookie1-hr JWT token 24-hr token obtained from Cloud Authentication Service.
Time-limited shared access + + + +
File auto expiration + +
Server-side encryption +AES-256
Storage per GB a month $0.055 – $0.095 for fully redundant

$0.037 – $0.076 for reduced redundacny


$0.055 – $0.095 for geo redundant

$0.037 – $ 0.07 for locally redundant storage

$0.054 – $0.085 for full redundancy

$0.042 – $0.063 reduced redundancy

Outgoing bandwidth per GB $0.05 – $0.12 (US)

1GB /mo is free

$0.05 – $0.12 (UE/EU)

$0.12-$0.19 (Asia)

$0.08-$0.12 (US)

$0.15-$0.21 (Asia-Pacific)

Incoming bandwidth $0 $0 $0 $0
PUT, POST, LIST $0.005 per 1000 $0.01 per 100 000 storage transactions $0.01 per 1000 $0
HEAD, GET, DELETE  $0.004 per 10 000(no charge for delete) $0.01 per 100 000 storage transactions $0.01 per 10 000(no charge for delete) $0

Update (Nov 23, 2013):  About 30% cost reduction by Google; up to 24% price reduction by Amazon ; same 24% reduction by Microsoft

UPDATE (Nov 5, 2012): Azure Strorage now has higher limits, way better internal connectivity network, and higher performance targets, see announcement.

2000 IPOS per partition, 20000 IPOS per storage account.