This is a follow up to my previous article about DynamoDB shortcomings. Here I’m talking about solutions I’m familiar with: AWS DynamoDB, MS Azure Storage Tables, Google AppEngine Datastore. As far as I know there is no other solutions of comparable scale / maturity out there.
Transparent support for data compression.
All the storages impose some limit on item size or attribute size. So when storing large objects like texts or JSON it makes sense to compress them as one way to mitigate the limit. This looks like totaly user-land problem, however life would be much easier if one could specify content-encoding for each attribute. This way one could have let’s say data attribute and serialize it with or without compression based on the specific conditions or software version. Then one can dream of having this natively supported by official client libraries — your cloud is only as good as your client library. Just let user set is_compressible optional flag for each attribute, and let library handle the rest, and do transparent decompression on read based on content-encoding. And as a third level support web management console should understand and support decompression too.
Transparent support for “overflow” storage
This may be familiar to those who knows how Postgres et al store data. While it is short it is stored as a single row record, once it becomes too large it “overflows” to separate table. I understand technical reasons of 400KB / 1MB limits and reasoning that “nobody will ever need more” ;)) However occasionally people do need more. In our use case 99.9% or 99.99% of data fits into the limit (with a good headroom), but then there is that 0.01–0.1% which does not, and it is a headache. Please, do transparent support for large objects. Make it slow, make it expensive, but do support large object.
Transparent support for encryption
Some people/businesses are really nervous (or conservative, or cautious in a good sense) about storing sensitive data in the cloud. DynamoDB does not even provide an encryption-at-rest guaranty. But many would want not only at-rest, but end-to-end encryption to. Similar to compression support described above, it would be great to be able to tag sensitive attributes as encrypted and provide AES key for encode and decode operation. Storage may also support storing key thumbprint together with encrypted attribute to simplify decryption and key rotation.
We at workato.com have been using DynamoDB for two years, some lessons learned. At that time there were no other option — MS Azure tables had less features; Google DataStore did not have REST API yet. Now we are migrating to Google Datastore, so expect similar article from me in two years 🙂
So without longer forewords my complaints about DynamoDB:
- Small item size limit. When we started it was mere 64KB, later on it was increased to 400KB, still too small by modern requirements. We had to implement our own chunking scheme, which makes it very hard to operate on data and add new indexes.
- Small number of secondary indexes you can have (5 local + 5 global).
- No multi-column indexes which means you need to design your schema in advance and create artificial columns merging several attributes to emulate such multi-column index. This is just a no go. Your requirement evolve, you data query needs evolve, if you cannot add index without massive (unfeasible) data migration you are screwed.
- No support for batch delete, you can not delete by index. You need first to query, and then delete in batches. (this is true for most cloud storages though)
- Poor elasticity toward traffic spikes. DynamoDB does support temporary traffic bursts, but you still need to be below provisioned throughput in 5-minute window average. (And I believe it was silently changes last year making us into troubles. I can not remember exact details now, but it was either 15 minutes to 5 minutes change, or even 5 minutes to 1 minute) Roughly you need to provision to 99.5% percentile level.
- As developer you don’t see the costs unless you have access to whole AWS account billing. If you have account without access to whole AWS account billing, you don’t know what charges are. All numbers are there, but you need to go through all the tables and indexes, sum up provisioned throughput and consumed volume, go to pricing page and calculate the number. But why so much headache? The number does not need to be precise as in billing report, but approximate number (split by table plus total) would be very useful.
- No support for multiple namespaces or databases. If you need to run multiple deployments (production, staging etc), you either need to implement prefixing of table names + access rules based on table names, or set up separate AWS accounts with separate billing, replicating admin access permissions etc.
Hot partition issue:
- Each partition gets only a fraction of total provisioned throughput. If you have 10 partitions, you throughput per partition is 1/10th of total one.
- And you don’t know how many partitions you have. I repeat it, just internalize it, you don’t know how many partitions you currently have in your system. So you don’t know your per-partition limit. There is a formula in documentation, but it does not work in practice. When I was troubleshooting that formula suggested I should have 2 or 3 partitions, talking to support revealed we had a dozen. And (see below) number of partitions is defined not by current settings, but by historical ones.
- No visibility into partition load. AWS support has internal tools which help to visualize partitions heatmap, but you don’t have access to them. Only if support sends you a screenshot.
- Poor monitoring / manifestation of throughput errors. So you see throttling errors, telling you your are over your provisioned throughput limits. You go to AWS monitoring to see that consumed throughput is 1/10th or 1/20th of provisioned. You are lost, you are puzzled, you panic, you cry. There is zero indication it is a hot partition problem.
- No scale down. Once you increased your throughput to say accomodate ETL job, number of partitions increase accordingly. And it never shrinks. This extremely counter-intuitive. You increase your throughput (per table), but it decreases (per partition). Or you pay more, but receive less.
- As time goes and your table grows, so number of partitions, and so per-partition throughput shrinks. So yesterday provisioned throughput was enough, tomorrow it may be not.
- Even if you shard well on your primary key, it does not prevent hot partition issue completely. One write operation is only 1KB, so write 400KB item and you are consuming 400 write units instantly. Dynamo may or may not accommodate this based on the other traffic.
- Hot partition problem is (almost) unavoidable. Even if you shard your data perfectly using hash-like key (and how you implement chunking then?) and into small (1KB) items, chances are you will need some secondary index to query data on, be it user_id or updated_at or something else. Global secondary indexes are eventually consistent, and are updated based on the queue of requests as far as I understand, so they are a bit more elastic. But only marginally. Eventually (sorry;) you will get into the same hot partition problem, now with indexes table which is even harder to diagnose.
Sidenote: I think MS Azure Tables approach is more straightforward: limited throughput per whole table, and limited throughput per partition. That’s it. As you table grows, your per-partition throughput remains constant.
As a summary: DynamoDB may be good as low-level key-value storage if you understand all complications and can design access patterns around it. It does not work as generic object storage.
Cross-posted to medium.com/@timanovsky