API versioning scheme for cloud REST services

Traditional approach

Traditionally it is considered a good practise to have REST API versioning like “/api/v1/xxx” and when you transition to next version of your API you introduce “/api/v2/xxx”. It works well for traditional web application deployments where HTTP reverse proxy / load balancer front end exists. In such set up application-level load balancer can also work as an HTTP router, and you can configure where requests go. For example you can define “v1” request go to one rack, “v2” to other, and static files to third one.

Cloud services tend to simplify things and they often don’t have path-based HTTP routing capabilities. For example Microsoft Azure does load balancing at TCP level. So any request can hit any VM, so your application needs to be prepared to handle all types of requests at each VM instance. This introduces a lot of issues:

  • No isolation. If you have a bug in v1 code which can be exploited all your instances are vulnerable;
  • Development limitations. You need to support whole code base making sure ‘v1’ and ‘v2’ live nicely together. Consider situation when some dependency is used in both ‘v1’ and ‘v2’ but require specific different versions;
  • Technology lock in. It either impossible or very hard to have ‘v1’ in C# and ‘v2’ in Python.
  • Deployment hurdle. You need to update all VMs in your cluster each time.
  • Capacity planing and monitoring. It is hard to understand how much resources are consumed by ‘v1’ calls vs. ‘v2’

Overall it is very risky, and eventually can become absolutely unmanageable.

Cloud aware approach

To overcome these and probably other difficulties I suggest to separate APIs at domain name level: e.g. “v1.api.mysite.com”, “v2.api.mysite.com”. It is then fairy easy to setup DNS mapping of the names to particular cloud clusters handling the requests. This way your clusters for “v1” and “v2” are completely independent – deployment, monitoring, they can even run on different platforms – JS, python, .NET, or even in different clouds – Amazon and Azure for example. You can scale them independently – scale v1 cluster down as load reduces, and scale v2 cluster up as its load increases. Then you can deploy “v2” into multiple locations and enable geo load balancing still keeping “v1” legacy set up compact and cheap. You can have ‘v1’ in house, and ‘v2’ hoste in cloud. Overall such set up is much more flexible and elastic, more easy to manage and maintain.

Of course this technique applies not only to HTTP REST API services but to others as well – HTTP, RTMP etc.

Comparison of BLOB cloud storage services

In the table below I’m trying to compare Amazon AWS S3, Microsoft Azure Storage, Google Cloud Storage, Rackspace CloudFiles.

Hope this is useful.

Amazon AWS S3 MS Azure Google Cloud Storage Rackspace Cloud Files
Max account size Unlimited? 200 TB per storage account(by default you have 20 storage accounts per subscription) = 4PB Unlimited? ??
Geo redundancy +extra price +extra price ?
SLA Availability SLA: 99.9%Designed to provide 99.999999999% durability and 99.99% availabilityReduced Redundancy:Designed to provide 99.99% durability and 99.99% availability Availability SLA: 99.9% Availability SLA: 99.9% Availability SLA: 99.9%
HTTP + + +
HTTPS + + + +
Container namespace Global LName collision is possible Storage account Global L Account
Container (bucket) levels single single single single
Root container -but with HTTP each bucket can have DNS name (not possible with HTTPS) + -but each bucket can have DNS name
Cross origin CORS support + +experimental +
Max number of containers (buckets) 100 unlimited Unlimited? 500,000
Container name length 3-63a few limitations, basic form is “my.aws.bucket“in US region:255 bytes[A-Za-z0-9.-_]But DNS-compliant is recommended 3-63 chracters, valid DNS name 3 to 63 charactersvalid DNS nameIf name is dot-separated each component <63 character, and 222 total 256after URL encoding, no ‘/’ character
Max files per container Unlimited, no performance penalty unlimited unlimited Unlimited, but API throttling begins at 50000 files.
List files, max results 1000 5000 ? 10,000
List files returns metadata +(configurable)
File (BLOB/object) name 1024 1024 1024 1024 bytes after URL encoding
Max file size 5TB 200GB (Block blob)1TB (page BLOB) 5TB UnlimitedMultiple files in the same container with same name prefix can be joined into single virtual file.
Max upload size 5 GB 64 MB 5 MB? 5GB
HTTP Chunked upload + +
Parallel upload + + ?(Resumable Uploads of Unknown Size) +
Upload Data checksumming + + +
Server-side copy +(up to 5 GB?) +supports external sources + +
File metadata 2KB, ASCII,no update 8KB + (size limit non specified)No update 90 pairs or 4096 bytes total
# data centers 8 8 ? 2: ORD, DFW
CDN-enabled + + ? +
Authentication HMAC-SHA1 signing HMAC-SHA256signing OAuth 2.0Google cookie1-hr JWT token 24-hr token obtained from Cloud Authentication Service.
Time-limited shared access + + + +
File auto expiration + +
Server-side encryption +AES-256
PRICING
Storage per GB a month $0.055 – $0.095 for fully redundant

$0.037 – $0.076 for reduced redundacny

 

$0.055 – $0.095 for geo redundant

$0.037 – $ 0.07 for locally redundant storage

$0.054 – $0.085 for full redundancy

$0.042 – $0.063 reduced redundancy

$0.10
Outgoing bandwidth per GB $0.05 – $0.12 (US)

1GB /mo is free

$0.05 – $0.12 (UE/EU)

$0.12-$0.19 (Asia)

$0.08-$0.12 (US)

$0.15-$0.21 (Asia-Pacific)

$0.18
Incoming bandwidth $0 $0 $0 $0
PUT, POST, LIST $0.005 per 1000 $0.01 per 100 000 storage transactions $0.01 per 1000 $0
HEAD, GET, DELETE  $0.004 per 10 000(no charge for delete) $0.01 per 100 000 storage transactions $0.01 per 10 000(no charge for delete) $0

Update (Nov 23, 2013):  About 30% cost reduction by Google; up to 24% price reduction by Amazon ; same 24% reduction by Microsoft

UPDATE (Nov 5, 2012): Azure Strorage now has higher limits, way better internal connectivity network, and higher performance targets, see announcement.

2000 IPOS per partition, 20000 IPOS per storage account.

Brief reference on cloud storage

This is very brief and shallow comparison of data model and partitioning principles in Amazon S3 and Azure Storage. Please also see my feature comparison post of various storage platforms: https://timanovsky.wordpress.com/2012/10/26/comparison-of-cloud-storage-services/
Amazon S3
Getting most out of Amazon S3: http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html
Their storage directory is lexigraphically-sorted, and leftmost characters used as partition key. It is not said, but looks like you need to have your prefix tree balanced in order for partition balancing to work optimally. I.e. if you prefix with 0-9A-F as suggested in the article, amount of requests going to all 16 prefixes must be roughly the same. This underneath might mean that key space is always partitioned evenly – split into fixed amount of equal key ranges. That is totally my speculation, but otherwise I can not explain why such prefixes would matter.

Microsoft Azure Storage
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/12/30/windows-azure-storage-architecture-overview.aspx
http://blogs.msdn.com/b/windowsazurestorage/archive/2011/11/20/windows-azure-storage-a-highly-available-cloud-storage-service-with-strong-consistency.aspx
Having glanced over MS docs I’m under impression that Azure storage can split key ranges independently based on the load and size.
Update: The following quote shows that Azure is similar to S3, and I was wrong:

A downside of range partitioning is scaling out access to
sequential access patterns. For example, if a customer is writing
all of their data to the very end of a table’s key range (e.g., insert
key 2011-06-30:12:00:00, then key 2011-06-30:12:00:02, then
key 2011-06:30-12:00:10), all of the writes go to the very last
RangePartition in the customer’s table. This pattern does not take
advantage of the partitioning and load balancing our system
provides. In contrast, if the customer distributes their writes
across a large number of PartitionNames, the system can quickly
split the table into multiple RangePartitions and spread them
across different servers to allow performance to scale linearly
with load (as shown in Figure 6). To address this sequential
access pattern for RangePartitions, a customer can always use
hashing or bucketing for the PartitionName, which avoids the
above sequential access pattern issue.

Enabling ARCHIVE storage engine in IUS MySQL 5.1

IUS is great repo which allows seamless integration of MySQL 5.1 and Python 2.6 into CentOS systems (which have 5.0 and 2.4 versions). The only issue is that if you run ‘SHOW ENGINES’ it will only show you MRG_MYISAM, CSV, FEDERATED, InnoDB, MEMORY, MyISAM engines. I wanted to experiment with ARCHIVE storage engine for storing raw input BI events, which is basically JSON. ARCHIVE engine seems to be a good hit for this – it supports compression (of our highly redundant data) and auto increment, which is necessary to implement queue-like processing, how it goes should be a topic for the separate post. So I was puzzled when I didn’t see archive storage engine in MySQL by IUS. Initial googling only suggested that ARCHIVE storage engine is enabled at compile time, which was pretty sad and I couldn’t understand why on the Earth had they omitted it. Later I found this post suggesting that ARCHIVE storage engine can be installed as plugin, and I need to install separate YUM packages. Finding for those packages in the current repo gave no results. So I finally found this bugreport revealing taht a few plugins are actually installed as a part of mysql51-server package and you only need to enable it! So I went a head and

mysql> INSTALL PLUGIN archive SONAME 'ha_archive.so';
Query OK, 0 rows affected (0.02 sec)

And then

mysql> show engines \G

*************************** 7. row ***************************
Engine: ARCHIVE
Support: YES
Comment: Archive storage engine
Transactions: NO
XA: NO
Savepoints: NO
7 rows in set (0.00 sec)

Voila!

The following plugins are installed into the OS but not into MySQL:


ha_archive.so - ARCHIVE
ha_blackhole.so - BLACKHOLE
ha_example.so - EXAMPLE
ha_innodb_plugin.so - InnoDB Plugin

Enjoy!

Update: Experiment on 24M rows shows 11X compression ratio! From 437 bytes/row in InnoDB (no indexes) down to 38 bytes per row.

SSH: Convert OpenSSH to SSH2 and vise versa (via //burnz.blog)

SSH2 format is used by windows PuTTY. If you need to migrate your SSH keys from windows to linux/ mac os, then this article is useful.

Connecting two server running different type of SSH can be nightmare if you does not know how to convert the key. In this tutorial, I will try to explain on how to convert the public key from OpenSSH to SSH2 and SSH2 to OpenSSH. To convert the key, it must be done in OpenSSH server. Convert OpenSSH key to SSH2 key Run the OpenSSH version of ssh-keygen on your OpenSSH public key to convert it into the format needed by SSH2 on the remote machine. T … Read More

via //burnz.blog

Some myths about H.264 vs Theora and their use in HTML5 video tag.

I was planning to write this post some time ago, now I’ve got the last piece of puzzle [7].

Myth: H.264 – is proprietary codec, Theora – is open.
Facts: H.264 – is open international standard ISO/IEC 14496-10 [8], there exist open/free reference code and several open source implementations. Theora – is nonstandard codec with unpredictable roadmap, distributed as open source [5].

Myth: One have to pay royalties to use H.264 on web site
Facts: No if your video content is free (i.e. you are not selling videos on title-by-title or subscription basis), at least till the end of 2015 [10]. This date has been moved several times, chances are it will be moved yet again. After that date one will have to pay $2500/year if service has less than half a million users; $5K – if less than 1M users; $10K – over 1M users. The bill very reasonable even for startup. Additional protection is that license terms are fixed for five years, and royalties can not be increased by more than 10% at renewal. [4]

Myth: One have to pay for H.264 encoding and decoding applications.One have to pay if using browser with H.264.
Facts: Yes, each developer of H.264 encoding/decoding application or library has to pay $0.20 for each copy sold. If you are an end user there is no grounds to worry, it is application developer/distributor responsibility to pay the royalties. Paid application have royalties already included. Free applications like IE, Safari, iTunes, Flash will obviously remain free, as corporations will wave those royalties for users. Moreover some of the SW companies are among H.264 licensors, so they probably have special conditions agreements and pay much less if anything. It is not clear, but companies like Mozilla, Firefox maker, may be among losers, as they don’t have a way to pay royalties if they are found to have to. [4]

Myth: Theora – open source codec, therefore it could be no problems with patent infringement.
Facts: Video codecs (and audio too) is a knowledge area with extremely high patent density. Everything one can imagine has already been patented, that’s what millions bucks research budgets are spent for. So probability of Theora being patent-clear is zero.

Myth: On2 donated original (VP3) code which Theora is based on, and warranted no legal actions against anybody using it or its modifications. So no patent worries.
Facts: A) There is only protection from On2 legal actions [6] B) Development of Theora went quite a bit from original VP3 code, so it now may infringe some patents unintentionally. C) There is no warranties that original VP3 technology does not violate third party patents including ones in H.264 patent pool. D) Another fact to keep in mind, even obtaining MPEG-LA license does not warrant one is all covered, there could be other panets out of MPEG-LA pool [3]. E) We’ve got first confirmation that legal actions against Theora are possible – by Steve Jobs [7].

Myth: With broad support for Video tag in browsers Flash Video will die.
Facts: Not that soon. Video tag provides way fewer controls to developer – e.g. no caching and buffer size control. Live streaming is not supported yet, and is not in roadmaps.

Who is in which camp among SW giants?
Fact: Apple, Microsoft are among H.264 licensors pool; Adobe, Google are not. [2] Google seems to be trying to build its own video patent portfolio by acquiring On2 company [9].

Why industry seems to like H.264 codec more than free open source solutions?
This codec is now supported by Adobe Flash, MS Silverlight, Apple iTunes/QuickTime, it is supported by a lot of mobile phones, camcoders, digital television etc. And its support will become only wider. For all industry players it is very beneficial to have single universal standard codec, it allows for great technology reuse, simplifies and makes development cheaper, it benefits technology convergence. And after all it is just a very good codec delivering very good quality in wide bitrate ranges.

Conclusion:
We will see major perturbations if Theora success starts threaten business of big guys.

Update
1. Changed spelling to Firefox, as was required 🙂
2. Actually, H.264 use for free internet broadcast will remain royalty-free till the end of 2015 [10], as pointed out in
3. One more good article: http://www.engadget.com/2010/05/04/know-your-rights-h-264-patent-licensing-and-you/

Update 2
Another very good article on On2-Google VP8 now: http://x264dev.multimedia.cx/?p=377 It nicely describes overall quality of VP8 code, and patent issues with VP8 and MS’s VC-1. The later was also thought patent-free initially.

References
1. H.264 page at MPEG LA site: http://www.mpegla.com/main/programs/AVC/Pages/Intro.aspx
2. List of H.264 patent pool licensors – http://www.mpegla.com/main/programs/AVC/Pages/Licensors.aspx
3. MPEG LA FAQ http://www.mpegla.com/main/programs/AVC/Pages/FAQ.aspx
4. Summary of H.264 license terms by MPEG LA – http://www.mpegla.com/main/programs/AVC/Documents/AVC_TermsSummary.pdf
5. Theora codec page: http://www.theora.org
6. On2 statement about VP3 license: http://svn.xiph.org/trunk/theora/LICENSE
7. Apple prepares to go after open source codecs: http://blogs.fsfe.org/hugo/2010/04/open-letter-to-steve-jobs/
8. Official H.264 standard http://www.iso.org/iso/catalogue_detail.htm?csnumber=52974
9. Google acquires On2 Technologies – http://www.google.com/intl/en/press/pressrel/ir_20090805.html
10. H.264 will remain free for internet broadcast till end of 2015 http://www.mpegla.com/Lists/MPEG%20LA%20News%20List/Attachments/226/n-10-02-02.pdf

Fast IP->location resolution in SQL

Abstract
Many web services nowadays rely on users’ geo location information. It can be required purely for statistics purposes, or to add value to provided service itself. In most cases the only possible way to detect user location is to use IP address of originating request. IP addresses are not hierarchical in a geographical sense, large block of IP addresses is reserved to an ISP, and then broken down for smaller companies or individuals. There is no much order in the system. So a quick database is required in order to be able to make millions of lookups.

Geo location data
Surely enough companies exist which provide such IP-based geo location data. Maxmind is one of them. Fortunately they provide free version of their database under some legal constraints. There are other providers too, I’ve recently came across IpInfoDB. So far so good, now we need a system to make use of the data.

Physical database design
Data consists of (startIp, endIp, country) tuples. If IP address is in the range, it is originated from that country. There blind unassigned ranges, there are private ranges, and as said above multiple (very many) ranges map to a Country. In my table there is > 100K distinct IP ranges.

Physical design in MySQL:
CREATE TABLE LookupIpToCountry (
startIp int(10) unsigned NOT NULL,
endIp int(10) unsigned NOT NULL,
countryId tinyint(3) unsigned NOT NULL,
PRIMARY KEY (startIp),
KEY fk_countryId (countryId),
CONSTRAINT LookupIpToCountry_ibfk_1 FOREIGN KEY (countryId) REFERENCES Countries (id)
) ENGINE=InnoDB;

Other database or engine can be used, provided that startIp is clustering index, i.e. data in the table is physically sorted according to startIp in ascending order.

Fast lookup query
mysql> select * from LookupIpToCountry where startIp<=INET_ATON('74.125.19.99') and endIp>=INET_ATON('74.125.19.99') order by startIp desc limit 1;
+------------+------------+-----------+
| startIp | endIp | countryId |
+------------+------------+-----------+
| 1249693272 | 1249773023 | 230 |
+------------+------------+-----------+
1 row in set (0.00 sec)

The idea here is to find the first region into which IP of interest falls. SELECT ... WHERE startIp<= X ... order by startIp desc will navigate through B-tree index to the last row satisfying startIp criterion and setup scan iterator in descending order. (Because it can only validate second condition endIp>= X by scanning). But the iteration ends immediately because of LIMIT 1 if second condition is satisfied. So effectively whole query is reduced to B-tree lookup of one value. Of course, in case of the miss (IP address does not fall to any of known ranges), there is a performance hit:

mysql> select * from LookupIpToCountry where startIp<=INET_ATON('255.255.255.255') and endIp>=INET_ATON('255.255.255.255') order by startIp desc limit 1;
Empty set (0.39 sec)

In order to fix it you can simplify the query to select * from LookupIpToCountry where startIp<=INET_ATON('255.255.255.255') order by startIp desc limit 1; and check second condition endIp>=INET_ATON('255.255.255.255') at application level, if you think it is worth.