Many web services nowadays rely on users’ geo location information. It can be required purely for statistics purposes, or to add value to provided service itself. In most cases the only possible way to detect user location is to use IP address of originating request. IP addresses are not hierarchical in a geographical sense, large block of IP addresses is reserved to an ISP, and then broken down for smaller companies or individuals. There is no much order in the system. So a quick database is required in order to be able to make millions of lookups.
Geo location data
Surely enough companies exist which provide such IP-based geo location data. Maxmind is one of them. Fortunately they provide free version of their database under some legal constraints. There are other providers too, I’ve recently came across IpInfoDB. So far so good, now we need a system to make use of the data.
Physical database design
Data consists of (startIp, endIp, country) tuples. If IP address is in the range, it is originated from that country. There blind unassigned ranges, there are private ranges, and as said above multiple (very many) ranges map to a Country. In my table there is > 100K distinct IP ranges.
Physical design in MySQL:
CREATE TABLE LookupIpToCountry (
startIp int(10) unsigned NOT NULL,
endIp int(10) unsigned NOT NULL,
countryId tinyint(3) unsigned NOT NULL,
PRIMARY KEY (startIp),
KEY fk_countryId (countryId),
CONSTRAINT LookupIpToCountry_ibfk_1 FOREIGN KEY (countryId) REFERENCES Countries (id)
Other database or engine can be used, provided that
startIp is clustering index, i.e. data in the table is physically sorted according to startIp in ascending order.
Fast lookup query
mysql> select * from LookupIpToCountry where startIp<=INET_ATON('22.214.171.124') and endIp>=INET_ATON('126.96.36.199') order by startIp desc limit 1;
| startIp | endIp | countryId |
| 1249693272 | 1249773023 | 230 |
1 row in set (0.00 sec)
The idea here is to find the first region into which IP of interest falls.
SELECT ... WHERE startIp<= X ... order by startIp desc will navigate through B-tree index to the last row satisfying
startIp criterion and setup scan iterator in descending order. (Because it can only validate second condition
endIp>= X by scanning). But the iteration ends immediately because of
LIMIT 1 if second condition is satisfied. So effectively whole query is reduced to B-tree lookup of one value. Of course, in case of the miss (IP address does not fall to any of known ranges), there is a performance hit:
mysql> select * from LookupIpToCountry where startIp<=INET_ATON('255.255.255.255') and endIp>=INET_ATON('255.255.255.255') order by startIp desc limit 1;
Empty set (0.39 sec)
In order to fix it you can simplify the query to
select * from LookupIpToCountry where startIp<=INET_ATON('255.255.255.255') order by startIp desc limit 1; and check second condition
endIp>=INET_ATON('255.255.255.255') at application level, if you think it is worth.