Tag Archives: MySQL

Enabling ARCHIVE storage engine in IUS MySQL 5.1

IUS is great repo which allows seamless integration of MySQL 5.1 and Python 2.6 into CentOS systems (which have 5.0 and 2.4 versions). The only issue is that if you run ‘SHOW ENGINES’ it will only show you MRG_MYISAM, CSV, FEDERATED, InnoDB, MEMORY, MyISAM engines. I wanted to experiment with ARCHIVE storage engine for storing raw input BI events, which is basically JSON. ARCHIVE engine seems to be a good hit for this – it supports compression (of our highly redundant data) and auto increment, which is necessary to implement queue-like processing, how it goes should be a topic for the separate post. So I was puzzled when I didn’t see archive storage engine in MySQL by IUS. Initial googling only suggested that ARCHIVE storage engine is enabled at compile time, which was pretty sad and I couldn’t understand why on the Earth had they omitted it. Later I found this post suggesting that ARCHIVE storage engine can be installed as plugin, and I need to install separate YUM packages. Finding for those packages in the current repo gave no results. So I finally found this bugreport revealing taht a few plugins are actually installed as a part of mysql51-server package and you only need to enable it! So I went a head and

mysql> INSTALL PLUGIN archive SONAME 'ha_archive.so';
Query OK, 0 rows affected (0.02 sec)

And then

mysql> show engines \G

*************************** 7. row ***************************
Engine: ARCHIVE
Support: YES
Comment: Archive storage engine
Transactions: NO
XA: NO
Savepoints: NO
7 rows in set (0.00 sec)

Voila!

The following plugins are installed into the OS but not into MySQL:


ha_archive.so - ARCHIVE
ha_blackhole.so - BLACKHOLE
ha_example.so - EXAMPLE
ha_innodb_plugin.so - InnoDB Plugin

Enjoy!

Update: Experiment on 24M rows shows 11X compression ratio! From 437 bytes/row in InnoDB (no indexes) down to 38 bytes per row.

Fast IP->location resolution in SQL

Abstract
Many web services nowadays rely on users’ geo location information. It can be required purely for statistics purposes, or to add value to provided service itself. In most cases the only possible way to detect user location is to use IP address of originating request. IP addresses are not hierarchical in a geographical sense, large block of IP addresses is reserved to an ISP, and then broken down for smaller companies or individuals. There is no much order in the system. So a quick database is required in order to be able to make millions of lookups.

Geo location data
Surely enough companies exist which provide such IP-based geo location data. Maxmind is one of them. Fortunately they provide free version of their database under some legal constraints. There are other providers too, I’ve recently came across IpInfoDB. So far so good, now we need a system to make use of the data.

Physical database design
Data consists of (startIp, endIp, country) tuples. If IP address is in the range, it is originated from that country. There blind unassigned ranges, there are private ranges, and as said above multiple (very many) ranges map to a Country. In my table there is > 100K distinct IP ranges.

Physical design in MySQL:
CREATE TABLE LookupIpToCountry (
startIp int(10) unsigned NOT NULL,
endIp int(10) unsigned NOT NULL,
countryId tinyint(3) unsigned NOT NULL,
PRIMARY KEY (startIp),
KEY fk_countryId (countryId),
CONSTRAINT LookupIpToCountry_ibfk_1 FOREIGN KEY (countryId) REFERENCES Countries (id)
) ENGINE=InnoDB;

Other database or engine can be used, provided that startIp is clustering index, i.e. data in the table is physically sorted according to startIp in ascending order.

Fast lookup query
mysql> select * from LookupIpToCountry where startIp<=INET_ATON('74.125.19.99') and endIp>=INET_ATON('74.125.19.99') order by startIp desc limit 1;
+------------+------------+-----------+
| startIp | endIp | countryId |
+------------+------------+-----------+
| 1249693272 | 1249773023 | 230 |
+------------+------------+-----------+
1 row in set (0.00 sec)

The idea here is to find the first region into which IP of interest falls. SELECT ... WHERE startIp<= X ... order by startIp desc will navigate through B-tree index to the last row satisfying startIp criterion and setup scan iterator in descending order. (Because it can only validate second condition endIp>= X by scanning). But the iteration ends immediately because of LIMIT 1 if second condition is satisfied. So effectively whole query is reduced to B-tree lookup of one value. Of course, in case of the miss (IP address does not fall to any of known ranges), there is a performance hit:

mysql> select * from LookupIpToCountry where startIp<=INET_ATON('255.255.255.255') and endIp>=INET_ATON('255.255.255.255') order by startIp desc limit 1;
Empty set (0.39 sec)

In order to fix it you can simplify the query to select * from LookupIpToCountry where startIp<=INET_ATON('255.255.255.255') order by startIp desc limit 1; and check second condition endIp>=INET_ATON('255.255.255.255') at application level, if you think it is worth.

Get Unix timestamp in Java, Python, Ruby, Erlang, JavaScript, Go, MySQL

Mostly a note for myself. To Get Unix timestamp value in seconds

Java:

long timestamp = System.currentTimeMillis()/1000

Python:

import time
timestamp = int(time.time())

JavaScript:

var ts = Math.floor(Date.now()/1000); // You can also use new Date().getTime()/1000 but this one is faster

Ruby:

require 'time'
ts = Time.now.to_f

Go:

package main
import (
 "time"
 "fmt"
)

func main() {
 // Get and print integer timestamp
 timestamp := time.Now().Unix()
 fmt.Printf("unixtime: %d\n", timestamp)
 // get and print floating point timestamp with sub-second precision
 timestampFloat := float64(time.Now().UnixNano()) / 1.0e9
 fmt.Printf("unixtime: %f\n", timestampFloat)
//Convert integer timestamp back to Time:
 t1 := time.Unix(timestamp, 0)
 fmt.Printf("Time: %v\n", t1)</pre>
//Convert floating point timestamp back to Time
 t2 := time.Unix(int64(timestampFloat), int64((timestampFloat - float64(int64(timestampFloat)))*1e9))
 fmt.Printf("Time: %v\n", t2)
 }

Erlang:

{Mega, Secs, _} = now(),
Timestamp = Mega*1000000 + Secs,

Working with timestamps in MySQL


mysql> SELECT UNIX_TIMESTAMP('1997-10-04 22:23:00');
-> 875996580
mysql> SELECT FROM_UNIXTIME(1111885200);
+---------------------------+
| FROM_UNIXTIME(1111885200) |
+---------------------------+
| 2005-03-27 03:00:00 |
+---------------------------+

Disappointment about Hibernate and MySQL

One-to-one unidirectional relation implementation in Hibernate sucks. You implement it using <many-to-one ... unique="true">. This creates two indexes in the table for the same field, one for uniqueness constraint, one for foreign key constraint.

Class inheritance when using table-per-subclass is implemented in similar way, again two indexes per field (ID) in child table, one created as primary key constraint, second as foreign key constraint referencing parent class table.

And finally dropping an index in MySQL InnoDB table is very expensive operation – it creates temporary table and reindexes it with remaining indexes. So if you want to drop 4 indexes it will re-create and re-index table 4 times. This can take many hours on any reasonably large database.

I’m disappointed.