Monthly Archives: February 2009

Disappointment about Hibernate and MySQL

One-to-one unidirectional relation implementation in Hibernate sucks. You implement it using <many-to-one ... unique="true">. This creates two indexes in the table for the same field, one for uniqueness constraint, one for foreign key constraint.

Class inheritance when using table-per-subclass is implemented in similar way, again two indexes per field (ID) in child table, one created as primary key constraint, second as foreign key constraint referencing parent class table.

And finally dropping an index in MySQL InnoDB table is very expensive operation – it creates temporary table and reindexes it with remaining indexes. So if you want to drop 4 indexes it will re-create and re-index table 4 times. This can take many hours on any reasonably large database.

I’m disappointed.

Correct implementation of fast server logging in Erlang

Making long story short: just use disk_log.

Actual history with concrete figures (I hate abstract discussions):
I first tried to implement logging as a separate singleton process (gen_server). Formatting was heavily optimized but performed in server thread, the idea was to offload main service threads as much as possible, as I was working on latency-critical application. Output was plain text using raw file output. The best performance I could get was about 300 events/sec (on MacBook pro). And I came to infamous problem that due to scheduling implementation particularities in Erlang, mailbox of logger thread was continuously growing eventually leading to hard crash of Erlang VM (with malloc exception).

I moved all formatting work to calling threads. I used buffered writing at Erlang driver level (delayed_write) and at logger process level (accumulating 1000 events before writing to disk). It worked slightly better – about 1000 events/sec but eventually got to the same problem of malloc out of memory exception due to mailbox memory overflow.

So I decided to give disk_log a try. I used internal format (binary), I also changed to semi-synchronous model (using disk_log:log and disk_log:log_terms) so that multiple processes can be served by log module, but each process have to wait till it’s message is processed. The results are great : about 100K events/sec and memory overflow is inherently impossible. There are of course implications on parsing this logs but they absolutely worth the performance gain it gives.