Making long story short: just use disk_log.
Actual history with concrete figures (I hate abstract discussions):
I first tried to implement logging as a separate singleton process (gen_server
). Formatting was heavily optimized but performed in server thread, the idea was to offload main service threads as much as possible, as I was working on latency-critical application. Output was plain text using raw file output. The best performance I could get was about 300 events/sec (on MacBook pro). And I came to infamous problem that due to scheduling implementation particularities in Erlang, mailbox of logger thread was continuously growing eventually leading to hard crash of Erlang VM (with malloc exception).
I moved all formatting work to calling threads. I used buffered writing at Erlang driver level (delayed_write
) and at logger process level (accumulating 1000 events before writing to disk). It worked slightly better – about 1000 events/sec but eventually got to the same problem of malloc out of memory exception due to mailbox memory overflow.
So I decided to give disk_log
a try. I used internal format (binary), I also changed to semi-synchronous model (using disk_log:log
and disk_log:log_terms
) so that multiple processes can be served by log module, but each process have to wait till it’s message is processed. The results are great : about 100K events/sec and memory overflow is inherently impossible. There are of course implications on parsing this logs but they absolutely worth the performance gain it gives.