Monthly Archives: January 2009

Profiling a Running Erlang Server

Say you want to profile a running (may be even production) Erlang server. You would do it with fprof. It is relatively easy to profile a function, just follow documentation, let’s see how to profile application.

  1. Start profiling for all processes of interest
    fprof:trace([start, {procs, [whereis(pid1), whereis(pid2)]}]).
    pid1, pid2 etc – registered processes. Fprof will profile them and all spawned processes, so depending on your architecture it is enough to include single process which listens on socket and accepts connections. Documentation states that whereis is not necessary, but it doesn’t work for me otherwise.
  2. After a while stop profiling. Note that trace files are really big, and processing them in consequent steps takes quite awhile, so the the first time you wouldn’t want to run profiling for the whole day 🙂 Just try 30-60 seconds to begin with. Also keep in mind that load will increase 5-10 times, so if you test in on production server, make sure you have enough resources 🙂
  3. Process data. This will process raw data and save result to ‘fprof.trace’ file, or you can give it other name so you can find and load it later.
  4. Analyse and save data to human readable text file – ‘fprof.analysis’
    fprof:analyse([totals, {dest, "fprof.analysis"}]).
  5. Clear all memory. This makes sense if you want to let server continue.

I will not explain how to read resulting text file, they are a bit cryptic and you have to read fprof documentation, frankly speaking I don’t understand it fully 🙂

Toward a million-user long-poll HTTP application – nginx + erlang + mochiweb :)

First, this post and title are inspired by the following great article, you absolutely must read if you are interested in the subject:

I’m quite far from 1M users, but still getting a load which out-of-the-box configuration can not cope with. What I’m doing is long-poll HTTP server which implements chat and some other real-time notifications for web clients. Nginx is used as a reverse proxy (HTTP router basically). Let’s say we want to handle N connections simultaneously (N should be much bigger than 1000 🙂 ). What parameters need to be changed? Note: all numbers are approximate.

1. Nginx
Number of file handlers (descriptors) provided by OS. It is configured in the script which launches nginx. In my case it is /etc/init.d/nginx. So I just add

ulimit -n N*2

Mind that proxy naturally uses two sockets per connection (uplink and downlink).

Now let’s take a look at Nginx config file (/etc/nginx/nginx.conf for me). Here is extract from Nginx manual on events module:

The worker_connections and worker_proceses from the main section allows you to calculate maxclients value:
max_clients = worker_processes * worker_connections
In a reverse proxy situation, max_clients becomes
max_clients = worker_processes * worker_connections/4

So we have to set

events {
worker_connections N*4;

Then as we use Nginx as a proxy not for a regular HTTP server, but for long-polling one, we have to tune Nginx parameters:

location /my_url {
proxy_buffering off;
proxy_read_timeout 3600;

So we forward all requests to “/my_url” to HTTP server running on localhost port 8000. We disable buffering in Nginx, and tell that it may take a while for the server to respond, so Nginx should wait and not timeout.

ok, we are done with Nginx, let’s go to the server

2. Erlang
Again in start script we set number of available file descriptors:

ulimit -n N

Then we are passing two additional params to erlang:

erl +K true +P N ...your params here

+K option enables kernel polling, which saves some CPU cycles when handling multitude of connections. +P says how many parallel processes Eralng VM can have, by default it is 32768. My application is based on mochiweb library and uses one process per connection (usual case for Erlang server applications), so we need to have at least N processes. Note that some recommend to set it to maximum possible value, roughly 137M, but It leads to that Erlang VM allocates about 1GB of memory, not a big deal per se as it is in virtual address space, but I can imagine what internal reference tables for this memory heap and erlang process tables/mailboxes/whatever are also big and can cause some overhead. So I’d prefer to be on a safer side and set it to the really required value.

3. Application
The last thing we want is tell mochiweb server that we want more than default 2048 simultaneous connections. So I changing start parameters:

mochiweb_http:start([{max, N },{name, ?MODULE}, {loop, Loop} | Options]).