Category Archives: Uncategorized

Clocks are not monotonic in Java

It turns out neither System.getCurrentTimeMillis() nor System.nanoTime() guarantee monotonic clock behaviour.

Refer to http://stackoverflow.com/questions/2978598/will-sytem-currenttimemillis-always-return-a-value-previous-calls

Advertisements

HTTP benchmark suites

Brief reference on HTTP benchmark I know

  • ab – Apache Benchmark, fairy simple
  • tsung – written in Erlang, supports multiple worker nodes
  • JMeter – witten in Java, supports multiple worker roles
  • Yandex-Tank – written in Python. Good documentation how to configure Linux kernel – https://github.com/yandex-load/yandex-tank
  • Loader.io – clous bases service

Note: I still to find my ideal tool. Creating a calibrated high load is not a trivial task. Simple hand-written load generator on Node.js provided 10%-20% better performance than ab. OS also plays a role here. I found it to be very hard to generate stable outgoing load from Linux machine. Occasionally outgoing TCP connection handshake would freeze for multiple of 3s intervals (tcp retry), or kernel will run out of ephemeral ports, as thousands of sockets are in FIN_WAIT state. I’m speaking about big RPS here, like > 1000 rps per core.

 

Brief reference on cloud storage

This is very brief and shallow comparison of data model and partitioning principles in Amazon S3 and Azure Storage. Please also see my feature comparison post of various storage platforms: https://timanovsky.wordpress.com/2012/10/26/comparison-of-cloud-storage-services/
Amazon S3
Getting most out of Amazon S3: http://aws.typepad.com/aws/2012/03/amazon-s3-performance-tips-tricks-seattle-hiring-event.html
Their storage directory is lexigraphically-sorted, and leftmost characters used as partition key. It is not said, but looks like you need to have your prefix tree balanced in order for partition balancing to work optimally. I.e. if you prefix with 0-9A-F as suggested in the article, amount of requests going to all 16 prefixes must be roughly the same. This underneath might mean that key space is always partitioned evenly – split into fixed amount of equal key ranges. That is totally my speculation, but otherwise I can not explain why such prefixes would matter.

Microsoft Azure Storage
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/05/10/windows-azure-storage-abstractions-and-their-scalability-targets.aspx
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/12/30/windows-azure-storage-architecture-overview.aspx
http://blogs.msdn.com/b/windowsazurestorage/archive/2011/11/20/windows-azure-storage-a-highly-available-cloud-storage-service-with-strong-consistency.aspx
Having glanced over MS docs I’m under impression that Azure storage can split key ranges independently based on the load and size.
Update: The following quote shows that Azure is similar to S3, and I was wrong:

A downside of range partitioning is scaling out access to
sequential access patterns. For example, if a customer is writing
all of their data to the very end of a table’s key range (e.g., insert
key 2011-06-30:12:00:00, then key 2011-06-30:12:00:02, then
key 2011-06:30-12:00:10), all of the writes go to the very last
RangePartition in the customer’s table. This pattern does not take
advantage of the partitioning and load balancing our system
provides. In contrast, if the customer distributes their writes
across a large number of PartitionNames, the system can quickly
split the table into multiple RangePartitions and spread them
across different servers to allow performance to scale linearly
with load (as shown in Figure 6). To address this sequential
access pattern for RangePartitions, a customer can always use
hashing or bucketing for the PartitionName, which avoids the
above sequential access pattern issue.

SSH: Convert OpenSSH to SSH2 and vise versa (via //burnz.blog)

SSH2 format is used by windows PuTTY. If you need to migrate your SSH keys from windows to linux/ mac os, then this article is useful.

Connecting two server running different type of SSH can be nightmare if you does not know how to convert the key. In this tutorial, I will try to explain on how to convert the public key from OpenSSH to SSH2 and SSH2 to OpenSSH. To convert the key, it must be done in OpenSSH server. Convert OpenSSH key to SSH2 key Run the OpenSSH version of ssh-keygen on your OpenSSH public key to convert it into the format needed by SSH2 on the remote machine. T … Read More

via //burnz.blog

Some myths about H.264 vs Theora and their use in HTML5 video tag.

I was planning to write this post some time ago, now I’ve got the last piece of puzzle [7].

Myth: H.264 – is proprietary codec, Theora – is open.
Facts: H.264 – is open international standard ISO/IEC 14496-10 [8], there exist open/free reference code and several open source implementations. Theora – is nonstandard codec with unpredictable roadmap, distributed as open source [5].

Myth: One have to pay royalties to use H.264 on web site
Facts: No if your video content is free (i.e. you are not selling videos on title-by-title or subscription basis), at least till the end of 2015 [10]. This date has been moved several times, chances are it will be moved yet again. After that date one will have to pay $2500/year if service has less than half a million users; $5K – if less than 1M users; $10K – over 1M users. The bill very reasonable even for startup. Additional protection is that license terms are fixed for five years, and royalties can not be increased by more than 10% at renewal. [4]

Myth: One have to pay for H.264 encoding and decoding applications.One have to pay if using browser with H.264.
Facts: Yes, each developer of H.264 encoding/decoding application or library has to pay $0.20 for each copy sold. If you are an end user there is no grounds to worry, it is application developer/distributor responsibility to pay the royalties. Paid application have royalties already included. Free applications like IE, Safari, iTunes, Flash will obviously remain free, as corporations will wave those royalties for users. Moreover some of the SW companies are among H.264 licensors, so they probably have special conditions agreements and pay much less if anything. It is not clear, but companies like Mozilla, Firefox maker, may be among losers, as they don’t have a way to pay royalties if they are found to have to. [4]

Myth: Theora – open source codec, therefore it could be no problems with patent infringement.
Facts: Video codecs (and audio too) is a knowledge area with extremely high patent density. Everything one can imagine has already been patented, that’s what millions bucks research budgets are spent for. So probability of Theora being patent-clear is zero.

Myth: On2 donated original (VP3) code which Theora is based on, and warranted no legal actions against anybody using it or its modifications. So no patent worries.
Facts: A) There is only protection from On2 legal actions [6] B) Development of Theora went quite a bit from original VP3 code, so it now may infringe some patents unintentionally. C) There is no warranties that original VP3 technology does not violate third party patents including ones in H.264 patent pool. D) Another fact to keep in mind, even obtaining MPEG-LA license does not warrant one is all covered, there could be other panets out of MPEG-LA pool [3]. E) We’ve got first confirmation that legal actions against Theora are possible – by Steve Jobs [7].

Myth: With broad support for Video tag in browsers Flash Video will die.
Facts: Not that soon. Video tag provides way fewer controls to developer – e.g. no caching and buffer size control. Live streaming is not supported yet, and is not in roadmaps.

Who is in which camp among SW giants?
Fact: Apple, Microsoft are among H.264 licensors pool; Adobe, Google are not. [2] Google seems to be trying to build its own video patent portfolio by acquiring On2 company [9].

Why industry seems to like H.264 codec more than free open source solutions?
This codec is now supported by Adobe Flash, MS Silverlight, Apple iTunes/QuickTime, it is supported by a lot of mobile phones, camcoders, digital television etc. And its support will become only wider. For all industry players it is very beneficial to have single universal standard codec, it allows for great technology reuse, simplifies and makes development cheaper, it benefits technology convergence. And after all it is just a very good codec delivering very good quality in wide bitrate ranges.

Conclusion:
We will see major perturbations if Theora success starts threaten business of big guys.

Update
1. Changed spelling to Firefox, as was required 🙂
2. Actually, H.264 use for free internet broadcast will remain royalty-free till the end of 2015 [10], as pointed out in
3. One more good article: http://www.engadget.com/2010/05/04/know-your-rights-h-264-patent-licensing-and-you/

Update 2
Another very good article on On2-Google VP8 now: http://x264dev.multimedia.cx/?p=377 It nicely describes overall quality of VP8 code, and patent issues with VP8 and MS’s VC-1. The later was also thought patent-free initially.

References
1. H.264 page at MPEG LA site: http://www.mpegla.com/main/programs/AVC/Pages/Intro.aspx
2. List of H.264 patent pool licensors – http://www.mpegla.com/main/programs/AVC/Pages/Licensors.aspx
3. MPEG LA FAQ http://www.mpegla.com/main/programs/AVC/Pages/FAQ.aspx
4. Summary of H.264 license terms by MPEG LA – http://www.mpegla.com/main/programs/AVC/Documents/AVC_TermsSummary.pdf
5. Theora codec page: http://www.theora.org
6. On2 statement about VP3 license: http://svn.xiph.org/trunk/theora/LICENSE
7. Apple prepares to go after open source codecs: http://blogs.fsfe.org/hugo/2010/04/open-letter-to-steve-jobs/
8. Official H.264 standard http://www.iso.org/iso/catalogue_detail.htm?csnumber=52974
9. Google acquires On2 Technologies – http://www.google.com/intl/en/press/pressrel/ir_20090805.html
10. H.264 will remain free for internet broadcast till end of 2015 http://www.mpegla.com/Lists/MPEG%20LA%20News%20List/Attachments/226/n-10-02-02.pdf

Fix Eclipse Config to Start with Broken Perspective

I happened to switch to Ruby Browsing Perspective in Eclipse eventually, for whatever reason it crashed Eclipse with Out of Memory exception. When I try to restart it won’t, because it tries to load all the same Perspective. The solution is to edit one of Eclipse config files. The file of interest is "workspace/.metadata/.plugins/org.eclipse.ui.workbench/workbench.xml" (in my case on Mac it is ~Documents/workspace/.metadata/.plugins/org.eclipse.ui.workbench/workbench.xml)

The default perspective is configured by the following directive:
<perspectives activePart="org.rubypeople.rdt.ui.EditorRubyFile" activePerspective="org.rubypeople.rdt.ui.RubyBrowsingPerspective">

So what I had to do is replace org.rubypeople.rdt.ui.RubyBrowsingPerspective with org.eclipse.jdt.ui.JavaPerspective

You can find for exact class names of available perspectives in the same file. Perspective declaration looks like:

<perspective editorAreaTrimState="2" editorAreaVisible="1" fixed="0" version="0.016">
<descriptor class="org.eclipse.jdt.internal.ui.JavaPerspectiveFactory" id="org.eclipse.jdt.ui.JavaPerspective" label="Java"/>
...

First line is opening of perspective declaration, and <descriptor> tag contains id attribute which is what you want. If you search trough workbench.xml you will find all available perspective names.

Tuning Linux firewall connection tracker ip_conntrack

Overview
If your Linux server should handle lots of connections, you can get into the problem with ip_conntrack iptables module. It limits number of simultaneous connections your system can have. Default value (in CentOS and most other distros) is 65536.

To check how many entries in the conntrack table are occupied at the moment:

cat /proc/sys/net/ipv4/netfilter/ip_conntrack_count

Or you can dump whole table :

cat /proc/net/ip_conntrack

Conntrack table is hash table (hash map) of fixed size (8192 entries by default), which is used for primary lookup. When the slot in the table is found it points to list of conntrack structures, so secondary lookup is done using list traversal. 65536/8192 gives 8 – the average list length. You may want to experiment with this value on heavily loaded systems.

Modifying conntrack capacity
To see the current conntrack capacity:

cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max

You can modify it by echoing new value there:

# echo 131072 > /proc/sys/net/ipv4/netfilter/ip_conntrack_max
# cat /proc/sys/net/ipv4/netfilter/ip_conntrack_max
131072

Changes are immediate, but temporary – will not survive reboot.

Modifying number of buckets in the hash table
As mentioned above just changing this parameter will give you some relief, if your server was at the cap, but it is not ideal setup. For 1M connections average list becomes 1048576 / 8192 = 128, which is a bit too much.

To see current size of hash table:

cat /proc/sys/net/ipv4/netfilter/ip_conntrack_buckets

which is read-only aliase for module parameter:

cat /sys/module/ip_conntrack/parameters/hashsize

You can change it on the fly as well:

#echo 32768 > /sys/module/ip_conntrack/parameters/hashsize
# cat /sys/module/ip_conntrack/parameters/hashsize
32768

Persisting the changes
Making these changes persistent is a bit tricky.
For total number of connection just edit /etc/sysctl.conf (CentOs, Redhat etc) and you are done:

# conntrack limits
net.ipv4.netfilter.ip_conntrack_max = 131072

Not so easy with hashtable size. You need to pass parameters to kerenl module at boot time. Edit add to /etc/modprobe.conf:

options ip_conntrack hashsize=32768

Memory usage
You can find how much kernel memory each conntrack entry occupies by grepping /var/log/messages :

ip_conntrack version 2.4 (8192 buckets, 65536 max) - 304 bytes per conntrack

1M connections would require 304MB of kernel memory.