Randal Hanak's Blog: December 2012

Sunday, December 23, 2012

Node.js video

Watched the below video on node and hadn't heard the term green threads. Basically green threads are just another name for user space threads which I knew.

He also mentions coroutines.

http://en.wikipedia.org/wiki/Coroutine

http://en.wikipedia.org/wiki/Continuation

http://jsconf.eu/2009/video_nodejs_by_ryan_dahl.html

Node.js Asynchronous i/o under the hood

I had a conversation with a friend at lunch about how node does async i/o under the covers.

We were basically arguing over whether or not node was truly single threaded or whether or not some operations launched a background thread to handle the operation. Looks like the answer is yes in some cases another thread is launched. For instance:

libeio (by Marc Lehmann) is a fantastic idea: it relies on POSIX threads to provide an asynchronous version of the POSIX API: open, close, stat, unlink, fdatasync, mknod, etc. It would be nice if UNIX kernels provided this asynchronism natively, because the overhead of using a thread for a stat() call when the inode data is already cached in memory is significant.

http://blog.zorinaq.com/?e=34

ZeroMQ Internal Architecture

Does Zeromq simply retry every so many milliseconds for the endpoint during disconnected operation?
Answer: Yes I believe so with exponential backoff.

http://od-eon.com/blogs/stefan/checking-availability-zeromq-endpoint/

Two cool things I got out of this article.

1. ØMQ's concurency model may a bit confusing at first. The reason is that we eat our own dogfood and use message passing to achieve concurrency and internal scalability. Thus, even though ØMQ is a multithreaded application you won't find mutexes, condition variables or semaphores meant to orchestrate the parallel processing. Instead, each object will live in its own thread and no other thread will ever touch it (that's why mutexes are not needed). Other threads will communicate with the object by sending it messages (called 'commands' to distinguish them for user-level ØMQ messages). Same way the object can speak to other objects — potentially running in different threads — by sending them 'commands'.

2. The requirements for messages are rather complex. The main reason for complexity is that the implementation should be very efficient for both very small and very large messages. The requirements are as follows:
For very small messages it's cheaper to copy the message than keep the shared data on the heap. These messages thus have no associated buffer and the data are stored directly in the zmq_msg_t structure — presumably on the stack. This has huge impact on performance as it almost entirely avoids need to do memory allocations/deallocations.
When using inproc transport, message should never be copied. Thus, the buffer sent in one thread, should be received in the other thread and get deallocated there.
Messages should support reference counting. Thus, if a message is published to many different TCP connections, all the sending I/O threads access the same buffer rather then copying the buffer for rach I/O thread or TCP connection.
The same trick should be accessible to user, so that he can send same physical buffer to multiple ØMQ sockets without need to copy the content.
User should be able to send buffer that was allocated by application-specific allocation mechanism without need to copy the data. This is especially important with legacy applications which allocate large amounts of data.

http://www.zeromq.org/whitepapers:architecture

Considerations in Building a Large Infrastructure

Analyze your types of servers for instance Facebook had 5 types of servers:
Web page servers, database computers, data storage systems, news feed servers and something called memcache servers

Possible infrastructure I could imagine being implemented

DNS Load balancers Web servers Databases and Data Storage
round --> 1 ---------------> group of web servers Session Storage instead of sticky sessions on a web server?
robin 2 ---------------> 2nd group of web servers Databases SQL or NoSQL distributed for failover and also size of data
with Finally Batch Processing and Analytics
Health Check

Good article on load balancing

DNS roundrobin is excellent to increase capacity, by distributing the load across multiple points (potentially geographically distributed). But it does not provide fail-over. You must first describe what type of failure you are trying to cover. A server failure must be covered locally using a standard IP address takeover mechanism (VRRP, CARP, ...). A switch failure is covered by resilient links on the server to two switches. A WAN link failure can be covered by a multi-link setup between you and your provider, using either a routing protocol or a layer2 solution (eg: multi-link PPP). A site failure should be covered by BGP : your IP addresses are replicated over multiple sites and you announce them to the net only where they are available.

http://serverfault.com/questions/101053/is-round-robin-dns-good-enough-for-load-balancing-static-content
http://stackoverflow.com/questions/1472214/load-balancing-dns-round-robin-in-front-of-hardware-load-balancers-how-to-shar

Mechanical Turk

Farm out work to humans that cannot easily be done by a computer.

One worry is for the quality of the information, but they solve this by duplicating the work to multiple individuals and comparing their outcomes. If the outcomes don't match you can send the two outcomes to a third worker for rectification.

What is HMAC Authentication and why is it useful?

HMAC - Hash-based message authentication code

Doesn't by itself protect against replay attacks but could be extended to include a incrementing number in requests. Then if a request is
received with a number previously received the request will not be satisfied.

http://wolfe.id.au/2012/10/20/what-is-hmac-and-why-is-it-useful/

Python module search path

Recently a coworker came to me asking why after installing a module via pip that he couldn't import it into python.

Turns out pip was installing to a directory that wasn't in the search path aka sys.path

You can check your sys.path by saying "import sys" followed by "sys.path"

Make sure the directory pip is installing your module into is in sys.path

Here pass the "-v" to pip to find out where it installed the module like so:

sudo pip install nltk -v

http://docs.python.org/2/tutorial/modules.html

Two-factor authentication For Increased Security

Two-factor authentication (TFA, T-FA or 2FA) is an approach to authentication which requires the presentation of two or more of the three authentication factors: a knowledge factor ("something the user knows"), a possession factor ("something the user has"), and an inherence factor ("something the user is").

Maybe with something like Yubikey from http://www.yubico.com/

http://en.wikipedia.org/wiki/Multi-factor_authentication

Hashing Passwords Security Attacks

"Salt doesn't prevent dictionary attacks, just precalculated dictionary attacks. In particular, it protects against rainbow tables (http://en.wikipedia.org/wiki/Rainbow_table) and also ensures that cracking one user's password doesn't automatically let you crack any user who shares that password."

http://stackoverflow.com/questions/1111494/are-hashed-and-salted-passwords-secure-against-dictionary-attacks

Dictionary and Brute Force Attacks
Lookup Tables
Reverse Lookup Tables
Rainbow Tables

http://crackstation.net/hashing-security.htm

Here's how to avoid those problems:
Set-Cookie: userName=Alice; authCode=eeba95a4...
Where: authCode=HMAC(ROWID, userName + ipAddr)
When you receive this cookie, look up the user in the database, recompute/verify the authCode in the cookie, using ROWID and ip address of the request. No need to store cookies in the database.
For extra crypto points, throw a salt parameter into the mix:
Set-Cookie: userName=Alice; salt=59843...; authCode=eeba9...
Where: authCode=HMAC(ROWID, userName + ipAddr + salt)
Salt value is generated randomly for every cookie you produce. There's no need to keep it a secret.

http://stackoverflow.com/questions/8529196/cookie-security?rq=1

Here is what you do need to store on the server - in order to authenticate each request.
UserId (obvious)
CookieHash (made out of userId, some secret private key and crypto randomly generated number)
LastLogin
SessionRenewed (useful for when to cancel someone's session eg. renew cookieHash every 10 min, otherwise log out user)
LastIP
What I store in cookie is following
UserId
CookieHash

http://stackoverflow.com/questions/6010567/how-to-remember-users-with-cookies-in-a-secure-way

Sharding Ids at Instagram

Generate IDs in web application
Generate IDs through dedicated service
Twitter Snowflake
DB Ticket Servers
Their Solution
Uses PostgresSQL Schemas and a time component along with their custom epoch

http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

Writing endian-independent code in C

I came across this article after reading that 0MQ considers your message to be a binary blob so if for instance you are exchanging integers over the network between different endian machines you must make sure to convert back and forth.

I think the simple solution is to always convert integers to network byte order when leaving a machine. Then on the receiving end you convert the host byte order. Problem Solved.

http://www.ibm.com/developerworks/aix/library/au-endianc/index.html?ca=drs-

Protocol Buffers

I was led to this by the following statement. "ØMQ doesn't know anything about the data you send except its size in bytes. That means you are responsible for formatting it safely so that applications can read it back. Doing this for objects and complex data types is a job for specialized libraries like Protocol Buffers."

Protocol Buffers are widely used at Google for storing and interchanging all kinds of structured information. Protocol Buffers serve as a basis for a custom remote procedure call (RPC) system that is used for nearly all inter-machine communication at Google.[3]

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python.
https://developers.google.com/protocol-buffers/docs/overview

A friend of mine mentioned why would I use this versus something like java serialization. Here are a couple good reasons to use protocol buffers versus java serialization.
http://stackoverflow.com/questions/647779/high-performance-serialization-java-vs-google-protocol-buffers-vs

ZeroMQ True Peer Connectivity (Harmony Pattern)

Since ØMQ is designed to make distributed messaging easy, people often ask how to interconnect a set of true peers (as compared to obvious clients and servers). It is a thorny question and ØMQ doesn't really provide a single clear answer.

TCP, which is the most commonly-used transport in ØMQ, is not symmetric; one side must bind and one must connect and though ØMQ tries to be neutral about this, it's not. When you connect, you create an outgoing message pipe. When you bind, you do not. When there is no pipe, you cannot write messages (ØMQ will return EAGAIN).

Developers who study ØMQ and then try to create N-to-N connections between sets of equal peers often try a ROUTER-to-ROUTER flow. It's obvious why: each peer needs to address a set of peers, which requires ROUTER. It usually ends with a plaintive email to the list.

My conclusion after trying several times from different angles is that ROUTER-to-ROUTER does not work. And the ØMQ reference manual does not allow it when it discusses ROUTER sockets in zmq_socket(). At a minimum, one peer must bind and one must connect, meaning the architecture is not symmetrical. But also because you simply can't tell when you are allowed to safely send a message to a peer. It's Catch-22: you can talk to a peer after it's talked to you. But the peer can't talk to you until you've talked to it. One side or the other will be losing messages and thus has to retry, which means the peers cannot be equal.

I'm going to explain the Harmony pattern, which solves this problem, and which we use in Zyre.

We want a guarantee that when a peer "appears" on our network, we can talk to it safely, without ØMQ dropping messages. For this, we have to use a DEALER or PUSH socket which connects out to the peer so that even if that connection takes some non-zero time, there is immediately a pipe, and ØMQ will accept outgoing messages.

A DEALER socket cannot address multiple peers individually. But if we have one DEALER per peer, and we connect that DEALER to the peer, we can safely send messages to a peer as soon as we've connected to it.

Now, the next problem is to know who sent us a particular message. We need a reply address, that is the UUID of the node who sent any given message. DEALER can't do this unless we prefix every single message with that 16-byte UUID, which would be wasteful. ROUTER does, if we set the identity properly before connecting to the router.

And so the Harmony pattern comes down to:

One ROUTER socket that we bind to a ephemeral port, which we broadcast in our beacons.
One DEALER socket per peer that we connect to the peer's ROUTER socket.
Reading from our ROUTER socket.
Writing to the peer's DEALER socket.
Next problem is that discovery isn't neatly synchronized. We can get the first beacon from a peer after we start to receive messages from it. A message comes in on the ROUTER socket and has a nice UUID attached to it. But no physical IP address and port. We have to force discovery over TCP. To do this, our first command to any new peer we connect to is an OHAI command with our IP address and port. This ensure that the receiver connects back to us before trying to send us any command.

Breaking this down into steps:

If we receive a UDP beacon we connect to the peer.
We read messages from our ROUTER socket, and each message comes with the UUID of the sender.
If it's an OHAI message we connect back to that peer if not already connected to it.
If it's any other message, we must already be connected to the peer (a good place for an assertion).
We send messages to each peer using a dedicated per-peer DEALER socket, which must be connected.
When we connect to a peer we also tell our application that the peer exists.
Every time we get a message from a peer, we treat that as a heartbeat (it's alive).
If we were not using UDP but some other discovery mechanism, I'd still use the Harmony pattern for a true peer network: one ROUTER for input from all peers, and one DEALER per peer for output. Bind the ROUTER, connect the DEALER, and start each conversation with an OHAI equivalent that provides the return IP address and port. You would need some external mechanism to bootstrap each connection.

Easy ØMQ Patterns

Request/Reply:
Publish-Subscribe:

A subscriber can connect to more than one publisher, using one 'connect' call each time. Data will then arrive and be interleaved ("fair-queued") so that no single publisher drowns out the others.
If a publisher has no connected subscribers, then it will simply drop all messages.
If you're using TCP, and a subscriber is slow, messages will queue up on the publisher. We'll look at how to protect publishers against this, using the "high-water mark" later.
From ØMQ 3.x, filtering happens at the publisher side, when using a connected protocol (tcp: or ipc:). Using the epgm:// protocol, filtering happens at the subscriber side. In ØMQ/2.x, all filtering happened at the subscriber side.

Pipeline:

Avoiding Classloader leaks

"OutOfMemoryError: PermGen" is a very common message to see after a few redeploys. The reason why it's so common is that it's amazingly easy to leak a class loader. It's enough to hold a single outside reference to an object instantiated from a class loaded by the said class loader to prevent that class loader from being GC-d.

http://java.dzone.com/articles/classloaderlocal-how-avoid

How to become a better programmer?

The below article is okay and I don't completely agree with everything he says but worthwhile...

1. It doesn’t matter how many years experience in carpentry you have had or how well you can design furniture or cabinetry if every time you try to cut wood you struggle with making the cuts.
Cutting wood is a base skill of carpentry, just like problem solving is the base skill of software development.

2. There is probably no more important skill in life than learning to learn.

3. When people ask me what I do all day, I mostly say “read things other people name and name things.”

http://java.dzone.com/articles/4-most-important-skills

You may have heard someone say there is a difference between a programmer with 10 years of experience and a programmer with 1 year of experience 10 times.

http://simpleprogrammer.com/2010/04/02/so-you-want-to-become-a-better-programmer-topcoder/

http://simpleprogrammer.com/2010/10/06/why-hard-interviews-are-good/

Different types of garbage collection -- quite interesting.

Mark and sweep (MS)

Starting from a known root set, the GC traces all reachable memory objects by following pointers. Objects reached in this way, and therefore visible for use by the program, are alive. Objects which are not reached in the trace are marked dead. In the second stage, sweep, all dead objects are destroyed and reclaimed.

Tri-color mark and sweep

Instead of a simple separation of marked (as live) and unmarked (dead), the object set is divided into three parts: white, gray, and black. The white objects are presumed dead. The gray objects have been marked as live by some other object, but haven't yet marked the objects they refer to. The black objects are live, and have marked all objects they directly refer to.

In the initial run, all objects start as white and the root set is marked gray. The marking process changes white objects to gray (marking them from another gray object), and gray objects to black (when all objects they refer to are marked). When the gray set is empty, all live objects have been marked and the white set can be collected. After a collection run, all black objects are reset to white, the root set to gray, and the process begins again.

The advantage of a tri-color mark over a simple mark is that it can be broken into smaller stages.

Copying collection

A copying GC copies objects from one memory region to another during the mark phase. At the end of the mark, all memory in the old region is dead and the whole region can be reclaimed at once.

Compacting collection

The compacting GC moves live objects close together in a single region in memory. This helps to elimianate fragmented free space and allows the allocation of large live objects. Compacting and copying collectors are often similar or even identical in implementation.

Uncooperative

An uncooperative GC is implemented as a separate module, often without affecting the remainder of the program. The programmer can write software without needing to be aware of the operations or implementation of the GC. The alternative is a cooperative GC, which is often implemented as a reference counting scheme and requires GC-related logic to be dispersed throughout the entire program.

Stop-the-world

A common disadvantage of a simple mark implementation is that the entire system (including all threads that use the same memory pools) must be suspended while the whole memory set is examined during marking and collection. Normal operation continues only after the whole GC cycle is performed. This can lead to arbitrarily long pauses during program execution.

Incremental

In order to alleviate the arbitrarily long pauses in a stop-the-world GC, the incremental GC breaks the mark and sweep process up into smaller, shorter phases. Each GC phase may still require the entire program to pause, but the pauses are shorter and more frequent.

Real-time

The pauses caused by GC don't exceed a certain limit.

Generational

The object space is divided between a young generation (short-lived temporaries) and one or more old generations. Only young generations are reset to white (presumed dead). The older generations are scanned less often because it is assumed that long-lived objects tend to live longer.

Concurrent

GC marking and collection runs as a separate thread, sometimes with multiple threads participating in GC. On a multi-processor machine, concurrent GC may be truly parallel.

Conservative

A conservative GC traces through memory looking for pointers to living objects. The GC does not necessarily have information about the layout of memory, so it cannot differentiate between an actual pointer and an integral value which has the characteristics of a pointer. The Conservative GC follows a policy of "no false negatives" and traces any value which appears to be a pointer.

Precise

A precise GC has intimate knowledge of the memory layout of the system and knows where to find pointers. In this way the precise collector never has any false positives.

http://docs.parrot.org/parrot/latest/html/docs/pdds/pdd09_gc.pod.html

Running the JVM with large amounts of RAM.

Got me to think about the cost of GCing a large heap which depending on the size could take minutes when a full GC occurs.

Read about Azul and there pauseless collector. Along with there hardware and specialized instructions to go along with it.

If your application is not interactive, and GC pauses are not an issue for you, there shouldn't be any problem for 64-bit Java to handle very large heaps, even in hundreds of GBs. We also haven't noticed any stability issues on either Windows or Linux.

However, when you need to keep GC pauses low, things get really nasty:

Forget the default throughput, stop-the-world GC. It will pause you application for several tens of seconds for moderate heaps (< ~30 GB) and several minutes for large ones (> ~30 GB). And buying faster DIMMs won't help.

The best bet is probably the CMS collector, enabled by -XX:+UseConcMarkSweepGC. The CMS garbage collector stops the application only for the initial marking phase and remarking phases. For very small heaps like < 4 GB this is usually not a problem, but for an application that creates a lot of garbage and a large heap, the remarking phase can take quite a long time - usually much less then full stop-the-world, but still can be a problem for very large heaps.

When the CMS garbage collector is not fast enough to finish operation before the tenured generation fills up, it falls back to standard stop-the-world GC. Expect ~30 or more second long pauses for heaps of size 16 GB. You can try to avoid this keeping the long-lived garbage production rate of you application as low as possible. Note that the higher the number of the cores running your application is, the bigger is getting this problem, because the CMS utilizes only one core. Obviously, beware there is no guarantee the CMS does not fall back to the STW collector. And when it does, it usually happens at the peak loads, and your application is dead for several seconds. You would probably not want to sign an SLA for such a configuration.

Well, there is that new G1 thing. It is theoretically designed to avoid the problems with CMS, but we have tried it and observed that:

Its throughput is worse than that of CMS.
It theoretically should avoid collecting the popular blocks of memory first, however it soon reaches a state where almost all blocks are "popular", and the assumptions it is based on simply stop working.
Finally, the stop-the-world fallback still exists for G1; ask Oracle, when that code is supposed to be run. If they say "never", ask them, why the code is there. So IMHO G1 really doesn't make the huge heap problem of Java go away, it only makes it (arguably) a little smaller.
If you have bucks for a big server with big memory, you have probably also bucks for a good, commercial hardware accelerated, pauseless GC technology, like the one offered by Azul. We have one of their servers with 384 GB RAM and it really works fine - no pauses, 0-lines of stop-the-world code in the GC.

Write the damn part of your application that requires lots of memory in C++, like LinkedIn did with social graph processing. You still won't avoid all the problems by doing this (e.g. heap fragmentation), but it would be definitely easier to keep the pauses low.

http://stackoverflow.com/questions/214362/java-very-large-heap-sizes

Oracle is trying to address the problem

Java 9

"At JavaOne 2011, Oracle discussed features they hope to have in Java 9, including better support for multi-gigabyte heaps, better native code integration, and a self-tuning JVM."

http://en.wikipedia.org/wiki/Java_version_history

JVM Fixed Upper Limit for Memory Usage

Well, there is a paper from Sun (eh, Oracle) that explains a lot about GC internals: oracle.com/technetwork/java/gc-tuning-5-138395.html . There it says By default, the virtual machine grows or shrinks the heap at each collection to try to keep the proportion of free space to live objects at each collection within a specific range. (section 4.1). So it's not really "full GC before heap increase", rather it's "heap increase if full GC does not free up enough"

http://stackoverflow.com/questions/3358328/why-does-the-sun-jvm-have-a-fixed-upper-limit-for-memory-usage-xmx

Why is memory management so visible in Java VM?

Java gives you a bit more control about memory -- strike one for people wanting to apply that control there, vs Ruby, Perl, and Python, which give you less control on that. Java's typical implementation is also very memory hungry (because it has a more advanced garbage collection approach) wrt the typical implementations of the dynamic languages... but if you look at JRuby or Jython you'll find it's not a language issue (when these different languages use the same underlying VM, memory issues are pretty much equalized). I don't know of a widespread "Perl on JVM" implementation, but if there's one I'm willing to bet it wouldn't be measurably different in terms of footprint from JRuby or Jython!

Python/Perl/Ruby allocate their memory with malloc() or an optimization thereof. The limit to the heap space is determined by the operating system rather than the VM, so there's no need for options like -Xmxn. Also, the garbage collection is simpler, based mostly on reference counting. So there's a lot less to fine-tune.

Furthermore, dynamic languages tend to be implemented with bytecode interpreters rather than JIT compilers, so they aren't used for performance-critical code anyway.

"So why does the JVM have (need?) a ceiling at all? Why can't it be flexible enough to request more memory from the OS when the need arises?" The Sun JVM can be easily configured to do just that. It's not the default because you have to be careful that your process doesn't cause the OS to thrash

Changing JVM is not a panacea. You can get new unexpected issues (e.g. see an article about launching an application under 4 different JVM).

You can have a class leak (e.g. via classloaders) that mostly often happen on redeploy. Frankly, I've never saw working hot redeploy on Tomcat (hope to see one day).
You can have incorrect JVM paramaters (e.g. for Sun JDK 6 64 bits -XX:+UseParNewGC switch leads to leak PermGen segment of memory. If you add additional switches: -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled-XX:+CMSPermGenSweepingEnabled the situation will be resolved. Funny, but I never met above mentioned leak with Sun JDK 6 32 bits). Link to an article "Tuning JVM Garbage Collection for Production Deployments".
PermGen chunk can be not enough to load classes and related information (actually that most often happens after redeploy under Tomcat, old classes stay in memory and new ones are loading)

http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html

http://stackoverflow.com/questions/1009042/what-free-jvm-implementation-has-the-best-permgen-handling

http://stackoverflow.com/questions/2550529/why-is-memory-management-so-visible-in-java-vm

http://java.dzone.com/articles/tale-four-jvms-and-one-app

http://java.dzone.com/articles/classloaderlocal-how-avoid -- Great Article!!

Ming ODM

Ming is a Python toolkit providing schema enforcement, an object/document mapper, an in-memory database, and various other goodies developed at SourceForge during our rewrite of the site from a PHP/Postgres stack to a Python/MongoDB one.

While this dynamic behavior is handy in a rapid development environment where you might delete and re-create the database many times a day, it starts to be a problem when you need to make guarantees of the type of data in a collection (because you code depends on it). The goal of Ming is to allow you to specify the schema for your data in Python code and then develop in confidence, knowing the format of data you get from a query

http://blog.mongodb.org/post/27907941873/using-the-python-toolkit-ming-to-accelerate-your

Spring Integration vs. Apache Camel for EIP

After reading the below comparison I am leaning toward Camel for Integration patterns. I have used spring integration some in the past and the configuration was quite verbose. Another benefit of Camel is its wide range of endpoints going from supporting Twitter all the way to Hadoop.

http://stackoverflow.com/questions/3034054/when-to-use-spring-integration-vs-camel

http://camel.apache.org/enterprise-integration-patterns.html

Some other messaging technologies

RabbitMQ

OpenAMQ - Seems this was made right around the time ZeroMQ started

JMS -

ZeroMQ Benefits and History

iMatix is behind Zeromq and looks like a consulting company for it which specializes in using it for the financial industry. "ZeroMQ is used by around 160 firms, estimates FastMQ CEO Martin Sustrik,
in everything from game servers to scientific computing to the sector
it's really meant for, financial market data. He continues, people
appreciate how light and fast this software is. It runs on practically
every system, speaks every language, and is very, very fast."

Zeromq had a major incompatibilities between versions of the api even making changes to the wire protocol. These major changes have stopped though and the project was forked to create http://www.crossroads.io/

Zeromq should now be much more stable between versions

Pros:
You don't have to make sure that the server is bound to the socket before clients accept

Con apparently?

what happens if a disconnect happens between the send and the recv?
Answer: Seems that you can set a timeout or use a Poller

armin.ronacher@active-4.com

http://stackoverflow.com/questions/7538988/zeromq-how-to-prevent-infinite-wait

http://zguide.zeromq.org/page:all#reliable-request-reply - This part of the guide talks alot about the problems this guy was having.

http://lucumr.pocoo.org/2012/6/26/disconnects-are-good-for-you/

Monday, December 17, 2012

Load testing Web Applications and Client Side Javascript

Two topics here.
load testing on the server
load testing on the front end

Load testing on the Server.
http://jmeter.apache.org/
http://grinder.sourceforge.net/

This is kind of in the middle.
For load testing the server, verifying responses, and interacting with the page returned. Multi-mechanize looks pretty sweet. Its a python library that reminds me of selenium somewhat.
http://testutils.org/multi-mechanize/scripts.html

Load testing on the front end

Jiffy is an end-to-end real-world web page instrumentation and measurement suite. Jiffy is a novel idea in load testing tools instead of measuring the performance of the web server. We are measuring the time it takes to load the web page on the client and run the javascript.
http://code.google.com/p/jiffy-web/

This would probably be a nice setup at a company if you had a great set of selenium tests that were maintained. This would most likely take a while to run and by running them in parallel you could definitely speed up the process.
http://selenium-grid.seleniumhq.org/
http://selenium-grid.seleniumhq.org/setting_up_selenium_grid_on_ec2.html

Selenium RC is a project for language bindings to selenium. http://seleniumhq.org/projects/remote-control/

If you have money to throw at the problem, this may be of interest. http://www.crunchbase.com/company/browsermob

Latent Dirichlet Allocation and Bayes Thereom

Bayesian Filtering - Uses a Naive Bayes Classifier to determine the likelihood of an email being spam or non-spam based upon the statistical likelihood of tokens in the email. In probability theory and statistics, Bayes' theorem (alternatively Bayes' law) is a theorem with two distinct interpretations. In the Bayesian interpretation, it expresses how a subjective degree of belief should rationally change to account for evidence. In the frequentist interpretation, it relates inverse representations of the probabilities concerning two events. In the Bayesian interpretation, Bayes' theorem is fundamental to Bayesian statistics, and has applications in fields including science, engineering, economics (particularly microeconomics), game theory, medicine and law. The application of Bayes' theorem to update beliefs is called Bayesian inference. Good explanation of Bayes Thereom along with an example. http://en.wikipedia.org/wiki/Bayes%27_theorem Still not 100% sure how Latent Dirichlet Allocation is related to Bayesian Filtering? http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

Git grep

http://gitster.livejournal.com/27674.html

Great in depth article on browser internals

http://taligarsiel.com/Projects/howbrowserswork1.htm

Tools and Ideas I would use/look at for ideas if I were developing my own project

-github
-travis ci
-depottools
gclient: Meta-checkout tool managing both subversion and git checkouts. It is similar to repo tool except that it works on Linux, OS X, and Windows and supports both svn and git. This is nice since people waist alot of time checking out multiple repos to get started on a project.
Think about how exceptions/errors will be handled aka on of the following:
1. simply let them occur and search logs for them
2. maybe stick them in some of kind of database so you can easily look up common errors
3. Always catch them and just log them vs. 1 were they would probably cancel the request once they occurred in a web app
http://dev.chromium.org/developers/how-tos/depottools

Major Browsers and Layout Engines Quick Reminder

Mozilla Firefox uses the Gecko layout engine IE since 4.0 uses the Trident Engine Safari, Chrome, iPad, iPhone, and Android Web browsers all use Webkit

Leave off the scheme in a URI

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>

http://stackoverflow.com/questions/550038/is-it-valid-to-replace-http-with-in-a-script-src-http

Steve Yegge Notes from the Mystery Machine Bus

Assembly language: Batshit liberal.

Perl, Ruby, PHP, shell-script: Extremist liberal.

JavaScript, Visual Basic, Lua: Hardcore liberal.

Python, Common Lisp, Smalltalk/Squeak: Liberal.

C, Objective-C, Scheme: Moderate-liberal.

C++, Java, C#, D, Go: Moderate-conservative.

Clojure, Erlang, Pascal: Conservative.

Scala, Ada, OCaml, Eiffel: Hardcore conservative.

Haskell, SML: Extremist conservative.

https://plus.google.com/110981030061712822816/posts/KaSKeg4vQtz

As much as some commenters are (weirdly?) railing against this classification scheme I think the underlying idea that software conservatism is about risk aversion is essentially accurate.
Perhaps another way of framing this is to ask the question: are you optimizing for the best case or the worst case? This ultimately is a form of risk management. And I'm not talking in the algorithmic sense, meaning complexity expressed as the asymptotically worst case. I'm talking about people, software and ecosystems.
Let me illustrate this idea with Java.
- C++ has operator overloads. Java does not? Why? Because people might abuse them. That's optimizing for the worst case (ie bad or inexperienced programmers). Properly used, operator overloading can lead to extremely readable code;
- Java has checked exceptions and uses them liberally (pun intended). C#, as one example, only has unchecked exceptions. Why? Philosophically the Java language designers (and many of its users) feel that this forces callers to deal with exceptions. Pragmatically (IMHO) it does not and leads to more cases of exceptions being simply swallowed. But again this is optimizing for the worst case ie programmers who should deal with a particular error condition but won't;
- Java has no multiple inheritance. Same story: it can be abused ("it is known"). But also mixins can be a powerful metaphor.
- Rinse and repeat for duck typing, extension methods, etc.
Putting Python two steps from Ruby strikes me as an interesting choice. I'd say the difference is at most one.
I'll also agree that Google as a company (based on my own much more limited experience than Yegge's) is firmly conservative. The style of writing Javascript that he refers to is about writing Google Closure code with all sorts of directives to aid the Closure Compiler (I describe Closure as putting the Java back into Javascript).

http://news.ycombinator.com/item?id=4365255

Paul Graham Reading

Nerds don't just happen to dress informally. They do it too consistently. Consciously or not, they dress informally as a prophylactic measure against stupidity.

A nerd, in other words, is someone who concentrates on substance.

The prospect of technological leverage will of course raise the specter of unemployment. I'm surprised people still worry about this. After centuries of supposedly job-killing innovations, the number of jobs is within ten percent of the number of people who want them. This can't be a coincidence. There must be some kind of balancing mechanism.

http://paulgraham.com/bubble.html
http://paulgraham.com/articles.html

LZW Compression front to back for JSON communication

Doing LZW compression front to back now, mostly just obfuscates but does provide some compression. I saw the fingerprint go down from~ 6k to 4k after compression. Big win its fast on the order of < 10ms to run the compression algorithm. The code is also really small to do the compression.

http://rosettacode.org/wiki/LZW_compression#JavaScript

Vim vertical edits

Type a command, say for instance '8x' to delete 8 characters before the cursor
Then enter visual block mode with 'v' select the text you want but do not leave this mode
Press ':' then type 'normal .' this will repeat your last command on the selected block

Useful resource Vim Character Patterns

Cookies are disabled?

A standard way of checking for cookie support is via a redirect.

For reasons I'll explain below, I think it's best to do a cookie check only when the user initiates an action that would require a cookie such as attempting to log in, or adding something to their cart.

First, the server checks the login data as normal - ie if the login data is wrong the user receives that feedback as normal. It immediately responds with a cookie, and a redirect to a page which is designed to check for cookie preferences - which may just be the same URL but with some flag added to the query string. This next page will then check to see if the client sent any cookie. If not, then the user receives a message stating that a cookie was not received and they should probably try to enable cookies if they want to log in.

Now for why I only do a cookie test after a user-initiated action other than simply loading a page. I have seen sites implement a cookie test on every single page, not realising that this is going to have effects on things like search engines trying to crawl the site. That is, if a user has cookies enabled, then the test cookie is set once, so they only have to endure a redirect on the first page they request and from then on there are no redirects. However, for any browser or other user-agent, like a search engine, that doesn't return cookies, every single page could have a redirect. While it'll still work and a lot of the time users won't see any difference, it is a lot more overhead and load than necessary.

Another method of checking for cookie support is with Javascript - this way, no redirect is necessarily needed - you can write a cookie and read it back virtually immediately to see if it was stored and then retrieved. The downside to this is it runs in script - ie if you still want the message about whether cookies are supported to get back to the server, then you still have to organise that - such as with an Ajax call.

For my own application, I implement some protection for 'Login CSRF' attacks, a variant of CSRF attacks, by setting a cookie containing a random token on the login screen before the user logs in, and checking that token when the user submits their login details. Read more about Login CSRF from Google. A side effect of this is that the moment they do log in, I can check for the existence of that cookie - an extra redirect is not necessary.

http://stackoverflow.com/questions/531393/how-to-detect-if-cookies-are-disabled-is-it-possible

Unique Short URLs

Today I was trying to create a unique short url.

I was using the UUID class provided by java earlier for id generation. This produced ids that were too large for our purposes.
So they recommended using something they were using on another project, hash(ip + timeofvisit)
I ended up using sha1(ip + timeofvisit), cut this in half from a 20 byte[] to ten bytes. Finally base64 encode the bytes into a url safe string.

Later I got into a discussion about why I was using base 64 encoding to shorten the length of the string. Here it goes.
My point was that if you started off with the md5 hash (which produces a 128 bit digest) in a byte []
Then of the two representations, hex coded string and base 64 encoding, the base 64 version would be a smaller string.

My partner argued the below:
As this shows base64 encoding a STRING obviously causes it to get larger.

dhcp199:apache-tomcat-7.0.33 randy$ php -r "echo md5('123').PHP_EOL; echo base64_encode(md5('123')).PHP_EOL;"
202cb962ac59075b964b07152d234b70
MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA=

Here below you see my point. By passing true for raw_output the base64 encoded version is shorter.

randys-MacBook-Air:~ randy$ php -r "echo md5('123').PHP_EOL; echo base64_encode(md5('123', true)).PHP_EOL;"
202cb962ac59075b964b07152d234b70
ICy5YqxZB1uWSwcVLSNLcA==

Good python code on here for generating unique random looking ids from some sequential key
https://github.com/adecker89/Tiny-Unique-Identifiers/blob/master/tuid.py

Monday, December 3, 2012

Illegal Characters in Cookies

Today I had a issue when setting a cookie in the browser where the server would simply not recognize that I had set the cookie. I was running Tomcat 7 and after a bunch of debugging I realized that it was because I had an @ sign in the cookie value. Interestingly tomcat didn't show an error it just ignored the cookie which was quite annoying.

setValue

public void setValue(String newValue)

Assigns a new value to a cookie after the cookie is created. If you use a binary value, you may want to use BASE64 encoding.With Version 0 cookies, values should not contain white space, brackets, parentheses, equals signs, commas, double quotes, slashes, question marks, at signs, colons, and semicolons. Empty values may not behave the same way on all browsers.

Parameters:: newValue - a String specifying the new value
See Also:: getValue(), Cookie

Saturday, December 1, 2012

Intel’s Haswell is an unprecedented threat to Nvidia, AMD

http://www.extremetech.com/computing/136219-intels-haswell-is-an-unprecedented-threat-to-nvidia-amd

Transactional Synchronization eXtensions

Transactional Synchronization eXtensions (TSX) extend the x86 ISA with two new interfaces: HLE and RTM.

Restricted Transactional Memory (RTM) uses Xbegin and Xend, allowing developers to mark the start and end of a critical section. The CPU will thread this piece of code as an atomic transaction. Xbegin also specifies a fall back path in case the transaction fails. Either everything goes well and the code runs without any lock, or the shared variable(s) that the thread is working on is overwritten. In that case, the code is aborted and the transaction has failed. The CPU will now execute the fall back path, which is most likely a piece of code that does coarse grained locking. RTM enabled software will only run on Haswell and is thus not backwards compatible, so it might take a while before this form of Hardware Transactional Memory is adopted.

The most interesting interface in the short term is Hardware Lock Elision or HLE. It first appeared in a paper by Ravi Rajwar and James Goodman in 2001. Ravi is now a CPU architect at Intel and presented TSX together with his colleague Martin Dixon TSX at IDF2012.

The idea is to remove the locks and let the CPU worry about consistency. Instead of assuming that a thread should always protect the shared data from other threads, you optimistically assume that the other threads will not overwrite the variables that the thread is working on (in the critical section). If another thread overwrites one of those shared variables anyway, the whole process will be aborted by the CPU, and the transaction will be re-executed but with a traditional lock.

If the lock removing or elision is successful, all threads can work in parallel. If not, you fall back to traditional locking. So the developer can use coarse grained locking (for example locking the entire shared structure) as a "fall back" solution, while Lock Elision can give the performance that software with a well tuned fine grained locking library would get.

According to Ravi and Martin, the beauty is that the developer of your locking libraries simply has to add a few HLE instructions without breaking backwards compatibility. The developer uses the new TSX enabled library and gets the benefits of TSX if his application is run on Haswell or a later Intel CPU.

http://www.anandtech.com/show/6290/making-sense-of-intel-haswell-transactional-synchronization-extensions

Javascript class with {SUPER: SYSTEM}

Defines some nice ways to declare classes in Javascript with true instance variables, and privates properties.

http://stackoverflow.com/questions/387707/whats-the-best-way-to-define-a-class-in-javascript
http://strd6.com/2010/10/javascript-mixins/
http://strd6.com/2010/10/object-reversemerge-useful-javascript-game-extensions-19/

Javascript hoisting and scoping

http://www.adequatelygood.com/2010/2/JavaScript-Scoping-and-Hoisting

var a = 1;
function b() {
a = 10;
return;
function a() {}
}
b();
alert(a);

function a() {} is actually defined first and therefore the assignment to a = 10 is ignored

Jquery deferreds with pipe and when

http://joseoncode.com/2011/09/26/a-walkthrough-jquery-deferred-and-promise/
function getCustomerSSNById(customerId){
var deferred = $.Deferred();
setTimeout(function() {deferred.resolve({"ssn" : "111 44 9999"});}, 300);
return deferred.pipe(function(p){
return p.ssn;
});
}

function getPersonAddressBySSN(ssn){
var deferred = $.Deferred();
console.log("ssn is " + ssn);
setTimeout(function() {deferred.resolve({"address" : "123 blah st"});},1000);
return deferred.pipe(function(p){
return p.address;
});
}

function getPersonAddressById(id){
return getCustomerSSNById(id).pipe(getPersonAddressBySSN);
}

getPersonAddressById(123).done(function(a){
alert("The address is " + a);
});

And using when

$.when(getCustomerSSNById(123), getPersonAddressBySSN("123 45 6789"))
.done(function(person, address){
alert("The name is " + person.ssn + " and the address is " + address);
});

Using the google closure compiler

java -jar ~/apps/google-closure-compiler/compiler.jar --compilation_level ADVANCED_OPTIMIZATIONS --js=../src/main/webapp/resources/evercookie/evercookie.js --js_output_file=out_advanced.js

You must define externs if you plan to compile your code separate from say for instance jquery.
http://closure-compiler.googlecode.com/svn/trunk/contrib/externs/jquery-1.8.js
--externs src/main/config/jquery-1.8.js

Make sure to define global properties that you need between source files like below for the closure compiler to work properly with
advanced optimizations
window['someobject'] = someobject;

CDMA vs GSM

Five of the top seven carriers in the U.S. use CDMA: Verizon Wireless, Sprint, MetroPCS, Cricket, and U.S. Cellular.

AT&T and T-Mobile use GSM.

Wondered why Amazon ec2 virtual machine instances were so fast?

Read an article that said that they used Xen Hypervisor
http://www.codinghorror.com/blog/2006/05/virtualization-and-ring-negative-one.html

Take a look at Xen. It's interesting because it offers better performance than traditional
virtualisation systems by modifying both host OS and guest OS. It's really interesting to learn about
its optimisation techniques.
http://download.intel.com/design/intarch/PAPERS/325258.pdf
http://www.codinghorror.com/blog/2006/10/the-single-most-important-virtual-machine-performance-tip.html
http://www.codinghorror.com/blog/2005/02/virtual-pc-2004-tips.html
http://searchservervirtualization.techtarget.com/feature/Xen-vs-KVM-Linux-virtualization-hypervisors
Apparently KVM is going to be the upcoming technology.

Hypervisor and does the guest os run in protected mode?

Seems it does and they are different levels of protected mode, 0 for the kernel and 3 for applications.
Intel and AMD both implement different ways to support virtualization, one way has a protection level of -1.

http://en.wikipedia.org/wiki/Ring_(computer_security)

More on Tor security

This article talks about Tor in relation to what it prevents, and gives a high level overview of how to tracks users
https://www.torproject.org/projects/torbrowser/design/#privacy
evercookie
panopticlick eff browser fingerprinting
http://www.safecache.com/

Tor Inner workings

Seems to be related to how tor works behind a NAT
https://github.com/samyk/pwnat
http://samy.pl/pwnat/pwnat.pdf

jsPerf — JavaScript performance playground

jsPerf aims to provide an easy way to create and share test cases, comparing the performance of different JavaScript snippets by running benchmarks.

http://jsperf.com

Writing Fast, Memory-Efficient JavaScript

http://coding.smashingmagazine.com/2012/11/05/writing-fast-memory-efficient-javascript/

HIGH-RESOLUTION TIME AND NAVIGATION TIMING API

V8 FLAGS FOR DEBUGGING OPTIMIZATIONS & GARBAGE COLLECTION
trace-deopt looks interesting to see what code it had to deoptimize

JAVASCRIPT MEMORY LEAK DETECTOR
very interesting a tool that detects what objects are causing leaks
https://code.google.com/p/leak-finder-for-javascript/

Resource Timing http://w3c-test.org/webperf/specs/ResourceTiming/
Navigation Timing http://www.w3.org/TR/2012/CR-navigation-timing-20120313/

MultiLevel source mapping in Chrome

Pretty amazing stuff the chrome developer tools team is doing, you can go from coffeescript down to minified js in the browser. This allows you to view coffeescript in the browser via the source maps but really there is minified js running in the browser.

The Breakpoint Ep 3: The Sourcemap Spectacular with Paul Irish and Addy Osmani
http://www.youtube.com/watch?v=HijZNR6kc9A&feature=player_embedded

Travis CI

Continuous integration platform integrates well with github, pretty hot stuff.

https://github.com/travis-ci/travis-ci
https://travis-ci.org/