Friday, November 26, 2010

Intercepting Linux system calls: Part I

[Update] This code can now be found on github (ntrace)

Working below the application layer affords me the luxury of not having to worry about implementation details as they relate to the hosting system... mostly. Sometimes, however, I am forced to have to either measure or dictate application behavior. For the most part the driving force here is mapping network effects to application artifacts.
When this situation arises I always task myself with finding the shortest path to implementation. Usually, the options are roughly:
  • Use an existing logging facility within the application
  • Look through and edit source code (when available)
  • Use a utility such as strace/ltrace
  • Shim the application to 'spy' on or 'touch' the application
I don't regularly come across programs that offer the type of logging needed to gain the insight required to pinpoint absolute performance. This is understandable since any logging I usually provide (if any) is designed to allow me to reverse-engineer any problems with the product without having to run it myself.

Source code browsing holds it's own set of problems. First, reading code is much harder than writing it and grokking another's thought process through that code takes an additional mental leap once you do get familiar. Matters worsen when there is invalid or missing comments and/or poorly written code. These problems are time consuming and entirely dependent on having the source in the first place. With very few exceptions I avoid this approach.

The most common solution I find to be appropriate is to use strace (or ltrace) and parse the output. There are few situations where the vast functionality of a tool like strace will not suffice, but these situation do come up. One such situation is the case when you want to use strace with the -e trace=network flag but need the added benefit of having context regarding the calls being made. strace is only going to give you a fixed set of calls traced and if the context you need is born or modified outside of that scope you need to engineer a more specific solution.

In the next set of posts I will walk through a complete solution to the above problem using the linux LD_PRELOAD environment variable.