Friday, May 12, 2017

Visualizing Network Traversal

As part of an experiment I was doing related to network traversal I started to trying to visualize coverage of a network by a single visitor traveling through the nodes. The idea was to set a gradient between two colors and as a node was visited more times it would slowly take on a change in color over the gradient. If the goal was to visit each node N times, then the visualization would indicate current progress as a 'heat map' of the network according to the number of visits to each node.

The outputs were interesting in that controlling the traversal policy would change the distribution of color over the image. This is perhaps obvious given the experimental construct but still provided an alternate mechanism for 'measuring' the accuracy of the traversal. For example, if the goal is to cover the network as evenly as possible during the traversal, the kurtosis of a distribution then becomes a measure of how close one is to this goal.

To demonstration this I set up an NxN grid network and ran three example traversals over that grid. I captured the 'heat map' when the median of all the visit values for the nodes was at half of the target visits. The results are below.

Random Choice Walk:
At any node, a visitor will choose to move up, down, left, or right by random selection.

Backpressure Walk:
Uses a notion of backpressure to control the traversal. At each node, a visitor checks for the the least visited neighboring node and travels there.

Not really a traversal, I just did it to see what it would look like. At each node, a visitor chooses a random node within the grid and jumps there directly.

Sunday, October 2, 2016

Water Bleed Image Effect

Recently, I had the occasion to play with some features of the amazingly powerful ImageMagick image manipulation tool. After playing with several of the command-line tools and manipulations I decided to look into the API provided through the MagickWand C interface to ImageMagick. To explore some of the features available at this layer I'm attempting to develop a bleed effect on an image; similar to if you had dipped a picture in water and then hung it up to dry. I'm hoping that my free time permits and I can turn this into a small series of posts detailing how I refine this tool (if I can eventually call it that).

Perhaps in a later post I'll get into what the code looks like and a dialog of what each part does but for now I'll just put up some of the early results and learning points that I've encountered. Ultimately, I'll get the code up on github and follow along with that here.

I'm using two images for this exercise: one that contains a rainbow of colors to test and another to represent the effect on a more likely picture.

Initially, I looked at just 'dragging' each color down the image while increasing the normalization scale along the vertical to give a wash out effect further down the image. This represents how the water would continue to wash color away as it 'pulled' higher colors down the hanging image. This worked well for the rainbow version of the image:

That result has a good 'bleed' in that stronger colors last longer down the vertical. However, with the colors in the house image the washout was too aggressive and wiped out much of the original content.

If you ignore the washout, though, it is clear that the draw of colors down the image is visible (especially around the windows) meaning that reducing the washout might still produce a more realistic effect if enough of the original image could be preserved.

With that in mind, I added the ability to preserve a percentage of the original image while applying the effect. This allows a user to tune the washout per image to get just the right look. For example, too much reuse (20%) on the house and the washout effect has little impact:

But taking that down to 7.5% gets closer to what I am looking for:

There is still plenty for me to do such as handling the edges and water line better and further manipulating the washed image. For example, adding additional transforms to the result to get more seep/bleed on the washed colors. The image below shows a simple kernel convolution to add blur to the pencils, for example.

Friday, May 6, 2016

Derive formula to convert Fahrenheit to Celsius

I had been revisiting linear regression the other day and as part of that review I challenged myself to use regression to derive a well known formula without actually looking it up (the formula, that is).

The first example that came to my mind was the formula for converting temperature in Fahrenheit to Celsius. I wanted to see if I could derive that formula using two sample data sets and a simple linear regression. If the data was accurate enough, I should be able to derive the exact equation for converting between the two formats. In essence, I wanted to be able to come to the following:

C = (F - 32) * 5/9

Since I didn't have a data set with both types of observations available I was faced with a little 'chicken or the egg' situation. Seeing how this is just a fun little exercise I generated my own data introducing some artificial error to stand in for true observations.

After the 'observations' were available the regression was as simple as loading the data into R and running lm. I ran through the entire manual procedure of how this works in a previous post so wont repeat it here. The result of calling lm is a list and one of the elements of that list is the coefficients - these represent the intercept and slope of:

y = mx + b

Since the Celsius observations are the response in my formula and the Fahrenheit observations are the predictors the I can create a similar equation where y represents the Celsius values and x represents the Fahrenheit values. Given that, I get the following (after plugging in the slope and intercept):

C = 0.555547 * F - 17.772318

Expanding the original equation for converting between Fahrenheit and Celsius yields:

C = (F * 5/9) - (32 * 5/9)
C = F * 0.555556 - 17.777778

So, given observations in both Celsius and Fahrenheit (for the same events, of course) it is possible to derive an equation to convert between the two using linear regression.

My observations are very highly correlated. Obviously, as this correlation falls the accuracy of the resulting equation will suffer. Fortunately there are tools to measure the correlation which helps quantify this accuracy.

You can find the code for this exercise on github.

Monday, December 1, 2014


I have written before about some of the differences between Ruby and Python and my quirks generally tend to the Ruby approach. I think readability is another dimension between the two languages that highlights this for me - especially as it applies to understanding new code.

I prefer to read code from left-to-right (LTR), top-to-bottom. This is natural for me as it models how I read other text. Code that processes right-to-left (RTL) and, in the severe case, bottom-to-top challenges my ability to easily understand intent. Method chaining highlights this quite nicely. For example, to transform the numbers of a list stored as a string in Python one might write:
','.join(map(lambda x : str(int(x) ** 2), "1,2,3,4".split(',')))
If I am reading that for the first time I need to mentally maintain a stack of operations (join, map, lambda) until I've parsed most of the statement and arrived at the object being operated on: "1,2,3,4". This is due to the RTL application of the code. I've then got to backtrack over each operation on my mental stack to understand what the type and/or result of the overall statement will be. This is complicated by the fact that Python allows for some LTR ("1,2,3,4".split(',')) mixed with the RTL.

For first-time readers of the language this process is even more difficult if the behavior of join or map are not yet well understood.

Ruby makes this significantly easier.
"1,2,3,4".split(',').map { |x| x.to_i ** 2 }.join(',')
When I read code similar to that I can store the type and result as I am parsing the statement. The initial object is immediately available and I can read the expression LTR as: split a string, apply a function to each element of the resulting array, and join that final array with the comma character. The fact that Ruby supports method chaining (on the built-in types) makes for much more readable code.

I've singled out Python above but that was only for the sake of the example. As far as RTL languages go I think Python is middle of the road. Haskell, for example, has a much nicer syntax to deal with function composition (a similar, but not identical situation). On the other end of the spectrum is Lisp which is basically a bottom-to-top, RTL language.

I can (and have) used these languages and many more; RTL vs. LTR in no way prevents one from being proficient over time. Certainly, most RTL code can be written in a way that it flows mostly LTR, top-to-bottom. Even when it isn't, well-written code can read by anyone with enough practice. For newcomers looking to read a new language however, there is less difficulty when the process more closely models how they read in general.

Tuesday, November 25, 2014

Order of Events

inet_ntoa uses static storage for the result it returns. The GNU implementation of inet_ntoa uses the following internally:

static __thread char buffer[18];

This makes the function thread-safe but this safety does not remove the need to worry about use within a single thread. Consider the following snippet of code:

#include <arpa/inet.h>
#include <stdio.h>
int main () {
    struct in_addr a = { .s_addr = 1234567 }, b = { .s_addr = 7654321 };
    return printf ("%s : %s\n", inet_ntoa (a), inet_ntoa (b));

Since inet_ntoa is used twice within the argument list the result is dependent on the order of evaluation of the arguments to printf. Regardless of which call gets evaluated first the output will always print the same IP address twice. On my system, the result is: :

This is a result of two things: arguments are evaluated before their results are used by printf; and inet_ntoa overwrites static storage on each invocation. Looking at the instructions for this C code makes this clear:

        .string "%s : %s\n"
        .globl  main
        .type   main, @function
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ebx
        andl    $-16, %esp
        subl    $32, %esp
        movl    $1234567, 24(%esp)
        movl    $7654321, 28(%esp)
        movl    28(%esp), %eax
        movl    %eax, (%esp)
        call    inet_ntoa
        movl    %eax, %ebx        ; pointer stored to static memory
        movl    24(%esp), %eax
        movl    %eax, (%esp)
        call    inet_ntoa
        movl    $.LC0, %edx
        movl    %ebx, 8(%esp)     ; arg2 to printf; pointer from above
        movl    %eax, 4(%esp)     ; arg1 to printf; new pointer, same static memory
        movl    %edx, (%esp)      ; arg0 (format string)
        call    printf
         movl    -4(%ebp), %ebx

The correct way to call inet_ntoa consecutively is to save each result to a local variable.

#include <arpa/inet.h>
#include <string.h>
#include <stdio.h>
int main () {
    struct in_addr a = { .s_addr = 1234567 }, b = { .s_addr = 7654321 };
    char ipa[18] = { 0 }, ipb[18] = { 0 };
    strcpy (ipa, inet_ntoa (a));
    strcpy (ipb, inet_ntoa (b));
    return printf ("%s : %s\n", ipa, ipb);

Sunday, October 5, 2014

Automating keystrokes via evdev

In a previous post I talked about how to capture keys out from under the X11 windowing system by reading from /dev/input/eventX. These character devices can also be useful to generate input simulating keyboard activity.

I circled back to this topic after having to automate user keyboard activity. I've accomplished similar tasks in the past with a tool named xdotool - unfortunately, in this case I did not have the luxury of being able to install software. The remainder of this post highlights the differences between consuming and producing events. (By the way, If you have the need to automate X actions I highly suggest looking at what xdotool can do for you.)

Consuming events is the easier of the two tasks: you simply read open the device and read events into the following structure:

/* See: /usr/include/linux/input.h */
struct input_event {
    struct timeval time;
    __u16 type;
    __u16 code;
    __s32 value;

Filter input with type == 1 and read the code to get the key and value to get the event (eg. press, release).

To produce a compliant event the process is a little more complicated since the input needs to be synchronized. For each event there are three distinct sets of data that are required: setup (EV_MSC); the event (EV_KEY); and event synchronize (EV_SYN). In addition to that, certain events are captured over time so this is a stateful process. An example of this is pressing Ctrl-L; the control key is held down while another key is pressed and then released.

The easiest way I found to initially grok the protocol is to capture all events while there is keyboard activity and see what the output looks like. Obviously, to produce fully compliant input you should consult API documentation or source code.

An example of automatically entering a URL in the Chrome browser (Ctrl-L [URL]) would require the following inputs (the type, code, and value members of struct input_event). The input goes to the focused window (the standard behavior for X) so you need to place focus on the Chrome window for the following example.

4,  4, 29  # Setup  
1, 29,  1  # Press Ctrl key
0,  0,  0  # Sync
4,  4, 29  # Setup  
1, 29,  2  # Ctrl (value == 2 -> autorepeat)
0,  0,  0  # Sync
4,  4, 38
1, 38,  1  # Press 'L' key
0,  0,  0
4,  4, 38
1, 38,  0  # Release 'L' key
0,  0,  0
4,  4, 29
1, 29,  0  # Release Ctrl key
0,  0,  0
# and so on for the URL string 

4,  4, 28
1, 28,  1  # Press Enter key
0,  0,  0
4,  4, 28
1, 28,  0  # Release Enter key
0,  0,  0 

Monday, August 4, 2014

Blocking ptrace

I've had occasion to change the functionality of binary programs for a variety of purposes - mostly to instrument for debugging or logging purposes. The techniques used to do this vary but can be used for both passive monitoring or actively changing the functionality of a program. I'd like to consider one of those techniques (ptrace) in a little more detail here - specifically the ability to stop and arbitrarily modify a running process (think gdb).

I'm going to walk through a few examples of how to prevent a ptrace-based approach to modifying a program. For illustrative purposes I'll use the following sample program that maintains a global variable to influence control flow at run time.

long global_flag = 1;

int main () {
    while (global_flag) {
        fprintf (stderr, "Running ...\n");
        sleep (5);
    fprintf (stderr, "Someone captured my flag!\n");
    return 0;

The goal in these examples is to prevent the global variable (global_flag) from being modified from an external process. I'm going to step through a few methods that could be used to modify this variable and how to prevent these techniques in turn.

First, I'll look to just overwrite the value directly. Since we can look at the symbols it is trivial to construct a program that will place data into the memory of our choosing within the running process using ptrace. Obviously, this case is easier than would be for most programs due to the simplicity of the example. The approach holds, however, regardless of the scale of the actual process.

Suppose our process is PID 11896; we can find the memory location to modify using nm

08048410 t frame_dummy
         U fwrite@@GLIBC_2.0   
0804a018 D global_flag
08048434 T main
         U sleep@@GLIBC_2.0    

If you don't have the program available you can still get at the symbols by looking in /proc (e.g. nm /proc/11896/exe).

The program I'm using to change memory in a particular process:

#include <sys/ptrace.h>
#include <stdio.h>
#include <stdlib.h>
#include <libgen.h>
#include <string.h>
#include <errno.h>
void usage (char * prog) {
    fprintf (stderr, "USAGE: %s <pid> <addr> <value>\n", basename (prog));
    fprintf (stderr, "-------------------------\n");
    fprintf (stderr, " pid      Process to modify\n");
    fprintf (stderr, " addr     Address to change\n");             
    fprintf (stderr, " value    Value to write\n");
    exit (42);
int main (int argc, char **argv) {
    pid_t pid = 0;
    unsigned long addr = 0;
    long value = 0, old_value = 0;                  
    if (4 != argc) { usage (argv[0]); }

    pid = strtol (argv[1], NULL, 10);
    addr = strtol (argv[2], NULL, 16);
    value = strtol (argv[3], NULL, 10);

    if (ptrace (PTRACE_ATTACH, pid, 0, 0)) {
        fprintf (stderr, "Unable to attach to PID: %d (%s)\n", 
                pid, strerror (errno));
        return 1;

    old_value = ptrace (PTRACE_PEEKDATA, pid, addr, 0);
    fprintf (stderr, "Original value: %ld\n", old_value);
    if (ptrace (PTRACE_POKEDATA, pid, addr, value)) {
        fprintf (stderr, "Unable to overwrite data @ 0x%lx (%s)\n",
                addr, strerror (errno));
        ptrace (PTRACE_DETACH, pid, 0, 0);
        return 1;

    ptrace (PTRACE_DETACH, pid, 0, 0);
    return 0;

Considering the output of nm and the PID, I'll call that as follows:

   ./modify 11896 0804a018 0

Then, in the terminal running the original process, you see the output "Someone captured my flag!" and the process ends.
To prevent the above result, we need to prevent ptrace from attaching to our running process. We can use ptrace against itself within our program to achieve this goal. Since a process can only be traced by a single process at a time we can immediately set to trace ourselves when the program starts. The new program looks like this:

long global_flag = 1;

int main () {
    ptrace (PTRACE_TRACEME, 0, 0, 0);
    while (global_flag) {
        fprintf (stderr, "Running ...\n");
        sleep (5);
    fprintf (stderr, "Someone captured my flag!\n");
    return 0;

Now, when we try to connect to the process at run time we get an error from ptrace. This is true for any process that attempts to use ptrace to this end (e.g. strace will report: "Unable to attach to PID: 11940 (Operation not permitted)"). Notice that this is also the case when trying to attach to the process as root.

Note for Ubuntu users: it is now the default behavior to prevent attaching to a process unless it is a direct child of the tracing process. The root user can still attach to arbitrary processes but other users are restricted (see /etc/sysctl.d/10-ptrace.conf or man prctl).

Unfortunately, that does not entirely solve the problem. If, instead of having the running process, a user can spawn the process within a debugger the above mechanism can still be defeated. Consider the following example.

[ezpz@mercury (ptrace)]$ gdb prevent_2 
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
(gdb) b main
Breakpoint 1 at 0x8048467      
(gdb) r
Starting program: prevent_2    
Breakpoint 1, 0x08048467 in main ()
(gdb) set {int}0x0804a01c = 0  
(gdb) c
Someone captured my flag!      
[Inferior 1 (process 12130) exited normally]

Since gdb can set a breakpoint at main control can be gained (by the debugger) prior to being able to self-trace. This situation can be identified from within the traced program, however, by looking at the return value of the call to ptrace.

--- prevent_2.c    2014-08-02 23:33:03.091366946 -0400 
+++ prevent_3.c    2014-08-02 23:33:06.939366991 -0400 
@@ -5,7 +5,10 @@
 long global_flag = 1;
 int main () {
-    ptrace (PTRACE_TRACEME, 0, 0, 0);
+    if (0 != ptrace (PTRACE_TRACEME, 0, 0, 0)) {
+        fprintf (stderr, "Tsk tsk tsk...");
+        return 1;
+    }
     while (global_flag) {     
         fprintf (stderr, "Running ...\n");
         sleep (5);

Now gdb can set the breakpoint and modify the memory but when execution continues the program will exit when the call to ptrace (from within gdb) fails.

The observant reader will realize that, from within the debugger, the return value check can also be modified. In fact, nothing prevents someone from directly modifying the binary prior to running the program. There are a variety of mechanisms - both static and dynamic - that can get around the above methods. Some can be prevented; others not. What these mechanisms do provide is a relatively cheap investment that raises the bar when trying to dynamically change program behavior.