Thursday, June 27, 2013

Walk this way

I recently found a handy mechanism for walking a directory tree in Linux. In
general, the way I used to do this was to use facilities found in dirent.h and
write my own recursive directory walker. Something similar to:

#include <stdio.h>
#include <string.h>
#include <dirent.h>

void reclist (const char* dirname) {

    DIR* dir = opendir (dirname);
    struct dirent* entry = 0;
    char name[1024] = {0};

    if (! dir) { return; }

    entry = readdir (dir);
    while (entry) {
        if (strncmp (entry->d_name, ".", 1)) {
            switch (entry->d_type) {
                case DT_REG:
                    printf ("%s\n", entry->d_name);
                    break;
                case DT_DIR:
                    snprintf (name, 1024, "%s/%s", dirname, entry->d_name);
                    reclist (name);
                    break;
            }
        }
        entry = readdir (dir);
    }
    closedir (dir);
}

int main(int argc, char** argv) {
    const char * dir = ".";
    if (argc == 2) { dir = argv[1]; }
    reclist (dir);
    return 0;
}

While that does work, it is rather verbose (especially once you get used to
environments like Ruby and Python). It turns out that ftw.h provides a more
concise way to do the above while managing all the little details like
avoiding '.' and '..' and managing the current path string. Here is what that
looks like to do the same as the above:

#include <stdio.h>
#include <ftw.h>

int handle_entry (const char *entry, const struct stat *sb, int type) {
    if (type == FTW_F) {
        printf("%s\n", entry);
    }
    return 0;
}

int main() {
    ftw(".", handle_entry, 10);
    return 0;
}

I also like the fact that a callback is used to operate on each of the files
found. It makes managing changes much easier as the tree walking is separated
from the code that handles the logic associated with inspecting the files.

Sunday, June 23, 2013

That's the key

A while back I cam across a post on Stephen Wolfram's blog where he presented the personal analytics of his life. As part of this post, there is a plot showing the keystroke activity of his life over the last 10 years. I want to ignore the resolve needed to conduct such an experiment for a moment and consider how he might have set something like that up.

[Update: see the corollary to this post - generating keyboard events - here]

I'm interested in data. I have a few logs of things I do on a daily basis but they are all collected proactively - I write entries into these logs in order to keep them current. I want to set up something similar to this key logger to automate this process for me. I'll mostly ignore that this is a potential security risk in that I will be capturing all keystrokes on the computer - including username and password information. To partially mitigate this I wont store the key information, I'll only keep the time the event occurred. This limits the amount of information in my database - I wont be able to see how my distribution of characters matches that of commonly used data, for instance - but it saves me from having to worry about how and where I store this information. Stephen Wolfram's post includes details about the actual keys so if my data starts to look interesting perhaps I'll transition to keeping that information as well.

I run Linux so I figured this would be rather straightforward: somehow hook into the X windowing subsystem and register for all keyboard events. Unfortunately, such an approach is not directly possible using Xlib (depending on which stackoverflow answer you read, it may not be possible at all). It turns out that it is rather difficult to ask X to just 'give me everything.' Things, as it were, are destined for a particular location (read: window) and asking for other windows' events doesn't make much sense in the general case. I had hoped there would be something akin to a callback list for registered components that I would be able to insert an entry into. Xlib is not designed that way (at least not in any documentation I can find).

To avoid having to hack the X window event delivery system I started to look at how these events are realized by X itself. In the guts of the device initialization configuration there is something similar to the following:

Section "InputClass"
    Identifier "evdev keyboard catchall"
    MatchIsKeyboard "on"
    MatchDevicePath "/dev/input/event*"
    Driver "evdev"
EndSection

which is using one of the /dev/input/event* devices. These are character devices set up by evdev to handle generic input events from a variety of sources: joysticks, mice, keyboards, and so on. One nice thing about these devices is they can be opened and read from as if they were regular files. So, if I can figure out which of the /dev/input/event* devices corresponds to the keyboard I should have access to the events that X is handing off to the child windows.

It turns out that there are two directories that exist to facilitate this type of search: /dev/input/by-id/ and /dev/input/by-path/. Searching either of the two of them for something like *-kbd you can find the exact device linked to a keyboard (if you have multiple keyboards attached you will need to further disambiguate). For example, in my /dev/input/by-path/ there are the following:

pci-0000:00:04.0-event-mouse -> ../event4
pci-0000:00:06.0-usb-0:1:1.0-event-mouse -> ../event3
pci-0000:00:06.0-usb-0:1:1.0-mouse -> ../js0
platform-i8042-serio-0-event-kbd -> ../event2
platform-i8042-serio-1-event-mouse -> ../event5
platform-i8042-serio-1-mouse -> ../mouse1

According to this (and some mappings provided in /usr/include/linux/input.h) I can now collect all keystrokes generated by my machine from /dev/input/event2 without having to devise a way to convince X to hand them over.