tag:blogger.com,1999:blog-21913643613959635462024-02-18T18:39:49.563-05:00uu kkezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.comBlogger66125tag:blogger.com,1999:blog-2191364361395963546.post-36228890572741434092023-06-23T16:06:00.001-04:002023-06-28T11:25:57.550-04:00ASCII Art<pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">I've been looking at ASCII art recently - at least in the context of
converting images into ASCII-rendered versions of the original. This has
resulted in a few simple projects including one where I rendered a webcam
video feed as ASCII in a terminal. One of the challenges I faced in that
project was finding the correct size of the font to get reasonable results; a
standard 80 x 24 terminal left a lot to be desired in terms of the final text
rendering of a video frame. It was easy enough to simply resize the
font/terminal to get what I wanted but it wasn't a very elegant solution and
didn't translate very well to other projects (where the render target was not
a terminal, for example).
To address that limitation I ended up writing a dynamic font 'renderer'[1] -
this post describes some aspects of that project.</pre><table>
<tbody><tr>
<td>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioiQuapii3os1O8OSO0FkBNEWNwmfQHsf4fpgCRr0H5fPHP6O8LsY51zBIy9vFK7RB6E32Fb8LH9RAMwBQKWDAW5QHWAbQujCM73n89_r_RDMKQPe_fzruWaSwKMGQb2VLBk0UnaTb18fdu7E0NkEJ99yT_Ex8mgU8Fa9Eekbk1EO4sbkfutaFYkyjuE8M/s2598/wave_orig.jpg" style="display: block; padding: 1em 0px; text-align: center;"></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioiQuapii3os1O8OSO0FkBNEWNwmfQHsf4fpgCRr0H5fPHP6O8LsY51zBIy9vFK7RB6E32Fb8LH9RAMwBQKWDAW5QHWAbQujCM73n89_r_RDMKQPe_fzruWaSwKMGQb2VLBk0UnaTb18fdu7E0NkEJ99yT_Ex8mgU8Fa9Eekbk1EO4sbkfutaFYkyjuE8M/s2598/wave_orig.jpg" style="display: inline; padding: 1em 0px;"><img alt="" border="0" data-original-height="2047" data-original-width="2598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEioiQuapii3os1O8OSO0FkBNEWNwmfQHsf4fpgCRr0H5fPHP6O8LsY51zBIy9vFK7RB6E32Fb8LH9RAMwBQKWDAW5QHWAbQujCM73n89_r_RDMKQPe_fzruWaSwKMGQb2VLBk0UnaTb18fdu7E0NkEJ99yT_Ex8mgU8Fa9Eekbk1EO4sbkfutaFYkyjuE8M/s200/wave_orig.jpg" width="200" /></a></div>
</td>
<td>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5FouP1QyHFgaC-jRh8LO1flQ2X9Xr8IvMs6dSkeaEY5XpQyV1Bb5YNkTdkAwASORhqPU-bRfjRME5Q6YwiMsdCPx0UG5yCW3dkhlQrWg2daaDb9GB-mvkQZCG_agBEEevmu_Fl9RWoy0W2uijsbTWkA8yWrZCCMkMvmCbLQdvuuV4qCIuzDdemEfZvrB2/s2598/wave-thresh.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="2047" data-original-width="2598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5FouP1QyHFgaC-jRh8LO1flQ2X9Xr8IvMs6dSkeaEY5XpQyV1Bb5YNkTdkAwASORhqPU-bRfjRME5Q6YwiMsdCPx0UG5yCW3dkhlQrWg2daaDb9GB-mvkQZCG_agBEEevmu_Fl9RWoy0W2uijsbTWkA8yWrZCCMkMvmCbLQdvuuV4qCIuzDdemEfZvrB2/s200/wave-thresh.jpg" width="200" /></a></div>
</td>
<td>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyeAj8wNNn0-NxfnvX5UyJY8OXYCc0VFUkjWFyk07ZceYWFfw8NdEBF2w2-fJtdWJgf5PNARZW5Gk6ODNIFAxGInElx6GxqQJ-TgevX-9vmygkWHihUiK24Fo3SvOVdnMA7q2s1DmQwfwnSQBGLP08EBr-1VJ99T6Wq1AnSjrLrxtEuhWMxGN-EQnhn08k/s2598/wave-target.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="2047" data-original-width="2598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgyeAj8wNNn0-NxfnvX5UyJY8OXYCc0VFUkjWFyk07ZceYWFfw8NdEBF2w2-fJtdWJgf5PNARZW5Gk6ODNIFAxGInElx6GxqQJ-TgevX-9vmygkWHihUiK24Fo3SvOVdnMA7q2s1DmQwfwnSQBGLP08EBr-1VJ99T6Wq1AnSjrLrxtEuhWMxGN-EQnhn08k/s200/wave-target.jpg" width="200" /></a></div>
</td>
</tr>
</tbody></table><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">The interesting parts (to me) include the font rendering/scaling,
the tiling of the image parts, and the interesting combinations of rendering.</pre><h2 style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt; text-align: left;">Dealing With Fonts</h2><div><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">This was mainly familiarizing myself with the FreeType font library and how
to load and render fonts in memory. I could have just taken one of the many
already existing ASCII-to-grayscale mappings but I wanted the ability to
provide a few additional features that required more than that (e.g. custom
character sets and non-fixed-width fonts).
So, rather than use an fixed set of characters I dynamically compute
the average brightness of a particular character set (for a given font) and
use that to select glyphs to replace image pixels.
In addition to which font and glyphs to use I also dynamically scale
the glyphs used based on properties of each region of an image. The FreeType
font library supports this directly allowing me to render a glyph at whatever
size I specify; I can then 'copy' those pixels directly to the translated image.</pre><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;"><br /></pre><h2 style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt; text-align: left;">Image Tiling</h2></div><div><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">I used a windowing mechanism to select the size of the font to use. Given an
image, I subdivide the image into N rows and M columns and in each N,M cell
select a font size (more on <i>how</i> a size is selected later).
Then, within that cell, compute the properties of the region for each character
glyph and translate that source image region to an ASCII character that most
closely matches the average greyscale value of that region.</pre></div>
<table>
<tbody><tr>
<td>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggFOKmdZ8Pfpwg9Fflve1HFe-duxrtCuySQzkdTJ9Gk8XxFp0ITeMgVIOvEaJXjusjRJXCnq_DqNmXVyC1RpY54Q4WVXEDZgLkDWRFTUEr31OVpJi_B0aZgd6XA0Luxyi6_maQr7RSeZg3Lea-Qeo_R5fMz29ulxKkwDq2eehA895y2pSxMS79QELvGd1a/s843/tiling-01.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="669" data-original-width="843" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEggFOKmdZ8Pfpwg9Fflve1HFe-duxrtCuySQzkdTJ9Gk8XxFp0ITeMgVIOvEaJXjusjRJXCnq_DqNmXVyC1RpY54Q4WVXEDZgLkDWRFTUEr31OVpJi_B0aZgd6XA0Luxyi6_maQr7RSeZg3Lea-Qeo_R5fMz29ulxKkwDq2eehA895y2pSxMS79QELvGd1a/s200/tiling-01.jpg" width="200" /></a></div>
</td>
<td>
<div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjadqgu7K_6BO0UhtiM3BK3r9fTqpezsXVpWdd3ONDAn1xl01AUKXLkUAzDLpy23PhBZAW7QbbWOXtUozlHh2MflNNrwaP63MiltYYICOYZ6g43EuM-7kaC6r-KNHVCUycdAPdL7YOJWEjsv3I6d0vXbQBAiNavelV5GQ_dmdwwvZ6Vq2QWpCEtP53_zN3T/s741/tiling-02.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="488" data-original-width="741" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjadqgu7K_6BO0UhtiM3BK3r9fTqpezsXVpWdd3ONDAn1xl01AUKXLkUAzDLpy23PhBZAW7QbbWOXtUozlHh2MflNNrwaP63MiltYYICOYZ6g43EuM-7kaC6r-KNHVCUycdAPdL7YOJWEjsv3I6d0vXbQBAiNavelV5GQ_dmdwwvZ6Vq2QWpCEtP53_zN3T/s200/tiling-02.jpg" width="200" /></a></div>
</td>
</tr>
</tbody></table><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">This nested tiling approach allows for selecting font sizes in any number of
ways. For example, selecting a random font size within each N,M cell of the
image results in the following.</pre><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibtWCadWC2VAN0CVeb7xU5zWzuEH2wsrQ7L2r1RWnAKMvIicxsLiV_jbfQddOuuEuNEvpJZBytaOz_AmjgKQG9JRjrQAH5X8XwQsYcCVzee5n8YhxPoVxDUV356olcz-6MiBq7chRbVC-am8dNkHoNhSoPPIUtieTL_dj57CLdYr1khkN9ZwCe5Uw5O6-7/s2598/wave-random.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2047" data-original-width="2598" height="252" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibtWCadWC2VAN0CVeb7xU5zWzuEH2wsrQ7L2r1RWnAKMvIicxsLiV_jbfQddOuuEuNEvpJZBytaOz_AmjgKQG9JRjrQAH5X8XwQsYcCVzee5n8YhxPoVxDUV356olcz-6MiBq7chRbVC-am8dNkHoNhSoPPIUtieTL_dj57CLdYr1khkN9ZwCe5Uw5O6-7/s320/wave-random.jpg" width="320" /></a></div><br /><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">Zooming in on a section of that images highlights a few things about this
approach:</pre><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;"><ul style="text-align: left;"><li>there is flexibility of rendering each region in isolation</li><li>clipping at the edges of regions will start to produce artifacts in the
output image due to discontinuities between the glyphs</li><li>as you approach a glyph size of 1x1 you approach a pixel-level copy of
the original image region</li></ul></pre><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMbydEBPSIbBGfI2j9JHIiw4cFHQPZJW5qHz2Mo7TWMwOXN1xztTnKcg44OYK8ODZLMI4na3ze7GADolQjZL0Ct9dgsXMaT_S8GqYmqjNFcKl6RVr-Ucpyde8ToPni0X1LAbyXMuwJliTAv64EPCHFqKfUx3suRnRe3nCHZksWE3Yt2cUx-hvLf7_7ileb/s960/wave_zoom.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="775" data-original-width="960" height="258" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiMbydEBPSIbBGfI2j9JHIiw4cFHQPZJW5qHz2Mo7TWMwOXN1xztTnKcg44OYK8ODZLMI4na3ze7GADolQjZL0Ct9dgsXMaT_S8GqYmqjNFcKl6RVr-Ucpyde8ToPni0X1LAbyXMuwJliTAv64EPCHFqKfUx3suRnRe3nCHZksWE3Yt2cUx-hvLf7_7ileb/s320/wave_zoom.jpg" width="320" /></a></div><br /><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">Of course, just like the impetus for this project, there is something left to
be desired about the resolution of the final image. Select a size too large
and there is loss of detail; too small and the 'ascii' effect is dimished.
So I started experimenting with ways to quantify the amount of resolution
in a section of the image and how to translate that to font size.</pre><h2 style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt; text-align: left;">Rendering</h2><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglvuvPPF3d19r3k4KM9gZLMjjPHj-ReLqZwxiiXgzO-Q_Ua088tePDqLofPSMBZP_0RhEsmAaPGBJFDmdpgyK5ONYVf0_nWH_IuKFYwgbvwpMxj-OtXC9Yt_JPA1g2BhiU4o02dAAAhfXbiU7Bo_qx5qt3l5K7E8cerWJsqJR0nfwbklGT5eOTJLK7x-Gy/s1167/kitten.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="780" data-original-width="1167" height="214" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEglvuvPPF3d19r3k4KM9gZLMjjPHj-ReLqZwxiiXgzO-Q_Ua088tePDqLofPSMBZP_0RhEsmAaPGBJFDmdpgyK5ONYVf0_nWH_IuKFYwgbvwpMxj-OtXC9Yt_JPA1g2BhiU4o02dAAAhfXbiU7Bo_qx5qt3l5K7E8cerWJsqJR0nfwbklGT5eOTJLK7x-Gy/s320/kitten.jpg" width="320" /></a></div><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">Finding the right way to encode resolution turned out to be a bit of a rabbit
hole for me. I started with the idea of using the entropy of each window but
broadened my search to frequency- and thresholding-based techniques.</pre><h3 style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt; text-align: left;">RMS</h3><div><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">A basic root mean square approach captured some of the contrast in the image
but it wasn't sufficient for what I wanted.</pre><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCtEFPwlreKkPGIo50Volz6rWir_LlPkV4jftB_ftAoVY_5s3lhm2GJTiEp2TOvodb1ajxD-v6GdywVLMfOsUCBU1VVfgze461_qhBhmBse1zBjLSxycda9Qdpp4_bLdeFsGMRjMMe7sHWCWgEJyEC4k4Eci_f_LdQuJfTlin72ASYRY_onhu3fUxiTGu-/s1167/kitten-rms.jpg" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="780" data-original-width="1167" height="214" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCtEFPwlreKkPGIo50Volz6rWir_LlPkV4jftB_ftAoVY_5s3lhm2GJTiEp2TOvodb1ajxD-v6GdywVLMfOsUCBU1VVfgze461_qhBhmBse1zBjLSxycda9Qdpp4_bLdeFsGMRjMMe7sHWCWgEJyEC4k4Eci_f_LdQuJfTlin72ASYRY_onhu3fUxiTGu-/s320/kitten-rms.jpg" width="320" /></a></div><br /><h3 style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt; text-align: left;">Frequency Domain</h3></div><div><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">After a bit of research I came across a paper "Contrast in Complex Images" by
Eli Peli which discussed how the human eye perceives contrast and various ways
to compute that spatially across an image. It fit nicely into the notion of a
subdivided image and provided an equation to compute sub-cell contrast based on
high-frequency components in an image.</pre><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">Essentially this consists of taking the DFT of an image, running that through
a high-pass filter, and computing the inverse DFT of the filtered content.
This is closely related to edge detection but maintains more high-frequency
information (as you can see in the examples below).
This worked well for some images, but the residual information created some
noise for certain image types. Consider the two images: kitten and wave.</pre></div><div><br /></div>
<table>
<tbody><tr>
<td><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheBa6ateowvsI-848JXf5ygn1HPiqlNipi3yikBOz557wd1YBDfw-3phr-CFgOiVY7XvMbgbRMoLLUpK-RxWg66CiosBnboJPB9xGTjJ3QctJz5H_GiJb1F3BQUhKVLEAN8UYcgyM9Q0RipxSfI1jtDSGpSMkySJy8wUWJUMnuHohqIjt19kUZ95O9WA2M/s1167/kitten-fft.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="780" data-original-width="1167" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheBa6ateowvsI-848JXf5ygn1HPiqlNipi3yikBOz557wd1YBDfw-3phr-CFgOiVY7XvMbgbRMoLLUpK-RxWg66CiosBnboJPB9xGTjJ3QctJz5H_GiJb1F3BQUhKVLEAN8UYcgyM9Q0RipxSfI1jtDSGpSMkySJy8wUWJUMnuHohqIjt19kUZ95O9WA2M/s320/kitten-fft.jpg" width="320" /></a></div>
</td>
<td><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlvBG1tx59zltLNaSzNFA0sBg_alX3-oicddkl1RF-mzsSygKWM666J6clquVu0XH5bv8-IXGuy4eeuxekLoHk01oyWhTm3mOsBGZ9m9m0lBCVP7P_kBY_chOfODaRdZ9MBSuGiYPP4xayD-lJg88YUmrMQQnHFmRBRmKEps1NUbb3ve_i6aen4Uk3HOG6/s2598/wave-fft.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="2047" data-original-width="2598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjlvBG1tx59zltLNaSzNFA0sBg_alX3-oicddkl1RF-mzsSygKWM666J6clquVu0XH5bv8-IXGuy4eeuxekLoHk01oyWhTm3mOsBGZ9m9m0lBCVP7P_kBY_chOfODaRdZ9MBSuGiYPP4xayD-lJg88YUmrMQQnHFmRBRmKEps1NUbb3ve_i6aen4Uk3HOG6/s320/wave-fft.jpg" width="320" /></a></div>
</td>
</tr>
</tbody></table><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">The contrast is easier to identify in the kitten but the wave is a bit more
challenging.</pre><h3 style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt; text-align: left;">Thresholding</h3><div><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">Finally, I ended up using a tunable threshold technique where I convert the
image to B/W based on some threshold brightness and use the proximity to this
threshold value as an indicator of the amount of contrast in each window. This
ultimately ended up working fairly well for the effect I was looking for
preserving much of the data of the frequency-based approach but without the
residual noise.</pre></div>
<table>
<tbody><tr>
<td><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXTr6jk2IC6DENl2avoeIgfhn8VWQmF2B9b5qGP3gS47PvraVyXRbT7wUEWDH86WO3171u7WEA4CK9os0vV--oz8a9dITASY_6Ugp_d5AQnKhkvZv-ykLz2vBSZRBQFHP8nzIkU76TZY06ASglXbB8_sGQ_q5Zg9Wn6UZiOeJdt_b7fKR2_IsPazOATg1a/s2598/wave-fft.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="2047" data-original-width="2598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgXTr6jk2IC6DENl2avoeIgfhn8VWQmF2B9b5qGP3gS47PvraVyXRbT7wUEWDH86WO3171u7WEA4CK9os0vV--oz8a9dITASY_6Ugp_d5AQnKhkvZv-ykLz2vBSZRBQFHP8nzIkU76TZY06ASglXbB8_sGQ_q5Zg9Wn6UZiOeJdt_b7fKR2_IsPazOATg1a/s320/wave-fft.jpg" width="320" /></a></div>
</td>
<td><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLkjzO_h0arZWEax_8IoW1qD6eBcynduFij5aBJVRCP71osYC6K6v3kEXqi9oSOlivFqD5BEVyTJJtFJVoUCr_-4Qp3vE4DZd0ryJfoWTLi9iBTKzg0Zxh_WwZAlFFNYOtDy6JS2qRdIAjhmMC8DQdpT_1C3Yud4MMCuF0nOc_KW1afWIyAYO8pw-nICTR/s2598/wave-thresh.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="2047" data-original-width="2598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgLkjzO_h0arZWEax_8IoW1qD6eBcynduFij5aBJVRCP71osYC6K6v3kEXqi9oSOlivFqD5BEVyTJJtFJVoUCr_-4Qp3vE4DZd0ryJfoWTLi9iBTKzg0Zxh_WwZAlFFNYOtDy6JS2qRdIAjhmMC8DQdpT_1C3Yud4MMCuF0nOc_KW1afWIyAYO8pw-nICTR/s320/wave-thresh.jpg" width="320" /></a></div>
</td>
</tr>
</tbody></table><br />
<table>
<tbody><tr>
<td><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN8qQDGI3FHvoJce6VUVaoctk0J1vMSLiiy5h111pu6riG3Dk7lZAK4nEjLEwbGaaYZp6mbn1bJ9QqddZGO0sQC7Xq1d9BbcA4ZwdOdMbv16KjUmwnFnkq0iXA7oJoyxEc23GAlPe4kxf3tHAY7nhITiqXjlEhGH4LcB3cuqd8CfWA8tgVKM_yy-DjcDoi/s1167/kitten-fft.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="780" data-original-width="1167" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhN8qQDGI3FHvoJce6VUVaoctk0J1vMSLiiy5h111pu6riG3Dk7lZAK4nEjLEwbGaaYZp6mbn1bJ9QqddZGO0sQC7Xq1d9BbcA4ZwdOdMbv16KjUmwnFnkq0iXA7oJoyxEc23GAlPe4kxf3tHAY7nhITiqXjlEhGH4LcB3cuqd8CfWA8tgVKM_yy-DjcDoi/s320/kitten-fft.jpg" width="320" /></a></div>
</td>
<td><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyMuHR5EFLHwqbx0w0k-mjXuOcpS5UkMqosGt8uafVeelwkgRzdJRp8Kb8qvW9xxLQN5ebCpTy-_UwefkqzUr8TaDR5s5jcuaeJvwD11B5J04KzZCzkjnlLzE3qfY9Avbu1Nei9GaLqdW5n3v1uixVi5loQ7MQxbLknVkNNybDLs4vIdd-NuR2scu9Ka-O/s1167/kitten-thresh.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="780" data-original-width="1167" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyMuHR5EFLHwqbx0w0k-mjXuOcpS5UkMqosGt8uafVeelwkgRzdJRp8Kb8qvW9xxLQN5ebCpTy-_UwefkqzUr8TaDR5s5jcuaeJvwD11B5J04KzZCzkjnlLzE3qfY9Avbu1Nei9GaLqdW5n3v1uixVi5loQ7MQxbLknVkNNybDLs4vIdd-NuR2scu9Ka-O/s320/kitten-thresh.jpg" width="320" /></a></div>
</td>
</tr>
</tbody></table><br />
<table>
<tbody><tr>
<td><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZuaBGjUoxnHeexFy6l_mtEz__RB9lRdpWjL-20DQUPi_7bBjIvULEqDi0LWzJiLX6uDwHInbci0Z56t6-ZEeFzDPLnkKdWOGLIMUdYNCEr0jhKqYtHPX1JbvBYKdMd1iNltqTpHcdPRC0bGGSmUmfSyIZAJ8w0qWsq4h0d07oePEaPCQ2JkhlfJvOiksE/s2598/wave-target.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="2047" data-original-width="2598" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgZuaBGjUoxnHeexFy6l_mtEz__RB9lRdpWjL-20DQUPi_7bBjIvULEqDi0LWzJiLX6uDwHInbci0Z56t6-ZEeFzDPLnkKdWOGLIMUdYNCEr0jhKqYtHPX1JbvBYKdMd1iNltqTpHcdPRC0bGGSmUmfSyIZAJ8w0qWsq4h0d07oePEaPCQ2JkhlfJvOiksE/s320/wave-target.jpg" width="320" /></a></div>
</td>
<td><div class="separator" style="clear: both;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDf0zKm4HtIlGrB2Tk2rEre1qM2N22UIiPJAYxahJCf3oEtUBd73JlbHcvikwOQm4HTgRitE4wHefmWV54JRaZyrV9X7EipUUDmg7C0U8FXsgdU0RFVLl8uS-RWLX7-4PD6W3MFYsnrFrADu3-kDkQKw-J8Ky7HmO60UHRy7OfxLf7wvrlCgHsec2N9sRR/s1167/kitten-target.jpg" style="display: block; padding: 1em 0px; text-align: center;"><img alt="" border="0" data-original-height="780" data-original-width="1167" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhDf0zKm4HtIlGrB2Tk2rEre1qM2N22UIiPJAYxahJCf3oEtUBd73JlbHcvikwOQm4HTgRitE4wHefmWV54JRaZyrV9X7EipUUDmg7C0U8FXsgdU0RFVLl8uS-RWLX7-4PD6W3MFYsnrFrADu3-kDkQKw-J8Ky7HmO60UHRy7OfxLf7wvrlCgHsec2N9sRR/s320/kitten-target.jpg" width="320" /></a></div>
</td>
</tr>
</tbody></table>
<br /><br />
<div><pre class="western" style="background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial; background-repeat: initial; background-size: initial; font-family: "Liberation Mono", monospace; font-size: 10pt;">[1] I say renderer but I use libfreetype to do all the heavy lifting of
pixel-level character rendering.
<br /></pre></div>ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-4070374215698668762021-12-17T21:07:00.007-05:002021-12-17T21:07:59.064-05:00Tiny Spell Checker (Part II)<p>In the previous <a href="https://uu-kk.blogspot.com/2021/11/tiny-spell-checker-part-1.html" target="_blank">post</a> I introduced the tiny spell checker challenge and some of the solutions our team discussed. In this post I'll go into a bit more detail about the solution I put forward, how well it performed, and some limitations of the approach.</p><p>I combined the use of a bloom filter with additional validation using n-grams.</p><h4 style="text-align: left;">Bloom Filter</h4><div><div>The bloom filter is a well-known data structure for checking the existence of an element in a set. The bloom filter provides good space efficiency at the cost of a probabilistic result. In other words, you can ask whether an item exists within a set and get one of two answers: <b>no</b> or <b>probably</b>. You can control how often probably is returned for an item that is not in the set through a few parameters: the number of items in the filter, the number of bits in the filter, and the number of hash functions used to compute the bit indices. </div><div><br /></div><div>In my case, the input size was fixed (the number of words in the dictionary) so I could control the number of bits and the number of hashes. Since this challenge was primarily about size I fixed the number of hashes and experimented with a bit count that would fit within the challenge constraints. </div><div><br /></div><div>That left me with a calculated probability of returning a false positive somewhere around 1 in 3. We will see how this ultimately impacts the performance of the spell checker below. However, there was still a bit of space remaining to attempt to reduce the number of false positives so I looked to using something else I was exploring in parallel: n-grams.</div></div><div><br /></div><h4 style="text-align: left;">N-gram Validation</h4><div><div>I spent a significant amount of time while exploring the solution thinking about compression. How could I compress the 6.6M to something even close to the 256K size limit for the challenge (zip gets close to 1.7M, for example). As part of that I looked into n-gram substitution - specifically, could I take advantage of known suffixes or prefixes to reduce the amount that needs to otherwise be compressed. </div><div><br /></div><div>This ultimately didn't pan out as I ended up needing to store more in metadata to decode the table than the size reduction of the approach. However, one of the challenges with bloom filters is false positives, especially when the table size is constrained and there are many inputs. So when I pivoted to using a bloom filter I returned to the n-gram approach as as a secondary validation mechanism. </div><div><br /></div><div>For each of the words in the dictionary I extracted each n-gram and added it to a set. Checking for a valid word then consisted of extracting each n-gram of the input word and verifying it was part of the set.</div></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEi0gGQxExNDFEtA8nNWnD4Nez96Kl7N_RID2VVluE5okwcFf1A7mjJo3W07xJ7_KzhV1Bo3npQOWjBC5FqPb9nrGKAZfn41ozp8xGv1oKCx2jUDxXEVPYzWfopzIQniBBe2giD0BGTsk0-XV0F9CgFjKbUaUSrCKxpTABDt8oXYMXM06JPromy8BWpCOQ=s718" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="376" data-original-width="718" height="168" src="https://blogger.googleusercontent.com/img/a/AVvXsEi0gGQxExNDFEtA8nNWnD4Nez96Kl7N_RID2VVluE5okwcFf1A7mjJo3W07xJ7_KzhV1Bo3npQOWjBC5FqPb9nrGKAZfn41ozp8xGv1oKCx2jUDxXEVPYzWfopzIQniBBe2giD0BGTsk0-XV0F9CgFjKbUaUSrCKxpTABDt8oXYMXM06JPromy8BWpCOQ=s320" width="320" /></a></div><div><br /></div><div>Much like the bloom filter this has the property of no false negatives (i.e. any n-gram not in the set must be from a misspelled word) with the potential for false positives (i.e. can construct a word from the n-grams set that is not a valid word from the original dictionary). </div><div><br /></div><div>For the 6.6M dictionary, the following table indicates the space necessary to encode all of the n-grams of a particular size. </div><div><br />
<table class="align: center;" style="margin-left: auto; margin-right: auto;">
<tbody><tr><th>N</th><th>Size</th></tr>
<tr><td style="text-align: center;">2</td><td style="text-align: center;">1,766</td>
</tr><tr><td style="text-align: center;">3</td><td style="text-align: center;">33,892</td>
</tr><tr><td style="text-align: center;">4</td><td style="text-align: center;">330,872</td>
</tr><tr><td style="text-align: center;">5</td><td style="text-align: center;">1,459,024</td>
</tr></tbody></table>
</div><div><br /></div><div>To keep within the size constrains of the problem I could only use n-grams of size 2 or 3. I wanted the larger since this was for validation and I felt there would be higher false positive rate using the shorter n-gram size (although, to be fair, I did not measure this).</div><div><br /></div><h4 style="text-align: left;">Building Metadata into the Executable</h4><div><div>As I wanted an all-in-one solution, I serialized both the bloom filter and n-gram set to disk as a separate process before building the executable. </div><div><br /></div><div>I used a <span style="font-family: courier;">std::bitset</span> for the bloom filter and made sure to set the size to something divisible by 8 so it was possible to serialize without accounting for padded bits. For the n-gram set I ended up using <span style="font-family: courier;">std::set< std::string ></span> so serialization was just a matter of writing the 3-byte sequences out to file. The order in the set didn't matter so I could simply dump the values contiguously given I knew the n-gram size.</div><div><br /></div><div>With both data sets serialized to disk I used the linker to generate object files out of the binary data so I could link them when building the final executable.</div></div><div><br /></div><blockquote><div><div><span style="font-family: courier;">$ ld -r -b binary -o ngram_data.o ngram.bin</span></div><div><span style="font-family: courier;"><br /></span></div><div><span style="font-family: courier;">$ nm ngram_data.o </span></div><div><span style="font-family: courier;">0000000000008464 D _binary_ngram_bin_end</span></div><div><span style="font-family: courier;">0000000000008464 A _binary_ngram_bin_size</span></div><div><span style="font-family: courier;">0000000000000000 D _binary_ngram_bin_start</span></div></div></blockquote><h4 style="text-align: left;"><br /></h4><h4 style="text-align: left;">Limitations</h4><div><div><b><u>Capitalization</u></b>: To achieve some of the size results I had to make some concessions. For example, all string operations (hashing and n-gram) is performed on lower case strings. This completely removes any capitalization support in this application (a necessary feature for any serious spell checker). </div><div><br /></div><div><b><u>Speed</u></b>: Because of the way the dictionary is packed, there is a large upfront cost (relative to the check itself) to load the information into usable data structures before a single word can be checked. I would have liked to have had something that could be searched directly from memory but opted for a more simplistic serialization approach instead.</div></div><div><br /></div><h4 style="text-align: left;">Results</h4><div><div>I computed the full dictionary (6.6M) for n-grams of size 3, hashed each of the 654K words for the bloom filter, and serialized and linked each of those into an executable that came in at 243K. </div><div><br /></div><div>In the worst test case (replacing 50% of the dictionary words with invalid words) the f-score was about 0.86 without using n-grams and closer to 0.88 with n-gram validation.</div></div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/a/AVvXsEjoS7o5wkG4rNjZkn1m59ilPkWzRlVNRuUDCYOW5UIeVpYhxYaN6bICUnBteF9_moF13RItHUK8aTq9fKWlZm2lVD6YNsHShXCp0UT0AHEBafyzwxJ_zSRA8Aur_spAZTf3BuiJKAA4RPycxoQgLsx--CNrDgT38gdC38zxhoWvsM04VDEr_1GkGMhZuA=s4915" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="2957" data-original-width="4915" height="386" src="https://blogger.googleusercontent.com/img/a/AVvXsEjoS7o5wkG4rNjZkn1m59ilPkWzRlVNRuUDCYOW5UIeVpYhxYaN6bICUnBteF9_moF13RItHUK8aTq9fKWlZm2lVD6YNsHShXCp0UT0AHEBafyzwxJ_zSRA8Aur_spAZTf3BuiJKAA4RPycxoQgLsx--CNrDgT38gdC38zxhoWvsM04VDEr_1GkGMhZuA=w640-h386" width="640" /></a></div><br /><div><div>Looking back at the expected false positive rate of the bloom filter (roughly 1 in 3) this is the expected output. With a 654K input, 50% of which are replaced with invalid words, we would expect somewhere around 115K false positives which is consistent with an f-score near 0.85.</div><div><br /></div><div>Adding in the n-gram effectively helps lower the probability of a false positive from 1 in 3 to just over 1 in 4. There is plenty of room here for optimizations such as fine-tuning the bloom filter to be closer to the size limit, pruning the n-gram list to the most frequent values to reduce size, and so on, but this was a decent start to thinking about the problem.</div></div>ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-71796913761022420752021-11-05T17:03:00.002-04:002021-11-05T17:03:41.438-04:00Tiny Spell Checker (part 1)<p>Inspired by a post I read about how designing a spell checker used to be a major software development effort[1] I designed a coding challenge of sorts for the teams I work with. We have these types of technical exchanges semi-regularly as a way to collaborate and explore technology problems; they also serve as a great team building exercise. </p><p>I'd like to introduce the problem here and some of the solution space we considered. I plan on providing a bit more detail about my final solution in a future post. </p><p>As an aside, we generally avoid making this competitive; it decreases participation and distracts from the main purpose of exploring the problem space and learning about others' approaches (which may be entirely non-technical).</p>
<br />
<table><tbody>
<tr><td><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvKl8xaSiR4tv0_lcOzI9BoAGS9q7VEwVcrX6_OeZNVTMkgxfP5Dj-r23-jNCfkmtPowpNcHtaM2R4aFMcgf3_Ao1hokHgU3BGktPd215UQVuXcs0uarnwugSA1PfcdT8TPXZPpfQlBx_R/s1651/tsc.png-2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1651" data-original-width="1275" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiCmyCNMyiRA96Q8LDnPmnZKr2lpByKHrqnmwQnqZF2piiwNRBHz2QuCzNGIuXo-uHSCUOvqeZYHTLdcPUBvlrlms40qqkKL2SJYtnRI-KZZR2OJaM-UXuGttkzbpcisf7XSAkIx1YHiqjB/s320/tsc.png-1.png" width="247" /></a>
</td><td><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvKl8xaSiR4tv0_lcOzI9BoAGS9q7VEwVcrX6_OeZNVTMkgxfP5Dj-r23-jNCfkmtPowpNcHtaM2R4aFMcgf3_Ao1hokHgU3BGktPd215UQVuXcs0uarnwugSA1PfcdT8TPXZPpfQlBx_R/s1651/tsc.png-2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1651" data-original-width="1275" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvKl8xaSiR4tv0_lcOzI9BoAGS9q7VEwVcrX6_OeZNVTMkgxfP5Dj-r23-jNCfkmtPowpNcHtaM2R4aFMcgf3_Ao1hokHgU3BGktPd215UQVuXcs0uarnwugSA1PfcdT8TPXZPpfQlBx_R/s320/tsc.png-2.png" width="247" /></a>
</td></tr></tbody></table>
<br /><div>Below are some of the topics we discussed during the showcase. The topics are primarily centered on various compresssion mechanisms (lossless and otherwise) as you might expect. However, there were additional conversations that considered a broader context. For example:</div><div><ul style="text-align: left;"><li>translating to another language which had better compression properties </li><li>using phonetics (stenography)</li><li>a literature survey of compression techniques</li><li>trade-offs in time, space, and accuracy of each approach</li></ul></div><div>It is those last two bullets that really highlight the benefit of these exchanges. </div><div><br /></div><div>It is one thing to be able to implement a decent compression algorithm; but learning how to consider the problem under extended or different constraints allows us to apply these ideas to the projects we are working on and generally makes us better developers.</div>
<div><br /></div><h3 style="text-align: left;">Asymmetric Numeral Systems (ANS)</h3><div>A set of entropy encoding methods that include probability distributions and finite state machines which combine to achieve performant compression. In particular, we discussed the table-based version of ANS (tANS).</div><div><br /></div><h3 style="text-align: left;">Huffman Coding</h3><div>Lossless compression technique that uses symbol frequencies (probability) to build a table for encoding symbols relative to their frequency: the more frequent a symbol the fewer the bits used to encode it.</div><div><br /></div><div>Some additional discussion was the cost of using a well-known table or building it from the input dictionary and storing it in the encoded (compressed) output.</div><div><br /></div><h3 style="text-align: left;">n-grams</h3><div>Encoding common groups of letters into a single symbol/value. For example, encoding the sequence "ing" as a single value. </div><div><br /></div><div>This is very similar to Huffman but operating on n-grams instead of a single byte.</div><div><br /></div><h3 style="text-align: left;">Rabin-Karp</h3><div>Is an algorithm for searching for a string in a body of text. At a high level it leverages a hash to do approximate string matching and only does full string matching if there is an approximate match.</div><div><br /></div><h3 style="text-align: left;">Trie</h3><div>A data structure that encodes words in a prefix tree and indicates whether a particular word exists in the tree (i.e. is valid) when all letters follow a path in the tree and end at a word terminator node.</div><div><br /></div><div><h3 style="text-align: left;">Bloom filter</h3><div>A data structure that, given a hash function, can answer either: an element may be encoded in the set; or it is definitely not in the set. That is, there is the possibility for false positives but not false negatives when checking for existence of elements.</div><div><br /></div><div>A nice property of this data structure is that it has known properties regarding the false positive rate given a size and hashing strategy used so it can be tuned to a specific accuracy given size constraints.</div></div><span></span><div><br /></div><span><a name='more'></a></span><div><br /></div><div>In a future post I'll go into a bit of detail of how I combined some of these techniques and used the linker to pack a compressed dictionary into a single executable that fit the size constraints of the problem.</div>
<p>[1] <a href="https://prog21.dadgum.com/29.html" target="_blank">https://prog21.dadgum.com/29.html</a></p>ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-19898217490557843272018-05-06T20:49:00.001-04:002018-05-06T20:53:54.261-04:00Affine TransformationI've been working on a <a href="https://github.com/ezpz/tandem" target="_blank">personal project</a> for data presentation and exploration. It is very much a work in progress but as part of the development I've come across some interesting little technical problems to solve. I hope to eventually have a longer series of posts along the lines of 'building your own data tools;' however, until I find the time to give that the attention it needs, I plan on posting some smaller entries centered around these technical problems.<br />
<br />
One of the early challenges I came across when plotting data was mapping the input values to the fixed amount of space I had on screen. For example, if I generate a window that is 600px wide I can only support a range of input values less than or equal to 600 (using a naive approach of each value in the range maps to a pixel on screen). Supporting an input range greater than 600 values requires some level of mapping between the input values and the available pixels on the screen. There are simple conversions for specific input ranges (e.g. when the range is close to a multiple of the number of pixels) but a more general solution is necessary to handle arbitrary input data sets.<br />
<br />
The way I ended up solving this was to use an <a href="https://en.wikipedia.org/wiki/Affine_transformation" target="_blank">affine transformation</a> to map the input domain to a fixed pixel range. This is how scales work in D3.js if you are familiar with that JavaScript library. For example, if my input values come from [-100, 100] but I need my output values to fall within [28,63] then applying this transformation would look something like the image below:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhm5hv_zdNMqrpALtfDSRPyhlErhuul1XTVSFqvN1mOAX-BjlumZwt7X1Kky6pq_KsMLBGBNgasGAPpdv4Eqbn7xau4nlJiiBK5AnneVWtQp2jlGn6wycLy3e6mBkaE2zOTGtOUVR_VatKB/s1600/affine_xform.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="643" data-original-width="1270" height="162" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhm5hv_zdNMqrpALtfDSRPyhlErhuul1XTVSFqvN1mOAX-BjlumZwt7X1Kky6pq_KsMLBGBNgasGAPpdv4Eqbn7xau4nlJiiBK5AnneVWtQp2jlGn6wycLy3e6mBkaE2zOTGtOUVR_VatKB/s320/affine_xform.png" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both;">
Notice that the minimum value in the input interval (-100) maps directly to the minimum value in the output interval (28). Values along the input interval are mapped to the output interval preserving the relative position within each. The equation to do this is:</div>
<div class="separator" style="clear: both;">
<br /></div>
<div class="separator" style="clear: both;">
\[ range_{x} = (domain_{x} - domain_{a}) * ((range_{b} - domain_{a}) / (domain_{b} - domain_{a})) + range_{a} \]</div>
<div class="separator" style="clear: both;">
<br /></div>
<div class="separator" style="clear: both;">
Where domain and range are defined by the intervals \([domain_{a}, domain_{b}]\) and \([range_{a},range_{b}]\), respectively. Neither the domain nor the range is required to have a 0 as one of the endpoints and the intervals can be either increasing or decreasing. This generality is helpful in a variety of contexts. For example:</div>
<div>
<br /></div>
<div>
<b><u>Setting Borders</u></b></div>
<div>
<div>
The viewport used to plot data may not always use all available pixels of the window; multi-plot windows or viewports that have a border are two examples. In the case where there is a border around the plot (allowing for labels and tick marks) that area should not be considered as part of the range of the viewport. Instead, a window with 600px width that uses 30px on either side for border should use a range of [30,570] - the equation above makes this easy.</div>
<div style="text-decoration-line: underline;">
<br /></div>
</div>
<div style="text-decoration-line: underline;">
<b>Inverted Axis</b></div>
<div>
<div>
In most window coordinate systems the origin is at the top left of the screen. That means increasing y-axis values (from the window perspective) are typically <i>decreasing</i> data set y-axis values. Using a negative range makes this translation simple. Consider a 600px window height: mapping a domain to the range [600,0] automatically handles the inverted axis problem.</div>
<div>
<br /></div>
<div>
Another way that this simplifies things is when there is a need to 'flip' the layout of a graph. For example, in the two images below the data is the same; the only difference is the range. Having the ability to easily manipulate target ranges for the input data allows me to cycle through multiple views of the same data without much work on the backend.</div>
<div style="font-weight: bold;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRaFO6EL54eMaG8sst40niG4WsCKolyHQmpJGEb_5j4IiXd1TQUZhoWswB_nyy8OKsQPHSEmoVOfXdzzP5wr_ordq_mdI3MpSo6AHq48S049su6LaG2dfwg_mFQtGdUQCUvQFFEhkd4Mln/s1600/hist.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" data-original-height="358" data-original-width="644" height="110" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgRaFO6EL54eMaG8sst40niG4WsCKolyHQmpJGEb_5j4IiXd1TQUZhoWswB_nyy8OKsQPHSEmoVOfXdzzP5wr_ordq_mdI3MpSo6AHq48S049su6LaG2dfwg_mFQtGdUQCUvQFFEhkd4Mln/s200/hist.png" width="200" /></a></div>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwNi7lL3WpTuluSHsQp1U60CgWsNjl_70a-Om4EvKAU3NIP9xIL9KS2vbNFrseAyHdxqycmfLe_KpgRLJlruiRi69qxmpOT612ypIQ-Tp914JE0-vMnN5HJpolByk2Ohu3rxP1R38Ej7Vp/s1600/hist_flipped.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em; text-align: center;"><img border="0" data-original-height="360" data-original-width="642" height="111" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwNi7lL3WpTuluSHsQp1U60CgWsNjl_70a-Om4EvKAU3NIP9xIL9KS2vbNFrseAyHdxqycmfLe_KpgRLJlruiRi69qxmpOT612ypIQ-Tp914JE0-vMnN5HJpolByk2Ohu3rxP1R38Ej7Vp/s200/hist_flipped.png" width="200" /></a><br />
<br />
<br />
<div style="font-weight: bold;">
<br /></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div>
<br /></div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-80157053013292396232017-05-12T23:15:00.000-04:002017-05-12T23:16:04.378-04:00Visualizing Network TraversalAs part of an experiment I was doing related to network traversal I started to trying to visualize coverage of a network by a single visitor traveling through the nodes. The idea was to set a gradient between two colors and as a node was visited more times it would slowly take on a change in color over the gradient. If the goal was to visit each node N times, then the visualization would indicate current progress as a 'heat map' of the network according to the number of visits to each node.<br />
<br />
The outputs were interesting in that controlling the traversal policy would change the distribution of color over the image. This is perhaps obvious given the experimental construct but still provided an alternate mechanism for 'measuring' the accuracy of the traversal. For example, if the goal is to cover the network as evenly as possible during the traversal, the kurtosis of a distribution then becomes a measure of how close one is to this goal.<br />
<br />
To demonstration this I set up an NxN grid network and ran three example traversals over that grid. I captured the 'heat map' when the median of all the visit values for the nodes was at half of the target visits. The results are below.<br />
<br />
Random Choice Walk:<br />
At any node, a visitor will choose to move up, down, left, or right by random selection.<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgah87X4v0idI_iRINwQQLB3A_TcP3I2xlL41dWZu49IhQQYR-1zBgAV4qjMfrkBhJ_pLzh1mxIMV90h0ik4OpIt9qAAlIYa3bJjmvDXRyGKRRMSNL5knVnxWvPhkBhtgEoaKDKY0Vp-avF/s1600/bothRand.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgah87X4v0idI_iRINwQQLB3A_TcP3I2xlL41dWZu49IhQQYR-1zBgAV4qjMfrkBhJ_pLzh1mxIMV90h0ik4OpIt9qAAlIYa3bJjmvDXRyGKRRMSNL5knVnxWvPhkBhtgEoaKDKY0Vp-avF/s320/bothRand.png" width="320" /></a></div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<div>
<div>
Backpressure Walk:</div>
<div>
Uses a notion of backpressure to control the traversal. At each node, a visitor checks for the the least visited neighboring node and travels there.</div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8Ja_mdIpIK2vRF9Szp4UyqflQNyz0gsMgbTfd6Zyi_0Q7zSOdTkg9L3JC04Sr8g2stPFFUJyNQ2-Dwhgi_CPGIcx6BK4FBRJ_oKvxnkNEVx3Q7uGqFWhWB8FsS9UZiNCPFYZroXGlGCZn/s1600/bothHill.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8Ja_mdIpIK2vRF9Szp4UyqflQNyz0gsMgbTfd6Zyi_0Q7zSOdTkg9L3JC04Sr8g2stPFFUJyNQ2-Dwhgi_CPGIcx6BK4FBRJ_oKvxnkNEVx3Q7uGqFWhWB8FsS9UZiNCPFYZroXGlGCZn/s320/bothHill.png" width="320" /></a></div>
<div>
<br /></div>
<br />
Teleportation:<br />
Not really a traversal, I just did it to see what it would look like. At each node, a visitor chooses a random node within the grid and jumps there directly.<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPyjRJDuCefH5qRmgQf2VWH3w4hmVLHsd9JQa3IblbQJF9XPyeEJQiqDgwcurEVVIYgKWplDmfRJB8v3KGqHpuUe9RIKQmJ1R4oBRDs8VZ0baeZsgMUqIRr1mTqJQpzZjWpwzfw_rZAzQW/s1600/bothJump.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjPyjRJDuCefH5qRmgQf2VWH3w4hmVLHsd9JQa3IblbQJF9XPyeEJQiqDgwcurEVVIYgKWplDmfRJB8v3KGqHpuUe9RIKQmJ1R4oBRDs8VZ0baeZsgMUqIRr1mTqJQpzZjWpwzfw_rZAzQW/s320/bothJump.png" width="320" /></a></div>
<div>
<br /></div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-87945301134025690642016-10-02T00:44:00.001-04:002016-10-02T00:44:24.158-04:00Water Bleed Image EffectRecently, I had the occasion to play with some features of the amazingly powerful ImageMagick image manipulation tool. After playing with several of the command-line tools and manipulations I decided to look into the API provided through the MagickWand C interface to ImageMagick. To explore some of the features available at this layer I'm attempting to develop a bleed effect on an image; similar to if you had dipped a picture in water and then hung it up to dry. I'm hoping that my free time permits and I can turn this into a small series of posts detailing how I refine this tool (if I can eventually call it that).<br />
<br />
Perhaps in a later post I'll get into what the code looks like and a dialog of what each part does but for now I'll just put up some of the early results and learning points that I've encountered. Ultimately, I'll get the code up on github and follow along with that here.<br />
<br />
I'm using two images for this exercise: one that contains a rainbow of colors to test and another to represent the effect on a more likely picture.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh71aQyGVoh9QQkEVG1dQ3kgrmSEZ_-gd2OErxaG5uSeWysiUmkLvgjtlmwJC3_Li9YSqEmfJcqN5ndCPMYLOwOeE79_tq9mLlEfefyFQY2xT00J-_L_6tvNV9s5D5XY9dGmfLmh2PEhUhu/s1600/pencils_orig.jpg" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="100" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh71aQyGVoh9QQkEVG1dQ3kgrmSEZ_-gd2OErxaG5uSeWysiUmkLvgjtlmwJC3_Li9YSqEmfJcqN5ndCPMYLOwOeE79_tq9mLlEfefyFQY2xT00J-_L_6tvNV9s5D5XY9dGmfLmh2PEhUhu/s200/pencils_orig.jpg" width="200" /></a><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQr0vX8Bm45YuDgO207YKZm-L4a21Ekjmia2QBJY0g0yjEbdLiqKMItNQJpozJxoRDbQ09vOAE5wmOtCsh-OcL01XmQHJXIE8zp7Wq2KyciO14dzHm2it1TTaC1OpOExYet1OAMDLUevwp/s1600/house_orig.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQr0vX8Bm45YuDgO207YKZm-L4a21Ekjmia2QBJY0g0yjEbdLiqKMItNQJpozJxoRDbQ09vOAE5wmOtCsh-OcL01XmQHJXIE8zp7Wq2KyciO14dzHm2it1TTaC1OpOExYet1OAMDLUevwp/s320/house_orig.jpg" width="320" /></a></div>
<br />
Initially, I looked at just 'dragging' each color down the image while increasing the normalization scale along the vertical to give a wash out effect further down the image. This represents how the water would continue to wash color away as it 'pulled' higher colors down the hanging image. This worked well for the rainbow version of the image:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQjNynxwGY8V1bwSDZSvYS3wZYsDOz3yhXw-KwdL7Ha5i2UVLUg3rCSTWPpD52EJyyQt901UOV8tjrIPRwMhk3R2nUeA8nMyCRiWufJZR72x19NHTuXVy4UhFPiF67gkGkfISp8v0BZnLe/s1600/pencils_basic.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="161" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQjNynxwGY8V1bwSDZSvYS3wZYsDOz3yhXw-KwdL7Ha5i2UVLUg3rCSTWPpD52EJyyQt901UOV8tjrIPRwMhk3R2nUeA8nMyCRiWufJZR72x19NHTuXVy4UhFPiF67gkGkfISp8v0BZnLe/s320/pencils_basic.jpg" width="320" /></a></div>
<br />
That result has a good 'bleed' in that stronger colors last longer down the vertical. However, with the colors in the house image the washout was too aggressive and wiped out much of the original content.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMmNBtRZPaYnV3e90L3mUBG25UdVNZCJlHnigZg3okle9iiOZu2o3nkzcr0X1W6MSsTEhFxAsLkWwil7L3HIJj_EYq-wMzuwhXgbCDyEA2d2wKgMsD-9ABc7HtUl3kZx4hxEyEMc0TL_Pt/s1600/house_basic.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjMmNBtRZPaYnV3e90L3mUBG25UdVNZCJlHnigZg3okle9iiOZu2o3nkzcr0X1W6MSsTEhFxAsLkWwil7L3HIJj_EYq-wMzuwhXgbCDyEA2d2wKgMsD-9ABc7HtUl3kZx4hxEyEMc0TL_Pt/s320/house_basic.jpg" width="320" /></a></div>
<br />
If you ignore the washout, though, it is clear that the draw of colors down the image is visible (especially around the windows) meaning that reducing the washout might still produce a more realistic effect if enough of the original image could be preserved.<br />
<br />
With that in mind, I added the ability to preserve a percentage of the original image while applying the effect. This allows a user to tune the washout per image to get just the right look. For example, too much reuse (20%) on the house and the washout effect has little impact:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFQLBKv5IJo6NX6SDk3dc5J-FusLH4-EOIliPWNKBLqJlMgKa5u6KPHY98vF9y1hxjDPMVQAqAtosXwwX2Dy8siQMmR_J48Z4xG8S7z_uUCro986gd7OCs8c_0yJjdm3-xv3eSKtKWDXws/s1600/house_reuse_0.2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjFQLBKv5IJo6NX6SDk3dc5J-FusLH4-EOIliPWNKBLqJlMgKa5u6KPHY98vF9y1hxjDPMVQAqAtosXwwX2Dy8siQMmR_J48Z4xG8S7z_uUCro986gd7OCs8c_0yJjdm3-xv3eSKtKWDXws/s320/house_reuse_0.2.jpg" width="320" /></a></div>
<br />
But taking that down to 7.5% gets closer to what I am looking for:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxQSU9i8cHQvEcRF1PBC1jETpkTGyeRbr6ATGGwQnJz6H2vHPr1YEGKsk78EZOscJuM94rGEi_f5htI8nKKV6SWoy7pG9-8VoeNgMLcON0MX2Ly9hygWeJXErVDI8IhA-sZkTCLlS7xU_M/s1600/house_reuse_0.075.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="180" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxQSU9i8cHQvEcRF1PBC1jETpkTGyeRbr6ATGGwQnJz6H2vHPr1YEGKsk78EZOscJuM94rGEi_f5htI8nKKV6SWoy7pG9-8VoeNgMLcON0MX2Ly9hygWeJXErVDI8IhA-sZkTCLlS7xU_M/s320/house_reuse_0.075.jpg" width="320" /></a></div>
<br />
There is still plenty for me to do such as handling the edges and water line better and further manipulating the washed image. For example, adding additional transforms to the result to get more seep/bleed on the washed colors. The image below shows a simple kernel convolution to add blur to the pencils, for example.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm21vrNh7y3bmUFA5iyrK5dLT1q-ub8TY_-WEvk1VSjneqRfAhboEfj7khPSBVpUZ0f2SZCQ9TBrvVCrHA5SGOrMMNKGWCmrk6iMl13EgS4jPdiuRYS3hWN_cm5o7NMgBYccvHJkifH-Wq/s1600/pencils_basic_blur.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="161" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjm21vrNh7y3bmUFA5iyrK5dLT1q-ub8TY_-WEvk1VSjneqRfAhboEfj7khPSBVpUZ0f2SZCQ9TBrvVCrHA5SGOrMMNKGWCmrk6iMl13EgS4jPdiuRYS3hWN_cm5o7NMgBYccvHJkifH-Wq/s320/pencils_basic_blur.jpg" width="320" /></a></div>
<br />ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-65614004773453482022016-05-06T22:40:00.001-04:002016-05-06T22:40:40.649-04:00Derive formula to convert Fahrenheit to CelsiusI had been revisiting linear regression the other day and as part of that review I challenged myself to use regression to derive a well known formula without actually looking it up (the formula, that is).<br />
<br />
The first example that came to my mind was the formula for converting temperature in Fahrenheit to Celsius. I wanted to see if I could derive that formula using two sample data sets and a simple linear regression. If the data was accurate enough, I should be able to derive the exact equation for converting between the two formats. In essence, I wanted to be able to come to the following:<br />
<br />
<div style="text-align: center;">
<span style="font-family: Courier New, Courier, monospace;">C = (F - 32) * 5/9</span></div>
<div style="text-align: left;">
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<span style="font-family: inherit;">Since I didn't have a data set with both types of observations available I was faced with a little 'chicken or the egg' situation. Seeing how this is just a fun little exercise I generated </span><span style="font-family: inherit;">my own data introducing some artificial error to stand in for true observations.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">After the 'observations' were available the regression was as simple as loading the data into R and running </span><span style="font-family: Courier New, Courier, monospace;">lm</span><span style="font-family: inherit;">. I ran through the entire manual procedure of how this works in <a href="http://uu-kk.blogspot.com/2011/04/y-mx-b.html" target="_blank">a previous post</a> s</span><span style="font-family: inherit;">o wont </span>repeat<span style="font-family: inherit;"> it here. The result of calling </span><span style="font-family: Courier New, Courier, monospace;">lm</span><span style="font-family: inherit;"> is a list and one of the elements of that list is the coefficients - these represent the intercept and slope of:</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<div style="text-align: center;">
<span style="text-align: start;"><span style="font-family: Courier New, Courier, monospace;">y = mx + b</span></span></div>
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Since the </span>Celsius<span style="font-family: inherit;"> observations are the response in my formula and the Fahrenheit observations are the predictors the I can create a similar equation where </span><span style="font-family: Courier New, Courier, monospace;">y</span><span style="font-family: inherit;"> represents the </span>Celsius<span style="font-family: inherit;"> values and </span><span style="font-family: Courier New, Courier, monospace;">x</span><span style="font-family: inherit;"> represents the Fahrenheit values. Given that, I get the following (after plugging in the slope and intercept):</span><br />
<span style="font-family: inherit;"><br /></span>
<div style="text-align: center;">
<span style="font-family: Courier New, Courier, monospace;">C = 0.555547 * F - 17.772318</span></div>
<div style="text-align: left;">
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<div style="text-align: left;">
<span style="font-family: inherit;">Expanding the </span>original<span style="font-family: inherit;"> equation for converting between Fahrenheit and </span>Celsius<span style="font-family: inherit;"> yields:</span></div>
<div style="text-align: center;">
<span style="font-family: inherit;"><br /></span></div>
<div style="text-align: center;">
<span style="font-family: Courier New, Courier, monospace;">C = (F * 5/9) - (32 * 5/9)</span></div>
<div style="text-align: center;">
<span style="font-family: Courier New, Courier, monospace;">C = F * 0.555556 - 17.777778</span></div>
<div style="text-align: left;">
<span style="font-family: Courier New, Courier, monospace;"><br /></span></div>
<span style="font-family: inherit;">So, given observations in both </span>Celsius<span style="font-family: inherit;"> and Fahrenheit (for the same events, of course) it is possible to derive an equation to convert between the two using linear regression.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">My observations are very highly correlated. Obviously, as this correlation falls the accuracy of the resulting equation will suffer. Fortunately there are tools to measure the correlation which </span><span style="font-family: inherit;">helps quantify this accuracy.</span><br />
<span style="font-family: inherit;"><br /></span>
<br />
<span style="font-family: inherit;">You can find the code for this </span>exercise<span style="font-family: inherit;"> on <a href="https://github.com/ezpz/regression" target="_blank">github</a>.</span>ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-57137461130715293592014-12-01T22:03:00.000-05:002014-12-01T22:03:20.284-05:00ReadabilityI have written <a href="http://uu-kk.blogspot.com/2011/01/closures-ruby-v-python.html" target="_blank">before</a> about some of the differences between Ruby and Python and my quirks generally tend to the Ruby approach. I think readability is another dimension between the two languages that highlights this for me - especially as it applies to understanding new code.<br />
<br />
I prefer to read code from left-to-right (LTR), top-to-bottom. This is natural for me as it models how I read other text. Code that processes right-to-left (RTL) and, in the severe case, bottom-to-top challenges my ability to easily understand intent. Method chaining highlights this quite nicely. For example, to transform the numbers of a list stored as a string in Python one might write:<br />
<div class="highlight">
<pre><span class="s">','</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span> <span class="p">:</span> <span class="nb">str</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">),</span> <span class="s">"1,2,3,4"</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">','</span><span class="p">)))</span>
</pre>
</div>
If I am reading that for the first time I need to mentally maintain a stack of operations (<span style="font-family: Courier New, Courier, monospace;">join</span>, <span style="font-family: Courier New, Courier, monospace;">map</span>, <span style="font-family: Courier New, Courier, monospace;">lambda</span>) until I've parsed most of the statement and arrived at the object being operated on: "1,2,3,4". This is due to the RTL application of the code. I've then got to backtrack over each operation on my mental stack to understand what the type and/or result of the overall statement will be. This is complicated by the fact that Python allows for some LTR (<span style="font-family: Courier New, Courier, monospace;">"1,2,3,4".split(',')</span>) mixed with the RTL.<br />
<br />
For first-time readers of the language this process is even more difficult if the behavior of <span style="font-family: Courier New, Courier, monospace;">join</span> or <span style="font-family: Courier New, Courier, monospace;">map</span> are not yet well understood.<br />
<br />
Ruby makes this significantly easier.<br />
<div class="highlight">
<pre><span class="s2">"1,2,3,4"</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">','</span><span class="p">)</span><span class="o">.</span><span class="n">map</span> <span class="p">{</span> <span class="o">|</span><span class="n">x</span><span class="o">|</span> <span class="n">x</span><span class="o">.</span><span class="n">to_i</span> <span class="o">**</span> <span class="mi">2</span> <span class="p">}</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s1">','</span><span class="p">)</span>
</pre>
</div>
When I read code similar to that I can store the type and result <i>as I am parsing the statement.</i> The initial object is immediately available and I can read the expression LTR as: split a string, apply a function to each element of the resulting array, and join that final array with the comma character. The fact that Ruby supports method chaining (on the built-in types) makes for much more readable code.<br />
<br />
I've singled out Python above but that was only for the sake of the example. As far as RTL languages go I think Python is middle of the road. Haskell, for example, has a much nicer syntax to deal with function composition (a similar, but not identical situation). On the other end of the spectrum is Lisp which is basically a bottom-to-top, RTL language.<br />
<br />
I can (and have) used these languages and many more; RTL vs. LTR in no way prevents one from being proficient over time. Certainly, most RTL code can be written in a way that it flows mostly LTR, top-to-bottom. Even when it isn't, well-written code can read by anyone with enough practice. For newcomers looking to read a new language however, there is less difficulty when the process more closely models how they read in general.<br />
<div>
<br /></div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-10563532895060272452014-11-25T20:53:00.000-05:002014-11-25T20:53:39.113-05:00Order of Events<span style="font-family: Courier New, Courier, monospace;">inet_ntoa</span> uses static storage for the result it returns. The GNU implementation of <span style="font-family: Courier New, Courier, monospace;">inet_ntoa</span> uses the following internally:<br />
<br />
<div class="highlight">
<pre><span class="k">static</span> <span class="n">__thread</span> <span class="kt">char</span> <span class="n">buffer</span><span class="p">[</span><span class="mi">18</span><span class="p">];</span>
</pre>
</div>
<br />
This makes the function thread-safe but this safety does not remove the need to worry about use within a single thread. Consider the following snippet of code:<br />
<br />
<div class="highlight">
<pre><span class="cp">#include <arpa/inet.h></span>
<span class="cp">#include <stdio.h></span>
<span class="kt">int</span> <span class="nf">main</span> <span class="p">()</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">in_addr</span> <span class="n">a</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">s_addr</span> <span class="o">=</span> <span class="mi">1234567</span> <span class="p">},</span> <span class="n">b</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">s_addr</span> <span class="o">=</span> <span class="mi">7654321</span> <span class="p">};</span>
<span class="k">return</span> <span class="n">printf</span> <span class="p">(</span><span class="s">"%s : %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">inet_ntoa</span> <span class="p">(</span><span class="n">a</span><span class="p">),</span> <span class="n">inet_ntoa</span> <span class="p">(</span><span class="n">b</span><span class="p">));</span>
<span class="p">}</span>
</pre>
</div>
<br />
Since <span style="font-family: Courier New, Courier, monospace;">inet_ntoa</span> is used twice within the argument list the result is dependent on the order of evaluation of the arguments to <span style="font-family: Courier New, Courier, monospace;">printf</span>. Regardless of which call gets evaluated first the output will <i>always</i> print the same IP address twice. On my system, the result is:<br />
<br />
<pre>135.214.18.0 : 135.214.18.0</pre>
<br />
This is a result of two things: arguments are evaluated <i>before</i> their results are used by <span style="font-family: Courier New, Courier, monospace;">printf</span>; and <span style="font-family: Courier New, Courier, monospace;">inet_ntoa</span> overwrites static storage on each invocation. Looking at the instructions for this C code makes this clear:<br />
<br />
<div class="highlight">
<pre><span class="nl">.LC0:</span>
<span class="nf">.string</span> <span class="s">"%s : %s\n"</span>
<span class="nf">.text</span>
<span class="nf">.globl</span> <span class="nv">main</span>
<span class="nf">.type</span> <span class="nv">main</span><span class="p">,</span> <span class="err">@</span><span class="nv">function</span>
<span class="nl">main:</span>
<span class="nf">pushl</span> <span class="o">%</span><span class="nb">ebp</span>
<span class="nf">movl</span> <span class="o">%</span><span class="nb">esp</span><span class="p">,</span> <span class="o">%</span><span class="nb">ebp</span>
<span class="nf">pushl</span> <span class="o">%</span><span class="nb">ebx</span>
<span class="nf">andl</span> <span class="kc">$</span><span class="o">-</span><span class="mi">16</span><span class="p">,</span> <span class="o">%</span><span class="nb">esp</span>
<span class="nf">subl</span> <span class="kc">$</span><span class="mi">32</span><span class="p">,</span> <span class="o">%</span><span class="nb">esp</span>
<span class="nf">movl</span> <span class="kc">$</span><span class="mi">1234567</span><span class="p">,</span> <span class="mi">24</span><span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span cla="" ss="p">)</span>
<span class="nf">movl</span> <span class="kc">$</span><span class="mi">7654321</span><span class="p">,</span> <span class="mi">28</span><span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span cla="" ss="p">)</span>
<span class="nf">movl</span> <span class="mi">28</span><span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span class="p">),</span> <span class="o">%</span><span class="nb">eax</span>
<span class="nf">movl</span> <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span class="p">)</span>
<span class="nf">call</span> <span class="nv">inet_ntoa</span>
<span class="nf">movl</span> <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="o">%</span><span class="nb">ebx</span> <span class="c1">; pointer stored to static memory</span>
<span class="nf">movl</span> <span class="mi">24</span><span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span class="p">),</span> <span class="o">%</span><span class="nb">eax</span>
<span class="nf">movl</span> <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span class="p">)</span>
<span class="nf">call</span> <span class="nv">inet_ntoa</span>
<span class="nf">movl</span> <span class="kc">$</span><span class="nv">.LC0</span><span class="p">,</span> <span class="o">%</span><span class="nb">edx</span>
<span class="nf">movl</span> <span class="o">%</span><span class="nb">ebx</span><span class="p">,</span> <span class="mi">8</span><span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span class="p">)</span> <span class="c1">; arg2 to printf; pointer from above</span>
<span class="nf">movl</span> <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="mi">4</span><span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span class="p">)</span> <span class="c1">; arg1 to printf; new pointer, same static memory</span>
<span class="nf">movl</span> <span class="o">%</span><span class="nb">edx</span><span class="p">,</span> <span class="p">(</span><span class="o">%</span><span class="nb">esp</span><span class="p">)</span> <span clas="" s="c1">; arg0 (format string)</span>
<span class="nf">call</span> <span class="nv">printf</span>
<span class="nf">movl</span> <span class="o">-</span><span class="mi">4</span><span class="p">(</span><span class="o">%</span><span class="nb">ebp</span><span class="p">),</span> <span class="o">%</span><span class="nb">ebx</span>
<span class="nf">leave</span>
<span class="nf">ret</span>
</pre>
</div>
<br />
The correct way to call <span style="font-family: Courier New, Courier, monospace;">inet_ntoa</span> consecutively is to save each result to a local variable.<br />
<br />
<div class="highlight"><pre><span class="cp">#include <arpa/inet.h></span>
<span class="cp">#include <string.h></span>
<span class="cp">#include <stdio.h></span>
<span class="kt">int</span> <span class="nf">main</span> <span class="p">()</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">in_addr</span> <span class="n">a</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">s_addr</span> <span class="o">=</span> <span class="mi">1234567</span> <span class="p">},</span> <span class="n">b</span> <span class="o">=</span> <span class="p">{</span> <span class="p">.</span><span class="n">s_addr</span> <span class="o">=</span> <span class="mi">7654321</span> <span class="p">};</span>
<span class="kt">char</span> <span class="n">ipa</span><span class="p">[</span><span class="mi">18</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">},</span> <span class="n">ipb</span><span class="p">[</span><span class="mi">18</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="n">strcpy</span> <span class="p">(</span><span class="n">ipa</span><span class="p">,</span> <span class="n">inet_ntoa</span> <span class="p">(</span><span class="n">a</span><span class="p">));</span>
<span class="n">strcpy</span> <span class="p">(</span><span class="n">ipb</span><span class="p">,</span> <span class="n">inet_ntoa</span> <span class="p">(</span><span class="n">b</span><span class="p">));</span>
<span class="k">return</span> <span class="n">printf</span> <span class="p">(</span><span class="s">"%s : %s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">ipa</span><span class="p">,</span> <span class="n">ipb</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-33536128505237775232014-10-05T01:54:00.000-04:002014-10-05T01:54:07.886-04:00Automating keystrokes via evdevIn a <a href="http://uu-kk.blogspot.com/2013/06/thats-key.html" target="_blank">previous post</a> I talked about how to capture keys out from under the X11 windowing system by reading from <span style="font-family: Courier New, Courier, monospace;">/dev/input/eventX</span>. These character devices can also be useful to generate input simulating keyboard activity.<br />
<br />
I circled back to this topic after having to automate user keyboard activity. I've accomplished similar tasks in the past with a tool named <a href="http://www.semicomplete.com/projects/xdotool/" target="_blank">xdotool</a> - unfortunately, in this case I did not have the luxury of being able to install software. The remainder of this post highlights the differences between consuming and producing events. (By the way, If you have the need to automate X actions I highly suggest looking at what <span style="font-family: Courier New, Courier, monospace;">xdotool</span> can do for you.)<br />
<br />
Consuming events is the easier of the two tasks: you simply read open the device and read events into the following structure:<br />
<br />
<div class="highlight">
<pre><span class="cm">/* See: /usr/include/linux/input.h */</span>
<span class="k">struct</span> <span class="n">input_event</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">timeval</span> <span class="n">time</span><span class="p">;</span>
<span class="n">__u16</span> <span class="n">type</span><span class="p">;</span>
<span class="n">__u16</span> <span class="n">code</span><span class="p">;</span>
<span class="n">__s32</span> <span class="n">value</span><span class="p">;</span>
<span class="p">};</span>
</pre>
</div>
<br />
Filter input with <span style="font-family: Courier New, Courier, monospace;">type == 1</span> and read the <span style="font-family: Courier New, Courier, monospace;">code</span> to get the key and <span style="font-family: Courier New, Courier, monospace;">value</span> to get the event (eg. press, release).<br />
<br />
To produce a compliant event the process is a little more complicated since the input needs to be synchronized. For each event there are three distinct sets of data that are required: setup (<span style="font-family: Courier New, Courier, monospace;">EV_MSC</span>); the event (<span style="font-family: Courier New, Courier, monospace;">EV_KEY</span>); and event synchronize (<span style="font-family: Courier New, Courier, monospace;">EV_SYN</span>). In addition to that, certain events are captured over time so this is a stateful process. An example of this is pressing <span style="font-family: Courier New, Courier, monospace;">Ctrl-L</span>; the control key is held down while another key is pressed and then released.<br />
<br />
The easiest way I found to initially grok the protocol is to capture all events while there is keyboard activity and see what the output looks like. Obviously, to produce fully compliant input you should consult API documentation or source code.<br />
<br />
An example of automatically entering a URL in the Chrome browser (Ctrl-L [URL]) would require the following inputs (the <span style="font-family: Courier New, Courier, monospace;">type</span>, <span style="font-family: Courier New, Courier, monospace;">code</span>, and <span style="font-family: Courier New, Courier, monospace;">value</span> members of <span style="font-family: Courier New, Courier, monospace;">struct input_event</span>). The input goes to the focused window (the standard behavior for X) so you need to place focus on the Chrome window for the following example.<br />
<br />
<div class="highlight">
<pre>4, 4, 29 <span class="c"># Setup</span>
1, 29, 1 <span class="c"># Press Ctrl key</span>
0, 0, 0 <span class="c"># Sync</span>
4, 4, 29 <span class="c"># Setup</span>
1, 29, 2 <span class="c"># Ctrl (value == 2 -> autorepeat)</span>
0, 0, 0 <span class="c"># Sync</span>
4, 4, 38
1, 38, 1 <span class="c"># Press 'L' key</span>
0, 0, 0
4, 4, 38
1, 38, 0 <span class="c"># Release 'L' key</span>
0, 0, 0
4, 4, 29
1, 29, 0 <span class="c"># Release Ctrl key</span>
0, 0, 0
<span class="c"># and so on for the URL string</span>
4, 4, 28
1, 28, 1 <span class="c"># Press Enter key</span>
0, 0, 0
4, 4, 28
1, 28, 0 <span class="c"># Release Enter key</span>
0, 0, 0
</pre>
</div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-67348421971773206362014-08-04T22:16:00.001-04:002014-08-04T22:16:17.382-04:00Blocking ptraceI've had occasion to change the functionality of binary programs for a variety of purposes - mostly to instrument for debugging or logging purposes. The techniques used to do this vary but can be used for both passive monitoring or actively changing the functionality of a program. I'd like to consider one of those techniques (<span style="font-family: Courier New, Courier, monospace;">ptrace)</span> in a little more detail here - specifically the ability to stop and arbitrarily modify a running process (think <span style="font-family: Courier New, Courier, monospace;">gdb</span>).<br />
<br />
I'm going to walk through a few examples of how to prevent a <span style="font-family: Courier New, Courier, monospace;">ptrace</span>-based approach to modifying a program. For illustrative purposes I'll use the following sample program that maintains a global variable to influence control flow at run time.<br />
<br />
<br />
<div class="highlight">
<pre><span class="kt">long</span> <span class="n">global_flag</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">main</span> <span class="p">()</span> <span class="p">{</span>
<span class="k">while</span> <span class="p">(</span><span class="n">global_flag</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Running ...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">sleep</span> <span class="p">(</span><span class="mi">5</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Someone captured my flag!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre>
</div>
<br />
The goal in these examples is to prevent the global variable (<span style="font-family: Courier New, Courier, monospace;">global_flag</span>) from being modified from an external process. I'm going to step through a few methods that could be used to modify this variable and how to prevent these techniques in turn.<br />
<br />
First, I'll look to just overwrite the value directly. Since we can look at the symbols it is trivial to construct a program that will place data into the memory of our choosing within the running process using <span style="font-family: Courier New, Courier, monospace;">ptrace</span>. Obviously, this case is easier than would be for most programs due to the simplicity of the example. The approach holds, however, regardless of the scale of the actual process.<br />
<br />
Suppose our process is PID 11896; we can find the memory location to modify using <span style="font-family: Courier New, Courier, monospace;">nm</span><br />
<br />
<br />
<div class="highlight">
<pre>...
08048410 t frame_dummy
U fwrite@@GLIBC_2.0
0804a018 D global_flag
08048434 T main
U sleep@@GLIBC_2.0
...
</pre>
</div>
<br />
<br />
If you don't have the program available you can still get at the symbols by looking in <span style="font-family: Courier New, Courier, monospace;">/proc</span> (e.g. <span style="font-family: Courier New, Courier, monospace;">nm /proc/11896/exe</span>).<br />
<br />
The program I'm using to change memory in a particular process:<br />
<br />
<br />
<div class="highlight">
<pre><span class="cp">#include <sys/ptrace.h></span>
<span class="cp">#include <stdio.h></span>
<span class="cp">#include <stdlib.h></span>
<span class="cp">#include <libgen.h></span>
<span class="cp">#include <string.h></span>
<span class="cp">#include <errno.h></span>
<span class="kt">void</span> <span class="nf">usage</span> <span class="p">(</span><span class="kt">char</span> <span class="o">*</span> <span class="n">prog</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"USAGE: %s <pid> <addr> <value></span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">basename</span> <span class="p">(</span><span class="n">prog</span><span class="p">));</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"-------------------------</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">" pid Process to modify</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">" addr Address to change</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">" value Value to write</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">exit</span> <span class="p">(</span><span class="mi">42</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span> <span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">)</span> <span class="p">{</span>
<span class="n">pid_t</span> <span class="n">pid</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">addr</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">long</span> <span class="n">value</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">old_value</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="mi">4</span> <span class="o">!=</span> <span class="n">argc</span><span class="p">)</span> <span class="p">{</span> <span class="n">usage</span> <span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span> <span class="p">}</span>
<span class="n">pid</span> <span class="o">=</span> <span class="n">strtol</span> <span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">10</span><span class="p">);</span>
<span class="n">addr</span> <span class="o">=</span> <span class="n">strtol</span> <span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">16</span><span class="p">);</span>
<span class="n">value</span> <span class="o">=</span> <span class="n">strtol</span> <span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">3</span><span class="p">],</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">10</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ptrace</span> <span class="p">(</span><span class="n">PTRACE_ATTACH</span><span class="p">,</span> <span class="n">pid</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span> <span class="p">{</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Unable to attach to PID: %d (%s)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="n">pid</span><span class="p">,</span> <span class="n">strerror</span> <span class="p">(</span><span class="n">errno</span><span class="p">));</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">old_value</span> <span class="o">=</span> <span class="n">ptrace</span> <span class="p">(</span><span class="n">PTRACE_PEEKDATA</span><span class="p">,</span> <span class="n">pid</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Original value: %ld</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">old_value</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ptrace</span> <span class="p">(</span><span class="n">PTRACE_POKEDATA</span><span class="p">,</span> <span class="n">pid</span><span class="p">,</span> <span class="n">addr</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span> <span class="p">{</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Unable to overwrite data @ 0x%lx (%s)</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span>
<span class="n">addr</span><span class="p">,</span> <span class="n">strerror</span> <span class="p">(</span><span class="n">errno</span><span class="p">));</span>
<span class="n">ptrace</span> <span class="p">(</span><span class="n">PTRACE_DETACH</span><span class="p">,</span> <span class="n">pid</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">ptrace</span> <span class="p">(</span><span class="n">PTRACE_DETACH</span><span class="p">,</span> <span class="n">pid</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre>
</div>
<br />
Considering the output of <span style="font-family: Courier New, Courier, monospace;">nm</span> and the PID, I'll call that as follows:<br />
<br />
<pre> ./modify 11896 0804a018 0</pre>
<br />
Then, in the terminal running the original process, you see the output "Someone captured my flag!" and the process ends.<br />
<br />
To prevent the above result, we need to prevent <span style="font-family: Courier New, Courier, monospace;">ptrace</span> from attaching to our running process. We can use <span style="font-family: Courier New, Courier, monospace;">ptrace</span> against itself <i>within</i> our program to achieve this goal. Since a process can only be traced by a single process at a time we can immediately set to trace ourselves when the program starts. The new program looks like this:<br />
<br />
<br />
<div class="highlight">
<pre><span class="kt">long</span> <span class="n">global_flag</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">main</span> <span class="p">()</span> <span class="p">{</span>
<span class="n">ptrace</span> <span class="p">(</span><span class="n">PTRACE_TRACEME</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">global_flag</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Running ...</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">sleep</span> <span class="p">(</span><span class="mi">5</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">fprintf</span> <span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Someone captured my flag!</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre>
</div>
<br />
Now, when we try to connect to the process at run time we get an error from <span style="font-family: Courier New, Courier, monospace;">ptrace</span>. This is true for any process that attempts to use <span style="font-family: Courier New, Courier, monospace;">ptrace</span> to this end (e.g. <span style="font-family: Courier New, Courier, monospace;">strace</span> will report: "Unable to attach to PID: 11940 (Operation not permitted)"). Notice that this is also the case when trying to attach to the process as root.<br />
<br />
Note for Ubuntu users: it is now the default behavior to prevent attaching to a process unless it is a direct child of the tracing process. The root user can still attach to arbitrary processes but other users are restricted (see <span style="font-family: Courier New, Courier, monospace;">/etc/sysctl.d/10-ptrace.conf</span> or <i>man <b>prctl</b></i>).<br />
<br />
Unfortunately, that does not entirely solve the problem. If, instead of having the running process, a user can spawn the process within a debugger the above mechanism can still be defeated. Consider the following example.<br />
<br />
<br />
<div class="highlight">
<pre>[ezpz@mercury (ptrace)]$ gdb prevent_2
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
(gdb) b main
Breakpoint 1 at 0x8048467
(gdb) r
Starting program: prevent_2
Breakpoint 1, 0x08048467 in main ()
(gdb) set {int}0x0804a01c = 0
(gdb) c
Continuing.
Someone captured my flag!
[Inferior 1 (process 12130) exited normally]
(gdb)
</pre>
</div>
<br />
Since <span style="font-family: Courier New, Courier, monospace;">gdb</span> can set a breakpoint at main control can be gained (by the debugger) prior to being able to self-trace. This situation can be identified from within the traced program, however, by looking at the return value of the call to <span style="font-family: Courier New, Courier, monospace;">ptrace</span>.<br />
<br />
<br />
<div class="highlight">
<pre><span class="gd">--- prevent_2.c 2014-08-02 23:33:03.091366946 -0400</span>
<span class="gi">+++ prevent_3.c 2014-08-02 23:33:06.939366991 -0400</span>
<span class="gu">@@ -5,7 +5,10 @@</span>
long global_flag = 1;
int main () {
<span class="gd">- ptrace (PTRACE_TRACEME, 0, 0, 0);</span>
<span class="gi">+ if (0 != ptrace (PTRACE_TRACEME, 0, 0, 0)) {</span>
<span class="gi">+ fprintf (stderr, "Tsk tsk tsk...");</span>
<span class="gi">+ return 1;</span>
<span class="gi">+ }</span>
while (global_flag) {
fprintf (stderr, "Running ...\n");
sleep (5);
</pre>
</div>
<br />
Now <span style="font-family: Courier New, Courier, monospace;">gdb</span> can set the breakpoint and modify the memory but when execution continues the program will exit when the call to <span style="font-family: Courier New, Courier, monospace;">ptrace</span> (from within <span style="font-family: Courier New, Courier, monospace;">gdb</span>) fails.<br />
<br />
The observant reader will realize that, from within the debugger, the return value check can also be modified. In fact, nothing prevents someone from directly modifying the binary prior to running the program. There are a variety of mechanisms - both static and dynamic - that can get around the above methods. Some can be prevented; others not. What these mechanisms <i>do</i> provide is a relatively cheap investment that raises the bar when trying to dynamically change program behavior.<br />
<div>
<br /></div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-61537512575370035822014-06-29T00:09:00.000-04:002014-06-29T00:09:58.726-04:00Of Binary Bombs (the secret)So far, I've described six stages of this bomb along with their solution. These stages have built up in difficulty while describing often used programming constructs such as: <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-1.html" target="_blank">string comparison</a>, <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-2.html" target="_blank">arrays</a>, <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-3.html" target="_blank">a switch statement</a>, <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-4.html" target="_blank">recursion</a>, <a href="http://uu-kk.blogspot.com/2014/06/of-binary-bombs-part-5.html" target="_blank">lookup tables</a>, <a href="http://uu-kk.blogspot.com/2014/06/of-binary-bombs-part-6.html" target="_blank">linked lists</a>, and here in the final stage a binary search tree.<br />
<br />
While solving the <a href="http://uu-kk.blogspot.com/2014/06/of-binary-bombs-part-6.html" target="_blank">6th phase</a> will successfully defuse the bomb there is a curious section of code executed at the end. The most important thing to notice is that we can not trigger the bomb from this point on; the entire function will only jump to a graceful exit unless we unlock the secrets. Recall the code for <span style="font-family: Courier New, Courier, monospace;">sym.phase_defused</span>:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjV8QQ2TpvchCPKJqoexUv_HPQ3_b5Te-uP1kKLNS1M1G3LrYWv2uivgKlW1sSaE1Uj8SLp90xbA6zF-ZrwRA4XCzxiSp5cQsx808EaD-KzegkHNh_9NJNfMa-zylQxj68Y8OCXQ-bM7sWz/s1600/phase_defused.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjV8QQ2TpvchCPKJqoexUv_HPQ3_b5Te-uP1kKLNS1M1G3LrYWv2uivgKlW1sSaE1Uj8SLp90xbA6zF-ZrwRA4XCzxiSp5cQsx808EaD-KzegkHNh_9NJNfMa-zylQxj68Y8OCXQ-bM7sWz/s1600/phase_defused.png" height="297" width="320" /></a></div>
<br />
<div>
<div>
Initially, there is a check for the total number of lines entered so far; until this point that check has failed. Here the jump is bypassed and execution proceeds to call to <span style="font-family: Courier New, Courier, monospace;">sscanf</span>. Two important arguments to <span style="font-family: Courier New, Courier, monospace;">sscanf</span> here are: the format string: <span style="font-family: Courier New, Courier, monospace;">str._d_s</span> (<span style="font-family: Courier New, Courier, monospace;">%d %s</span>) and <span style="font-family: Courier New, Courier, monospace;">0x804b770</span>. From that first argument we can infer the types that will be read and the second indicates from where we will read that data. Unlike in prior phases, there is no input line read to start this phase so <span style="font-family: Courier New, Courier, monospace;">0x804b770</span> must already have data located in it.</div>
<div>
<br /></div>
<div>
If we look at what is stored there we find nothing special - certainly not something that looks like a number followed by a string.</div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRKXDfHco5lAALVqaVRzz2bRGn49tGjMnxck7vM6MqsvCuRjmJgT8tBs-3GaZvPsCVjRRDCOt1N5J_9CpNWQd5N7nIHE7bcAxmJiozsNkv29D_2Es2VSFL37FwSgE5qex1Cn5mYruMAlhb/s1600/mem1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRKXDfHco5lAALVqaVRzz2bRGn49tGjMnxck7vM6MqsvCuRjmJgT8tBs-3GaZvPsCVjRRDCOt1N5J_9CpNWQd5N7nIHE7bcAxmJiozsNkv29D_2Es2VSFL37FwSgE5qex1Cn5mYruMAlhb/s1600/mem1.png" height="31" width="320" /></a></div>
<div>
<br /></div>
<div>
<div>
This analysis is using a static binary, however, so this memory may get filled in at runtime. We have looked at each function in turn and the only changes in memory are driven by the inputs we provide. So where, <i>is</i> this address in memory? If we look for known addresses around this we see that <span style="font-family: Courier New, Courier, monospace;">0x804b770</span> is located at <span style="font-family: Courier New, Courier, monospace;">sym.input_strings+240</span>. Remember in <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-2.html" target="_blank">phase 2</a> we determined that <span style="font-family: Courier New, Courier, monospace;">sym.input_strings</span> was a global array of 80-byte character arrays to hold the inputs we provide. So 240 bytes beyond that is the 4th solution we provided (the number 9). There was no string after that but that is part of the secret...</div>
<div>
<br /></div>
<div>
The <span style="font-family: Courier New, Courier, monospace;">sym.read_line</span> grabs the entire input line and in <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-4.html" target="_blank">phase 4</a> <span style="font-family: Courier New, Courier, monospace;">sscanf</span> only looked for <span style="font-family: Courier New, Courier, monospace;">%d</span> which leaves the remainder of the buffer untouched. Nothing prevents us from providing some trailing values after the number so long as there is a space between them.</div>
<div>
<br /></div>
<div>
Supposing we did provide a trailing string the next step is to check that string against <span style="font-family: Courier New, Courier, monospace;">str.austinpowers</span>. So that is the secret to accessing the secret phase: update the 4th input to be '9 austinpowers'. </div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxxxVn257QCakZ4dgt5Zfj79UZ2csRER5Q-GdVOd1_Y_kZgLgckwNZRO5_T59iLrgDEgj1FBFnsEjFEJuzO5pTqeGFG5gF1vYU5VRgOpLZAjckGsw1nq0WSLu9LWD3lZv0Tir2BLUqe-ua/s1600/secret_phase.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxxxVn257QCakZ4dgt5Zfj79UZ2csRER5Q-GdVOd1_Y_kZgLgckwNZRO5_T59iLrgDEgj1FBFnsEjFEJuzO5pTqeGFG5gF1vYU5VRgOpLZAjckGsw1nq0WSLu9LWD3lZv0Tir2BLUqe-ua/s1600/secret_phase.png" height="298" width="320" /></a></div>
<div>
<br /></div>
<div>
<div>
The secret phase reads in an additional line from the input stream and converts it to a long value using <span style="font-family: Courier New, Courier, monospace;">strtol</span>. That value is decremented and compared against <span style="font-family: Courier New, Courier, monospace;">0x3e8</span> (1000) - the bomb is triggered if our decremented value is greater than that. If the input passes that check we enter the final function: <span style="font-family: Courier New, Courier, monospace;">sym.fun7</span>. Prior to going into detail, however, it is important to note that the return value from this function needs to be <span style="font-family: Courier New, Courier, monospace;">0x7</span> to avoid triggering the bomb. The initial value to <span style="font-family: Courier New, Courier, monospace;">sym.fun7</span> is <span style="font-family: Courier New, Courier, monospace;">sym.n1</span> (<span style="font-family: Courier New, Courier, monospace;">0x0804b320</span>).</div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOZI-eV4FwehnnDPdg6qIobvndfoEMbEJBfSBHYFVdvhyphenhyphen5gcQqPhL-AcOgzuvEIuoLc1mHBl-6JbthXo8iuC3_UCPyQrzu-Be5hddv73QUefM0YvjN74zbMb2Hgjin3rbk4LmFifwN92En/s1600/fun7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOZI-eV4FwehnnDPdg6qIobvndfoEMbEJBfSBHYFVdvhyphenhyphen5gcQqPhL-AcOgzuvEIuoLc1mHBl-6JbthXo8iuC3_UCPyQrzu-Be5hddv73QUefM0YvjN74zbMb2Hgjin3rbk4LmFifwN92En/s1600/fun7.png" height="320" width="240" /></a></div>
<div>
<br /></div>
<div>
<div>
This is a recursive function very similar to the one explained in <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-4.html" target="_blank">stage 4</a>. To understand what is happening with the control flow it is important to first understand what is contained in <span style="font-family: Courier New, Courier, monospace;">sym.n1</span>. However, unlike <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-4.html" target="_blank">stage 4</a> this variable name gives us little indication of what the memory may contain.</div>
<div>
<br /></div>
<div>
Looking at the first 16 bytes of that memory location we see the values are (after adjusting for endianess and assuming 32-byte values): <span style="font-family: Courier New, Courier, monospace;">0x24</span>, <span style="font-family: Courier New, Courier, monospace;">0x0804b314</span>, <span style="font-family: Courier New, Courier, monospace;">0x0804b308</span>, <span style="font-family: Courier New, Courier, monospace;">0x0</span>. The second two look very much like memory addresses in a range very close to <span style="font-family: Courier New, Courier, monospace;">sym.n1</span>.</div>
</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoLBv6e2AO-WrKzd0hmqR9KPiRqeWnOHeRnZV3T7EspxcAIjcthbobTf46B4XqUT1U4wzGRN1bE-Zj1RV5UQxdbXICUPus-aGhKQ06YphxBP8Y7Qxi9exX-nXccfR4wl5lzUCbwYCx4aYM/s1600/n1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjoLBv6e2AO-WrKzd0hmqR9KPiRqeWnOHeRnZV3T7EspxcAIjcthbobTf46B4XqUT1U4wzGRN1bE-Zj1RV5UQxdbXICUPus-aGhKQ06YphxBP8Y7Qxi9exX-nXccfR4wl5lzUCbwYCx4aYM/s1600/n1.png" height="30" width="320" /></a></div>
<div>
<br /></div>
<div>
Following these two addresses we arrive at a very similar layout. This begins to resemble a recursive data structure most people will recognize: a binary tree. In C it is represented as:</div>
<br />
<div class="highlight">
<pre><span class="k">struct</span> <span class="n">bst</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">value</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">bst</span> <span class="o">*</span><span class="n">left</span><span class="p">,</span> <span class="o">*</span><span class="n">right</span><span class="p">;</span>
<span class="p">};</span>
</pre>
</div>
<br />
Mapping out the entire tree yields the following:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg57yFscAqG8m3L6d5lgGyc5eqLuw-Z8s_smmni0M-Pkxbe_YVfUJHBCsxaeBzlz0tK9q2BUdkJ28NXOCL-oAwVJsg5GQE9oN5XS4QfbpY4wQQvTshRkiEXsMGkhs2he5xs4HqYin2LKYyJ/s1600/btree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg57yFscAqG8m3L6d5lgGyc5eqLuw-Z8s_smmni0M-Pkxbe_YVfUJHBCsxaeBzlz0tK9q2BUdkJ28NXOCL-oAwVJsg5GQE9oN5XS4QfbpY4wQQvTshRkiEXsMGkhs2he5xs4HqYin2LKYyJ/s1600/btree.png" height="128" width="320" /></a></div>
<br />
Now, that will make it easier to follow the control flow in <span style="font-family: Courier New, Courier, monospace;">sym.fun7</span> but there are still some pieces that are needed before a solution can be derived directly. Back in <span style="font-family: Courier New, Courier, monospace;">sym.fun7</span>, there is an initial check for a nil next pointer and then the remainder of the function follows a pre-order traversal of the binary search tree.<br />
<br />
The main concern at this point is understanding how the return value is calculated. Ultimately, we need to understand when the return value will be 7 so that we can provide input that will force a return at that particular point. The control flow on the left subtree either continues down the next left subtree when the argument node value is less than the current node or the right subtree if the value is greater than the current node. If the value is equal to the current node, <span style="font-family: Courier New, Courier, monospace;">eax</span> is set to zero and the function returns.<br />
<br />
The return path from a left tree traversal simply doubles the value of <span style="font-family: Courier New, Courier, monospace;">eax</span> and returns to the caller. The return from the right subtree is a little more interesting - in addition to <span style="font-family: Courier New, Courier, monospace;">eax</span> being doubled it is also incremented by one prior to returning to the caller. Since <span style="font-family: Courier New, Courier, monospace;">eax</span> is used to hold intermediate memory addresses, the calculation probably only makes sense when the search value is found in the tree (thus setting <span style="font-family: Courier New, Courier, monospace;">eax</span> to 0).<br />
<br />
Since a found value returns 0 initially any return from a left subtree will only propagate the zero value; in order to get to seven we need to rely on the increment on the return path of the right subtree path. The only path that leads to the target return value is the one from the rightmost leaf in the tree.<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEZcGFIlulIapJcLlZYNpqUcZ1CyXxZzod2GE202-uU38Z42hkLqotPPS0xJ8uIL5M2BfGjAQ0Ax7chHO7h7omCRZWn8sLi6MjnnMgr0S5KIkWAnrKDSaZM6XxV2hFbfI0vGtY9egM6EBt/s1600/btree2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEZcGFIlulIapJcLlZYNpqUcZ1CyXxZzod2GE202-uU38Z42hkLqotPPS0xJ8uIL5M2BfGjAQ0Ax7chHO7h7omCRZWn8sLi6MjnnMgr0S5KIkWAnrKDSaZM6XxV2hFbfI0vGtY9egM6EBt/s1600/btree2.png" height="142" width="320" /></a></div>
<div>
<br /></div>
<div>
To force a return value of 7 we must provide a value of 1001.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiLTddyeVyFlpGJQLraq7Gii-9BJP3cc_LozBbWy2ltAGExDfzve3uIWCFgYoEQitZTW52lB87-1ylEV_NDRuwYnKWT9N26Dv4gkQMegGxAF_ksKcvqG2w2ZOj8YtTm4O-1ee-jaEopH3H/s1600/solution_secret.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjiLTddyeVyFlpGJQLraq7Gii-9BJP3cc_LozBbWy2ltAGExDfzve3uIWCFgYoEQitZTW52lB87-1ylEV_NDRuwYnKWT9N26Dv4gkQMegGxAF_ksKcvqG2w2ZOj8YtTm4O-1ee-jaEopH3H/s1600/solution_secret.png" height="219" width="320" /></a></div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikJq5SHW2Z6bB8DB2QEwNdfxkoqUXejCs9tOoxywYlJvyroLVG0qGYnoMRkzh64IzENeppWopNCyCwP24VrRBiswYNHgZTA8BTj-Erm4yoM9aZjA0zZmu5ghVWfUYQL8olITBnyozcq5H7/s1600/austinpowers.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEikJq5SHW2Z6bB8DB2QEwNdfxkoqUXejCs9tOoxywYlJvyroLVG0qGYnoMRkzh64IzENeppWopNCyCwP24VrRBiswYNHgZTA8BTj-Erm4yoM9aZjA0zZmu5ghVWfUYQL8olITBnyozcq5H7/s1600/austinpowers.jpg" height="155" width="200" /></a></div>
<div>
<br /></div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com2tag:blogger.com,1999:blog-2191364361395963546.post-17491294971431482922014-06-10T00:09:00.000-04:002014-06-29T00:12:22.850-04:00Of Binary Bombs (part 6)In the last installment (<a href="http://uu-kk.blogspot.com/2014/06/of-binary-bombs-part-5.html" target="_blank">phase 5</a>) Dr. Evil used masking and a lookup table to try and defeat any secret agent. I will continue on here with the final phase of this binary bomb: phase 6. (This isn't <i>really</i> the final stage - check out the <a href="http://uu-kk.blogspot.com/2014/06/of-binary-bombs-secret.html" target="_blank">secret stage</a>)<br />
<br />
Our input string is loaded into the <span style="font-family: Courier New, Courier, monospace;">edx</span> register as usual but then there is a strange reference to a <span style="font-family: Courier New, Courier, monospace;">sym.node1</span> that gets loaded into local stack space. That makes our first order of business to find what is stored in <span style="font-family: Courier New, Courier, monospace;">sym.node1</span>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmancTNupDSG0ehHeYROO1_4MixSSRHpEGvuhWmwbyFyWvodWvhH38xao4Kmgbt4UxXrpQxaw5abJetUIP9zxB431zaf_8zFx23zQJRB7O_Cu1uW00Bal_9zFa7NEdCZNHai01molbGPvJ/s1600/node1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjmancTNupDSG0ehHeYROO1_4MixSSRHpEGvuhWmwbyFyWvodWvhH38xao4Kmgbt4UxXrpQxaw5abJetUIP9zxB431zaf_8zFx23zQJRB7O_Cu1uW00Bal_9zFa7NEdCZNHai01molbGPvJ/s1600/node1.png" height="42" width="320" /></a></div>
<br />
<div>
The name <span style="font-family: Courier New, Courier, monospace;">node1</span> gives a fairly blatant hint at how we should look at this memory (without the symbols, this task would be a whole lot less straightforward). The first several bytes are pretty sparse: interpreting as 32-bit values we get <span style="font-family: Courier New, Courier, monospace;">0xfd</span> (253), <span style="font-family: Courier New, Courier, monospace;">0x01</span> (1), and then the value <span style="font-family: Courier New, Courier, monospace;">0x0804b260</span> (this is stored in little endian). That looks like another memory address; lets see.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCs2uXrjFbVojYI_dgL29MnoPXqNDq9TgSo6CBWACROeSyzSVAGs8hYZVfgxNNbyHoFAYQGLLRchBvJmglPqef-hdefAPMeUgDYv0gu3Hdy68fkOdRNZfHVTAdOzm3twXA904zs4nuckcg/s1600/node_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjCs2uXrjFbVojYI_dgL29MnoPXqNDq9TgSo6CBWACROeSyzSVAGs8hYZVfgxNNbyHoFAYQGLLRchBvJmglPqef-hdefAPMeUgDYv0gu3Hdy68fkOdRNZfHVTAdOzm3twXA904zs4nuckcg/s1600/node_2.png" height="40" width="320" /></a></div>
<div>
<br /></div>
Same structure. <span style="font-family: Courier New, Courier, monospace;">0x02d5</span> (725), <span style="font-family: Courier New, Courier, monospace;">0x02</span> (2), <span style="font-family: Courier New, Courier, monospace;">0x0804b254</span>. And the pattern continues. I'll take a leap and say that we have something that looks like the following C structure:<br />
<br />
<div class="highlight">
<pre><span class="k">struct</span> <span class="n">list_</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">value_</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">index_</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">list_</span> <span class="o">*</span><span class="n">next_</span><span class="p">;</span>
<span class="p">};</span>
</pre>
</div>
<br />
I'm going to walk the list for a while to collect the values (and verify the counter continues in order). That results in the following <span style="font-family: Courier New, Courier, monospace;">(value_,index_)</span> pairs starting from <span style="font-family: Courier New, Courier, monospace;">sym.node1</span>.<br />
<br />
<pre>(253, 1)
(725, 2)
(301, 3)
(997, 4)
(212, 5)
(432, 6)
</pre>
<br />
The list is terminated at that point with a null <span style="font-family: Courier New, Courier, monospace;">next_</span> pointer. At this point, the values of the list are known so it is appropriate to resume walking the body of <span style="font-family: Courier New, Courier, monospace;">sym.phase_6</span>.<br />
<br />
Currently, the input string is loaded into <span style="font-family: Courier New, Courier, monospace;">edx</span> and the linked list is stored in a local value; next a local buffer is loaded to <span style="font-family: Courier New, Courier, monospace;">eax</span> and <span style="font-family: Courier New, Courier, monospace;">sym.read_six_numbers </span>is called. I described this function in <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-2.html" target="_blank">phase 2</a> and we can expect that the local buffer will contain our six input numbers after the call. I have a guess at this point what they should be but I want to verify first to avoid any of Dr. Evil's tricks.<br />
<br />
The remainder of this phase can be broken down into four distinct loops. They are:<br />
<ol>
<li>Verify the input values</li>
<li>Collect the nodes of the above list according to the input values</li>
<li>Reorder the original list with that collection</li>
<li>Verify the resulting list</li>
</ol>
<div>
While the input verification has a nested loop it is the most straightforward of the steps: it checks that all values are unique and less than 7.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5ubpiD2lqChPAuyNMDuMmynCG-qZRp3WvLLwY5v-215yYyYsHwFkiCj6UIKBvb5fURg-cky0o-2TZNG23aybK-N6a4a3wwPEcvO_AD2CBzi_Cs-drR43Vu0fEdy_KbVo61Az6CC8-4GUr/s1600/loop1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5ubpiD2lqChPAuyNMDuMmynCG-qZRp3WvLLwY5v-215yYyYsHwFkiCj6UIKBvb5fURg-cky0o-2TZNG23aybK-N6a4a3wwPEcvO_AD2CBzi_Cs-drR43Vu0fEdy_KbVo61Az6CC8-4GUr/s1600/loop1.png" height="290" width="320" /></a></div>
<div>
<br /></div>
<div>
Initially, collecting nodes according to the input values is a little harder to grasp as it too is a nested loop construct but is now dealing with offsetting into structures and moving memory locations (C pointers) around.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8ZZ1HAHVqmPks1bjRxycVIdBF170BDA8tzJXtYLDyKAaTg6Th-3MYCS06_lSLgHl4E_K8SqvRkYXLnfUoyErwDJKkStjCeWGh20jAbOnCBk8XG-MOv53UfE7XI8j3JizAtj4dsw30-9XA/s1600/loop2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8ZZ1HAHVqmPks1bjRxycVIdBF170BDA8tzJXtYLDyKAaTg6Th-3MYCS06_lSLgHl4E_K8SqvRkYXLnfUoyErwDJKkStjCeWGh20jAbOnCBk8XG-MOv53UfE7XI8j3JizAtj4dsw30-9XA/s1600/loop2.png" height="231" width="320" /></a></div>
<div>
<br /></div>
<div>
Specifically, the commented line below walks the linked list. This is something that would have not been evident had I not understood the memory in <span style="font-family: Courier New, Courier, monospace;">sys.node1</span>.</div>
<br />
<div class="highlight">
<pre> <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="p">[</span><span class="nb">edx</span><span class="o">+</span><span class="nb">ecx</span><span class="p">]</span>
<span class="nf">lea</span> <span class="nb">esi</span><span class="p">,</span> <span class="p">[</span><span class="nb">esi</span><span class="p">]</span>
<span class="nf">mov</span> <span class="nb">esi</span><span class="p">,</span> <span class="p">[</span><span class="nb">esi</span><span class="o">+</span><span class="mh">0x8</span><span class="p">]</span> <span class="c1">; this uses the 'next' pointer</span>
<span class="nf">inc</span> <span class="nb">ebx</span>
<span class="nf">cmp</span> <span class="nb">ebx</span><span class="p">,</span> <span class="nb">eax</span>
</pre>
</div>
<br />
The third step, reordering the original list, is short and looks simple enough but took me some time to fully grok. I needed to understand that the previous step was storing local copies of the nodes in the original list. From that the original list is <i>overwirtten</i> here in the order specified by the input.<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkUGyUU-PqlqZCJELG_guUiH04tUAiXkvfyNnppjToGUUoec9vt6xODIyn7jC83ba_rBDSvFmdQ4OIm-qeT-F_6O2TQEB7ZZWgUPuB-vrvz8XTiPYLzNXlCIyglMDLd_z2VUwG1U77QHaS/s1600/loop3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkUGyUU-PqlqZCJELG_guUiH04tUAiXkvfyNnppjToGUUoec9vt6xODIyn7jC83ba_rBDSvFmdQ4OIm-qeT-F_6O2TQEB7ZZWgUPuB-vrvz8XTiPYLzNXlCIyglMDLd_z2VUwG1U77QHaS/s1600/loop3.png" height="117" width="320" /></a></div>
<div>
<br /></div>
<div>
Finally, the overwritten list is checked to ensure that the <span style="font-family: Courier New, Courier, monospace;">value_</span> elements are arranged in decreasing order.</div>
<div>
<br /></div>
<div>
With that final piece of information the necessary input sequence becomes clear - the solution is to provide <span style="font-family: Courier New, Courier, monospace;">index_</span> values that order the <span style="font-family: Courier New, Courier, monospace;">value_</span> members from largest to smallest.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbVwDgw6VnmAIP0gZZ_s__xL457BMWDTBkGrfQZct9H15_ktPnP1-nvSX7n9gF8tZxdy7KlngLo3XL2zlzQqQDZgDAoOIJL-1gGN_zSU8WX4zYjNdbQWoMWOFk8aCEz9psipg2RFoDT8Ji/s1600/solution_6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbVwDgw6VnmAIP0gZZ_s__xL457BMWDTBkGrfQZct9H15_ktPnP1-nvSX7n9gF8tZxdy7KlngLo3XL2zlzQqQDZgDAoOIJL-1gGN_zSU8WX4zYjNdbQWoMWOFk8aCEz9psipg2RFoDT8Ji/s1600/solution_6.png" height="171" width="320" /></a></div>
<div>
<br /></div>
<div>
Below is a mapping of this functionality to some C code that it may have come from.</div>
<br />
<br />
<div class="highlight">
<pre><span class="k">struct</span> <span class="n">list_</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">value_</span><span class="p">,</span> <span class="n">index_</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">list_</span> <span class="o">*</span><span class="n">next</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="nf">phase_6</span> <span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span> <span class="n">input</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">list_</span> <span class="o">*</span><span class="n">list</span> <span class="o">=</span> <span class="p">...,</span> <span class="o">*</span><span class="n">node</span> <span class="o">=</span> <span class="n">list</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">values</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
<span class="k">struct</span> <span class="n">list_</span> <span class="o">*</span><span class="n">nodes</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
<span class="n">read_six_numbers</span> <span class="p">(</span><span class="n">input</span><span class="p">,</span> <span class="n">values</span><span class="p">);</span>
<span class="c1">// 0x08048db8 - 0x08048e00</span>
<span class="k">for</span> <span class="p">(;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">6</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">></span> <span class="mi">6</span><span class="p">)</span> <span class="n">explode_bomb</span> <span class="p">();</span>
<span class="k">for</span> <span class="p">(;</span> <span class="n">j</span> <span class="o"><</span> <span class="mi">6</span><span class="p">;</span> <span class="o">++</span><span class="n">j</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">values</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">values</span><span class="p">[</span><span class="n">j</span><span class="p">])</span>
<span class="n">explode_bomb</span> <span class="p">();</span>
<span class="p">}</span>
<span class="c1">// 0x08048e02 - 0x08048e42</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">6</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">list</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">node</span><span class="o">-></span><span class="n">index_</span> <span class="o">==</span> <span class="n">values</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="p">{</span>
<span class="n">nodes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">node</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">node</span><span class="o">-></span><span class="n">next</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// 0x08048e44 - 0x08048e60</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">list</span> <span class="o">=</span> <span class="n">nodes</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">list</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">i</span> <span class="o"><=</span> <span class="mi">5</span><span class="p">)</span> <span class="p">{</span>
<span class="n">node</span><span class="o">-></span><span class="n">next</span> <span class="o">=</span> <span class="n">nodes</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">node</span><span class="o">-></span><span class="n">next</span><span class="p">;</span>
<span class="o">++</span><span class="n">i</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">node</span><span class="o">-></span><span class="n">next</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="c1">// 0x08048e67 - 0x08048e85</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">list</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">5</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="k">if</span> <span class="p">(</span><span class="n">node</span><span class="o">-></span><span class="n">value_</span> <span class="o"><</span> <span class="n">node</span><span class="o">-></span><span class="n">next</span><span class="o">-></span><span class="n">value_</span><span class="p">)</span>
<span class="n">explode_bomb</span> <span class="p">();</span>
<span class="p">}</span></pre>
</div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com4tag:blogger.com,1999:blog-2191364361395963546.post-41246607956666027092014-06-03T23:34:00.000-04:002014-06-10T00:09:59.388-04:00Of Binary Bombs (part 5)<a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-4.html" target="_blank">Part 4</a> detailed a recursive function that calculated the nth entry into the Fibonacci sequence. Here we continue with the next stage to defeating Dr. Evil.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhpe-1bUKNrRPFjD83LtJlPp3N_Go-z5DvpbAiKVez108ebDRakvH1wSsS2Mvd0iNlXdXN3S60Ev6Xmk7wBTIsX8zseUGNAMQ1ZPiT_FFknZUs_VZ9BDf5qLTyVHEwBeetDOQJpeKHbywv/s1600/phase_5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhhpe-1bUKNrRPFjD83LtJlPp3N_Go-z5DvpbAiKVez108ebDRakvH1wSsS2Mvd0iNlXdXN3S60Ev6Xmk7wBTIsX8zseUGNAMQ1ZPiT_FFknZUs_VZ9BDf5qLTyVHEwBeetDOQJpeKHbywv/s1600/phase_5.png" height="320" width="272" /></a></div>
<br />
There is a familiar face here: <span style="font-family: Courier New, Courier, monospace;">sym.string_length</span>. Recall in <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-1.html" target="_blank">phase 1</a> I glazed over <span style="font-family: Courier New, Courier, monospace;">sym.string_not_equal</span> which had buried inside a call to <span style="font-family: Courier New, Courier, monospace;">sym.string_length</span> - if you've been following along at home this is not a surprise. The result of this call (which expects our input string as an argument) should be 6.<br />
<br />
<div class="highlight">
<pre> <span class="nf">cmp</span> <span class="nb">eax</span><span class="p">,</span> <span class="mh">0x6</span>
</pre>
</div>
<br />
This is our first clue to solving the riddle.<br />
<br />
Peeking ahead a little there are two memory locations referenced directly: <span style="font-family: Courier New, Courier, monospace;">sym.array.123</span> and <span style="font-family: Courier New, Courier, monospace;">str.giants</span>. Before we get too far into the details of <span style="font-family: Courier New, Courier, monospace;">sym.phase_5</span> lets look at what each of these contain. Using the memory printing capabilities of radare2 we can do this with: <span style="font-family: Courier New, Courier, monospace;">px @ sym.array.123</span> and <span style="font-family: Courier New, Courier, monospace;">ps @ str.giants</span> to get the hex and ASCII representations, respectively.<br />
<br />
Not surprisingly <span style="font-family: Courier New, Courier, monospace;">str.giants</span> contains the string 'giants' and the content of <span style="font-family: Courier New, Courier, monospace;">sym.array.123</span> is listed below:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFHKi1NRGSAC02KncBZJ_k1BCbDttXhujSFgeBIo_rFehpekeTZFYawqjKDgLVaC1HgWtEJ9QmhyphenhyphentjYfCYX8nrFtsg8P0oeWQL1-2u4ojCbZ9oLgsH5NFpJqc6ChM8Z3jp3iTAPy1htKIZ/s1600/sym.array.123.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFHKi1NRGSAC02KncBZJ_k1BCbDttXhujSFgeBIo_rFehpekeTZFYawqjKDgLVaC1HgWtEJ9QmhyphenhyphentjYfCYX8nrFtsg8P0oeWQL1-2u4ojCbZ9oLgsH5NFpJqc6ChM8Z3jp3iTAPy1htKIZ/s1600/sym.array.123.png" height="190" width="320" /></a></div>
<br />
<div>
Alright, now that we've got some context lets continue with the code.</div>
<br />
<div class="highlight">
<pre> <span class="nf">lea</span> <span class="nb">ecx</span><span class="p">,</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x8</span><span class="p">]</span> <span class="c1">; load an empty local array</span>
<span class="nf">mov</span> <span class="nb">esi</span><span class="p">,</span> <span class="nv">sym.array.123</span> <span class="c1">; set a pointer to the first element of the memory above</span>
<span class="nf">mov</span> <span class="nb">al</span><span class="p">,</span> <span class="p">[</span><span class="nb">edx</span><span class="o">+</span><span class="nb">ebx</span><span class="p">]</span> <span class="c1">; target of the jump below</span>
<span class="nf">and</span> <span class="nb">al</span><span class="p">,</span> <span class="mh">0xf</span>
<span class="nf">movsx</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">al</span>
<span class="nf">mov</span> <span class="nb">al</span><span class="p">,</span> <span class="p">[</span><span class="nb">eax</span><span class="o">+</span><span class="nb">esi</span><span class="p">]</span>
<span class="nf">mov</span> <span class="p">[</span><span class="nb">edx</span><span class="o">+</span><span class="nb">ecx</span><span class="p">],</span> <span class="nb">al</span>
<span class="nf">inc</span> <span class="nb">edx</span>
<span class="nf">cmp</span> <span class="nb">edx</span><span class="p">,</span> <span class="mh">0x5</span>
<span class="nf">jle</span> <span class="mh">0x8048d57</span>
</pre>
</div>
<br />
After loading the address of a local array the code enters a loop from 0 to 5 (for the six characters of our input). The body of that loop does the following:<br />
<br />
Selects the nth byte from the user input string, masks off the bottom 4 bits, and then uses that as an index into <span style="font-family: Courier New, Courier, monospace;">sym.array.123</span>. The byte at that index is then copied to the local array.<br />
<br />
<div class="highlight">
<pre> <span class="nf">mov</span> <span class="nb">al</span><span class="p">,</span> <span class="p">[</span><span class="nb">edx</span><span class="o">+</span><span class="nb">ebx</span><span class="p">]</span>
<span class="nf">and</span> <span class="nb">al</span><span class="p">,</span> <span class="mh">0xf</span>
<span class="nf">movsx</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">al</span>
<span class="nf">mov</span> <span class="nb">al</span><span class="p">,</span> <span class="p">[</span><span class="nb">eax</span><span class="o">+</span><span class="nb">esi</span><span class="p">]</span>
<span class="nf">mov</span> <span class="p">[</span><span class="nb">edx</span><span class="o">+</span><span class="nb">ecx</span><span class="p">],</span> <span class="nb">al</span>
</pre>
</div>
<br />
In C, that might look similar to<br />
<br />
<div class="highlight">
<pre><span class="kt">char</span> <span class="n">array123</span><span class="p">[]</span> <span class="o">=</span> <span class="s">"isrveawhobpnutfg"</span><span class="p">,</span> <span class="n">local</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">},</span> <span class="o">*</span><span class="n">input</span> <span class="o">=</span> <span class="p">...;</span>
<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">6</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="n">local</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">array123</span><span class="p">[</span><span class="n">input</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">&</span> <span class="mh">0xf</span><span class="p">];</span>
</pre>
</div>
<br />
<br />
After the loop the local array is null terminated and compared against <span style="font-family: Courier New, Courier, monospace;">str.giants</span>; matching strings avoids triggering the bomb. Now all we need is to determine what indices from <span style="font-family: Courier New, Courier, monospace;">sym.array.123</span> yield the string 'giants.'<br />
<br />
Recall the memory stored in <span style="font-family: Courier New, Courier, monospace;">sym.array.123</span> - <span style="font-family: Courier New, Courier, monospace;">isrveawhobpnutfg</span>. The necessary index sequence then becomes: <span style="font-family: Courier New, Courier, monospace;">0xf</span>, <span style="font-family: Courier New, Courier, monospace;">0x0</span>, <span style="font-family: Courier New, Courier, monospace;">0x5</span>, <span style="font-family: Courier New, Courier, monospace;">0xb</span>, <span style="font-family: Courier New, Courier, monospace;">0xd</span>, <span style="font-family: Courier New, Courier, monospace;">0x1</span>. Since our ASCII input is masked we need to find ASCII strings with lower-order bits matching these values. I list the valid combinations (for printable ASCII) below:<br />
<br />
<pre>0xf : / ? O _ o
0x0 : 0 @ P ` p
0x5 : % 5 E U e u
0xb : + ; K [ k {
0xd : - = M ] m }
0x1 : ! 1 A Q a q
</pre>
<br />
Any combination of those values should be a valid input to solve this stage. Let's try one: 'opekma'<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmjkb0K0mVEAabWv1-N2TZa7eOMuteVQ9d1w5rrdcX9iqTzasI5_yKLu-M2cpLg_siihRUrCAhzZb3XW0JCV8fOKnmPZPh-YpX_KgLI92eArbJ5rkCZTh8s4OcylnhuVSoqekonOS7pY3s/s1600/solution_5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmjkb0K0mVEAabWv1-N2TZa7eOMuteVQ9d1w5rrdcX9iqTzasI5_yKLu-M2cpLg_siihRUrCAhzZb3XW0JCV8fOKnmPZPh-YpX_KgLI92eArbJ5rkCZTh8s4OcylnhuVSoqekonOS7pY3s/s1600/solution_5.png" height="147" width="320" /></a></div>
<br />
Sweet, almost there. Next up is <a href="http://uu-kk.blogspot.com/2014/06/of-binary-bombs-part-6.html" target="_blank">phase 6</a> the [supposed] last stage...ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-65067139372494555882014-05-26T22:32:00.000-04:002014-06-03T23:35:04.604-04:00Of Binary Bombs (part 4)<a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-3.html" target="_blank">Part 3</a> of this series explored <span style="font-family: Courier New, Courier, monospace;">phase_3</span> of this riddle which marked the first phase that accepted more than a single correct input sequence. Here I'll continue with the next phase: <span style="font-family: Courier New, Courier, monospace;">sym.phase_4</span>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEt6ndg4FJAVUijZbQEuFa1OXVVad5-qU-a372ITUZHS1yDKA88XejHJLAL5YuuaHHITNNQvNhKfMnKY4w53WzaPocjPyGSnpzARUshq8W2FUX1Sniz3Pr2du9fmMknssJjhHiiVMoIIJI/s1600/phase_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjEt6ndg4FJAVUijZbQEuFa1OXVVad5-qU-a372ITUZHS1yDKA88XejHJLAL5YuuaHHITNNQvNhKfMnKY4w53WzaPocjPyGSnpzARUshq8W2FUX1Sniz3Pr2du9fmMknssJjhHiiVMoIIJI/s1600/phase_4.png" height="320" width="289" /></a></div>
<br />
At first, some arguments are loaded onto the stack to prep for a call to <span style="font-family: Courier New, Courier, monospace;">sscanf</span> but rather than an immediate string as the format string argument, an address is provided. Recall in in <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-2.html" target="_blank">phase_2</a> that we could get a hint of the input by printing <span style="font-family: Courier New, Courier, monospace;">str._d_d_d_d_d_d</span> (<span style="font-family: Courier New, Courier, monospace;">ps @ str._d_d_d_d_d_d</span>) - here we need to understand the arguments to <span style="font-family: Courier New, Courier, monospace;">sscanf</span> a little to know that the address <span style="font-family: Courier New, Courier, monospace;">0x8049808</span> should contain a format string. Indeed, it does (<span style="font-family: Courier New, Courier, monospace;">ps @ 0x8049808</span> yields <span style="font-family: Courier New, Courier, monospace;">%d</span>) - our input string needs to be a number. That number needs to be larger than 0.<br />
<br />
<div class="highlight">
<pre> <span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0x0</span>
<span class="nf">jg</span> <span class="mh">0x8048d0e</span>
</pre>
</div>
<br />
After the input is checked, its value is pushed to the stack and control is passed to <span style="font-family: Courier New, Courier, monospace;">sym.func4</span>.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHEnEu7AnBf13hPWSUhiCAOjvEDUr0Vr_VJufzrb3AXOn5yDo-SgRxEnlROxd5HtPev__MTcdNWeqWKuEuw4UV9GKVJeIM6h5GhClubWTdt9uFK1PWjp0iWKkscAFgh0GVI1TNXAr7ubEV/s1600/func4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHEnEu7AnBf13hPWSUhiCAOjvEDUr0Vr_VJufzrb3AXOn5yDo-SgRxEnlROxd5HtPev__MTcdNWeqWKuEuw4UV9GKVJeIM6h5GhClubWTdt9uFK1PWjp0iWKkscAFgh0GVI1TNXAr7ubEV/s1600/func4.png" height="320" width="266" /></a></div>
<br />
The second requirement of our input is now revealed:<br />
<br />
<div class="highlight">
<pre> <span class="nf">cmp</span> <span class="nb">ebx</span><span class="p">,</span> <span class="mh">0x1</span>
<span class="nf">jle</span> <span class="mh">0x8048cd0</span>
</pre>
</div>
<br />
For values larger than 1 <span style="font-family: Courier New, Courier, monospace;">sym.func4</span> is called recursively.<br />
<br />
<div class="highlight">
<pre> <span class="nf">lea</span> <span class="nb">eax</span><span class="p">,</span> <span class="p">[</span><span class="nb">ebx</span><span class="o">-</span><span class="mh">0x1</span><span class="p">]</span>
<span class="nf">push</span> <span class="nb">eax</span>
<span class="nf">call</span> <span class="nv">sym.func4</span>
</pre>
</div>
<br />
So the input value to <span style="font-family: Courier New, Courier, monospace;">sym.func4</span> (our input, initially) is decremented by 1 and passed to the recursive call. The result of that call is saved to <span style="font-family: Courier New, Courier, monospace;">esi</span> and then <span style="font-family: Courier New, Courier, monospace;">sym.func4</span> is called recursively yet again - this time after decrementing the input by 2.<br />
<br />
<div class="highlight">
<pre> <span class="nf">lea</span> <span class="nb">eax</span><span class="p">,</span> <span class="p">[</span><span class="nb">ebx</span><span class="o">-</span><span class="mh">0x2</span><span class="p">]</span>
<span class="nf">push</span> <span class="nb">eax</span>
<span class="nf">call</span> <span class="nv">sym.func4</span>
</pre>
</div>
<br />
The return value of the second recursion is added to the result of the first and that sum becomes the return value of this depth of the recursive call. In C, this looks something like:<br />
<br />
<div>
<div class="highlight">
<pre><span class="kt">int</span> <span class="nf">func4</span> <span class="p">(</span><span class="kt">int</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">n</span> <span class="o"><</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span>
<span class="k">return</span> <span class="n">func4</span> <span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">func4</span> <span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">2</span><span class="p">);</span>
<span class="p">}</span>
</pre>
<pre><span class="p">
</span></pre>
<pre><span class="p"></span></pre>
</div>
Back in <span style="font-family: Courier New, Courier, monospace;">sym.phase_4</span> there is the following check:</div>
<br />
<div class="highlight">
<pre> <span class="nf">cmp</span> <span class="nb">eax</span><span class="p">,</span> <span class="mh">0x37</span>
</pre>
</div>
<br />
The input, when fed to <span style="font-family: Courier New, Courier, monospace;">sym.func4</span>, should return 55. The recursion of <span style="font-family: Courier New, Courier, monospace;">n-1</span> + <span style="font-family: Courier New, Courier, monospace;">n-2</span> is one way to calculate the Fibonacci number at index <span style="font-family: Courier New, Courier, monospace;">n</span>. That, coupled with the fact that 55 is a Fibonacci number, allows us to derive the necessary input value of 9.
<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi15_AsQgBiGJczHdcketE5ybzeLTEjfm7KxBDBQR5BthIpBS_WWnOhhH2PrQUeLO-ouzD0boqo_Z59msA6ai8iRhXe4VAcC3wXLC-lJunGLySEiW6Q1641mlJ1hvFw-JqN5GyRcaXwtjVD/s1600/solution_4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi15_AsQgBiGJczHdcketE5ybzeLTEjfm7KxBDBQR5BthIpBS_WWnOhhH2PrQUeLO-ouzD0boqo_Z59msA6ai8iRhXe4VAcC3wXLC-lJunGLySEiW6Q1641mlJ1hvFw-JqN5GyRcaXwtjVD/s1600/solution_4.png" height="127" width="320" /></a></div>
<div>
<br />
Next up, <a href="http://uu-kk.blogspot.com/2014/06/of-binary-bombs-part-5.html" target="_blank">part 5</a>.</div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-41996400018185929092014-05-22T00:08:00.000-04:002014-05-26T22:33:36.665-04:00Of Binary Bombs (part 3)In <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-2.html" target="_blank">part 2</a> I explained both <span style="font-family: Courier New, Courier, monospace;">sym.read_line</span> and the solution to <span style="font-family: Courier New, Courier, monospace;">sym.phase_2</span>. Here I work through the third phase of Dr. Evil's nasty binary bomb.<br />
<br />
<br />
I'll assume to start directly at <span style="font-family: Courier New, Courier, monospace;">sym.phase_3</span> beyond the input handling routines previously discussed.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
</div>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnggxh8KNhwKJXgp29PbUaH6DUk8pRtB42QLyCfe2JNcZkYaDsYZcbH1ha70RX_Xb7mtOvlHSGHlL819x2gvz88FsGImq9qQfwjme8mdmxmN4cRdPTvyL2aAavWBp_YlJ5UTOcjRpqymXX/s1600/phase_3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnggxh8KNhwKJXgp29PbUaH6DUk8pRtB42QLyCfe2JNcZkYaDsYZcbH1ha70RX_Xb7mtOvlHSGHlL819x2gvz88FsGImq9qQfwjme8mdmxmN4cRdPTvyL2aAavWBp_YlJ5UTOcjRpqymXX/s1600/phase_3.png" height="320" width="214" /></a></div>
<br />
<div>
<div>
Phase 3 starts with a call to <span style="font-family: Courier New, Courier, monospace;">sscanf</span> with a format string of "<span style="font-family: Courier New, Courier, monospace;">%d %c %d</span>". So we should provide a number, character, and a number. The first number should be less than or equal to 7<br />
<br /></div>
</div>
<div class="highlight">
<pre> <span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0xc</span><span class="p">],</span> <span class="mh">0x7</span>
</pre>
</div>
<br />
and is used to calculate a jump address<br />
<br />
<div class="highlight">
<pre> <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0xc</span><span class="p">]</span>
<span class="nf">jmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">eax</span><span class="o">*</span><span class="mi">4</span><span class="o">+</span><span class="mh">0x80497e8</span><span class="p">]</span>
</pre>
</div>
<br />
If we start with <span style="font-family: Courier New, Courier, monospace;">eax</span> containing 0 this jumps us to a location stored at <span style="font-family: Courier New, Courier, monospace;">0x80497e8</span>. That address is <span style="font-family: Courier New, Courier, monospace;">0x08048be0</span> - or just over the next instruction.<br />
<br />
<div class="highlight">
<pre> <span class="nf">mov</span> <span class="nb">bl</span><span class="p">,</span> <span class="mh">0x71</span>
<span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0x309</span>
</pre>
</div>
<br />
This sets <span style="font-family: Courier New, Courier, monospace;">bl</span> to 0x71 (ASCII 'q') and compares the third input to 777. If the third input is 777 control jumps to<br />
<br />
<div class="highlight">
<pre> <span class="nf">cmp</span> <span class="nb">bl</span><span class="p">,</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x5</span><span class="p">]</span>
</pre>
</div>
<br />
So, to avoid the bomb we can provide the following input: 0 q 777 and we have a valid solution.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0JHbKP08rl5GW9oTMEfPjCvgQ2x6xR5dE-83O3VlrQnbAWZ82Su170_hjanYKWcn4CyLweayXDuH0t6vpS8eWoqxRkJMCUEvHzFc6mFJ4Q1d0YBMcofKuUIquU9EBA9gK3palJV0R4D4w/s1600/solution_3a.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0JHbKP08rl5GW9oTMEfPjCvgQ2x6xR5dE-83O3VlrQnbAWZ82Su170_hjanYKWcn4CyLweayXDuH0t6vpS8eWoqxRkJMCUEvHzFc6mFJ4Q1d0YBMcofKuUIquU9EBA9gK3palJV0R4D4w/s1600/solution_3a.png" height="102" width="320" /></a></div>
<br />
But what about setting <span style="font-family: Courier New, Courier, monospace;">eax</span> to something other than 0 to start? Let's look at the other possible jump addresses for values less than 8 but greater than 0 for the first input. I've abbreviated the code and commented what is different from the above description.<br />
<br />
<div class="highlight">
<pre><span class="c1">; eax == 1 - 0x08048c00</span>
<span class="nf">mov</span> <span class="nb">bl</span><span class="p">,</span> <span class="mh">0x62</span> <span class="c1">; ASCII 'b'</span>
<span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0xd6</span> <span class="c1">; 214</span>
<span class="c1">; eax == 2 - 0x08048c16</span>
<span class="nf">mov</span> <span class="nb">bl</span><span class="p">,</span> <span class="mh">0x62</span> <span class="c1">; ASCII 'b'</span>
<span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0x2f3</span> <span class="c1">; 755</span>
<span class="c1">; eax == 3 - 0x08048c28</span>
<span class="nf">mov</span> <span class="nb">bl</span><span class="p">,</span> <span class="mh">0x6b</span> <span class="c1">; ASCII 'k'</span>
<span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0xfb</span> <span class="c1">; 251</span>
<span class="c1">; eax == 4 - 0x08048c40</span>
<span class="nf">mov</span> <span class="nb">bl</span><span class="p">,</span> <span class="mh">0x6f</span> <span class="c1">; ASCII 'o'</span>
<span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0xa0</span> <span class="c1">; 160</span>
<span class="c1">; eax == 5 - 0x08048c52</span>
<span class="nf">mov</span> <span class="nb">bl</span><span class="p">,</span> <span class="mh">0x74</span> <span class="c1">; ASCII 't'</span>
<span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0x1ca</span> <span class="c1">; 458</span>
<span class="c1">; eax == 6 - 0x08048c64</span>
<span class="nf">mov</span> <span class="nb">bl</span><span class="p">,</span> <span class="mh">0x76</span> <span class="c1">; ASCII 'v'</span>
<span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0x30c</span> <span class="c1">; 780</span>
<span class="c1">; eax == 7 - 0x08048c76</span>
<span class="nf">mov</span> <span class="nb">bl</span><span class="p">,</span> <span class="mh">0x62</span> <span class="c1">; ASCII 'b'</span>
<span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x4</span><span class="p">],</span> <span class="mh">0x20c</span> <span class="c1">; 524</span>
</pre>
</div>
<br />
So it looks as if there are several answers to this part of the riddle. Let's verify at least one of the others work.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEYnXfurWvhxLBdb5bmY2gFZ87OmydneHfE4InfRaDNKPCOJIPqBqwjReedZT4G0jvS8FL1LNOSpNX2RZU4zih29E9xNGYBtL0QZlZirhMs6iYBdYPlw2x4m2luaPmWO7Ev_Vc-OMS_1ub/s1600/solution_3b.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEYnXfurWvhxLBdb5bmY2gFZ87OmydneHfE4InfRaDNKPCOJIPqBqwjReedZT4G0jvS8FL1LNOSpNX2RZU4zih29E9xNGYBtL0QZlZirhMs6iYBdYPlw2x4m2luaPmWO7Ev_Vc-OMS_1ub/s1600/solution_3b.png" height="103" width="320" /></a></div>
<br />
Next, <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-4.html" target="_blank">phase 4</a>.ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-54324382538533802202014-05-21T21:51:00.001-04:002014-05-22T00:09:01.059-04:00Of Binary Bombs (part 2)In <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-1.html" target="_blank">part 1</a> I worked out the setup of the binary bomb and walked through the solution to the first phase of the riddle. This post continues from there to examine phase 2.<br />
<br />
After defusing phase 1 the program reads the next line from the user (<span style="font-family: Courier New, Courier, monospace;">sym.read_line</span>) and calls <span style="font-family: Courier New, Courier, monospace;">sym.phase_2</span>. In my first post I glazed over the <span style="font-family: Courier New, Courier, monospace;">sym.read_line</span> with some hand-waving so I'd like to revisit that here with a little more attention. As a reminder, in radare2 you can search for a symbol while in visual mode with <span style="font-family: Courier New, Courier, monospace;">s</span> (<span style="font-family: Courier New, Courier, monospace;">s sym.read_line</span>).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0xYy-E6Y9GVrWCETX84cxBJBRhA6IF2xjLXgnb-lMQ0kRjoZT8gGr3cMLQElsFizaUmJnS3vAW5OuqmXa2z0Xd7jU1c5BZs4cIxlKblFLc5KPBypn60o6IoymuIZyoU0YEyrsOCkdZdkA/s1600/read_line.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0xYy-E6Y9GVrWCETX84cxBJBRhA6IF2xjLXgnb-lMQ0kRjoZT8gGr3cMLQElsFizaUmJnS3vAW5OuqmXa2z0Xd7jU1c5BZs4cIxlKblFLc5KPBypn60o6IoymuIZyoU0YEyrsOCkdZdkA/s1600/read_line.png" height="261" width="320" /></a></div>
<br />
The first thing that happens in <span style="font-family: Courier New, Courier, monospace;">sym.read_line</span> (after managing the pointers and local variables) is a call to <span style="font-family: Courier New, Courier, monospace;">sym.skip</span>. <span style="font-family: Courier New, Courier, monospace;">sym.skip</span> has a little more to it than just this but basically determines which input line is to be read by looking at the global variable <span style="font-family: Courier New, Courier, monospace;">sym.num_input_strings</span> and using that as an index into <span style="font-family: Courier New, Courier, monospace;">sym.input_strings </span>(a global holding each input string in an 80-byte char array). It returns the string read in <span style="font-family: Courier New, Courier, monospace;">eax</span>.<br />
<br />
Once a line is read from the input stream the following instructions check the return value of <span style="font-family: Courier New, Courier, monospace;">sym.skip</span>.<br />
<br />
<div class="highlight">
<pre> <span class="nf">test</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nf">jne</span> <span class="mh">0x804925f</span>
</pre>
</div>
<br />
Only if the return value of <span style="font-family: Courier New, Courier, monospace;">sym.skip</span> is zero (when null is returned) does execution not jump to 0x0804925f. I will cover later what happens in the null return case. For now, non-null returns follow this explanation.<br />
<br />
<div class="highlight">
<pre> <span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="p">[</span><span class="nv">sym.num_input_strings</span><span class="p">]</span>
</pre>
</div>
<br />
At this point, we have entered 2 strings but the global counter <span style="font-family: Courier New, Courier, monospace;">sym.num_input_strings</span> has only been updated once so contains the value 1.<br />
<br />
<div class="highlight">
<pre> <span class="nf">lea</span> <span class="nb">eax</span><span class="p">,</span> <span class="p">[</span><span class="nb">eax</span><span class="o">+</span><span class="nb">eax</span><span class="o">*</span><span class="mi">4</span><span class="p">]</span>
<span class="nf">shl</span> <span class="nb">eax</span><span class="p">,</span> <span class="mh">0x4</span>
<span class="nf">lea</span> <span class="nb">edi</span><span class="p">,</span> <span class="p">[</span><span class="nb">eax</span><span class="o">+</span><span class="nv">sym.input_strings</span><span class="p">]</span>
</pre>
</div>
<br />
The above sets <span style="font-family: Courier New, Courier, monospace;">eax</span> to <span style="font-family: Courier New, Courier, monospace;">eax * 5</span>, multiplies the result by 16, and loads the string at <span style="font-family: Courier New, Courier, monospace;">sym.input_strings+eax</span> into <span style="font-family: Courier New, Courier, monospace;">edi</span>. In this case (<span style="font-family: Courier New, Courier, monospace;">sym.num_input_strings</span> == 1) this yields the second index in the <span style="font-family: Courier New, Courier, monospace;">sym.input_strings</span> array (recall that each entry there is 80 bytes).<br />
<br />
The next set of instructions calculate the length of the input string by decrementing a counter for each non-null byte in the string. The calculation uses some shortcuts for efficiency so is not directly mapped to traditional C-like counting (google 'asm string length' for details).<br />
<br />
<div class="highlight">
<pre> <span class="nf">mov</span> <span class="nb">al</span><span class="p">,</span> <span class="mh">0x0</span>
<span class="nf">cld</span>
<span class="nf">mov</span> <span class="nb">ecx</span><span class="p">,</span> <span class="mh">0xffffffff</span>
<span class="nf">repne</span> <span class="nv">scasb</span>
<span class="nf">mov</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">ecx</span>
<span class="nf">not</span> <span class="nb">eax</span>
<span class="nf">lea</span> <span class="nb">edi</span><span class="p">,</span> <span class="p">[</span><span class="nb">eax</span><span class="o">-</span><span class="mh">0x1</span><span class="p">]</span>
</pre>
</div>
<br />
Our input strings must fit within the 80-byte array so checking that the string (less newline) is less than 0x4f (79) bytes is done next<br />
<br />
<div class="highlight">
<pre> <span class="nf">cmp</span> <span class="nb">edi</span><span class="p">,</span> <span class="mh">0x4f</span>
</pre>
</div>
<br />
Finally, the newline is replaced with a zero byte<br />
<br />
<div class="highlight">
<pre> <span class="nf">mov</span> <span class="kt">byte</span> <span class="p">[</span><span class="nb">edi</span><span class="o">+</span><span class="nb">eax</span><span class="o">+</span><span class="mh">0x804b67f</span><span class="p">],</span> <span class="mh">0x0</span>
</pre>
</div>
<br />
the input string is placed in <span style="font-family: Courier New, Courier, monospace;">eax</span> and the global <span style="font-family: Courier New, Courier, monospace;">sym.num_input_strings</span> is incremented by 1.<br />
<br />
At this point <span style="font-family: Courier New, Courier, monospace;">eax</span> contains the input string and I can resume with the second phase of the bomb.<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh73rLehlcSWzY3n6SobXItotLulNciUVgZVtu42SBa6wucIznkjfJUVDrwP51QTeKDKoMutPWu92agEiUlclzET2z4vVH0VUBWXqNr5sAdUoqcBiJh9RWS3Maz4DXm3wmNt7CETSrQYtYF/s1600/phase_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh73rLehlcSWzY3n6SobXItotLulNciUVgZVtu42SBa6wucIznkjfJUVDrwP51QTeKDKoMutPWu92agEiUlclzET2z4vVH0VUBWXqNr5sAdUoqcBiJh9RWS3Maz4DXm3wmNt7CETSrQYtYF/s1600/phase_2.png" height="320" width="285" /></a></div>
<div>
<div>
<br /></div>
<div>
After the prologue and setting aside 32 bytes of local space the input string is copied to <span style="font-family: Courier New, Courier, monospace;">edx</span> and the local buffer address is loaded into <span style="font-family: Courier New, Courier, monospace;">eax</span>. These two arguments are pushed onto the stack and <span style="font-family: Courier New, Courier, monospace;">sym.read_six_numbers</span> is invoked. To avoid making this post too long I'll skip over the detail of <span style="font-family: Courier New, Courier, monospace;">sym.read_six_numbers</span> and just mention that the numbers in the input string are read into the local array via <span style="font-family: Courier New, Courier, monospace;">sscanf</span> with a format string of "%d %d %d %d %d %d" (Hint #1). </div>
<div>
<br /></div>
<div>
The following check verifies that the first entry in the local array has the value 1 and is the second hint to solving this phase.<br />
<br /></div>
</div>
<div class="highlight">
<pre> <span class="nf">cmp</span> <span class="kt">dword</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x18</span><span class="p">],</span> <span class="mh">0x1</span>
</pre>
</div>
<br />
The next section of code is the meat of phase 2 and how we determine what numbers to include in our string.<br />
<br />
<div class="highlight">
<pre> <span class="nf">mov</span> <span class="nb">ebx</span><span class="p">,</span> <span class="mh">0x1</span>
<span class="nf">lea</span> <span class="nb">esi</span><span class="p">,</span> <span class="p">[</span><span class="nb">ebp</span><span class="o">-</span><span class="mh">0x18</span><span class="p">]</span>
<span class="nf">lea</span> <span class="nb">eax</span><span class="p">,</span> <span class="p">[</span><span class="nb">ebx</span><span class="o">+</span><span class="mh">0x1</span><span class="p">]</span> <span class="c1">; target of jump below (0x8048b76)</span>
<span class="nf">imul</span> <span class="nb">eax</span><span class="p">,</span> <span class="p">[</span><span class="nb">esi</span><span class="o">+</span><span class="nb">ebx</span><span class="o">*</span><span class="mi">4</span><span class="o">-</span><span class="mh">0x4</span><span class="p">]</span>
<span class="nf">cmp</span> <span class="p">[</span><span class="nb">esi</span><span class="o">+</span><span class="nb">ebx</span><span class="o">*</span><span class="mi">4</span><span class="p">],</span> <span class="nb">eax</span>
<span class="nf">je</span> <span class="mh">0x8048b88</span>
<span class="nf">call</span> <span class="nv">sym.explode_bomb</span>
<span class="nf">inc</span> <span class="nb">ebx</span> <span class="c1">; target of jump above (0x8048b88)</span>
<span class="nf">cmp</span> <span class="nb">ebx</span><span class="p">,</span> <span class="mh">0x5</span>
<span class="nf">jle</span> <span class="mh">0x8048b76</span>
</pre>
</div>
<br />
This is a loop from 1 to 5 using <span style="font-family: Courier New, Courier, monospace;">ebx</span> as the counter. The initialization sets <span style="font-family: Courier New, Courier, monospace;">ebx</span> to 1 and <span style="font-family: Courier New, Courier, monospace;">esi</span> to the beginning of the array of converted numbers (the local array). On each iteration of the loop <span style="font-family: Courier New, Courier, monospace;">eax</span> is set to <span style="font-family: Courier New, Courier, monospace;">ebx + 1</span> then multiplied by the nth entry of the local array (n is <span style="font-family: Courier New, Courier, monospace;">ebx - 1</span>). The result of that is compared to the next entry in the array. In C parlance this might look like:<br />
<br />
<div class="highlight">
<pre> <span class="kt">int</span> <span class="n">xs</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{...};</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">5</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="k">if</span> <span class="p">((</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">xs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">!=</span> <span class="n">xs</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">])</span>
<span class="n">FAIL</span><span class="p">();</span>
</pre>
</div>
<br />
If we know that <span style="font-family: Courier New, Courier, monospace;">xs[0]</span> must be 1 we can now use the above to calculate the remainder of the array.<br />
<br />
<div class="highlight">
<pre> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=></span> <span class="mi">2</span>
<span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="mi">3</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=></span> <span class="mi">6</span>
<span class="n">x</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=></span> <span class="mi">24</span>
<span class="p">...</span>
</pre>
</div>
<br />
Ultimately, our input requirements become: 1 2 6 24 120 720. This works as expected and phase 2 is now solved.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrcSMQmas-bM6XuJWYXBTk25SVPaDNXLxgO5UoKVGTCQcWB5Yw5i0VuvBkKupExq3yU_lSnxLbKwby4ta034veGNcwzePt_SuKewi8xJPDWucrn_FmlPxXCS8TYE7M35ofqVYgwzgZolEE/s1600/solution_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjrcSMQmas-bM6XuJWYXBTk25SVPaDNXLxgO5UoKVGTCQcWB5Yw5i0VuvBkKupExq3yU_lSnxLbKwby4ta034veGNcwzePt_SuKewi8xJPDWucrn_FmlPxXCS8TYE7M35ofqVYgwzgZolEE/s1600/solution_2.png" height="82" width="320" /></a></div>
<br />
On to <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-3.html" target="_blank">phase 3</a>.ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-49422913023428263842014-05-06T22:09:00.002-04:002014-05-21T21:52:30.725-04:00Of Binary Bombs (part 1)I was recently turned on to radare2 (http://www.radare.org/y/) and, in an attempt to use-it-to-learn-it, I am going to try and solve the sample binary bomb provided as an example lab for the CMU computer systems course. I'm aiming to make my next few posts an account of trying to solve that puzzle (and learn this new tool).<br />
<br />
Caveat lector: I've basically no experience with binary tools and assembly language so this will be deliberate and likely full of amateur mistakes.<br />
<br />
<br />
I expect to be able to analyze this as the program would be run so I'm staring at the main entry point and assuming that there is nothing tricky happening prior to that. Load up radare2 and enter visual mode (press '<span style="font-family: Courier New, Courier, monospace;">V</span>' at the command prompt) then press '<span style="font-family: Courier New, Courier, monospace;">p</span>' until the view is one that resembles assembly instructions. Once there search for the main entry point (at the command prompt: <span style="font-family: Courier New, Courier, monospace;">s main</span>). That will bring you to a screen that looks similar to:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix4Uf1TTXzRoViK_I2AyqSQ2hDVdFLid1iKNn07bqyAG6QbBDzEeR0z0mw3c_a6xf2G0pA18wCAICn3QsLn4i9ElcBiYyLdAqCvuvNdoA2Vnm4OYhkPNhq_ZI3K_5FakCQFbReJAdstn1V/s1600/main.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEix4Uf1TTXzRoViK_I2AyqSQ2hDVdFLid1iKNn07bqyAG6QbBDzEeR0z0mw3c_a6xf2G0pA18wCAICn3QsLn4i9ElcBiYyLdAqCvuvNdoA2Vnm4OYhkPNhq_ZI3K_5FakCQFbReJAdstn1V/s1600/main.png" height="246" width="320" /></a></div>
<br />
First thing to happen in this program is to check the arguments to main to determine whether to read from <span style="font-family: Courier New, Courier, monospace;">stdin</span> or from a file. This progresses from the function prologue: saving the base pointer (<span style="font-family: Courier New, Courier, monospace;">ebp</span>); copy the stack pointer (<span style="font-family: Courier New, Courier, monospace;">esp</span>); and reserve 20 bytes for local use (<span style="font-family: Courier New, Courier, monospace;">sub esp, 0x14</span>). The program then continues to look at the value of <span style="font-family: Courier New, Courier, monospace;">argc</span> and acts accordingly eventually assigning the input stream (<span style="font-family: Courier New, Courier, monospace;">stdin</span> or otherwise) to <span style="font-family: Courier New, Courier, monospace;">sym.infile</span>.<br />
<br />
Without error, control flow reaches address <span style="font-family: Courier New, Courier, monospace;">0x08048a30</span> (in my version) which is a call to <span style="font-family: Courier New, Courier, monospace;">sym.initialize_bomb</span>. Radare2 allows you to set marks much like you would in vim so if you are familiar with that, set a mark here with <span style="font-family: Courier New, Courier, monospace;">ma</span> and hit enter at this address to follow to <span style="font-family: Courier New, Courier, monospace;">sym.initialize_bomb</span>. Alternatively, at the command prompt, search for the function symbol with <span style="font-family: Courier New, Courier, monospace;">s sym.initialize_bomb</span>. In either case you should see the following:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBPEl4j4rX8jPTQFp5TpNnTsv-eqka2sNIVbJoHj37e8rgp2sam_ucVEKcHQQ149GoGCQaaTDPOVhQ1WHeHmsAb8efh33J64L2KkD5j47gO1F8i6-7LW47YyjnHuud2HGdsIcKwBiBPtP2/s1600/initialize_bomb.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjBPEl4j4rX8jPTQFp5TpNnTsv-eqka2sNIVbJoHj37e8rgp2sam_ucVEKcHQQ149GoGCQaaTDPOVhQ1WHeHmsAb8efh33J64L2KkD5j47gO1F8i6-7LW47YyjnHuud2HGdsIcKwBiBPtP2/s1600/initialize_bomb.png" height="129" width="320" /></a></div>
<br />
Here, the program simply sets up to handle <span style="font-family: Courier New, Courier, monospace;">SIGINT</span> with the function <span style="font-family: Courier New, Courier, monospace;">sig_handler</span> and returns. If you are using marks you can return to the previous address with <span style="font-family: Courier New, Courier, monospace;">'a</span>.<br />
<br />
The program continues with a couple of generic prompts, proceeds to read a line of input (which does some internal bookkeeping), and jumps to phase 1 (<span style="font-family: Courier New, Courier, monospace;">sys.phase_1</span>). Place a local mark (<span style="font-family: Courier New, Courier, monospace;">ma</span>) and jump to the first phase code: we are now ready to start the actual process of solving Dr. Evil's nasty riddle.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii6vfpx9agi7oRtdYQA-F8qAdPq7Pb2gaxklothjIK5Xe7pjPWczwognP83Fo-N6KPCZ7-1_YhfLJQqRjUSRH2i7cMYDG2DhVQMfG851fpsqPNNU5tK2Stu4U1a-0KoR3vvTOlIcPVc-Xd/s1600/phase_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEii6vfpx9agi7oRtdYQA-F8qAdPq7Pb2gaxklothjIK5Xe7pjPWczwognP83Fo-N6KPCZ7-1_YhfLJQqRjUSRH2i7cMYDG2DhVQMfG851fpsqPNNU5tK2Stu4U1a-0KoR3vvTOlIcPVc-Xd/s1600/phase_1.png" height="154" width="320" /></a></div>
<br />
Phase 1 is intended to be easy. The first line of input provided to the program is copied to <span style="font-family: Courier New, Courier, monospace;">eax</span>, and then two arguments are set up: a global string (at address <span style="font-family: Courier New, Courier, monospace;">0x080497c0</span>) and <span style="font-family: Courier New, Courier, monospace;">eax</span>. These arguments are sent to <span style="font-family: Courier New, Courier, monospace;">strings_not_equal</span> which returns 0 if the two arguments are the same length and contain the same strings. The check <span style="font-family: Courier New, Courier, monospace;">test eax, eax</span> (true when <span style="font-family: Courier New, Courier, monospace;">eax</span> is 0) avoids exploding the bomb so we want <span style="font-family: Courier New, Courier, monospace;">eax</span> (the first input string) to be the same as whatever is in <span style="font-family: Courier New, Courier, monospace;">0x080497c0</span>.<br />
<br />
The radare2 command to print memory as a string is <span style="font-family: Courier New, Courier, monospace;">ps</span>. Entering command mode (<span style="font-family: Courier New, Courier, monospace;">:</span>) and typing <span style="font-family: Courier New, Courier, monospace;">ps @ 0x080497c0</span> yields: 'Public speaking is very easy.' (the period is significant). Entering that provides:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgObMBwsUo9JOAW91SEB4qdXAhprHK96slM28qQIXnL7O_sPqqmMeH-UrVizdbkrSEj4YHWeGl9W91XyeHmp13774HB2kJPBOM_SZ_r3JV-LdzuqsJA7A_ZlOz1faC2_Rwm2Q5WVH5mrAbk/s1600/solution_1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgObMBwsUo9JOAW91SEB4qdXAhprHK96slM28qQIXnL7O_sPqqmMeH-UrVizdbkrSEj4YHWeGl9W91XyeHmp13774HB2kJPBOM_SZ_r3JV-LdzuqsJA7A_ZlOz1faC2_Rwm2Q5WVH5mrAbk/s1600/solution_1.png" height="57" width="320" /></a></div>
<br />
<div>
Next, I'll tackle <a href="http://uu-kk.blogspot.com/2014/05/of-binary-bombs-part-2.html" target="_blank">phase 2</a>.</div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-25376983401280495592014-03-29T22:27:00.003-04:002014-03-29T22:28:21.025-04:00Something's not writeThis is a recount of a recent high pressure debugging session I helped with. As with most difficult debugging sessions it ended up being educational and fun. I'm posting here mostly for posterity and to enlighten the future me.<br />
<br />
Two coworkers and I were called in to support a project (not our own) that was going to demo in two days but was experiencing a variety of system-halting problems. This was a system with which I had almost no experience and to complicate matters, since demo was so close, there was no chance we could take down the system to investigate. Our instructions were to engage the production system while the entire user base was attempting to squeeze in last minute testing. Time to go spelunking.<br />
<br />
The tasks were broken up between the three of us and my focus was a master-slave setup supporting system automation through a web interface. The machines were running FreeBSD with code written in a variety of languages including C, Python, Perl, and PHP all tied together and coordinated with databases and XML-RPC. The symptom I had to go on was that every once in a while when trying to create a data capture (a file that could ultimately be several Gbs) the resulting file would end up being 0 bytes on disk representing a complete loss of data for that day. There was also a message in the log file that appeared when the data capture failed: '<span style="font-family: Courier New, Courier, monospace;">write failed: Permission Denied</span>'. The strange thing was that this message would appear approximately half way through the transfer so by that time the file had to exist and would have already been written to.<br />
<br />
I started by looking at the source code for the offending process. The error was being reported as a result of a failed <span style="font-family: Courier New, Courier, monospace;">write</span> call. Strangely, 'Permission Denied' (<span style="font-family: Courier New, Courier, monospace;">EACCES</span>) is not listed as a valid error in the <span style="font-family: Courier New, Courier, monospace;">write</span> man pages. I wanted to avoid tearing apart the build system to recompile a single source file so I started investigating ways other than editing source to try and figure out why this was happening.<br />
<br />
I'm quite unfamiliar with FreeBSD but on Linux I would use something like <span style="font-family: Courier New, Courier, monospace;">strace</span> to try an establish a baseline of what is happening. Luckily, FreeBSD has <span style="font-family: Courier New, Courier, monospace;">truss</span> which provided close to the same functionality. Unfortunately, the process I wanted to examine was <span style="font-family: Courier New, Courier, monospace;">exec</span>'d from deep within some Perl code that kept failing when I tried to prefix the command with <span style="font-family: Courier New, Courier, monospace;">truss -o truss.out</span>. Since it was a long running process I ended up attaching to the running process with <span style="font-family: Courier New, Courier, monospace;">truss -o truss.out -p 12345</span>.<br />
<br />
Now I had a running process that I could monitor and all I needed was to trigger the effect - enter: the heisenbug. Several hours of creating large files without issue forced me to look elsewhere.<br />
<br />
I started collecting what information I could about the system and <span style="font-family: Courier New, Courier, monospace;">df</span> showed that the data capture file was being written to an NFS mounted location. Since nothing else was out of the ordinary I tried to understand why a write to an NFS file would fail. My first instinct was that there was an overload so I tried several parallel writes of substantial size without issues - no dice. Setting up a <span style="font-family: Courier New, Courier, monospace;">tcpdump</span> between the master and slave would have also been tough since all communication throughout the system (including NFS) was handled between these two machines.<br />
<br />
Since the process I was monitoring was writing to the master machine and the local code seemed fine I tried digging into why mounted locations would experience random interruptions. I ended up finding a thread on a FreeBSD mailing list that described this <a href="http://lists.freebsd.org/pipermail/freebsd-current/2012-July/034963.html" target="_blank">exact problem</a>. According to the dialog there, <span style="font-family: Courier New, Courier, monospace;">mountd</span> could cause this problem if it received a <span style="font-family: Courier New, Courier, monospace;">SIGHUP</span> while the transfer was taking place. I ran a test where I tried transferring a large file while sitting on the destination machine executing <span style="font-family: Courier New, Courier, monospace;">kill -s HUP `cat /var/run/mountd.pid`</span>. It took three attempts but I finally triggered the bug - it seems to be a race condition even when doing the very thing that will cause the error.<br />
<br />
The only thing left was to determine why <span style="font-family: Courier New, Courier, monospace;">mountd</span> might be receiving a <span style="font-family: Courier New, Courier, monospace;">SIGHUP</span>. This would happen when calling <span style="font-family: Courier New, Courier, monospace;">mount</span> by hand and, as it turns out, some of the tests in the system mount new locations as part of spinning up a test environment. So in the very rare case of creating a data capture while another test mounts a new location and that mount happens to be at the precice moment to interfere with network writes to NFS you get issues. In the general operation this might not ever occur; however, given the load on the system during prep for demo it became a more noticeable (and repeatable) problem.<br />
<br />
Considering the the mailing list thread was discussing the issue in FreeBSD 8.3 and this system is still seeing it in 9.1, the ultimate solution was to modify the C code to handle the <span style="font-family: Courier New, Courier, monospace;">EACESS</span> case on error from <span style="font-family: Courier New, Courier, monospace;">write</span>. As to why the man pages for <span style="font-family: Courier New, Courier, monospace;">write</span> do not list <span style="font-family: Courier New, Courier, monospace;">EACESS</span> as an error condition... they probably aren't allowed.ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-5991293092753393182014-02-27T00:29:00.001-05:002014-02-27T00:32:06.937-05:00Vertical HistogramIn the process of munging data for my current project I came across the need to compare (visually) the difference between two modes within the same dataset. I was using a simple scatterplot and setting the alpha in the hopes that the over-plotting would indicate which was the major mode. Unfortunately, the size of the data overwhelmed this approach.<br />
<br />
I only wanted to use a single image and it was important that I keep the scatterplot to show other features of the data. I started looking for a way to combine a histogram (rotated 90 degrees) with the scatterplot to help describe the density within the plot. A quick search for how to do this in R turned up empty so I decided to implement my own version of such a plot.<br />
<br />
Certainly, there are other ways to describe the features that I am trying to present here but in this particular case the following code worked out nicely. Hopefully it proves useful to others as well.<br />
<br />
<br />
<div class="highlight">
<pre>plot.vertical.hist <span class="o"><-</span> <span class="kr">function</span><span class="p">(</span>data<span class="p">,</span>breaks<span class="o">=</span><span class="m">500</span><span class="p">)</span> <span class="p">{</span>
agg <span class="o"><-</span> aggregate<span class="p">(</span>data<span class="p">$</span>Y<span class="p">,</span> by<span class="o">=</span>list<span class="p">(</span>xs<span class="o">=</span>data<span class="p">$</span>X<span class="p">),</span> FUN<span class="o">=</span>mean<span class="p">)</span>
hs <span class="o"><-</span> hist<span class="p">(</span>agg<span class="p">$</span>x <span class="o">/</span> <span class="m">10000</span><span class="p">,</span> breaks<span class="o">=</span>breaks<span class="p">,</span> plot<span class="o">=</span><span class="kc">FALSE</span><span class="p">)</span>
old.par <span class="o"><-</span> par<span class="p">(</span>no.readonly<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span>
mar.default <span class="o"><-</span> par<span class="p">(</span><span class="s">'mar'</span><span class="p">)</span>
mar.left <span class="o"><-</span> mar.default
mar.right <span class="o"><-</span> mar.default
mar.left<span class="p">[</span><span class="m">4</span><span class="p">]</span> <span class="o"><-</span> <span class="m">0</span>
mar.right<span class="p">[</span><span class="m">2</span><span class="p">]</span> <span class="o"><-</span> <span class="m">0</span>
<span class="c1"># Main plot </span>
par <span class="p">(</span>fig<span class="o">=</span>c<span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">0.8</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">1.0</span><span class="p">),</span> mar<span class="o">=</span>mar.left<span class="p">)</span>
plot <span class="p">(</span>agg<span class="p">$</span>xs<span class="p">,</span> agg<span class="p">$</span>x <span class="o">/</span> <span class="m">10000</span><span class="p">,</span>
xlab<span class="o">=</span><span class="s">"X"</span><span class="p">,</span> ylab<span class="o">=</span><span class="s">"Y"</span><span class="p">,</span>
main<span class="o">=</span><span class="s">"Vertical Histogram Side Plot"</span><span class="p">,</span>
pch<span class="o">=</span><span class="m">19</span><span class="p">,</span> col<span class="o">=</span>rgb<span class="p">(</span><span class="m">0.5</span><span class="p">,</span><span class="m">0.5</span><span class="p">,</span><span class="m">0.5</span><span class="p">,</span>alpha<span class="o">=</span><span class="m">0.5</span><span class="p">))</span>
grid <span class="p">()</span>
<span class="c1"># Vertical histogram of the same data</span>
par <span class="p">(</span>fig<span class="o">=</span>c<span class="p">(</span><span class="m">0.8</span><span class="p">,</span><span class="m">1.0</span><span class="p">,</span><span class="m">0.0</span><span class="p">,</span><span class="m">1.0</span><span class="p">),</span> mar<span class="o">=</span>mar.right<span class="p">,</span> new<span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span>
plot <span class="p">(</span><span class="kc">NA</span><span class="p">,</span> type<span class="o">=</span><span class="s">'n'</span><span class="p">,</span> axes<span class="o">=</span><span class="kc">FALSE</span><span class="p">,</span> yaxt<span class="o">=</span><span class="s">'n'</span><span class="p">,</span>
xlab<span class="o">=</span><span class="s">'Frequency'</span><span class="p">,</span> ylab<span class="o">=</span><span class="kc">NA</span><span class="p">,</span> main<span class="o">=</span><span class="kc">NA</span><span class="p">,</span>
xlim<span class="o">=</span>c<span class="p">(</span><span class="m">0</span><span class="p">,</span>max<span class="p">(</span>hs<span class="p">$</span>counts<span class="p">)),</span>
ylim<span class="o">=</span>c<span class="p">(</span><span class="m">1</span><span class="p">,</span>length<span class="p">(</span>hs<span class="p">$</span>counts<span class="p">)))</span>
axis <span class="p">(</span><span class="m">1</span><span class="p">)</span>
arrows<span class="p">(</span>rep<span class="p">(</span><span class="m">0</span><span class="p">,</span>length<span class="p">(</span>hs<span class="p">$</span>counts<span class="p">)),</span><span class="m">1</span>:length<span class="p">(</span>hs<span class="p">$</span>counts<span class="p">),</span>
hs<span class="p">$</span>counts<span class="p">,</span><span class="m">1</span>:length<span class="p">(</span>hs<span class="p">$</span>counts<span class="p">),</span>
length<span class="o">=</span><span class="m">0</span><span class="p">,</span>angle<span class="o">=</span><span class="m">0</span><span class="p">)</span>
par<span class="p">(</span>old.par<span class="p">)</span>
invisible <span class="p">()</span>
<span class="p">}</span>
</pre>
</div>
<br />
Results look similar to the following:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjza-t4xM6Fm6bUYA6rhyphenhyphenxQ_Z7bz1TK08puj8Tq9Y7TYffzERcoZUFQIzwFlouyRkypDjzvWPXu6xZXJImQdv2GSNM3dtgCfqAhy-JSwu30cJoOmu-vkbgMJO4yredA3jrBGpEjV0JrK9H7/s1600/vertHist.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjza-t4xM6Fm6bUYA6rhyphenhyphenxQ_Z7bz1TK08puj8Tq9Y7TYffzERcoZUFQIzwFlouyRkypDjzvWPXu6xZXJImQdv2GSNM3dtgCfqAhy-JSwu30cJoOmu-vkbgMJO4yredA3jrBGpEjV0JrK9H7/s1600/vertHist.png" height="236" width="320" /></a></div>
<br />
<br />
Initially, I experimented with <span style="font-family: Courier New, Courier, monospace;">rug</span> or <span style="font-family: Courier New, Courier, monospace;">barplot(..., horiz=TRUE)</span>. Unfortunately, <span style="font-family: Courier New, Courier, monospace;">rug</span> isn't available on the left or right side and would suffer from the same problem that the alpha settings did and I was unable to get the alignment worked out when using <span style="font-family: Courier New, Courier, monospace;">barplot</span>.<br />
<div>
<br /></div>
<br />ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-59485324183944260852014-01-24T23:14:00.005-05:002014-01-24T23:25:07.305-05:00Probably not the BestProbabilities are tricky - for me, at least. The fact that I can not use them like other numbers although they are sometimes represented as floating point numbers often trips me up.<br />
<br />
Case in point: I'm designing an approach to a packet forwarding problem that includes calculating the probability of a packet making it through a portion of a network. An example that helps describe the solution is in the following diagram:<br />
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7g-csdy6OSNirsrV1K-ejGk2T9lpmhpUn2A6yrV85ktDMQXaXvhVEiBV_s6Hh-2nV50h7qkysYyg3YLKQ55WmNwmseo5JzMEqmNDtGo52yUQZfS6ULcQWrR98JqsEUHExR15-TgFg0X0T/s1600/g1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi7g-csdy6OSNirsrV1K-ejGk2T9lpmhpUn2A6yrV85ktDMQXaXvhVEiBV_s6Hh-2nV50h7qkysYyg3YLKQ55WmNwmseo5JzMEqmNDtGo52yUQZfS6ULcQWrR98JqsEUHExR15-TgFg0X0T/s1600/g1.png" height="142" width="320" /></a></div>
<div>
<br /></div>
<div>
<br /></div>
<div>
<div>
Where \(I\) is the ingress node, \(E\) is the egress node and the other nodes are represented by the probability that a given packet will be forwarded (\(P_{f}\)) to \(E\) when received from \(I\).</div>
<div>
<br /></div>
<div>
For any given path \(I\) to \(E\) through a single node, the probability the packet will be delivered (\(P(I \Rightarrow E)\)) is equal to the probability that the intermediary node will forward the packet (ignoring all other details of the medium).</div>
<div>
<br /></div>
<div>
What I [incorrectly] claimed in my presentation of the solution was: if you use all nodes simultaneously and discard duplicates at \(E\), the probability of a packet successfully passing from \(I\) to \(E\) then becomes \(max(N)\). Where \(N\) is the set of nodes between \(I\) and \(E\). Considering the example above that equation results in \(P(I \Rightarrow E) = 0.82\). This is better than choosing the node with \(P_{f} = 0.21\) but it is <i>not</i> the true probability of a packet getting through. </div>
<div>
<br /></div>
<div>
This design made several rounds of internal review where this error was not noticed; perhaps there is some solace to be had in knowing I am not the only one who is tripped up by this.</div>
<div>
<br /></div>
<div>
When it finally came time to implement this I was discussing the solution with another engineer and he quickly pointed out that my calculation was incorrect. In fact, my estimation was much lower than the actual likelihood of getting a packet through. The true probability of <i>not</i> getting a packet through is the probability of all intermediate nodes dropping the packet.<br />
<br />
\[\prod_{i=0}^{n} (1 - P_{f_{n}})\]</div>
<div>
<br /></div>
<div>
The ultimate effect of my proposed approach is the probability of <i>not</i> having that happen or:</div>
<div>
<br /></div>
<div>
\[1 - (\prod_{i=0}^{n} (1 - P_{f_{n}}))\]</div>
<div>
<br /></div>
<div>
For this specific example that is:</div>
<div>
<br /></div>
<div>
1 - ((1 - 0.76) * (1 - 0.21) * (1 - 0.82) * (1 - 0.33))</div>
<div>
<br /></div>
<div>
or 0.977. A much higher value than my originally proposed 0.82.</div>
</div>
<div>
<br /></div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-12797023604751994902013-08-16T01:06:00.004-04:002014-02-25T23:22:28.025-05:00Visualizing Internet ActivityI've recently been playing with the idea of visualizing the network demand for various activities related to everyday Internet use. Most times the browser presents a view that abstracts this behind-the-scenes activity to provide a seamless experience to the user. Even so, it can be enlightening to peel back the layers just a bit to peek at what is really happening.<br />
<br />
In the images below I've tried to present this underlying activity in a way that is intuitive without drowning the reader with information. This is still a work in progress but I feel as though this is a decent start toward that goal.<br />
<br />
The graphs below represent network traffic related to three types of activity one might typically engage in while online: browsing facebook, watching a youtube video, and downloading a large file. Each 'bubble' represents a packet; radius is the relative packet size; individual connections are annotated with unique colors; and I add jitter in an attempt to alleviate over-plotting.<br />
<br />
<div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizEOO3FNAFeC-1ltLO3ZO1KB8P-rTTWhWaWCqnDGnDneBcFWxiQLJ8NV13ljavoHcyJUIrxTAEmpITgZV2d62Tv1gGJaaDT0cBXo3Twv_Rxvbd4-idFjFSvdfMi3v3n9x9EYkcVN9fCNxg/s1600/facebook.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEizEOO3FNAFeC-1ltLO3ZO1KB8P-rTTWhWaWCqnDGnDneBcFWxiQLJ8NV13ljavoHcyJUIrxTAEmpITgZV2d62Tv1gGJaaDT0cBXo3Twv_Rxvbd4-idFjFSvdfMi3v3n9x9EYkcVN9fCNxg/s400/facebook.png" height="173" width="400" /></a></div>
<div>
<br /></div>
<div>
The above is the graph for facebook. There is an initial burst of activity to load all the components when you first log in; as you scroll down the page more content is loaded to extend the page on-demand (these are the narrow vertical impulses throughout the graph); then, around time 250 there is a large amount of activity related to opening a photo album and browsing through the pictures. As can be seen by the transitioning colors there are a variety of individual connections used over the course of the facebook session.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhltWEgRF9ZGZpXDCvn7wJ3Y5BwXVnCweMxTmz_ZIpfua6c_YnnMqTZMWzuKNGBU1y2-9bQTEmlm7RRQVuSAkI-9Neg0VPMGQTqCzWrjlyym9SKG2znHAODCqEawXsEySWlt1N9pQnZ5bDc/s1600/youtube.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhltWEgRF9ZGZpXDCvn7wJ3Y5BwXVnCweMxTmz_ZIpfua6c_YnnMqTZMWzuKNGBU1y2-9bQTEmlm7RRQVuSAkI-9Neg0VPMGQTqCzWrjlyym9SKG2znHAODCqEawXsEySWlt1N9pQnZ5bDc/s400/youtube.png" height="173" width="400" /></a></div>
<div>
<br /></div>
<div>
This next graph is a view of what happens when watching a youtube video. Again, there is a quick burst to load the page and several connections are used to do this. Once the video is playing, however, there is only very periodic bursts of activity. In fact, if you observe the video progress bar as you watch the video you will notice that youtube front-loads a portion of the video immediately and then, as you start to need more content, requests the next section of the video.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjulUVGj0zbIOf-LbpmADji4A-_TNwU1NgEfPC28RkGt9JzJYv5H5GX7-gJYInhX2LsNj4tdjEhyAVsfbWmyYW272U82LCTzgrMGmjwwo3jTUS1BURqwsoKNPRjdB6s_670PNFmAhH07nCk/s1600/kernel.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjulUVGj0zbIOf-LbpmADji4A-_TNwU1NgEfPC28RkGt9JzJYv5H5GX7-gJYInhX2LsNj4tdjEhyAVsfbWmyYW272U82LCTzgrMGmjwwo3jTUS1BURqwsoKNPRjdB6s_670PNFmAhH07nCk/s400/kernel.png" height="173" width="400" /></a></div>
<div>
<br /></div>
<div>
Finally, this is a view of what a large file download requires: very few connections and a consistent, heavy flow of packets across the network.</div>
<div>
<br /></div>
<div>
From an end user point-of-view, the experience related to each of these activities is the same: open a browser, click a few buttons and receive content. The resources and demand to deliver this content, however, is entirely distinct from that experience.</div>
ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-90065285319883927622013-07-19T23:15:00.000-04:002013-07-19T23:15:45.786-04:00Column-Major ConfusionAs a programmer coming from a language like C I am used to understanding multidimensional arrays in the following way:<br />
<div class="highlight">
<pre><span class="kt">int</span> <span class="n">matrix</span><span class="p">[</span><span class="mi">3</span><span class="p">][</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span> <span class="p">},</span>
<span class="p">{</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span> <span class="p">},</span>
<span class="p">{</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">9</span> <span class="p">}</span>
<span class="p">};</span>
</pre>
</div>
<br />
Understandably, this is a little bit of a setback when trying to grok how a language such as R handles matrices. The example above is 'row-major' but R uses 'column-major' by default. Note that I'm not describing memory layout here - which coincidentally is the same as my description - I referring to how the matrices are presented to the programmer. To complete the example, here is how I would create the same matrix as above in R:<br />
<div class="highlight">
<pre><span class="o">></span> matrix<span class="p">(</span><span class="m">1</span>:<span class="m">9</span><span class="p">,</span>ncol<span class="o">=</span><span class="m">3</span><span class="p">)</span>
<span class="p">[,</span><span class="m">1</span><span class="p">]</span> <span class="p">[,</span><span class="m">2</span><span class="p">]</span> <span class="p">[,</span><span class="m">3</span><span class="p">]</span>
<span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">1</span> <span class="m">4</span> <span class="m">7</span>
<span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">2</span> <span class="m">5</span> <span class="m">8</span>
<span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">3</span> <span class="m">6</span> <span class="m">9</span></pre>
</div>
<br />
There is no problem with this per se. In fact, I'd imagine that R programmers that come to something like C may feel similar unease in the paradigm shift. I'm finding that this fact is glazed over in some texts offering an introduction to R. This happens in non-obvious ways. In particular, I found this example today:<br />
<div class="highlight">
<pre><span class="o">></span> matrix<span class="p">(</span>c<span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">),</span>ncol<span class="o">=</span><span class="m">3</span><span class="p">)</span>
<span class="p">[,</span><span class="m">1</span><span class="p">]</span> <span class="p">[,</span><span class="m">2</span><span class="p">]</span> <span class="p">[,</span><span class="m">3</span><span class="p">]</span>
<span class="p">[</span><span class="m">1</span><span class="p">,]</span> <span class="m">1</span> <span class="m">0</span> <span class="m">0</span>
<span class="p">[</span><span class="m">2</span><span class="p">,]</span> <span class="m">0</span> <span class="m">1</span> <span class="m">0</span>
<span class="p">[</span><span class="m">3</span><span class="p">,]</span> <span class="m">0</span> <span class="m">0</span> <span class="m">1</span>
</pre>
</div>
<br />
The identity matrix is a particularly bad choice in this regard as it gives no indication of the true layout being used. It is probably good for any tutorial using matrices to cover an obvious simple case first to set the stage before moving directly to something like the identity matrix (or any other symmetric matrix for that matter).ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-89738020320905758232013-07-13T13:13:00.000-04:002014-02-25T23:24:04.976-05:0010 points: 10 plotsAs an exercise in expanding my ability to display data I challenged myself to present 10 data points in 10 ways that were as distinct as possible. The idea was simple: use 10 random data points; minimize the axis and other ancillary information so as to focus on the data as much as possible; and try to minimize the overlap between each of the approaches.<br />
<br />
Initially, I expected this would be a trivial task - something that would take a single sitting and a little bit of thought. A few attempts later and I kept circling back on a few common ideas while considering just how many approaches I'd <i>not</i> considered. What exists below is a collection of the results of that exercise with explanation if necessary.<br />
<br />
1 - Standard Cartesian (scatterplot)<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvJcZtaCgmcJGh2i0CA1dV7L1T58SWoxIKSLaBKh9UnJDP0ZG7sQ6i5M0sw9I_wSMmaRddZRwJ6B0ow1-8dAV2UliRkvR1T0XzbL_vOF4y9NNzaTZ5IXhLIyUwmUz5Bu8iXJqP6BfYqKZc/s1600/plot1.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgvJcZtaCgmcJGh2i0CA1dV7L1T58SWoxIKSLaBKh9UnJDP0ZG7sQ6i5M0sw9I_wSMmaRddZRwJ6B0ow1-8dAV2UliRkvR1T0XzbL_vOF4y9NNzaTZ5IXhLIyUwmUz5Bu8iXJqP6BfYqKZc/s320/plot1.jpeg" height="188" width="320" /></a></div>
<br />
<br />
2 - Derivative Cartesian: uses labels instead of points to eliminate the need for tick marks on the x-axis.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQwsxYLvwORlXCkUO9vA9H8QPRXFOlux6WbIn2NJKPbAZ6XLIWyp5GOAC9mNW7Gb4zkb30znISTTkiRxE0LyajTImEAmRZLGTNHzE1t6vO6x9Jf9ovts4JTc7j9usURdHeMAxVB9MOMY_T/s1600/plot2.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQwsxYLvwORlXCkUO9vA9H8QPRXFOlux6WbIn2NJKPbAZ6XLIWyp5GOAC9mNW7Gb4zkb30znISTTkiRxE0LyajTImEAmRZLGTNHzE1t6vO6x9Jf9ovts4JTc7j9usURdHeMAxVB9MOMY_T/s320/plot2.jpeg" height="188" width="320" /></a></div>
<br />
3 - Impulses. Mixing the number and characters on the x-axis tick marks is questionable and could just as well have been labels at the top of each impulse<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmHEFuqV2mtQfdwA39yMRW4Zjx2wqdupwspdWNGFaWXXJUgHpvAcJsXyn-YWVHwFqr3bmuyNeldxkjUZLA1tH2Hhr2KAu8C4jzr1z0vi77pgamKdP_YELv54GnnJTcpJRgtd8e4U01oCRn/s1600/plot3.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmHEFuqV2mtQfdwA39yMRW4Zjx2wqdupwspdWNGFaWXXJUgHpvAcJsXyn-YWVHwFqr3bmuyNeldxkjUZLA1tH2Hhr2KAu8C4jzr1z0vi77pgamKdP_YELv54GnnJTcpJRgtd8e4U01oCRn/s320/plot3.jpeg" height="188" width="320" /></a></div>
<br />
4 - Sorted derivative Cartesian<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilGslIRdNfJdroAuc2W6OR4MVpce7-bPc-rP75_TLLa8D5F9EPqU68k60-H2Q1x9YtPBueHLRgNzddssalwhSH0DRRnhu1s2gDREa0aFm3ZJBA_l-HyBaDiLFJI3-2lU8fmuFC0TT__1jq/s1600/plot4.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilGslIRdNfJdroAuc2W6OR4MVpce7-bPc-rP75_TLLa8D5F9EPqU68k60-H2Q1x9YtPBueHLRgNzddssalwhSH0DRRnhu1s2gDREa0aFm3ZJBA_l-HyBaDiLFJI3-2lU8fmuFC0TT__1jq/s320/plot4.jpeg" height="188" width="320" /></a></div>
<br />
5 - Boxplot<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOackRlbbjVQOrkmCB4PbW9B2V2mX3PeVpHQ8l78xAmySYsQPRkReeSmcUCr4Y0f6DHABUGoUeExC4h7KQDJpC6XvD09yLPxO5KqYJ_7PPw8oxPJeoVa3-y-240-QHmyUidly2Ej8WKlie/s1600/plot5.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgOackRlbbjVQOrkmCB4PbW9B2V2mX3PeVpHQ8l78xAmySYsQPRkReeSmcUCr4Y0f6DHABUGoUeExC4h7KQDJpC6XvD09yLPxO5KqYJ_7PPw8oxPJeoVa3-y-240-QHmyUidly2Ej8WKlie/s320/plot5.jpeg" height="188" width="320" /></a></div>
<br />
6 - Barplot<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyzsqGmOzUKgHUJ549ohubod_S1-4ChLWnXBCuT-6Y8Ep6M-lIA1frgDIpXsa3DvLM2eoZJ4ABl7g9RrOxhW-oi3ESpL4OIrTar1qnkMN1SGTSw54LLtnMK7Wb8uoRoQSPbJATQYKkOqSz/s1600/plot6.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyzsqGmOzUKgHUJ549ohubod_S1-4ChLWnXBCuT-6Y8Ep6M-lIA1frgDIpXsa3DvLM2eoZJ4ABl7g9RrOxhW-oi3ESpL4OIrTar1qnkMN1SGTSw54LLtnMK7Wb8uoRoQSPbJATQYKkOqSz/s320/plot6.jpeg" height="188" width="320" /></a></div>
<br />
7 - Radial. Points are interpreted as radians and placed starting from 0 radians<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKVMF2RiKKD3Z7-zj2A_JMvY3onCKAlJnELw0_UB4AstrXckpsNcziGfJTZqNl1eVEImQozPtiCkD3c8g6i0lr_XRiwL5x634zN8EKgc_doSS5K6Ge7_zdsnV8dY20ssDYCwKsPWYKw6xx/s1600/plot7.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKVMF2RiKKD3Z7-zj2A_JMvY3onCKAlJnELw0_UB4AstrXckpsNcziGfJTZqNl1eVEImQozPtiCkD3c8g6i0lr_XRiwL5x634zN8EKgc_doSS5K6Ge7_zdsnV8dY20ssDYCwKsPWYKw6xx/s320/plot7.jpeg" height="188" width="320" /></a></div>
<br />
8 - Heatmap<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEif1kS4Ao3ZjLopWByqLDgj6KUBGxbTmUopdlPLmyezrbWgwVLXu1ubE4q6maGG_mvt1TWsbP-mNDiylfGYNrQ26P5pfakSMGQZszSguiZqjvefZfL5FaFqGRhs0rmK3PAyAL6vSEXL0ihk/s1600/plot8.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEif1kS4Ao3ZjLopWByqLDgj6KUBGxbTmUopdlPLmyezrbWgwVLXu1ubE4q6maGG_mvt1TWsbP-mNDiylfGYNrQ26P5pfakSMGQZszSguiZqjvefZfL5FaFqGRhs0rmK3PAyAL6vSEXL0ihk/s320/plot8.jpeg" height="188" width="320" /></a></div>
<br />
9 - Cumulative Sum<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijZfTeqqXhG93DACp8p1cCIXG9wPM3ZpBcxfXh3GFNmx6ttECb5AfBA_mmGNCVv93YXQ8caUhdGfGHEzlf7Ih8wi3ElEdrO4p1uwfJ33QOaVjKUs74vI5kEXlDMa1zf13JrksWyzGAu3Q_/s1600/plot9.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEijZfTeqqXhG93DACp8p1cCIXG9wPM3ZpBcxfXh3GFNmx6ttECb5AfBA_mmGNCVv93YXQ8caUhdGfGHEzlf7Ih8wi3ElEdrO4p1uwfJ33QOaVjKUs74vI5kEXlDMa1zf13JrksWyzGAu3Q_/s320/plot9.jpeg" height="188" width="320" /></a></div>
<br />
10 - Financial/Intensity: Positive values are blue, negative are red. Absolute values define the radius of the circle used.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyIkBjCbKL8uTgXHmVvqNFaWx_lXgwXRJb-MHZ5UbUDLFDaKxif73brJCMhoDhNLAAB_6Bkgd_At6KUyB9LMaFfuGUlWnyaW39yEvtaYW9Vt2GMvDhUOuiCChlMEm9kx3Dd12J3kosPXOp/s1600/plot10.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiyIkBjCbKL8uTgXHmVvqNFaWx_lXgwXRJb-MHZ5UbUDLFDaKxif73brJCMhoDhNLAAB_6Bkgd_At6KUyB9LMaFfuGUlWnyaW39yEvtaYW9Vt2GMvDhUOuiCChlMEm9kx3Dd12J3kosPXOp/s320/plot10.jpeg" height="188" width="320" /></a></div>
<br />
<br />
I considered others such as LOESS fit but they either needed the points to accompany them (to show what was being fitted) which made them too close to the Cartesian plot, or they were too complex for just 10 points.<br />
<br />
It was interesting to see how difficult it turned out to be to stretch 10 points into 10 distinct presentation approaches.ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0tag:blogger.com,1999:blog-2191364361395963546.post-87799918035700481332013-06-27T22:23:00.002-04:002013-06-27T22:23:28.603-04:00Walk this wayI recently found a handy mechanism for walking a directory tree in Linux. In<br />
general, the way I used to do this was to use facilities found in <span style="font-family: Courier New, Courier, monospace;">dirent.h</span> and<br />
write my own recursive directory walker. Something similar to:<br />
<br />
<div class="highlight">
<pre><span class="cp">#include <stdio.h></span>
<span class="cp">#include <string.h></span>
<span class="cp">#include <dirent.h></span>
<span class="kt">void</span> <span class="nf">reclist</span> <span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">dirname</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">DIR</span><span class="o">*</span> <span class="n">dir</span> <span class="o">=</span> <span class="n">opendir</span> <span class="p">(</span><span class="n">dirname</span><span class="p">);</span>
<span class="k">struct</span> <span class="n">dirent</span><span class="o">*</span> <span class="n">entry</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">name</span><span class="p">[</span><span class="mi">1024</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span> <span class="n">dir</span><span class="p">)</span> <span class="p">{</span> <span class="k">return</span><span class="p">;</span> <span class="p">}</span>
<span class="n">entry</span> <span class="o">=</span> <span class="n">readdir</span> <span class="p">(</span><span class="n">dir</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">entry</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strncmp</span> <span class="p">(</span><span class="n">entry</span><span class="o">-></span><span class="n">d_name</span><span class="p">,</span> <span class="s">"."</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <span class="p">{</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">entry</span><span class="o">-></span><span class="n">d_type</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">DT_REG</span>:
<span class="n">printf</span> <span class="p">(</span><span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">entry</span><span class="o">-></span><span class="n">d_name</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="n">DT_DIR</span>:
<span class="n">snprintf</span> <span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="s">"%s/%s"</span><span class="p">,</span> <span class="n">dirname</span><span class="p">,</span> <span class="n">entry</span><span class="o">-></span><span class="n">d_name</span><span class="p">);</span>
<span class="n">reclist</span> <span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">entry</span> <span class="o">=</span> <span class="n">readdir</span> <span class="p">(</span><span class="n">dir</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">closedir</span> <span class="p">(</span><span class="n">dir</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">char</span> <span class="o">*</span> <span class="n">dir</span> <span class="o">=</span> <span class="s">"."</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">argc</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span> <span class="n">dir</span> <span class="o">=</span> <span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span> <span class="p">}</span>
<span class="n">reclist</span> <span class="p">(</span><span class="n">dir</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre>
</div>
<br />
While that does work, it is rather verbose (especially once you get used to<br />
environments like Ruby and Python). It turns out that <span style="font-family: Courier New, Courier, monospace;">ftw.h</span> provides a more<br />
concise way to do the above while managing all the little details like<br />
avoiding '.' and '..' and managing the current path string. Here is what that<br />
looks like to do the same as the above:<br />
<br />
<div class="highlight">
<pre><span class="cp">#include <stdio.h></span>
<span class="cp">#include <ftw.h></span>
<span class="kt">int</span> <span class="nf">handle_entry</span> <span class="p">(</span><span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">entry</span><span class="p">,</span> <span class="k">const</span> <span class="k">struct</span> <span class="n">stat</span> <span class="o">*</span><span class="n">sb</span><span class="p">,</span> <span class="kt">int</span> <span class="n">type</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">type</span> <span class="o">==</span> <span class="n">FTW_F</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">entry</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">ftw</span><span class="p">(</span><span class="s">"."</span><span class="p">,</span> <span class="n">handle_entry</span><span class="p">,</span> <span class="mi">10</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre>
</div>
<br />
I also like the fact that a callback is used to operate on each of the files<br />
found. It makes managing changes much easier as the tree walking is separated<br />
from the code that handles the logic associated with inspecting the files.ezpzhttp://www.blogger.com/profile/01475146505316478870noreply@blogger.com0