## Friday, January 24, 2014

### Probably not the Best

Probabilities are tricky - for me, at least. The fact that I can not use them like other numbers although they are sometimes represented as floating point numbers often trips me up.

Case in point: I'm designing an approach to a packet forwarding problem that includes calculating the probability of a packet making it through a portion of a network. An example that helps describe the solution is in the following diagram:

Where $$I$$ is the ingress node, $$E$$ is the egress node and the other nodes are represented by the probability that a given packet will be forwarded ($$P_{f}$$) to $$E$$ when received from $$I$$.

For any given path $$I$$ to $$E$$ through a single node, the probability the packet will be delivered ($$P(I \Rightarrow E)$$) is equal to the probability that the intermediary node will forward the packet (ignoring all other details of the medium).

What I [incorrectly] claimed in my presentation of the solution was: if you use all nodes simultaneously and discard duplicates at $$E$$, the probability of a packet successfully passing from $$I$$ to $$E$$ then becomes $$max(N)$$. Where $$N$$ is the set of nodes between $$I$$ and $$E$$. Considering the example above that equation results in $$P(I \Rightarrow E) = 0.82$$. This is better than choosing the node with $$P_{f} = 0.21$$ but it is not the true probability of a packet getting through.

This design made several rounds of internal review where this error was not noticed; perhaps there is some solace to be had in knowing I am not the only one who is tripped up by this.

When it finally came time to implement this I was discussing the solution with another engineer and he quickly pointed out that my calculation was incorrect. In fact, my estimation was much lower than the actual likelihood of getting a packet through. The true probability of not getting a packet through is the probability of all intermediate nodes dropping the packet.

$\prod_{i=0}^{n} (1 - P_{f_{n}})$

The ultimate effect of my proposed approach is the probability of not having that happen or:

$1 - (\prod_{i=0}^{n} (1 - P_{f_{n}}))$

For this specific example that is:

1 - ((1 - 0.76) * (1 - 0.21) * (1 - 0.82) * (1 - 0.33))

or 0.977. A much higher value than my originally proposed 0.82.