Friday, January 24, 2014

Probably not the Best

Probabilities are tricky - for me, at least. The fact that I can not use them like other numbers although they are sometimes represented as floating point numbers often trips me up.

Case in point: I'm designing an approach to a packet forwarding problem that includes calculating the probability of a packet making it through a portion of a network. An example that helps describe the solution is in the following diagram:

Where \(I\) is the ingress node, \(E\) is the egress node and the other nodes are represented by the probability that a given packet will be forwarded (\(P_{f}\)) to \(E\) when received from \(I\).

For any given path \(I\) to \(E\) through a single node, the probability the packet will be delivered (\(P(I \Rightarrow E)\)) is equal to the probability that the intermediary node will forward the packet (ignoring all other details of the medium).

What I [incorrectly] claimed in my presentation of the solution was: if you use all nodes simultaneously and discard duplicates at \(E\), the probability of a packet successfully passing from \(I\) to \(E\) then becomes \(max(N)\). Where \(N\) is the set of nodes between \(I\) and \(E\). Considering the example above that equation results in \(P(I \Rightarrow E) = 0.82\). This is better than choosing the node with \(P_{f} = 0.21\) but it is not the true probability of a packet getting through. 

This design made several rounds of internal review where this error was not noticed; perhaps there is some solace to be had in knowing I am not the only one who is tripped up by this.

When it finally came time to implement this I was discussing the solution with another engineer and he quickly pointed out that my calculation was incorrect. In fact, my estimation was much lower than the actual likelihood of getting a packet through. The true probability of not getting a packet through is the probability of all intermediate nodes dropping the packet.

\[\prod_{i=0}^{n} (1 - P_{f_{n}})\]

The ultimate effect of my proposed approach is the probability of not having that happen or:

\[1 - (\prod_{i=0}^{n} (1 - P_{f_{n}}))\]

For this specific example that is:

1 - ((1 - 0.76) * (1 - 0.21) * (1 - 0.82) * (1 - 0.33))

or 0.977. A much higher value than my originally proposed 0.82.