Monday, March 16, 2015

WiFiLine's synthetic probability algorithm revealed

As I promised, I'm posting a short explanation of the built-in "synthetic probability" location algorithm. To start with it let us consider how WiFi data looks like. When you scan a building you get a list of geographical points with WiFi measurements, each of them contains signals from several hotspots, and every hotspot signal is normally detected many times demonstrating varying levels - so called RSSI (Received Signal Strength Indication). All this data is stored in the map, and can be viewed - just switch to the "Scan and Edit" mode, open the list of points and expand any of them - you'll see the names of hotspots detected in this point. Tap a hotspot to open a dialog presenting this spot's RSSI distribution.

Usually it's bumpy, but the more scans you perform in a point the more smooth distribution you get. According to the Central Limit Theorem the distribution tends to be Normal (Gaussian), so one can estimate its mean and standard deviation. This is where the word "synthetic" in the algorithm name comes from: it implies that we can build a synthetic (near to real, yet artificial) probability distribution (mass) function from acquired WiFi data.

Of course, this is just a probability of getting specific signal level of a hotspot S in a point X. We need to combine all such probabilities somehow in order to calculate current user location. And here is the Bayes theorem comes in handy.

P(A | B) * P(B) = P(B | A) * P(A), where

  • P(A) and P(B) are the probabilities (priors) of A and B independent of each other;
  • P(A | B) is the conditional probability of A given that B is true;
  • P(B | A) is the conditional probability of B given that A is true;
Now let us add some suffixes to bring it into our context:

P(Asi | Bxj) * P(Bxj) = P(Bxj | Asi) * P(Asi), where

suffixes 'si' and 'xj' stand for i-th hotspot and j-th point respectively, so:

  • P(Asi) is the prior probability of i-th hotspot;
  • P(Asi | Bxj) is the conditional probability of observing i-th hotspot in j-th point;
  • P(Bxj) is the prior probability of j-th point;
  • P(Bxj | Asi) is the conditional probability of being in j-th point provided that i-th hotspot signal is detected;
Apparently, it's the last probability which we are after:

P(Bxj | Asi) = P(Asi | Bxj) * P(Bxj) / P(Asi)

In the right side of the equation, P(Asi) can be estimated as a percentage of points where the hotspot Si was detected. P(Bxj) is initially 1/N for every point, where N is the total number of points, but later it can be refined by applying our knowledge about current user location and reducing N to a reasonable neighbourhood. At last P(Asi | Bxj) is the probability obtained from the synthesized probability distribution of signals described above.

Having P(Bxj | Asi) for different i-s in every point j, we should combine them by hotspots (by i) in order to get the total probability of point Xj. According to the formula of mutually non-exclusive events:

P(Xj) = P(Σi(Bxj | Asi)) = 1 - Πi(1 - P(Bxj | Asi))

The point with largest value (or an average of K points with largerst values) is most likely the current location of the user.

This is a simplified description of the "synthetic probability" algorithm. It does not cover many nuances such as signal level calibration and in-motion scanning but gives you an idea of how WiFiLine works behind the scenes.

No comments:

Post a Comment