Usually it's bumpy, but the more scans you perform in a point the more smooth distribution you get. According to the Central Limit Theorem the distribution tends to be Normal (Gaussian), so one can estimate its mean and standard deviation. This is where the word "synthetic" in the algorithm name comes from: it implies that we can build a synthetic (near to real, yet artificial) probability distribution (mass) function from acquired WiFi data.
Of course, this is just a probability of getting specific signal level of a hotspot S in a point X. We need to combine all such probabilities somehow in order to calculate current user location. And here is the Bayes theorem comes in handy.
P(A | B) * P(B) = P(B | A) * P(A), where
- P(A) and P(B) are the probabilities (priors) of A and B independent of each other;
- P(A | B) is the conditional probability of A given that B is true;
- P(B | A) is the conditional probability of B given that A is true;
P(Asi | Bxj) * P(Bxj) = P(Bxj | Asi) * P(Asi), where
suffixes 'si' and 'xj' stand for i-th hotspot and j-th point respectively, so:
- P(Asi) is the prior probability of i-th hotspot;
- P(Asi | Bxj) is the conditional probability of observing i-th hotspot in j-th point;
- P(Bxj) is the prior probability of j-th point;
- P(Bxj | Asi) is the conditional probability of being in j-th point provided that i-th hotspot signal is detected;
P(Bxj | Asi) = P(Asi | Bxj) * P(Bxj) / P(Asi)
In the right side of the equation, P(Asi) can be estimated as a percentage of points where the hotspot Si was detected. P(Bxj) is initially 1/N for every point, where N is the total number of points, but later it can be refined by applying our knowledge about current user location and reducing N to a reasonable neighbourhood. At last P(Asi | Bxj) is the probability obtained from the synthesized probability distribution of signals described above.
Having P(Bxj | Asi) for different i-s in every point j, we should combine them by hotspots (by i) in order to get the total probability of point Xj. According to the formula of mutually non-exclusive events:
P(Xj) = P(Σi(Bxj | Asi)) = 1 - Πi(1 - P(Bxj | Asi))
The point with largest value (or an average of K points with largerst values) is most likely the current location of the user.
This is a simplified description of the "synthetic probability" algorithm. It does not cover many nuances such as signal level calibration and in-motion scanning but gives you an idea of how WiFiLine works behind the scenes.