Monday, March 16, 2015

WiFiLine's synthetic probability algorithm revealed

As I promised, I'm posting a short explanation of the built-in "synthetic probability" location algorithm. To start with it let us consider how WiFi data looks like. When you scan a building you get a list of geographical points with WiFi measurements, each of them contains signals from several hotspots, and every hotspot signal is normally detected many times demonstrating varying levels - so called RSSI (Received Signal Strength Indication). All this data is stored in the map, and can be viewed - just switch to the "Scan and Edit" mode, open the list of points and expand any of them - you'll see the names of hotspots detected in this point. Tap a hotspot to open a dialog presenting this spot's RSSI distribution.

Usually it's bumpy, but the more scans you perform in a point the more smooth distribution you get. According to the Central Limit Theorem the distribution tends to be Normal (Gaussian), so one can estimate its mean and standard deviation. This is where the word "synthetic" in the algorithm name comes from: it implies that we can build a synthetic (near to real, yet artificial) probability distribution (mass) function from acquired WiFi data.

Of course, this is just a probability of getting specific signal level of a hotspot S in a point X. We need to combine all such probabilities somehow in order to calculate current user location. And here is the Bayes theorem comes in handy.

P(A | B) * P(B) = P(B | A) * P(A), where

  • P(A) and P(B) are the probabilities (priors) of A and B independent of each other;
  • P(A | B) is the conditional probability of A given that B is true;
  • P(B | A) is the conditional probability of B given that A is true;
Now let us add some suffixes to bring it into our context:

P(Asi | Bxj) * P(Bxj) = P(Bxj | Asi) * P(Asi), where

suffixes 'si' and 'xj' stand for i-th hotspot and j-th point respectively, so:

  • P(Asi) is the prior probability of i-th hotspot;
  • P(Asi | Bxj) is the conditional probability of observing i-th hotspot in j-th point;
  • P(Bxj) is the prior probability of j-th point;
  • P(Bxj | Asi) is the conditional probability of being in j-th point provided that i-th hotspot signal is detected;
Apparently, it's the last probability which we are after:

P(Bxj | Asi) = P(Asi | Bxj) * P(Bxj) / P(Asi)

In the right side of the equation, P(Asi) can be estimated as a percentage of points where the hotspot Si was detected. P(Bxj) is initially 1/N for every point, where N is the total number of points, but later it can be refined by applying our knowledge about current user location and reducing N to a reasonable neighbourhood. At last P(Asi | Bxj) is the probability obtained from the synthesized probability distribution of signals described above.

Having P(Bxj | Asi) for different i-s in every point j, we should combine them by hotspots (by i) in order to get the total probability of point Xj. According to the formula of mutually non-exclusive events:

P(Xj) = P(Σi(Bxj | Asi)) = 1 - Πi(1 - P(Bxj | Asi))

The point with largest value (or an average of K points with largerst values) is most likely the current location of the user.

This is a simplified description of the "synthetic probability" algorithm. It does not cover many nuances such as signal level calibration and in-motion scanning but gives you an idea of how WiFiLine works behind the scenes.

Friday, March 6, 2015

New WiFiLine version has been released

New version of WiFiLine - 1.1.3 - has been just released. The main changes are:

  • debug marks with current user position(s) detected by all built-in algorithms are now optional and disabled by default; in order to enable the marks, one should enter "debug:on" in the processing command line (described some time ago in the blog); "debug:off" disables the marks;
  • the naive multiplier algorithm has been changed a bit, but it's still an experimental thing;
According to latest tests in fields, recommended methods are extended fingerprint and synthetic probability again, as they were before. Position accuracy is 10 meters. I plan to shed some light on the probabilistic method in a future. As for neural network, it requires manual finetuning of input data to get best results. For example, it could be worth removing some non-discriminative hotspots from the map.

Готовится новый выпуск WiFiLine

В разработке и тестировании находится следующая версия WiFiLine. Вот основные изменения:

  • Отладочный вывод на карту меток с результатами работы всех алгоритмов позиционирования сделан опциональным; по-умолчанию, он отключен; чтобы его включить, необходимо ввести в командной строке "debug:on";
  • слегка изменен алгоритм naive multiplier, он все еще является экспериментальным;
По итогам полевого тестирования, рекомендуемыми по-прежнему являются методы расширенного отпечатка и синтетической вероятности. Точность позиционирования - 10 метров. В одной из следующих публикаций я расскажу более подробно о вероятностном методе. Что же касается нейронной сети, то для получения с ней хороших результатов, требуется тонкая ручная подготовка исходных данных, в частности, удаление некоторых хотспотов с малоинформативными сигналами из карты.