Sharing your ideas is important because you may never know how your contributions will be put to productive use.
Vladimir Losifovich Levenshtein, a Russian scientist, might not have known his work would ultimately prevent an on-call surgeon from having her car towed. But as we will show in this blog entry, Levenshtein’s contributions help handle this type of problem thousands of times a day.
Prof. Vladimir Levenshtein was born in 1935, the same year Amelia Earhart became the first person to fly solo from Hawaii to California and when the first canned bear was ever sold. Here’s a photo of him at a conference in 2002. He’s the white-haired guy.
Over his career, Levenshtein thought about letters (characters) a lot and how you could compare words (strings).
His work at the Russian Academy of Sciences in Moscow would eventually lead to the discovery of “Levenshtein Distance” or “edit distance” which became the root of spell-checking software. (Yes, that squiggly red line you see under your Facebook comments more often than you want to admit.)
Levenshtein’s Edit Distance calculation measures the difference between two words so you can tell mathematically how similar they are. For example, let’s take “KITTEN” and “SITTING”. How are these two words similar compared to any other word?
- kitten → sitten (substitution of “s” for “k”)
- sitten → sittin (substitution of “i” for “e”)
- sittin → sitting (insertion of “g” at the end).
This “edit distance” comes to three, since there are three edits to change one word to the other and there is no way to do it with less than three edits.
It turns out, Levenshtein Distance is a fine way to compare license plate numbers. Often, LPR (license plate reading) scans are imprecise. A person may be scanning from an odd angle or the lighting may be particularly bad.
The plate number we have to work with in software may be off by a character or two or even more. But using Levenshtein’s edit distance, we can still make thoughtful use of the information.
For example, assume in the photo above that the black car on the right was read as “AWST4” when in fact the plate is “AWN5T4.” Using Levenshtein Distance we can calculate that these plates are actually 73% similar.
Now say this car is parked in an employee lot, and we’re using EasyALPR Parking Enforcer. to patrol the parking lot for cars that don’t belong. We’ve uploaded a spreadsheet of employee cars, which includes a certain black Kia Soul with the license plate AWN5T4 that belongs to an on-call surgeon named Julie Harding, MD.
We would hope that EasyALPR would figure out that even though the sunlight is shining right into the iPhone’s camera–and the angle is not great–the software would figure out this is Dr. Harding’s car and not to bother with it.
Thanks to Professor Levenshtein, EasyALPR can decide in the blink of an eye that the detected plate number and the one on the employee list are a 73% match. EasyALPR correctly assumes this is Dr. Harding’s whitelisted vehicle and allows the patrol person to continue walking their route. This is all without any correction done by the person on patrol.
Being able to calculate similarity between plates quickly means even in bad lighting we can make good matches between license plate numbers and save facility managers a great deal of time patrolling parking lots for violations.
Levenshtein earned a medal from IEEE for this contribution and he should be thanked thousands of times a day because his work is both detecting and preventing parking violations.
We’ve added an interactive Fuzzy String Matching Tool that lets you test Levenshtein Distance between plate numbers. This demonstrates a Python library with the whimsical name of fuzzywuzzy which does a great job of efficiently implementing Professor Levenshtein’s algorithm. Here’s a preview of the simple interface:
Sadly, Professor Levenshtein passed away in 2017. But his work lives on both in EasyALPR and many other software products. Do you have another interesting use for Levenshtein Distance, or a suggestion for improving plate similarity detection? Please tell us about it in the comments.