+ 8

How to find out which string is more matching then others? Python

If I have a list of strings. lst = [] #List of strings string = input() How do I find out which list index matches the best with the string variable? Matches the best. Always returns something. lst = ["solo", "sololearn"] string = "solo" That would return index 0 lst = ["solo", "sololearn"] string = "sololrn" That would return index 1 And if there's only one item it would of course return index 0 since there's no more strings that could match better

11th Jul 2018, 11:27 PM
Toni Isotalo
Toni Isotalo - avatar
9 Answers
+ 10
The usual way to do this is to calculate the levenshtein distance between your word and all the words in the list. Smallest levenshtein distance means best match. The Levenshtein distance checks how many letters you would need to swap or add or remove to get to the target word. Look it up, it's not too hard to implement!
12th Jul 2018, 3:11 AM
Schindlabua
Schindlabua - avatar
+ 7
Schindlabua Wow, thanks. I have learned something. First time I've heard of Levenshtein distance. https://code.sololearn.com/cUCYOx3cC6rd/?ref=app
12th Jul 2018, 4:54 AM
Louis
Louis - avatar
+ 5
matches best? or matches a string in your list.
11th Jul 2018, 11:43 PM
LONGTIE👔
LONGTIE👔 - avatar
+ 5
hmmm idk Toni Isotalo
12th Jul 2018, 12:13 AM
LONGTIE👔
LONGTIE👔 - avatar
+ 5
from difflib import SequenceMatcher as SM lst = ["solo", "sololearn"] string = "solo" a = max(lst, key = lambda x:SM(a = string, b = x).ratio()) print(lst.index(a)) lst = ["solo", "sololearn"] string = "sololrn" a = max(lst, key = lambda x:SM(a = string, b = x).ratio()) print(lst.index(a))
12th Jul 2018, 2:27 PM
Mert Yazıcı
Mert Yazıcı - avatar
+ 3
take substrings of search string and then match it with list in reverse order (longest substring first).. so it will match the best
12th Jul 2018, 2:52 AM
Zoetic_Zeel
Zoetic_Zeel - avatar
+ 3
Extremely good question! It should be in a challenge :) my try: https://code.sololearn.com/cngfC80PsFuC/?ref=app
13th Jul 2018, 1:52 PM
Sahil Danayak
Sahil Danayak - avatar
+ 3
Sahil Danayak change your list to list=['apple','banana', 'bandana' ,'mango','manga'] It has difficulty deciding between similar words.
13th Jul 2018, 3:01 PM
Louis
Louis - avatar
+ 2
Not sure if this is still relevant but the other day I found out that apparently levenshtein distance is super old and outdated and the cool kids use the "Sþrensen–Dice coefficient" for finding similar strings these days. I'm not sure how easy or hard it is to implement though.
24th Sep 2018, 10:29 PM
Schindlabua
Schindlabua - avatar