Comparaison de strings et logique floue
Pour implémenter dans la logique floue pour comparer deux strings, on peut utiliser la distance de Levenshtein.
Voici le code permettant de calculer cette distance. Rien de particulier, on retrouve partout ce code sur internet.
import numpy as np
def levenshtein_ratio_and_distance(s, t, ratio_calc = False):
""" levenshtein_ratio_and_distance:
Calculates levenshtein distance between two strings.
If ratio_calc = True, the function computes the
levenshtein distance ratio of similarity between two strings
For all i and j, distance[i,j] will contain the Levenshtein
distance between the first i characters of s and the
first j characters of t
"""
# Initialize matrix of zeros
rows = len(s)+1
cols = len(t)+1
distance = np.zeros((rows,cols),dtype = int)
# Populate matrix of zeros with the indeces of each character of both strings
for i in range(1, rows):
for k in range(1,cols):
distance[i][0] = i
distance[0][k] = k
# Iterate over the matrix to compute the cost of deletions,insertions and/or substitutions
for col in range(1, cols):
for row in range(1, rows):
if s[row-1] == t[col-1]:
cost = 0 # If the characters are the same in the two strings in a given position [i,j] then the cost is 0
else:
# In order to align the results with those of the Python Levenshtein package, if we choose to calculate the ratio
# the cost of a substitution is 2. If we calculate just distance, then the cost of a substitution is 1.
if ratio_calc == True:
cost = 2
else:
cost = 1
distance[row][col] = min(distance[row-1][col] + 1, # Cost of deletions
distance[row][col-1] + 1, # Cost of insertions
distance[row-1][col-1] + cost) # Cost of substitutions
if ratio_calc == True:
# Computation of the Levenshtein Distance Ratio
Ratio = ((len(s)+len(t)) - distance[row][col]) / (len(s)+len(t))
return Ratio
else:
# print(distance) # Uncomment if you want to see the matrix showing how the algorithm computes the cost of deletions,
# insertions and/or substitutions
# This is the minimum number of edits needed to convert string a to string b
return "The strings are {} edits away".format(distance[row][col])
Il existe aussi un package appelé Levenshtein
qui nous simplifie la vie.
>>> import Levenshtein as lev
>>> string_1 = "Lean Deep."
>>> string_2 = "leandeep"
>>> distance = lev.distance(string_1.lower(), string_2.lower()),
>>> print(distance)
(1,)
>>> ratio = lev.ratio(Str1.lower(),Str2.lower())
>>> print(ratio)
0.9473684210526315
Et surtout il existe le package fuzzywuzzy
qui contient des fonctions avancées pas présentes dans le package Levenshtein
qui vont nous permettre de gérer des cas plus complexes comme le “substring matching”; très utile pour extraire des bribes d’informations de textes.
>>> from fuzzywuzzy import fuzz
>>> string_1 = "Lean Deep"
>>> string_2 = "deep"
>>> ratio = fuzz.ratio(string_1.lower(), string_2.lower())
>>> partial_ratio = fuzz.partial_ratio(string_1.lower(), string_2.lower())
>>> print(ratio)
>>> print(partial_ratio)
50
100