Fuzzy Bitap Algorithm

This is a fuzzy string matching version of bitap algorithm. The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance — if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast.



									Public Shared Function SearchString(text As String, pattern As String, k As Integer) As Integer
	Dim result As Integer = -1
	Dim m As Integer = pattern.Length
	Dim R As Integer()
	Dim patternMask As Integer() = New Integer(127) {}
	Dim i As Integer, d As Integer

	If String.IsNullOrEmpty(pattern) Then
		Return 0
	End If
	If m > 31 Then
		Return -1 'Error: The pattern is too long!
	End If

	R = New Integer((k + 1) * 4 - 1) {}
	For i = 0 To k
		R(i) = Not 1
	Next

	For i = 0 To 127
		patternMask(i) = Not 0
	Next

	For i = 0 To m - 1
		patternMask(Convert.ToInt32(pattern(i))) = patternMask(Convert.ToInt32(pattern(i))) And Not (1 << i)
	Next

	For i = 0 To text.Length - 1
		Dim oldRd1 As Integer = R(0)

		R(0) = R(0) Or patternMask(Convert.ToInt32(text(i)))
		R(0) <<= 1

		For d = 1 To k
			Dim tmp As Integer = R(d)

			R(d) = (oldRd1 And (R(d) Or patternMask(Convert.ToInt32(text(i))))) << 1
			oldRd1 = tmp
		Next

		If 0 = (R(k) And (1 << m)) Then
			result = (i - m) + 1
			Exit For
		End If
	Next

	Return result
End Function
								


Example

									Dim index As Integer = SearchString("The quick brown foax jumps over the lazy dog", "fox", 1)
								


Output

									index: 16