Sørensen–Dice Coefficient

The Sørensen–Dice coefficient also known as Sørensen–Dice index, Sørensen index, Dice's coefficient or Soerenson index, is a simple and elegant way to calculate a measure of the similarity of two strings. The values produces are bounded between zero and one. The algorithm works by comparing the number of identical character pairs between the two strings.



									Public Shared Function DiceMatch(string1 As String, string2 As String) As Double
	If String.IsNullOrEmpty(string1) OrElse String.IsNullOrEmpty(string2) Then
		Return 0
	End If

	If string1 = string2 Then
		Return 1
	End If

	Dim strlen1 As Integer = string1.Length
	Dim strlen2 As Integer = string2.Length

	If strlen1 < 2 OrElse strlen2 < 2 Then
		Return 0
	End If

	Dim length1 As Integer = strlen1 - 1
	Dim length2 As Integer = strlen2 - 1

	Dim matches As Double = 0
	Dim i As Integer = 0
	Dim j As Integer = 0

	While i < length1 AndAlso j < length2
		Dim a As String = string1.Substring(i, 2)
		Dim b As String = string2.Substring(j, 2)
		Dim cmp As Integer = String.Compare(a, b)

		If cmp = 0 Then
			matches += 2
		End If

		i += 1
		j += 1
	End While

	Return matches / (length1 + length2)
End Function
								


Example

									Dim result As Double = DiceMatch("algorithms are fun", "logarithms are not")
								


Output

									result: 0.58823529411764708