Sørensen–Dice Coefficient

The Sørensen–Dice coefficient also known as Sørensen–Dice index, Sørensen index, Dice's coefficient or Soerenson index, is a simple and elegant way to calculate a measure of the similarity of two strings. The values produces are bounded between zero and one. The algorithm works by comparing the number of identical character pairs between the two strings.



									function DiceMatch($string1, $string2)
{
	if (empty($string1) || empty($string2))
		return 0;

	if ($string1 == $string2)
		return 1;

	$strlen1 = strlen($string1);
	$strlen2 = strlen($string2);

	if ($strlen1 < 2 || $strlen2 < 2)
		return 0;

	$length1 = $strlen1 - 1;
	$length2 = $strlen2 - 1;

	$matches = 0;
	$i = 0;
	$j = 0;

	while ($i < $length1 && $j < $length2)
	{
		$a = substr($string1, $i, 2);
		$b = substr($string2, $j, 2);
		$cmp = strcasecmp($a, $b);

		if ($cmp == 0)
			$matches += 2;

		++$i;
		++$j;
	}

	return $matches / ($length1 + $length2);
}
								


Example

									$result = DiceMatch("algorithms are fun", "logarithms are not");
								


Output

									result: 0.58823529411765