LAST_NAME
your query might look like this
SELECT *
FROM CONTACTS
WHERE LAST_NAME='theName'
Seems simple enough, but not without problems. Take my last name - Wolf - for example. Let's say you knew what my last name was, but you didn't know how it was spelled - so it could be Wolf, Wolfe, Wolff, Woolf, etc. The above query would require me to try every possible spelling to find a result. So maybe as a precaution, you would try to match the beginning of the word using like:
SELECT *
FROM CONTACTS
WHERE LAST_NAME like 'wolf%'
This would do a little better, but it has two major shortcomings - It will exclude names that should be included like Woolf, for example, and will include names that shouldn't like Wolfish or Wolfson. Of course, my last name is a limited example. If we were searching for, say, Smith, searching under
Smith
will also return Smithson
but not alternative spellings of Smith, such as Smythe.Fortunately, there are two SQL functions that can help in this endeavour:
SOUNDEX
and DIFFERENCE
.SOUNDEX
takes a string as an argument and returns a value that is based on the way that the string sounds (I am not 100% how this is caluculated, but it still seems to work). So re-writing our earlier query:
SELECT *
FROM CONTACTS
WHERE SOUNDEX(LAST_NAME)=SOUNDEX('WOLF')
This query will give us the desired effect of matching similarly sounding names with different spellings without excluding most spellings and without including different names.
DIFFERENCE
takes this one step further. It essentially takes two strings as arguments and then returns the difference of their SOUNDEX
values. The difference is between 0-4, with 4 being the closest match. Using difference we can re-write the query one more time, expanding our horizons to include not-so-exact matches, but also ranking them by closeness:
SELECT *,DIFFERENCE(LAST_NAME) as RANK
FROM CONTACTS
WHERE DIFFERENCE(LAST_NAME,'WOLF')>=3
ORDER BY DIFFERENCE(LAST_NAME) DESC
What this query does is it will find names that are not quite 'Wolf', but closely related and then rank them in order of closeness (i.e. it will match, Wolf, Wolfe, and Wolfson and they will be ranked in that order respectively).
So the next time you're in a matching bind, give
DIFFERENCE
and SOUNDEX
a try.
No comments:
Post a Comment