Posted November 20, 2015
mrkgnao: More or less. Never heard of Levenshtein, unless you count my ex-girlfriend, who had this surname.
Considered detecting a change in title with no other change, but thought it wasn't worth the trouble.
ssokolow: Well, there are plenty of implementations kicking around, so it shouldn't be too onerous if you feel like it. Just feed in two strings and it'll give you a number representing the total number of characters added, removed, and substituted to transform one into the other. (eg. "Windowx -> Windows" would be 1) Considered detecting a change in title with no other change, but thought it wasn't worth the trouble.
Just decide on a rule for a maximum allowable limit and you're ready to go. (Could be as simple as "if edit_distance < floor_division(len(title), 4)")
You see, what will happen in effect is that:
1) MaGog has a hash (dictionary) of existing files (say 10 files per game), indexed by title
2) It identifies that a new file is not in the dictionary
3) Now it needs to iterate through all the dictionary files and compare the metadata to the new file (minimum one comparison, maximum four)
4) If a match is a found, apply your string comparison.
5) Repeat this for every file that is added.
Considering that probably >99% of the added files are truly added files and not title typo fixes, we will have hundreds if not thousands of unnecessary comparisons in step 3, only to rarely ever reach step 4. Add to this the slight extra code complexity and this is what I would consider "not worth the trouble".
It's not as if any information is lost this way. It is just presented in an inconvenient manner.