It seems that you're using an outdated browser. Some things may not work as they should (or don't work at all).
We suggest you upgrade newer and better browser like: Chrome, Firefox, Internet Explorer or Opera

×
There are a few cases where it's possible to impersonate a user or website by making a name that looks like the legitimate user or site, but actually isn't. I am wondering if there are any tools that would allow one to check to see if two Unicode strings look alike when displayed, even if they are't exactly the same.

For example, if there's a user named fables22, it should not be possible to register a name like fab1es22 or fables22 .

(Note: That extra space before the period is intentional.)

Is there a way to do this?
Spaces are a special case because they shouldn't be in an internal account name. They may be displayed to someone on a website looking at information but that's almost always a separate field. Spaces get encoded and replaced in many databases but you run into too many issues and it's probably best just to not have them in there.

Comparing too strings really depends on the programming language. Most languages have a command to do an exact comparison like c's strcmp command. Doing a "How close is it?" comparison is a bit harder.

Even google has issues with this with how it throws out suggestions. I could fill pages on how bad I think they are with that.
I do not know any exist libraries do this.
But for forum names, usually only limited characters are allowed.
What happens if a user has custom fonts when viewing the website? Even the same font could have differences when rendered in separate engines and platforms. I do not think it would be possible to cover all cases when the rendering of the string occurs on the client side. I could even have a font where every character looks the same.

Common cases might be found by rasterising both strings with a set of well-known fonts and styles, and then comparing the resulting renders.
avatar
dtgreene: There are a few cases where it's possible to impersonate a user or website by making a name that looks like the legitimate user or site, but actually isn't. I am wondering if there are any tools that would allow one to check to see if two Unicode strings look alike when displayed, even if they are't exactly the same.

For example, if there's a user named fables22, it should not be possible to register a name like fab1es22 or fables22 .

(Note: That extra space before the period is intentional.)

Is there a way to do this?
Sounds like someone has read the recent whatsapp news...
avatar
kbnrylaec: But for forum names, usually only limited characters are allowed.
Depends on the software actually. The couple we deal with will support special characters. They just encode into unicode before hand and rely on user ID numbers for their internal workings.

Too many folks have umlauts and whatever else....
avatar
nightcraw1er.488: Sounds like someone has read the recent whatsapp news...
That may be what got me thinking about this issue, but it's not the only case I heard of this happening; I have heard of it done to make a phishing site look like the real one, even when you look at the URL (and even have a secure connection to it).
avatar
Xzaril: What happens if a user has custom fonts when viewing the website? Even the same font could have differences when rendered in separate engines and platforms. I do not think it would be possible to cover all cases when the rendering of the string occurs on the client side. I could even have a font where every character looks the same.

Common cases might be found by rasterising both strings with a set of well-known fonts and styles, and then comparing the resulting renders.
I think it makes sense to focus on the default fonts used first, or at least the most common fonts.

In the ASCII character set, there are some well-known look-alike characters, like 'l' and '1', '5' and 'S', '0' and 'O', for example. Perhaps a list of look-alikes like this could work (though Unicode makes things more complicated with things like combining characters).

Another idea would be to apply machine learning: Use a CAPCHA, and have the program study the most common mistakes people make. This only works for characters that are commonly used by the audience, and might not, for example, catch similar kanji in a US focused site. Also, machine learning can make mistakes and has false positives.
Post edited November 09, 2017 by dtgreene
A turnkey solution probably doesn't exist. If it existed, Taylor Swift on twitter wouldn't be writing long-ass regular expressions to catch phishers.
Having just walked a few miles and thought about this and other things today, maybe a practical example of what environment we're talking about would be helpful.

I ask because I know the problem with fake staff and well known user accounts in chat has come up a couple of times recently here on the forums.

Having just looked at the chat screen, I;m rather surprised not to see any sort of tooltip or popup like we have here on the forums when we hover over the person's name. It's easy to steal an avatar but it;s not easy to steal rep points or the join date.
avatar
dtgreene: There are a few cases where it's possible to impersonate a user or website by making a name that looks like the legitimate user or site, but actually isn't. I am wondering if there are any tools that would allow one to check to see if two Unicode strings look alike when displayed, even if they are't exactly the same.

For example, if there's a user named fables22, it should not be possible to register a name like fab1es22 or fables22 .

(Note: That extra space before the period is intentional.)

Is there a way to do this?
No, because like said, what something looks like on your screen depends entirely on what browser you are using and what kind of font you are viewing. If you have, for some strange reason, some hand-writing style font as your default font, 1 and l may look completely different, but for instance, s and f can look kind of similar.

There is a way to check how unique some unicode string is though. This even happens on some servers where you need to change your password at certain intervals, and the new password must be "different enough" from the previous password. I guess there might be some code examples of that somewhere.

Good luck, have fun searching.
avatar
PixelBoy: Good luck, have fun searching.
Got me curious so I went looking:

pam_cracklib

https://linux.die.net/man/8/pam_cracklib

Never seen it before.

Gone to read....

edit: Granted it's for passwords....

reedit: Comparison of 2 strings:

https://stackoverflow.com/questions/577463/finding-how-similar-two-strings-are
Post edited November 09, 2017 by drmike
Failsafe? No. You could feed an ngram "fuzzy search" engine like Lucene and go by the score it wields.
Post edited November 09, 2017 by AlienMind
Won't help much out of the box. You shouldn't be looking for just any edits but for specifically those which cause confusion, preferably malicious confusion. General distance metrics don't do this. Ideally, you need a sorta soundex but for glyphs - given a font and a string, compute a function of the string's visual representation, compare. OCR software probably use something like that.

Also, malice depends on community norms and expectations; a variation that's extremely suspicious on a forum might be perfectly acceptable in many MMORPGs.
avatar
Starmaker: Also, malice depends on community norms and expectations; a variation that's extremely suspicious on a forum might be perfectly acceptable in many MMORPGs.
That's why I asked about the environment we were talking about. Are we talking about seeing a username on a webpage? Are we talking about during signup of a new account? Different answers would apply in each case.
Hehe, good times, reminds me the first time I saw this back in 2013.

https://labs.spotify.com/2013/06/18/creative-usernames/


Which btw is entirely different from an attack involving "fab1es22 or fables22 .". That example has nothing to do with the plethora of problems introduced by Unicode.
Post edited November 10, 2017 by onarliog