Programming question: Check if two Unicode strings look the same, page 1 - Forum

dtgreene Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

dtgreene

vaccines work she/her

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Jan 2010

From United States

Posted November 09, 2017

1

There are a few cases where it's possible to impersonate a user or website by making a name that looks like the legitimate user or site, but actually isn't. I am wondering if there are any tools that would allow one to check to see if two Unicode strings look alike when displayed, even if they are't exactly the same.

For example, if there's a user named fables22, it should not be possible to register a name like fab1es22 or fables22 .

(Note: That extra space before the period is intentional.)

Is there a way to do this?

drmike Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

drmike

Why yes, I am a Major General

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Jan 2012

From United States

Posted November 09, 2017

2

Spaces are a special case because they shouldn't be in an internal account name. They may be displayed to someone on a website looking at information but that's almost always a separate field. Spaces get encoded and replaced in many databases but you run into too many issues and it's probably best just to not have them in there.

Comparing too strings really depends on the programming language. Most languages have a command to do an exact comparison like c's strcmp command. Doing a "How close is it?" comparison is a bit harder.

Even google has issues with this with how it throws out suggestions. I could fill pages on how bad I think they are with that.

kbnrylaec Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

kbnrylaec

Asuka Tanaka

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Nov 2011

From Taiwan

Posted November 09, 2017

3

I do not know any exist libraries do this.
But for forum names, usually only limited characters are allowed.

Xzaril Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

Xzaril

New User

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Aug 2010

From Sweden

Posted November 09, 2017

4

What happens if a user has custom fonts when viewing the website? Even the same font could have differences when rendered in separate engines and platforms. I do not think it would be possible to cover all cases when the rendering of the string occurs on the client side. I could even have a font where every character looks the same.

Common cases might be found by rasterising both strings with a set of well-known fonts and styles, and then comparing the resulting renders.

nightcraw1er.488 Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

nightcraw1er.488

Pale & Bitter

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Apr 2012

From United Kingdom

Posted November 09, 2017

5

dtgreene: There are a few cases where it's possible to impersonate a user or website by making a name that looks like the legitimate user or site, but actually isn't. I am wondering if there are any tools that would allow one to check to see if two Unicode strings look alike when displayed, even if they are't exactly the same.

For example, if there's a user named fables22, it should not be possible to register a name like fab1es22 or fables22 .

(Note: That extra space before the period is intentional.)

Is there a way to do this?

Sounds like someone has read the recent whatsapp news...

drmike Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

drmike

Why yes, I am a Major General

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Jan 2012

From United States

Posted November 09, 2017

6

kbnrylaec: But for forum names, usually only limited characters are allowed.

Depends on the software actually. The couple we deal with will support special characters. They just encode into unicode before hand and rely on user ID numbers for their internal workings.

Too many folks have umlauts and whatever else....

dtgreene Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

dtgreene

vaccines work she/her

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Jan 2010

From United States

Posted November 09, 2017

7

nightcraw1er.488: Sounds like someone has read the recent whatsapp news...

That may be what got me thinking about this issue, but it's not the only case I heard of this happening; I have heard of it done to make a phishing site look like the real one, even when you look at the URL (and even have a secure connection to it).

Xzaril: What happens if a user has custom fonts when viewing the website? Even the same font could have differences when rendered in separate engines and platforms. I do not think it would be possible to cover all cases when the rendering of the string occurs on the client side. I could even have a font where every character looks the same.

Common cases might be found by rasterising both strings with a set of well-known fonts and styles, and then comparing the resulting renders.

I think it makes sense to focus on the default fonts used first, or at least the most common fonts.

In the ASCII character set, there are some well-known look-alike characters, like 'l' and '1', '5' and 'S', '0' and 'O', for example. Perhaps a list of look-alikes like this could work (though Unicode makes things more complicated with things like combining characters).

Another idea would be to apply machine learning: Use a CAPCHA, and have the program study the most common mistakes people make. This only works for characters that are commonly used by the audience, and might not, for example, catch similar kanji in a US focused site. Also, machine learning can make mistakes and has false positives.

Post edited November 09, 2017 by dtgreene

Starmaker Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

Starmaker

go Clarice!

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Sep 2010

From Russian Federation

Posted November 09, 2017

8

A turnkey solution probably doesn't exist. If it existed, Taylor Swift on twitter wouldn't be writing long-ass regular expressions to catch phishers.

drmike Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

drmike

Why yes, I am a Major General

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Jan 2012

From United States

Posted November 09, 2017

9

Having just walked a few miles and thought about this and other things today, maybe a practical example of what environment we're talking about would be helpful.

I ask because I know the problem with fake staff and well known user accounts in chat has come up a couple of times recently here on the forums.

Having just looked at the chat screen, I;m rather surprised not to see any sort of tooltip or popup like we have here on the forums when we hover over the person's name. It's easy to steal an avatar but it;s not easy to steal rep points or the join date.

PixelBoy Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

PixelBoy

New Loser

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Jun 2009

From Finland

Posted November 09, 2017

10

dtgreene: There are a few cases where it's possible to impersonate a user or website by making a name that looks like the legitimate user or site, but actually isn't. I am wondering if there are any tools that would allow one to check to see if two Unicode strings look alike when displayed, even if they are't exactly the same.

For example, if there's a user named fables22, it should not be possible to register a name like fab1es22 or fables22 .

(Note: That extra space before the period is intentional.)

Is there a way to do this?

No, because like said, what something looks like on your screen depends entirely on what browser you are using and what kind of font you are viewing. If you have, for some strange reason, some hand-writing style font as your default font, 1 and l may look completely different, but for instance, s and f can look kind of similar.

There is a way to check how unique some unicode string is though. This even happens on some servers where you need to change your password at certain intervals, and the new password must be "different enough" from the previous password. I guess there might be some code examples of that somewhere.

Good luck, have fun searching.

drmike Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

drmike

Why yes, I am a Major General

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Jan 2012

From United States

Posted November 09, 2017

11

PixelBoy: Good luck, have fun searching.

Got me curious so I went looking:

pam_cracklib

https://linux.die.net/man/8/pam_cracklib

Never seen it before.

Gone to read....

edit: Granted it's for passwords....

reedit: Comparison of 2 strings:

https://stackoverflow.com/questions/577463/finding-how-similar-two-strings-are

Post edited November 09, 2017 by drmike

AlienMind Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

AlienMind

GOG sells DRM+microtransactions + accept is a joke

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Oct 2012

From Germany

Posted November 09, 2017

12

Failsafe? No. You could feed an ngram "fuzzy search" engine like Lucene and go by the score it wields.

Post edited November 09, 2017 by AlienMind

Starmaker Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

Starmaker

go Clarice!

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Sep 2010

From Russian Federation

Posted November 09, 2017

13

drmike: reedit: Comparison of 2 strings:

https://stackoverflow.com/questions/577463/finding-how-similar-two-strings-are

Won't help much out of the box. You shouldn't be looking for just any edits but for specifically those which cause confusion, preferably malicious confusion. General distance metrics don't do this. Ideally, you need a sorta soundex but for glyphs - given a font and a string, compute a function of the string's visual representation, compare. OCR software probably use something like that.

Also, malice depends on community norms and expectations; a variation that's extremely suspicious on a forum might be perfectly acceptable in many MMORPGs.

drmike Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

drmike

Why yes, I am a Major General

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Jan 2012

From United States

Posted November 09, 2017

14

Starmaker: Also, malice depends on community norms and expectations; a variation that's extremely suspicious on a forum might be perfectly acceptable in many MMORPGs.

That's why I asked about the environment we were talking about. Are we talking about seeing a username on a webpage? Are we talking about during signup of a new account? Different answers would apply in each case.

wbmatic Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Comment buried. Unhide

wbmatic

New User

Sorry, data for given user is currently unavailable. Please, try again later. View profile View wishlist Start conversation Invite to friends Invite to friends Accept invitation Accept invitation Pending invitation... User since {{ user.formattedDateUserJoined }} Friends since {{ user.formattedDateUserFriended }} Unblock chat User blocked This user's wishlist is not public. You can't chat with this user due to their or your privacy settings. You can't chat with this user because you have blocked him. You can't invite this user because you have blocked him.

Registered: Mar 2014

From United States

Posted November 10, 2017

15

Hehe, good times, reminds me the first time I saw this back in 2013.

https://labs.spotify.com/2013/06/18/creative-usernames/

Which btw is entirely different from an attack involving "fab1es22 or fables22 .". That example has nothing to do with the plethora of problems introduced by Unicode.

Post edited November 10, 2017 by onarliog

Discover CD PROJEKT RED games

Highlights

Featured