nothing revelatory here, but some interesting background pointers.

i’ve been catching up on a massive reading backlog as of late. one of the topics egregiously in the backlog has been digesting the information associated with internationalized domain names (IDNs) and poking at some of the attendant follow-on considerations. given that there’s a huge hunk of the world that doesn’t use latin character sets, this is an increasingly interesting and relevant topic. particularly for network infrastructure dweebs.

for those looking for a good place to start on the topic of internationalization i highly recommend geoff huston’s writeup on the topic, Internationalizing the Internet. he provides a reasonable primer on interesting topics such as digraphs, glyphs, etc.

fortunately, localization of content presentation is an area which has received a considerable amount of attention within the computer industry. further, it benefits from the fact that there’s been a bit of give and take socially and from a development perspective to accommodate various localization requirements. e.g.: japanese writing and layout has undergone a bit of accommodation to “modern” publishing capabilities and computer interfaces.

internationalization of the Internet is another matter. of considerably greater difficulty is enabling the infrastructure to support the variety of localizations that are out there. the first among these is the DNS infrastructure. this leads you down a windy path of different encoding mechanisms and a whole host of additional security implications. of notea number of interesting variants on homograph attacks.

looking at this from the perspective of network engineering, we’re really moving into a world where there will be new stresses and strains placed upon the DNS infrastructure. what was previously a relatively low bandwidth infrastructure service will rapidly explode in terms of bandwidth utilization and processing requirements going forward. considerable attention will need to be given to application design and verification mechanisms in the background to alert users to a host of new attacks. it’s unclear what the implications will be on service / application developers over the near term given that most of the infrastructure elements associated with web services are ascii oriented.

misc. background reading:

  • punycode - a means of encoding unicode into the ASCII character space.
  • Phishing defense against IDN address spoofing attacks - **abstract: **Address spoofing is a common trick used in phishing scams to confuse unsuspecting users about a Web site’s real origin. With the introduction of Unicode characters into domain names, also known as Internationalized Domain Names (IDN), the risk has significantly increased even for the most cautious users. The author explores the various types of address spoofing attacks focusing on IDN, and presents a novel client-side Web browser plug-in Quero which implements several techniques—including highlighting—to protect the user against visually undistinguishable address manipulations.
  • RFC 4690 - **abstract: ** This note describes issues raised by the deployment and use of Internationalized Domain Names. It describes problems both at the time of registration and for use of those names in the DNS. It recommends that IETF should update the RFCs relating to IDNs and a framework to be followed in doing so, as well as summarizing and identifying some work that is required outside the IETF.