Domain names are the face of the web and any time I had to register one my brain always defaulted to using Latin letters. For my own personal site here, despite its domain name being my first name I opted to use the romanized version of it, never thinking that there was actually another option. That was the case until recently when I realized that I could register a domain name entirely in my native Thai script.
This post chronicles the journey I took to get an internationalized domain registered and configured for this website.
Goals
I am by no means an expert in this domain (excuse the pun), so this is more of a documentation of my learning and the problems I ran into when trying to set this up for myself.
Here are the things that I'll go over:
- The basics of internationalized domain names and how they work
- Domain registration and configurations
- Challenges and pitfalls I encountered
- Support for them on the web
The Internet's Great Oversight
The creators of the web were mostly English-speaking Americans and this resulted in most of the standards and practices only being thought out from that standpoint. This has a lasting effect until today where the vast majority of domain names we see on the web use the limited ASCII character set, which consists of only Latin letters. This is due to the fact that the Domain Name System (DNS) was designed to only support ASCII which leaves out many languages and their native alphabets and scripts. While the Latin alphabet is the most used character set in the world, it certainly should not be the only character set supported for domain names. In order to correct this oversight, the standard for internationalized domain names was introduced.
Internationalized Domain Name
An internationalized domain name (IDN) is a domain name that contains one or more non-ASCII characters. This means it can contain Unicode characters which allows for domain names in various non-Latin alphabets and scripts. Since only ASCII characters are supported by the DNS due to its design, supporting IDNs is just a mechanism to work around this limitation without having to overhaul the deep-rooted infrastructure for the internet. The important goal the designers behind this standard had was to ensure that IDNs are interoperable with the existing infrastructure so its introduction would not break existing user-facing applications, such as web browsers or email clients. This standard was approved by ICANN and deployed in 2003.
The solution is just a matter of converting the Unicode domain name into its ASCII representation before submitting the DNS query. Doing so involves using an algorithm called Punycode which, as described in RFC 3492, "uniquely and reversibly transforms a Unicode string into an ASCII string". Everything from that point on remains the same. This means that browsers don't necessarily have to support this standard and they should still be able to locate resources specified at the ASCII version of the IDNs.
Internationalized Country Code Top-level Domain
Country code top-level domains (ccTLDs) have been around since the early days of the internet. You have probably come across them as domain hacks with the likes of .io
(British Indian Ocean Territory), .fm
(Federated States of Micronesia), and .ly
(Libya) among many others. These are subjected to each country's requirements which means they can limit who can register them and for what purpose. A number of these ccTLDs also have non-Latin counterparts in their country's native script such as .cn
+ .中國
(China), .eg
+ مصر.
(Egypt), and of course .th
+ .ไทย
(Thailand). These are referred to as internationalized country code top-level domain (IDN ccTLD). Surprisingly they are quite recent additions to the internet, having only been available starting in 2010.
Converting Unicode to ASCII in a Domain Name
Converting an IDN into its ASCII counterpart involves a few steps:
- Split up the domain name into individual labels.
- Encode each label using the Punycode algorithm.
- Add a special prefix
xn--
to each label. - Put the full domain name back together using
.
to separate the labels.
As an example: for an IDN คน.ไทย
, the individual labels are คน
and ไทย
. Encoding those labels yields 42c6b
and o3cw4h
, respectively. Adding the prefix and putting them together into a full domain name we get xn--42c6b.xn--o3cw4h
.
You can visit http://xn--42c6b.xn--o3cw4h
in your browser now and you should see that it gets decoded back to the Unicode form of คน.ไทย
. Even though your browser displays the domain name in Unicode to you, in the background it first converts it to ASCII before submitting the DNS query, and everything from that point works the same way as any ASCII domain.
Registering วัทธิกร.ไทย
Having an uncommon name in English-speaking world, the domain name of this site is just my romanized first name: Vatthikorn. I would say that in itself is already pretty cool. But I think what's even cooler is to also have the domain name of my actual first name in Thai script with a Thai IDN ccTLD: วัทธิกร.ไทย
. Not many people can say that they have not only one, but two domain names for their site that are literally just their first name.
To find the registrar for a ccTLD, you can of course just do a quick internet search, there's this list for every single one of them, each with its own Wikipedia page. You can also visit IANA's Root Zone Database page that lists all of the available TLDs with more details for them than you'll ever need. But a fun trick I discovered is to simply go to your terminal and use the whois
command for your TLD:
whois ไทย
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object
domain: ไทย
domain-ace: XN--O3CW4H
[...contact details omitted for brevity...]
whois: whois.thnic.co.th
status: ACTIVE
remarks: Registration information: http://www.thnic.co.th
created: 2010-08-19
changed: 2020-08-24
source: IANA
# whois.thnic.co.th
Whois Server Version 2.1.6
So to thnic.co.th I went, and sure enough, my Thai name was not yet taken. To register it, I had to provide both the Thai name as well as its Latin counterpart. So I'm basically getting two domain names at once. They offer a few TLD options such as .co.th
+ .ธุรกิจ.ไทย
(business entities) and .ac.th
+ .ศึกษา.ไทย
(academic institutions). But the one I'm eligible for is .in.th
+ .ไทย
which is designed to be used by Thai citizens.
Due to the fact that they have to verify my identity and eligibility, the registration was technically just a request which needed human review and approval before they handed the domain name over to me. After submitting the request, I then had to email them a proof of payment which I had to make via PayPal. The price was 856 baht/year (around 27 USD). And just mere five hours later — on a Sunday morning in Thailand too — I received an email back from one of their representatives saying that my request was approved!
Setting up DNS Provider
My intention with this newly-acquired Thai domain name was to set up a basic 301 redirect to the main site. The forwarding service is not included with the purchase of the domain and they charge extra 428 baht/year (14 USD) for it which is ridiculous if you ask me. Having already spent a bit more than I wanted to on the domain name, I had to take matters into my own hands.
The DNS provider for this site is currently AWS Route 53. But in trying to set up วัทธิกร.ไทย
on it, I learned that it doesn't accept domain names with Unicode characters and you have to convert them to ASCII first. That was a bit disappointing as I expected AWS to be more global and inclusive than this. On top of that, using Route 53 was going to cost me additional 50 cents a month. After some searching, I found that Cloudflare supports IDNs directly in their UI without requiring you to convert them to ASCII. Setting up an account and adding the domain name was such a smooth sailing process compared to AWS. Best of all, it's completely free for what I'm using it for.
Setting up Forwarding
Setting up forwarding using only the DNS can be done with a CNAME
record but there are a couple of caveats. First, you can't do this on the apex of a domain. So while you can do:
www.example.com IN CNAME another-domain.com
You can't do:
example.com IN CNAME another-domain.com
This on its own was already a no-go for me since I didn't want to have to use www
in my site's URL.
Second, this method cannot perform a proper redirect where the path and/or query components from the original domain are appended to the target domain. Say you want to have example.com/about
forward to another-domain.com/about
, this is not possible with a CNAME
record. Doing that is a web server’s responsibility.
While you could absolutely go with a DIY route and set up a web server (like Apache or Nginx) to just do HTTP redirects with all the customizations you want, to me that seems overkill for what boils down to just a vanity domain redirect for my own amusement. There are also several free URL redirection services out there but I didn't want to add another link in the chain that could potentially break my setup.
As it turned out, Netlify, where this site is hosted, provides a domain alias feature that I can leverage to make this work the way I wanted to. Unfortunately, they also don't natively support IDNs in their UI so it has to be in its ASCII form.
The next step is to add an A
record for this domain to point to Netlify's load balancer IP address. Since I only want Cloudflare to act as a DNS provider, I made sure that this record is marked as "DNS only" instead of "Proxied". This keeps it strictly a DNS record and bypasses Cloudflare's other functionalities.
After the A
record was propagated, trying to load https://วัทธิกร.ไทย
resulted in an error as the subject name in the TLS certificate returned didn't match the requested Thai domain name:
curl -I https://xn--12c7bd9bq4dxa.xn--o3cw4h
curl: (60) SSL: no alternative certificate subject name matches target host name 'xn--12c7bd9bq4dxa.xn--o3cw4h'
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.
The problem here was that after adding an alias domain name on your Netlify site, the TLS certificate needed to be renewed so that the alias domain would be included.
After that was regenerated, วัทธิกร.ไทย
finally loaded the content of this site. However, it didn't perform a redirect to vatthikorn.com
(วัทธิกร.ไทย
remains in the URL field in the browser). For this you can certainly set up redirect rules with Cloudflare but since I already have an existing _redirects
file to have Netlify handle them for me, I wanted to keep all the configurations in one place. Doing this is just a matter of adding an entry for the domain name (of course it needs to be in ASCII as Unicode characters also aren't allowed here either):
https://xn--12c7bd9bq4dxa.xn--o3cw4h/* https://vatthikorn.com/:splat 301!
So now sending a curl to วัทธิกร.ไทย
correctly returns a 301 response code with the location header pointing to the main domain name with the same path:
curl -I https://xn--12c7bd9bq4dxa.xn--o3cw4h/wwdc-2021-wish-list
HTTP/2 301
cache-control: public, max-age=0, must-revalidate
content-length: 64
content-type: text/plain; charset=utf-8
date: Thu, 02 Dec 2021 19:48:08 GMT
strict-transport-security: max-age=31536000
server: Netlify
location: https://vatthikorn.com/wwdc-2021-wish-list
x-nf-request-id: 01FNYB1CN3F46QVZ4B5RS7T51Q
age: 6
Some Finishing Touches
To finish this off, I figured why not add a little more fun to this by also creating a special URL for my About page all in Thai: วัทธิกร.ไทย/เกี่ยวกับ
. The goal is to have this redirect to vatthikorn.com/about
. This was easy enough to do using the same _redirects
file, though it requires those Unicode characters in the path to be URL-encoded first.
Now the final configurations for these redirects look like the following:
https://xn--12c7bd9bq4dxa.xn--o3cw4h/%E0%B9%80%E0%B8%81%E0%B8%B5%E0%B9%88%E0%B8%A2%E0%B8%A7%E0%B8%81%E0%B8%B1%E0%B8%9A https://vatthikorn.com/about 301!
https://xn--12c7bd9bq4dxa.xn--o3cw4h/* https://vatthikorn.com/:splat 301!
And that's all there is to it! You can now visit วัทธิกร.ไทย and it should take you to vatthikorn.com, and วัทธิกร.ไทย/เกี่ยวกับ to the About page.
Adoption and Support in the Wild
As I mentioned earlier, IDNs were first deployed back in 2003 so it's been around long enough to vote now. Thankfully, web browsers have had support for this since very early on. But what I wanted to know is if some of the popular social sites allow my newly-configured Thai domain to be added on my profile.
Instagram isn't having any of it. (But hey, you can put Unicode characters in the bio. So yay for emoji, right?)
While Twitter isn't even trying and just throws up this badly-formatted error message.
LinkedIn accepts it but converts it to the ASCII form for you, which is just lovely to look at.
But GitHub and Letterboxd work very nicely on the web (though both of their mobile apps won't display it).
All in all, if there's one thing I took away from this exercise is that when developing software, we should really consider diversity and inclusion aspects from the beginning and not just take the path of least resistance and only support what we're familiar with. At the very least, making sure our apps are localized and have proper accessibility support should be on top of that list. For IDNs, the fact that it was an afterthought made adding support for it just a hack that is neither ideal nor elegant. The internet — with its great promise of allowing everyone equal access to information — should have been designed to work for everyone, not just those who speak English.
Thanks to Indira for proofreading and helping improve this post.