I'm Ayesh Karunaratne, a freelance PHP/Drupal Web Developer and
a solo traveler.
I develop back-ends of awesome Drupal websites, built with performance, security and best practices in mind. Being a freelancer, I have plenty of time to travel. Currently living in Kandy, Sri Lanka, but the actual location may vary.
Fri, 2016-11-11 22:00
I went full Emoji today!
Remember those times that you had to transliterate "weird characters" to "proper characters", and run unnecessarily strict validations on URLs? Well, the world has changed, and you can use pretty much any character in URLs, and even in the domain names. Today, I did an experiment with an Emoji domain name, with an Emoji URL. Domain registrations, DNS, SSL certificates, URLs, they all must support and should play well for this to work.
Emojis are only a subset of characters that you can use. From here, I will be focusing on all of vast of Unicode characters, most importantly the multi-byte characters.
We didn't actually fix our DNS to accept the entire Unicode character set. Instead, we convert the Unicode characters to ASCII back and forth, and the browsers at the end will display the appropriate characters. This is a key point, because without a lot of modifications, most of the software we already use can work with these characters.
For example, the Emoji part of the URL above translates to xn--qei5ga86518bxdc. You can't register a regular domain name like this, because 2 continuous dash signs are not allowed in a regular domain name. You can see how they are encoded from here, and here is a great online converter for that.
Top Level Domains
Among the popular .com, .net, .org, .me, and hundreds of other top level domains, there are over 50 TLDs that do not follow the a-z pattern. Every single of them are assigned for different kinds of languages, mostly from India, and 2 from Sri Lanka. Some of them are not in use, but technically, it is possible to nowadays to have domains like
ayesh.ලංකා. You will have to contact the relevant authority to register a domain with these TLDs.
Many of the regular A-Z TLD registrars allow you to register domain names that contain Unicode characters. Check if your registrar supports them. If they don't, simply shop around.
.ws TLD is particularly popular for them. You'll be late to the party because Coca-Cola did a campaign for their happiness campaign, Norwegian Airlines, and there even exists a URL shortener with only Emojis.
When registering the domain name, you will have to use the Punycode. As of now, Name.com and NameCheap.com support these domains among the possible several others.
DNS for IDNs
Configuring DNS for the domain was surprisingly easy. Check with your DNS host to confirm if you can DNS records with the Punycode. Since you use the Punycode, it should not be a problem for them. Network tools you use, such as
dig, etc should work without any issues either.
In my case, I made a CNAME record to the base domain (ayeshious.com is the domain I use for all the experiments).
Configure web server for IDNs
With your DNS server now responding to your IDN queries, it is time we configure the web server to properly handle them.
For both Apache and Nginx (I believe for other servers too), you simply use the Punycode in place of otherwise normal domains. Just make sure your server software is latest.
server_name ayeshious.com xn--qei5ga86518bxdc.ayeshious.com;
HTTPS for the IDNs
You and I can't be friends if you don't understand how important HTTPS today is, and you have no excuses to not use HTTPS. Getting a TLS certificate works pretty much the same, except that you use the Punycode. Enter the Punycode when you generate your Certificate Signing Request.
Let's Encrypt's official client and many other clients (My favorite is kelunik/acme-client BTW) support IDNs as well. In fact, the certificate I'm using in the above IDN subdomain is from Lets Encrypt.
As for EV certificates, I suppose CAs will accept your requests as long as your IDN reflects an actual organization that you are part of. Emoji domains will have hard time getting it passed though.
If you are using any frameworks, such as Wordpress 🤔 or Drupal 😍, check with your framework for support. Two things that you should look for:
1. Database support.
This basically boils down to use an appropriate character set and have collation. As for MySQL, use charset
utf8mb4 and collation
utf8mb4_general_ci. You will need to convert any existing columns to the new collation if they are different. Obviously, make sure you backup everything before going ahead.
2. Proper encoding and validating
You will need to make changes in your validation functionality if you have any regular expressions to validate URLs, path aliases, etc. Also, when you print the data back, make sure you sanitize user input properly.
IDN handling at browsers
All browsers tend to support IDN domains quite well. However, I could not get Internet Explorer to show the Emoji image in URLs. Firefox and Chrome worked very well.
You will not see the Emoji icons in the domain name part of the URL as a security precaution. Without this protection, one could use certain characters to imitate a padlock for example, or otherwise go rogue. URL portion should not be a problem, but browsers do have a whitelist to only allow characters that look innocent.
Search engine considerations
I am not sure if search engines use the meanings of Emojis to weigh the relevant of a URL, but they don't have any problem indexing IDN URLs for sure. You can even see the Emoji icons in SERPs (example).
With all that said, I do not see a lot of advantages of using Emoji URLs, other than to make them look funny and memorable. Practical considerations can beat them in most cases, but Emoji URLs (and entire set of IDN features) are a great option for marketing campaigns, memorable URLs, and to publish blog post so you sound like a smart-ass.