Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]

TLDs -- Putting the '.fun' in the top of the DNS

August 12th, 2021

The Domain Name System or DNS is a never-ending source of amusement and amazement. If you have been dealing with just about anything related to operations on the internet, you know that it's always the DNS in the end, what with its almost 100 different resource records and, uhm, shall we say, "interesting" security threat model.

But today, let's talk about Top-Level Domains, or TLDs. You know, .com, .org, .net, .gov, .vermögensberatung and .香港 - those guys. As you know, the entire domain name space consists of a tree of domain names; the (common) root of the DNS tree is . (dot), and the tree sub-divides into zones consisting of domains and sub-domains:

A tree graph of the
DNS space, showing the common TLDs and select
sub-domains.

Ok, so far, so good. With RFC920, we got the initial set of top level domains:

.gov Government, any government related domains meeting the second level requirements.
.edu Education, any education related domains meeting the second level requirements.
.com Commercial, any commercial related domains meeting the second level requirements.
.mil Military, any military related domains meeting the second level requirements.
.org Organization, any other domains meeting the second level requirements.
.net Initially intended for network organizations; not mentioned in RFC920, but created in 1985

 

Oh, and:

.arpa

.arpa Temporary; The current ARPA-Internet hosts.


That's right: .arpa was supposed to be temporary:

"After a short period of initial experimentation, all current ARPA-Internet hosts will select some domain other than ARPA for their future use. The use of ARPA as a top level domain will eventually cease." -- RFC920

Yeah, well, we all know how temporary temporary solutions are. And so today, we continue to use .arpa for e.g., reverse mapping of IP addresses to names via the .in-addr.arpa and .ip6.arpa second-level domains. But .arpa is used for a lot more: as112.arpa (RFC7535, effectively RFC1918 reverse resolution; see also https://www.as112.net/), e164.arpa (RFC6116 / NAPTR records), home.arpa (RFC8375, non-unique use in residential home network), in-addr-servers.arpa and ip6-servers.arpa (RFC5855, name servers for the in-addr.arpa and ip6.arpa domains), ipv4only.arpa (RFC7050, detecting DNS64 and IPv6 Prefixes), iris.arpa (RFC4698, for locating Internet Registry Information Services), as well as uri.arpa and urn.arpa (RFC3405 for resolving Uniform Resource Identifiers / NAPTR).

Note: the arpa zone is served from all root servers except the J Root, which, per RFC2870, should not "provide secondary service for any zones other than the root and root-servers.net zones". Noted on dns-operations@dns-oarc.net.

ccTLDs

In addition to these original TLDs, we also got the country code top-level domains, or ccTLDs:

The English two letter code (alpha-2) identifying a country according the the ISO Standard for "Codes for the Representation of Names of Countries". -- RFC920

And this is where the fun begins, because of course you are always operating on Layer 9, and this list is necessarily somewhat fluid, as countries change, are born, divided, or cease to exist:

The OSI Stack with
the 'financial' and 'political' layers added and an
arrow pointing to the top: 'you are here'.
  • .ss, the ccTLD for South Sudan was allocated in August 2011, but not added to the root zone until February 2019, with general availability of names in that domain only starting in September 2020.
  • .ge, the ccTLD for Georgia (the country, not the US state) uses the ISO-3166-1 Alpha 2 country code that previously used to represent the Gilbert and Ellice Islands; the Ellice Islands became Tuvalu, which, per country code designation, got the rather valuable .tv ccTLD.

    This TLD became so valuable that at some point 10% of the country's revenue came from royalties of .tv domains; Tuvalu used the money for the marketing rights to allow them to pay the membership dues for United Nations when they joined the UN in 2000!
  • .eh has been reserved (but not been assigned yet) as the ccTLD for the disputed "Western Sahara" territory; in 2013, on April 1st, CIRA, the Canadian Internet Registration Authority (responsible for .ca), announced that it would offer .eh names, because, you know, Canadians would like that, eh?
  • Some ccTLDs represent countries that other countries don't acknowledge as existing. .ps is the ccTLD for Palestine (not a sponsored TLD for the (Turing-complete!) PostScript programming language), recognized by 138 of the 193 UN members; .tw is assigned for the Republic of China, aka Taiwan, which a mere 14 countries recognize.
  • Not all ccTLDs represent actual countries: Hong Kong has a ccTLD, .hk as a special administrative region of China (much like .mo for Macao); .uk represents the entire United Kingdom, while the e.g., scarcely used .gb is assigned to Great Britain; England doesn't even get a ccTLD, and neither does Northern Ireland!

    Similarly, .eu counts as a ccTLD, representing, obviously, not a single country. But due to Brexit, British citizens who had registered .eu domains had their domains suspended on January 1st, 2021, requiring proof of European Economic Area (EEA) citizenship to avoid them being deleted in March 2021.
  • When a country ceases to exist, its ccTLD is normally retired:.cs (Czechoslovakia) became .cz (Czech Republic) and .sk (Slovakia); .dd (East Germany) disappeared after the reunification of Germany; .yu (Yugoslavia) became .si (Slovenia), .hr (Croatia), and Serbia and Montenegro, which had had .cs assigned (but never used that, instead continuing to use .yu) before they split into .rs (Serbia) and .me (Montenegro); .zr (Zaire) became .cd (Democratic Republic of the Congo).

    However, .su, the ccTLD for the former Soviet Union, assigned a mere 15 months before that Union was dissolved back in 1990, still remains in active use.

ccTLD Domain Hacks and Governance

With ccTLDs having appealing two-letter names (gTLDs are a minimum of three characters), they lend themselves to so-called "domain hacks" to create words, to shorten URLs, or as a convenient way to jump on a popular trend, and many people began registering names in other countries' ccTLDs:

  • .ag, the ccTLD for Antigua and Barbuda is often used in German speaking countries, where "AG" is an abbreviation of "Aktiengesellschaft" (a private limited / joint stock company), and use of .ag names for other entities may even carry legal risks.
  • .ai, the ccTLD for Anguilla, is used for extra leet effect in artificial intelligence marketing. .ai also is notable in that as a TLD it nevertheless has both an A and MX record, meaning you could have a functional email address like hal@ai. (Email addresses are difficult to validate, it turns out.)
  • .am (Armenia) is used by e.g., instagr.am
  • .at (Austria) is used for things like e.g., donteat.at
  • .be is used by e.g., Google to shorten youtu.be links.
  • .by (Belarus) is frequently used for sites relating to the German state of Bavaria (Bayern)
  • .cm (Cameroon) and .co (Colombia) are frequently used by typo-squatters to catch traffic from people fat-fingering ".com".
  • .cx was assigned to the Christmas Island, and appears currently to be defunct, but it did have the significant glory of once having been the home of goatse.cx (Wikipedia).
  • .im (Isle of Man) is used for various instant messaging domain hacks.
  • .io, assigned to the British Indian Ocean Territory is almost exclusively used by annoying startups for content completely unrelated to the islands.
  • .la (Laos) is commonly used for Louisianna or Los Angeles related domains as well as random domain hacks, like e.g., Mozilla's link shortener mzl.la or Tesla's ts.la
  • .me (Montenegro, which up until 2007 had been using cg.yu) became one of the most popular TLDs and is used for link shorteners like Facebook's fb.me, Google's g.me or GoDaddy's go.me.

    Yahoo used to use me.me for its "Yahoo! Meme microblogging site"; after it shut that service, it returned the domain to the registry, and it's now, what else, a Meme search engine.
  • .ms (Montserrat) is, of course, used by Microsoft, sites in the US state of Mississippi, and by e.g., the New York Times for its nyti.ms link shortener.
  • Python nerds on the internet register names in Paraguy's ccTLD (.py), Rust nerds in Serbia's .rs.
  • The editor wars have been decided at the TLD level: .vi exists (U.S. Virgin Islands), but .emacs does not (emacs.vi, however, does).

Now one noteworthy aspect here is that since the ccTLDs are administered by the given country, they may be subject to (and enforce) different requirements. Some domains can only be registered by entities residing within the given country, others, like the .cat domain sponsored by the dotCAT foundation to promote the Catalan language, may stipulate the language or content of the domains.

Lybia, with the ever so popular .ly ccTLD did in 2010 shut down Violet Blue's vb.ly domain, objecting to the content. In a similar manner, Colombia could choose to break just about all of Twitter (which uses the t.co domain name to wrap every single link on its platform); Greenland could shut down Google's goo.gl links.

Generic TLDs (gTLDs)

In addition to the original TLDs and the ccTLDs, in the late 1980s InterNIC added .nato, but that was later replaced by .nato.int, with the new .int TLD being added in 1988 for intergovernmental organizations.

In 2000, ICANN, who had by then taken over the administration of domain names, added seven more TLDs: .aero, .biz, .coop, .info, .museum, .name, and .pro. It then began soliciting proposals for "sponsored top-level domains" (sTLDs), but only received a handful of proposals, ultimately adding .asia, .cat, .jobs, .mobi, .post, .tel, .travel, and .xxx.

Sponsored TLDs being somewhat restricted in scope and use, ICANN then went for another round of accepting proposals for new, generic TLDs (gTLDs), this time with a price tag of $185,000 per TLD. In 2012, it processed 1,930 applications: 101 from Google (under the name Charleston Road Registry Inc. (see also), including .lol, .google, .dog, and .foo (of those, .lol is the only one not assigned), 76 from Amazon, 11 from Microsoft and 307 from the "Donuts" domain name registry.

The list of ultimately approved domains included a number of geographic TLDs (geoTLDs), adding domains for certain cities (e.g., .berlin, .london, .nyc, .paris, or .tokyo), countries that previously did not have a ccTLD (e.g., .cymru, .scot, and .wales, although England still doesn't get its own TLD, while e.g., New Zealand (.nz) now got a second: .kiwi), and broader geographic regions (e.g., .africa or .lat).

But of course people went a bit nuts, too: many brands applied for .<brand> and got into various arguments over who should own the given TLD. For example, Amazon applied for (and was given) .amazon over the objection of several nations of, well, the Amazon; and multiple applications for entirely generic terms had to be sorted out.

One of those was the .secure domain, which had been proposed by one Alex Stamos of (then) Artemis Internet as a TLD that would enforce certain minimum security requirements; ultimately, .secure was assigned to Amazon.

Eventually, ICANN added 1239 new TLDs to the DNS, bestowing upon us such important TLDs as e.g., .beer, .cloud, .dot, .duck, .foo, .google, .rocks and .sucks, .travelersinsurance, and .yahoo.

But of course some TLDs then go under again: .wed, for example, was delegated, but the company that had applied for this name apparently didn't pay up, and ICANN terminated the registry agreement. However, the TLD remains in the root; it appears to now be operated by ICANN EBERO and some names remain in use (e.g., get.wed, albeit with an invalid certificate).

Internationalized TLDs

Even before the landrush for the new gTLDs, ICANN approved the introduction of internationalized domain name (IDN) TLDs, and many ccTLDs added TLDs using their respective languages and alphabets (including right-to-left!), represented within the DNS using Punycode.

DNS name IDN ccTLD Country/Region Language Other ccTLD
xn--lgbbat1ad8j .الجزائر Algeria Arabic .dz
xn--fiqs8s .中国 China Chinese (Simplified) .cn
xn--qxa6a .ευ European Union Greek .eu
xn--4dbrk0ce .ישראל Israel Hebrew .il
xn--o3cw4h .ไทย Thailand Thai .th

(See Wikipedia's full table for all IDN ccTLDs.)

But IDNs are not only for ccTLDs: many of the new gTLDs also include various Unicode characters, such as e.g., .сайт ("website"), .大众汽车 ("volkswagen"), .ファッション ("fashion"), ابوظبي‎. ("Abu Dhabi"), and, of course, .vermögensberatung ("wealth management / advice").

Note that with IDNs, you can mix an IDN second-level with a non-IDN top-level or vice versa. Due to the resulting IDN Homograph Attack vector, browsers stopped rendering the IDNs and now always display them as Punycode.

Special Use Domains

In addition to all that, there is also a small number of so-called "special use domains", of which .arpa (already discussed above) is just one. These are:

  • .example -- intended for use in documentation, tutorials, and testing; defined, together with example.com, example.net, and example.org in RFC6761.
  • .invalid and .test -- for testing and documentation, originally defined in RFC2606.
  • .local -- usually used for zero-configuration networking (RFC6762).
  • .localhost -- reserved since traditionally .localhost existed in e.g., /etc/hosts for the loopback address (RFC2606). Note: .localdomain is not reserved, and use of localhost.localdomain can lead to unexpected results if your stub resolver expands this.
  • .onion -- used by Tor (.onion service address) and defined in RFC7686. Note: this "TLD" is not entered into the DNS, but following work by Jim McCoy and Alec Muffett leading to CA/B Forum Ballot 144, you can get a valid x509 certificate from public CAs. (For a while, Tor also used to use the .exit pseudo TLD; this is no longer supported.)
  • Not all networks may use the standard DNS root (ICANN is not pleased), and so of course all bets are off if you are on a network using an alternative DNS root. Some TLDs in such networks include(d) .bitnet, .csnet, .oz (from ACSnet, now moved into .oz.au), .uucp (if you remember that), and .i2p (the aptly named "Invisible Internet Project").
  • Some TLDs are effectively split-horizon, only exposing some parts to the public internet. .kp, the ccTLD assigned for North Korea serves the North Korea internal-only Kwangmyong network.
  • China uses the .chn domain internally for its Internet of Things. This domain relies on the use of an alternate DNS root as well, and is not found in the common root.

TLD Zone files

The DNS is an inherently public system (modulo alternate root shenenigans or split-horizon games). The root zone itself continues to be available for download via FTP or HTTPS and so we can easily extract the full count of all TLDs:

$ curl https://www.internic.net/domain/root.zone |
    awk '{if ($4 == "NS") { print $1;}}' | sort -u | wc -l
1499

Processing the simple zone file, we find that most TLDs are two- (248) or three- (222) letter TLDs; that there are 154 IDN TLDs; that there are TLDs starting with every letter of the alphabet ('s' being the most popular one); that the longest TLD is vermögensberatung (24 characters in punycode: xn--vermgensberatung-pwb).

But what about all the individual TLD zone files? Since that data is also public in nature, we should be able to get and process it as well. And for the ICANN assigned new gTLDs, this is indeed the case: ICANN offers the Centralized Zone Data Service, where you can apply to gain access to all gTLD zone files. For some domains the access is granted almost instantly, for others it takes a few days.

Now for the ccTLDs, however, there unfortunately is no equivalent service, although there's a (rather short) list of ccTLD zone sources here; some registries let you AXFR the domain (e.g., .ee, .ch and .li, .se and .nu), some provide a list of names (e.g., sk or .gov), but otherwise it's up to you to contact the registry in question and plead your case. Yes, for each of the over 300 domains -- good luck!

Given how difficult it is to get to all the public data, it's then no surprise that several businesses are making good money by selling you that access or by providing TLD reports.

Some stats

After having requested access to all gTLD zone files and having received most of them (several are still pending), I looked around a bit, seeking entertaining stats. One thing to note is that a large number of zones (230) do not have any names defined (other than, say, a NIC NS record) -- TLDs registered purely as a brand or placeholder, I suspect. Over 360 zones have fewer than 10 records, over 470 fewer than 100.

Zones that are actually used include the expected variety of silly names, including very long domain names:

accountantaccountantaccountantaccountantaccountantaccountant.accountant
artartartartartartartartartartartartartartartartartartartartart.art
yoyoyodogillbestraightwithyouicanttellifthatsatattoooranartisti.art
barbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbarbar.bar
clickclickclickclickclickclickclickclickclickclickclickclick.click
ahndung-von-verkehrsordnungswidrigkeiten-mit-unfallfolge.cologne.
0-------------------------------------------------------------0.com.
thelongestdomainnameintheworldliterallynobodycangetalongeronexd.community
you-know-you-are-pretty-gosh-darned-cute-do-you-wanna-go-on-a.date.
lololololololololololololololololololololololololololololololol.fun.
gayfriendlyconvenientaffordabletrendyhairsalonsindowntowntoront.mobi
wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww.org
partypartypartypartypartypartypartypartypartypartypartyparty.party
runrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrunrun.run
thehighestthemostvaluableandthemostexpensivedomainnameofalltime.top
this-crazy-url-is-definitely-one-of-the-longest-adresses-in-the.world.
rindfleischetikettierungsuberwachungsaufgabenubertragungsgesetz.xyz
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xyz.
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.zone

...and so on and so on. Per RFC1035, the maximum size of a DNS label is 63 octets (note: octets, not characters, which is why the maximmum length of a domain name is 253 characters), which explains why there are no longer second-level domains, although it doesn't explain why people insist on registering over 1700 such names.

Of the 987 zones I looked at, the top ten zones based on number of domains were:

Rank TLD # of domains
1 .com 155,883,253
2 .net 13,291,304
3 .org 10,424,321
4 .info 3,859,083
5 .xyz 3,128,897
6 .online 1,811,807
7 .top 1,200,953
8 .site 1,067,408
9 .shop 907,239
10 .app 722,140

(Note that not all TLDs are treated the same across the internet. Despite being rather popular, the .xyz domain appears to have a poor score in many automated domain reputation systems, which may lead to all sorts of unexpected problems for your business.)

For my own entertainment, I wrote a shabby little perl script to run over a zone file and produce some additional numbers:

$ gzcat net.txt.gz | perl -T zonestats.pl

Total number of records: 34658946
Total number of names: 13291304

Total number of different record types: 7
ns: 32819414
rrsig: 759035
ds: 414744
nsec3: 379518
a: 270671
aaaa: 15543
soa: 1

Top ten name lengths:
9: 2839977
10: 2836099
8: 2783467
11: 2648213
7: 2496883
12: 2404541
6: 2205730
13: 2159827
14: 1886160
15: 1585087

Longest name: 000000000000000000000000000000000000000000000000000000000000001.net. (63)
There are 134 names with 63 chars in this domain.

Total number of unique name servers: 689703
The three most popular name servers found in this zone are:
dns1.registrar-servers.com.: 298617
dns2.registrar-servers.com.: 298352
jm2.dns.com.: 239693

The most popular domains in which the nameservers are:
domaincontrol.com: 6200836
googledomains.com: 1485364
dns.com: 908420

This domain contains names including the following dirty words:
shit: 8732
fuck: 8057
tits: 2351
piss: 844
cunt: 575
motherfucker: 86
cocksucker: 16
$ 

The "seven dirty words" domains are of course full of mismatches, but it looks like most zones contain more or less the same percentage of dirty domain names: somewhere between 0.006% and 0.008% of the total; .xxx predictably ranks a bit higher here, but not all that much at only 0.1% of all names.

Public Suffix List

Now all of the above is good fun, but why would you want to know whether a given string is a TLD? Wouldn't it be trivially the right-most label of the fully-qualified domain name (FQDN)?

Strictly speaking: yes. However, consider that many TLDs are not generic in nature, meaning people cannot simply register any name under the given TLD. ccTLDs, being managed by individual registries, each may have unique requirements and regulations, and it is a common practice for these registries to enforce a second-level domain hierarchy, replicating or mirroring to some degree the top-level hierarchy.

For example, and perhaps most widely known, the .uk TLD uses .ac.uk (for academic institutions), .co.uk (for commercial entities), .gov.uk, .net.uk, .org.uk, and so on. How many such second-level domains are reserved depends on each TLD; Brazil (.br), for example, has over 100.

XKCD Comic
2347 modified: A complex infrastructure resting on a
fragile pole labeled 'A flimsy txt file manually
maintained and copied into place from some random
location on the internet'. Now within the context of, for example, HTTP cookies or x509 TLS certificates, it's rather important that an entity cannot use a wildcard to match an entire TLD, but how does a browser know whether foo.example is a reserved second-level domain, or simply a normal domain registered by some entity? Should a website be able to set a cookie for foo.example? Should it be able to get a certificate for *.foo.example? There is no programmatic way to determine this.

To solve this problem, the good folks over at Mozilla started putting together a list of these TLDs and "effective TLDs", known as the Public Suffix List. That's right, it's another one of those manually compiled and maintained text files we like to build the internet infrastructure on!

This lists consists of over 9,000 prefixes, and is used by all of the popular browsers to restrict cookie scope as well as for various UI features.

Google uses similar heuristics based on a domain name's TLD to determine whether to offer users different language versions of their content and other geo-targeting. Within that context, Google treats some ccTLDs (such as e.g., .io, .me, .tv etc.) as if they were gTLDs rather than as indicators of geographic location.

Finally, the HSTS Preload list baked into browsers like Chrome and Firefox to enforce HTTP Strict Transport Security includes a number of TLDs and public prefixes:

$ curl -O https://publicsuffix.org/list/public_suffix_list.dat
$ curl -O https://hg.mozilla.org/mozilla-central/raw-file/tip/security/manager/ssl/nsSTSPreloadList.inc
$ grep -v '^/' public_suffix_list.dat | grep . | sed -e 's/$/\./' | sort > psl
$ sed -n -e 's/^\([^, ]*\), .*/\1\./p' nsSTSPreloadList.inc > hsts
$ comm -1 -2 hsts psl | wc -l
      73
$ 

That is, websites registered under any of these 73 prefixes, such as e.g., .app or .dev, will always use HTTPS when using the common, popular browsers that consume this list.

Summary

Well, there you go. Top-level domains are, it turns out, a lot more complicated than what we commonly think of. The internet being a truly global network of networks with varied jurisdictions being in control of parts of the whole continues to provide for curious challenges and -- as anybody working in tech knows -- you regularly run into weird scenarios that trace back to the DNS.

Sometimes all the way to the
toptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptoptop.top.

August 12th, 2021


See also:


Previous: [DuckDuckGo Onion Search for Firefox]  -- Next: [There is no 'printf'.]
[homepage]  [blog]  [jschauma@netmeister.org]  [@jschauma]  [RSS]