| # tldts - Blazing Fast URL Parsing | |
| `tldts` is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs. | |
| **Features**: | |
| 1. Tuned for **performance** (order of 0.1 to 1 μs per input) | |
| 2. Handles both URLs and hostnames | |
| 3. Full Unicode/IDNA support | |
| 4. Support parsing email addresses | |
| 5. Detect IPv4 and IPv6 addresses | |
| 6. Continuously updated version of the public suffix list | |
| 7. **TypeScript**, ships with `umd`, `esm`, `cjs` bundles and _type definitions_ | |
| 8. Small bundles and small memory footprint | |
| 9. Battle tested: full test coverage and production use | |
| # Install | |
| ```bash | |
| npm install --save tldts | |
| ``` | |
| # Usage | |
| Using the command-line interface: | |
| ```js | |
| $ npx tldts 'http://www.writethedocs.org/conf/eu/2017/' | |
| { | |
| "domain": "writethedocs.org", | |
| "domainWithoutSuffix": "writethedocs", | |
| "hostname": "www.writethedocs.org", | |
| "isIcann": true, | |
| "isIp": false, | |
| "isPrivate": false, | |
| "publicSuffix": "org", | |
| "subdomain": "www" | |
| } | |
| ``` | |
| Programmatically: | |
| ```js | |
| const { parse } = require('tldts'); | |
| // Retrieving hostname related informations of a given URL | |
| parse('http://www.writethedocs.org/conf/eu/2017/'); | |
| // { domain: 'writethedocs.org', | |
| // domainWithoutSuffix: 'writethedocs', | |
| // hostname: 'www.writethedocs.org', | |
| // isIcann: true, | |
| // isIp: false, | |
| // isPrivate: false, | |
| // publicSuffix: 'org', | |
| // subdomain: 'www' } | |
| ``` | |
| Modern _ES6 modules import_ is also supported: | |
| ```js | |
| import { parse } from 'tldts'; | |
| ``` | |
| Alternatively, you can try it _directly in your browser_ here: https://npm.runkit.com/tldts | |
| # API | |
| - `tldts.parse(url | hostname, options)` | |
| - `tldts.getHostname(url | hostname, options)` | |
| - `tldts.getDomain(url | hostname, options)` | |
| - `tldts.getPublicSuffix(url | hostname, options)` | |
| - `tldts.getSubdomain(url, | hostname, options)` | |
| - `tldts.getDomainWithoutSuffix(url | hostname, options)` | |
| The behavior of `tldts` can be customized using an `options` argument for all | |
| the functions exposed as part of the public API. This is useful to both change | |
| the behavior of the library as well as fine-tune the performance depending on | |
| your inputs. | |
| ```js | |
| { | |
| // Use suffixes from ICANN section (default: true) | |
| allowIcannDomains: boolean; | |
| // Use suffixes from Private section (default: false) | |
| allowPrivateDomains: boolean; | |
| // Extract and validate hostname (default: true) | |
| // When set to `false`, inputs will be considered valid hostnames. | |
| extractHostname: boolean; | |
| // Validate hostnames after parsing (default: true) | |
| // If a hostname is not valid, not further processing is performed. When set | |
| // to `false`, inputs to the library will be considered valid and parsing will | |
| // proceed regardless. | |
| validateHostname: boolean; | |
| // Perform IP address detection (default: true). | |
| detectIp: boolean; | |
| // Assume that both URLs and hostnames can be given as input (default: true) | |
| // If set to `false` we assume only URLs will be given as input, which | |
| // speed-ups processing. | |
| mixedInputs: boolean; | |
| // Specifies extra valid suffixes (default: null) | |
| validHosts: string[] | null; | |
| } | |
| ``` | |
| The `parse` method returns handy **properties about a URL or a hostname**. | |
| ```js | |
| const tldts = require('tldts'); | |
| tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv'); | |
| // { domain: 'amazonaws.com', | |
| // domainWithoutSuffix: 'amazonaws', | |
| // hostname: 'spark-public.s3.amazonaws.com', | |
| // isIcann: true, | |
| // isIp: false, | |
| // isPrivate: false, | |
| // publicSuffix: 'com', | |
| // subdomain: 'spark-public.s3' } | |
| tldts.parse( | |
| 'https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv', | |
| { allowPrivateDomains: true }, | |
| ); | |
| // { domain: 'spark-public.s3.amazonaws.com', | |
| // domainWithoutSuffix: 'spark-public', | |
| // hostname: 'spark-public.s3.amazonaws.com', | |
| // isIcann: false, | |
| // isIp: false, | |
| // isPrivate: true, | |
| // publicSuffix: 's3.amazonaws.com', | |
| // subdomain: '' } | |
| tldts.parse('gopher://domain.unknown/'); | |
| // { domain: 'domain.unknown', | |
| // domainWithoutSuffix: 'domain', | |
| // hostname: 'domain.unknown', | |
| // isIcann: false, | |
| // isIp: false, | |
| // isPrivate: true, | |
| // publicSuffix: 'unknown', | |
| // subdomain: '' } | |
| tldts.parse('https://192.168.0.0'); // IPv4 | |
| // { domain: null, | |
| // domainWithoutSuffix: null, | |
| // hostname: '192.168.0.0', | |
| // isIcann: null, | |
| // isIp: true, | |
| // isPrivate: null, | |
| // publicSuffix: null, | |
| // subdomain: null } | |
| tldts.parse('https://[::1]'); // IPv6 | |
| // { domain: null, | |
| // domainWithoutSuffix: null, | |
| // hostname: '::1', | |
| // isIcann: null, | |
| // isIp: true, | |
| // isPrivate: null, | |
| // publicSuffix: null, | |
| // subdomain: null } | |
| tldts.parse('tldts@emailprovider.co.uk'); // email | |
| // { domain: 'emailprovider.co.uk', | |
| // domainWithoutSuffix: 'emailprovider', | |
| // hostname: 'emailprovider.co.uk', | |
| // isIcann: true, | |
| // isIp: false, | |
| // isPrivate: false, | |
| // publicSuffix: 'co.uk', | |
| // subdomain: '' } | |
| ``` | |
| | Property Name | Type | Description | | |
| | :-------------------- | :----- | :---------------------------------------------- | | |
| | `hostname` | `str` | `hostname` of the input extracted automatically | | |
| | `domain` | `str` | Domain (tld + sld) | | |
| | `domainWithoutSuffix` | `str` | Domain without public suffix | | |
| | `subdomain` | `str` | Sub domain (what comes after `domain`) | | |
| | `publicSuffix` | `str` | Public Suffix (tld) of `hostname` | | |
| | `isIcann` | `bool` | Does TLD come from ICANN part of the list | | |
| | `isPrivate` | `bool` | Does TLD come from Private part of the list | | |
| | `isIP` | `bool` | Is `hostname` an IP address? | | |
| ## Single purpose methods | |
| These methods are shorthands if you want to retrieve only a single value (and | |
| will perform better than `parse` because less work will be needed). | |
| ### getHostname(url | hostname, options?) | |
| Returns the hostname from a given string. | |
| ```javascript | |
| const { getHostname } = require('tldts'); | |
| getHostname('google.com'); // returns `google.com` | |
| getHostname('fr.google.com'); // returns `fr.google.com` | |
| getHostname('fr.google.google'); // returns `fr.google.google` | |
| getHostname('foo.google.co.uk'); // returns `foo.google.co.uk` | |
| getHostname('t.co'); // returns `t.co` | |
| getHostname('fr.t.co'); // returns `fr.t.co` | |
| getHostname( | |
| 'https://user:password@example.co.uk:8080/some/path?and&query#hash', | |
| ); // returns `example.co.uk` | |
| ``` | |
| ### getDomain(url | hostname, options?) | |
| Returns the fully qualified domain from a given string. | |
| ```javascript | |
| const { getDomain } = require('tldts'); | |
| getDomain('google.com'); // returns `google.com` | |
| getDomain('fr.google.com'); // returns `google.com` | |
| getDomain('fr.google.google'); // returns `google.google` | |
| getDomain('foo.google.co.uk'); // returns `google.co.uk` | |
| getDomain('t.co'); // returns `t.co` | |
| getDomain('fr.t.co'); // returns `t.co` | |
| getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash'); // returns `example.co.uk` | |
| ``` | |
| ### getDomainWithoutSuffix(url | hostname, options?) | |
| Returns the domain (as returned by `getDomain(...)`) without the public suffix part. | |
| ```javascript | |
| const { getDomainWithoutSuffix } = require('tldts'); | |
| getDomainWithoutSuffix('google.com'); // returns `google` | |
| getDomainWithoutSuffix('fr.google.com'); // returns `google` | |
| getDomainWithoutSuffix('fr.google.google'); // returns `google` | |
| getDomainWithoutSuffix('foo.google.co.uk'); // returns `google` | |
| getDomainWithoutSuffix('t.co'); // returns `t` | |
| getDomainWithoutSuffix('fr.t.co'); // returns `t` | |
| getDomainWithoutSuffix( | |
| 'https://user:password@example.co.uk:8080/some/path?and&query#hash', | |
| ); // returns `example` | |
| ``` | |
| ### getSubdomain(url | hostname, options?) | |
| Returns the complete subdomain for a given string. | |
| ```javascript | |
| const { getSubdomain } = require('tldts'); | |
| getSubdomain('google.com'); // returns `` | |
| getSubdomain('fr.google.com'); // returns `fr` | |
| getSubdomain('google.co.uk'); // returns `` | |
| getSubdomain('foo.google.co.uk'); // returns `foo` | |
| getSubdomain('moar.foo.google.co.uk'); // returns `moar.foo` | |
| getSubdomain('t.co'); // returns `` | |
| getSubdomain('fr.t.co'); // returns `fr` | |
| getSubdomain( | |
| 'https://user:password@secure.example.co.uk:443/some/path?and&query#hash', | |
| ); // returns `secure` | |
| ``` | |
| ### getPublicSuffix(url | hostname, options?) | |
| Returns the [public suffix][] for a given string. | |
| ```javascript | |
| const { getPublicSuffix } = require('tldts'); | |
| getPublicSuffix('google.com'); // returns `com` | |
| getPublicSuffix('fr.google.com'); // returns `com` | |
| getPublicSuffix('google.co.uk'); // returns `co.uk` | |
| getPublicSuffix('s3.amazonaws.com'); // returns `com` | |
| getPublicSuffix('s3.amazonaws.com', { allowPrivateDomains: true }); // returns `s3.amazonaws.com` | |
| getPublicSuffix('tld.is.unknown'); // returns `unknown` | |
| ``` | |
| # Troubleshooting | |
| ## Retrieving subdomain of `localhost` and custom hostnames | |
| `tldts` methods `getDomain` and `getSubdomain` are designed to **work only with _known and valid_ TLDs**. | |
| This way, you can trust what a domain is. | |
| `localhost` is a valid hostname but not a TLD. You can pass additional options to each method exposed by `tldts`: | |
| ```js | |
| const tldts = require('tldts'); | |
| tldts.getDomain('localhost'); // returns null | |
| tldts.getSubdomain('vhost.localhost'); // returns null | |
| tldts.getDomain('localhost', { validHosts: ['localhost'] }); // returns 'localhost' | |
| tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] }); // returns 'vhost' | |
| ``` | |
| ## Updating the TLDs List | |
| `tldts` made the opinionated choice of shipping with a list of suffixes directly | |
| in its bundle. There is currently no mechanism to update the lists yourself, but | |
| we make sure that the version shipped is always up-to-date. | |
| If you keep `tldts` updated, the lists should be up-to-date as well! | |
| # Performance | |
| `tldts` is the _fastest JavaScript library_ available for parsing hostnames. It is able to parse _millions of inputs per second_ (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with `{ extractHostname: false }`). | |
| Please see [this detailed comparison](./comparison/comparison.md) with other available libraries. | |
| ## Contributors | |
| `tldts` is based upon the excellent `tld.js` library and would not exist without | |
| the many contributors who worked on the project: | |
| <a href="graphs/contributors"><img src="https://opencollective.com/tldjs/contributors.svg?width=890" /></a> | |
| This project would not be possible without the amazing Mozilla's | |
| [public suffix list][]. Thank you for your hard work! | |
| # License | |
| [MIT License](LICENSE). | |
| [badge-ci]: https://secure.travis-ci.org/remusao/tldts.svg?branch=master | |
| [badge-downloads]: https://img.shields.io/npm/dm/tldts.svg | |
| [public suffix list]: https://publicsuffix.org/list/ | |
| [list the recent changes]: https://github.com/publicsuffix/list/commits/master | |
| [changes Atom Feed]: https://github.com/publicsuffix/list/commits/master.atom | |
| [public suffix]: https://publicsuffix.org/learn/ | |