| text-encoding | |
| ============== | |
| This is a polyfill for the [Encoding Living | |
| Standard](https://encoding.spec.whatwg.org/) API for the Web, allowing | |
| encoding and decoding of textual data to and from Typed Array buffers | |
| for binary data in JavaScript. | |
| By default it adheres to the spec and does not support *encoding* to | |
| legacy encodings, only *decoding*. It is also implemented to match the | |
| specification's algorithms, rather than for performance. The intended | |
| use is within Web pages, so it has no dependency on server frameworks | |
| or particular module schemes. | |
| Basic examples and tests are included. | |
| ### Install ### | |
| There are a few ways you can get and use the `text-encoding` library. | |
| ### HTML Page Usage ### | |
| Clone the repo and include the files directly: | |
| ```html | |
| <!-- Required for non-UTF encodings --> | |
| <script src="encoding-indexes.js"></script> | |
| <script src="encoding.js"></script> | |
| ``` | |
| This is the only use case the developer cares about. If you want those | |
| fancy module and/or package manager things that are popular these days | |
| you should probably use a different library. | |
| #### Package Managers #### | |
| The package is published to **npm** and **bower** as `text-encoding`. | |
| Use through these is not really supported, since they aren't used by | |
| the developer of the library. Using `require()` in interesting ways | |
| probably breaks. Patches welcome, as long as they don't break the | |
| basic use of the files via `<script>`. | |
| ### API Overview ### | |
| Basic Usage | |
| ```js | |
| var uint8array = new TextEncoder().encode(string); | |
| var string = new TextDecoder(encoding).decode(uint8array); | |
| ``` | |
| Streaming Decode | |
| ```js | |
| var string = "", decoder = new TextDecoder(encoding), buffer; | |
| while (buffer = next_chunk()) { | |
| string += decoder.decode(buffer, {stream:true}); | |
| } | |
| string += decoder.decode(); // finish the stream | |
| ``` | |
| ### Encodings ### | |
| All encodings from the Encoding specification are supported: | |
| utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 | |
| iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14 | |
| iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 | |
| windows-1250 windows-1251 windows-1252 windows-1253 windows-1254 | |
| windows-1255 windows-1256 windows-1257 windows-1258 x-mac-cyrillic | |
| gb18030 hz-gb-2312 big5 euc-jp iso-2022-jp shift_jis euc-kr | |
| replacement utf-16be utf-16le x-user-defined | |
| (Some encodings may be supported under other names, e.g. ascii, | |
| iso-8859-1, etc. See [Encoding](https://encoding.spec.whatwg.org/) for | |
| additional labels for each encoding.) | |
| Encodings other than **utf-8**, **utf-16le** and **utf-16be** require | |
| an additional `encoding-indexes.js` file to be included. It is rather | |
| large (596kB uncompressed, 188kB gzipped); portions may be deleted if | |
| support for some encodings is not required. | |
| ### Non-Standard Behavior ### | |
| As required by the specification, only encoding to **utf-8** is | |
| supported. If you want to try it out, you can force a non-standard | |
| behavior by passing the `NONSTANDARD_allowLegacyEncoding` option to | |
| TextEncoder and a label. For example: | |
| ```js | |
| var uint8array = new TextEncoder( | |
| 'windows-1252', { NONSTANDARD_allowLegacyEncoding: true }).encode(text); | |
| ``` | |
| But note that the above won't work if you're using the polyfill in a | |
| browser that natively supports the TextEncoder API natively, since the | |
| polyfill won't be used! | |
| You can force the polyfill to be used by using this before the polyfill: | |
| ```html | |
| <script> | |
| window.TextEncoder = window.TextDecoder = null; | |
| </script> | |
| ``` | |
| To support the legacy encodings (which may be stateful), the | |
| TextEncoder `encode()` method accepts an optional dictionary and | |
| `stream` option, e.g. `encoder.encode(string, {stream: true});` This | |
| is not needed for standard encoding since the input is always in | |
| complete code points. | |