Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / transformers.js /pr_1472 /en /api /utils /data-structures.md

rtrm

about 1 month ago

preview code

download

raw

16.6 kB

utils/data-structures

Custom data structures.

These are only used internally, meaning an end-user shouldn't need to access anything here.

utils/data-structures
- static
  - .PriorityQueue
    - new PriorityQueue(comparator)
    - .size
    - .isEmpty() ⇒ boolean
    - .peek() ⇒ any
    - .push(...values) ⇒ number
    - .extend(values) ⇒ number
    - .pop() ⇒ any
    - .replace(value) ⇒ *
    - ._siftUpFrom(node)
  - .CharTrie
  - .TokenLattice
    - new TokenLattice(sentence, bosTokenId, eosTokenId)
    - .insert(pos, length, score, tokenId)
    - .viterbi() ⇒ Array.<TokenLatticeNode>
    - .piece(node) ⇒ string
    - .tokens() ⇒ Array.<string>
    - .tokenIds() ⇒ Array.<number>
  - .DictionarySplitter
    - new DictionarySplitter(dictionary)
    - .split(text) ⇒ Array.<string>
  - .LRUCache
    - new LRUCache(capacity)
    - .get(key) ⇒ any
    - .put(key, value)
    - .clear()
- inner
  - ~CharTrieNode
    - new CharTrieNode(isLeaf, children)
    - .default() ⇒ CharTrieNode
  - ~TokenLatticeNode
    - new TokenLatticeNode(tokenId, nodeId, pos, length, score)
    - .clone() ⇒ TokenLatticeNode

utils/data-structures.PriorityQueue

Efficient Heap-based Implementation of a Priority Queue. It uses an array-based binary heap, where the root is at index 0, and the children of node i are located at indices 2i + 1 and 2i + 2, respectively.

Adapted from the following sources:

https://stackoverflow.com/a/42919752/13989043 (original)
https://github.com/belladoreai/llama-tokenizer-js (minor improvements)

Kind: static class of utils/data-structures

.PriorityQueue
- new PriorityQueue(comparator)
- .size
- .isEmpty() ⇒ boolean
- .peek() ⇒ any
- .push(...values) ⇒ number
- .extend(values) ⇒ number
- .pop() ⇒ any
- .replace(value) ⇒ *
- ._siftUpFrom(node)

`new PriorityQueue(comparator)`

Create a new PriorityQueue.

  ParamTypeDescription




comparatorfunctionComparator function to determine priority. Defaults to a MaxHeap.

`priorityQueue.size`

The size of the queue

Kind: instance property of PriorityQueue

`priorityQueue.isEmpty()` ⇒ boolean

Check if the queue is empty.

Kind: instance method of PriorityQueue
Returns: boolean - true if the queue is empty, false otherwise.

`priorityQueue.peek()` ⇒ any

Return the element with the highest priority in the queue.

Kind: instance method of PriorityQueue
Returns: any - The highest priority element in the queue.

`priorityQueue.push(...values)` ⇒ number

Add one or more elements to the queue.

Kind: instance method of PriorityQueue
Returns: number - The new size of the queue.

  ParamTypeDescription




...valuesanyThe values to push into the queue.

`priorityQueue.extend(values)` ⇒ number

Add multiple elements to the queue.

Kind: instance method of PriorityQueue
Returns: number - The new size of the queue.

  ParamTypeDescription




valuesArray.&lt;any&gt;The values to push into the queue.

`priorityQueue.pop()` ⇒ any

Remove and return the element with the highest priority in the queue.

Kind: instance method of PriorityQueue
Returns: any - The element with the highest priority in the queue.

`priorityQueue.replace(value)` ⇒ *

Replace the element with the highest priority in the queue with a new value.

Kind: instance method of PriorityQueue
Returns: * - The replaced value.

  ParamTypeDescription




value*The new value.

`priorityQueue._siftUpFrom(node)`

Helper function to sift up from a given node.

Kind: instance method of PriorityQueue

  ParamTypeDescription




nodenumberThe index of the node to start sifting up from.

utils/data-structures.CharTrie

A trie structure to efficiently store and search for strings.

Kind: static class of utils/data-structures

.CharTrie

`charTrie.extend(texts)`

Adds one or more texts to the trie.

Kind: instance method of CharTrie

  ParamTypeDescription




textsArray.&lt;string&gt;The strings to add to the trie.

`charTrie.push(text)`

Adds text to the trie.

Kind: instance method of CharTrie

  ParamTypeDescription




textstringThe string to add to the trie.

`charTrie.commonPrefixSearch(text)`

Searches the trie for all strings with a common prefix of text.

Kind: instance method of CharTrie

  ParamTypeDescription




textstringThe common prefix to search for.

utils/data-structures.TokenLattice

A lattice data structure to be used for tokenization.

Kind: static class of utils/data-structures

.TokenLattice
- new TokenLattice(sentence, bosTokenId, eosTokenId)
- .insert(pos, length, score, tokenId)
- .viterbi() ⇒ Array.<TokenLatticeNode>
- .piece(node) ⇒ string
- .tokens() ⇒ Array.<string>
- .tokenIds() ⇒ Array.<number>

`new TokenLattice(sentence, bosTokenId, eosTokenId)`

Creates a new TokenLattice instance.

  ParamTypeDescription




sentencestringThe input sentence to be tokenized.


bosTokenIdnumberThe beginning-of-sequence token ID.


eosTokenIdnumberThe end-of-sequence token ID.

`tokenLattice.insert(pos, length, score, tokenId)`

Inserts a new token node into the token lattice.

Kind: instance method of TokenLattice

  ParamTypeDescription




posnumberThe starting position of the token.


lengthnumberThe length of the token.


scorenumberThe score of the token.


tokenIdnumberThe token ID of the token.

`tokenLattice.viterbi()` ⇒ Array.<TokenLatticeNode>

Implements the Viterbi algorithm to compute the most likely sequence of tokens.

Kind: instance method of TokenLattice
Returns: Array.<TokenLatticeNode> - The most likely sequence of tokens.

`tokenLattice.piece(node)` ⇒ string

Kind: instance method of TokenLattice
Returns: string - The array of nodes representing the most likely sequence of tokens.

  ParamType




nodeTokenLatticeNode

`tokenLattice.tokens()` ⇒ Array.<string>

Kind: instance method of TokenLattice
Returns: Array.<string> - The most likely sequence of tokens.

`tokenLattice.tokenIds()` ⇒ Array.<number>

Kind: instance method of TokenLattice
Returns: Array.<number> - The most likely sequence of token ids.

utils/data-structures.DictionarySplitter

A data structure which uses a trie to split a string into tokens based on a dictionary. It can also use a regular expression to preprocess the input text before splitting.

NOTE: To ensure multi-byte characters are handled correctly, we operate at byte-level instead of character-level.

Kind: static class of utils/data-structures

.DictionarySplitter
- new DictionarySplitter(dictionary)
- .split(text) ⇒ Array.<string>

`new DictionarySplitter(dictionary)`

  ParamTypeDescription




dictionaryArray.&lt;string&gt;The dictionary of words to use for splitting.

`dictionarySplitter.split(text)` ⇒ Array.<string>

Splits the input text into tokens based on the dictionary.

Kind: instance method of DictionarySplitter
Returns: Array.<string> - An array of tokens.

  ParamTypeDescription




textstringThe input text to split.

utils/data-structures.LRUCache

A simple Least Recently Used (LRU) cache implementation in JavaScript. This cache stores key-value pairs and evicts the least recently used item when the capacity is exceeded.

Kind: static class of utils/data-structures

.LRUCache
- new LRUCache(capacity)
- .get(key) ⇒ any
- .put(key, value)
- .clear()

`new LRUCache(capacity)`

Creates an LRUCache instance.

  ParamTypeDescription




capacitynumberThe maximum number of items the cache can hold.

`lruCache.get(key)` ⇒ any

Retrieves the value associated with the given key and marks the key as recently used.

Kind: instance method of LRUCache
Returns: any - The value associated with the key, or undefined if the key does not exist.

  ParamTypeDescription




keyanyThe key to retrieve.

`lruCache.put(key, value)`

Inserts or updates the key-value pair in the cache. If the key already exists, it is updated and marked as recently used. If the cache exceeds its capacity, the least recently used item is evicted.

Kind: instance method of LRUCache

  ParamTypeDescription




keyanyThe key to add or update.


valueanyThe value to associate with the key.

`lruCache.clear()`

Clears the cache.

Kind: instance method of LRUCache

utils/data-structures~CharTrieNode

Represents a node in a character trie.

Kind: inner class of utils/data-structures

~CharTrieNode
- new CharTrieNode(isLeaf, children)
- .default() ⇒ CharTrieNode

`new CharTrieNode(isLeaf, children)`

Create a new CharTrieNode.

  ParamTypeDescription




isLeafbooleanWhether the node is a leaf node or not.


childrenMap.&lt;string, CharTrieNode&gt;A map containing the node&#39;s children, where the key is a character and the value is a CharTrieNode.

`CharTrieNode.default()` ⇒ CharTrieNode

Returns a new CharTrieNode instance with default values.

Kind: static method of CharTrieNode
Returns: CharTrieNode - A new CharTrieNode instance with isLeaf set to false and an empty children map.

utils/data-structures~TokenLatticeNode

Kind: inner class of utils/data-structures

~TokenLatticeNode
- new TokenLatticeNode(tokenId, nodeId, pos, length, score)
- .clone() ⇒ TokenLatticeNode

`new TokenLatticeNode(tokenId, nodeId, pos, length, score)`

Represents a node in a token lattice for a given sentence.

  ParamTypeDescription




tokenIdnumberThe ID of the token associated with this node.


nodeIdnumberThe ID of this node.


posnumberThe starting position of the token in the sentence.


lengthnumberThe length of the token.


scorenumberThe score associated with the token.

`tokenLatticeNode.clone()` ⇒ TokenLatticeNode

Returns a clone of this node.

Kind: instance method of TokenLatticeNode
Returns: TokenLatticeNode - A clone of this node.

Xet Storage Details

Size:: 16.6 kB
Xet hash:: 2a1fa62dc6f4270b7c51fa48a3f5a9dda93f3a2bb8ceac613fd006869f8b8196

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.

utils/data-structures

utils/data-structures.PriorityQueue

new PriorityQueue(comparator)

priorityQueue.size

priorityQueue.isEmpty() ⇒ boolean

priorityQueue.peek() ⇒ any

priorityQueue.push(...values) ⇒ number

priorityQueue.extend(values) ⇒ number

priorityQueue.pop() ⇒ any

priorityQueue.replace(value) ⇒ *

priorityQueue._siftUpFrom(node)

utils/data-structures.CharTrie

charTrie.extend(texts)

charTrie.push(text)

charTrie.commonPrefixSearch(text)

utils/data-structures.TokenLattice

new TokenLattice(sentence, bosTokenId, eosTokenId)

tokenLattice.insert(pos, length, score, tokenId)

tokenLattice.viterbi() ⇒ Array.<TokenLatticeNode>

tokenLattice.piece(node) ⇒ string

tokenLattice.tokens() ⇒ Array.<string>

tokenLattice.tokenIds() ⇒ Array.<number>

utils/data-structures.DictionarySplitter

new DictionarySplitter(dictionary)

dictionarySplitter.split(text) ⇒ Array.<string>

utils/data-structures.LRUCache

new LRUCache(capacity)

lruCache.get(key) ⇒ any

lruCache.put(key, value)

lruCache.clear()

utils/data-structures~CharTrieNode

new CharTrieNode(isLeaf, children)

CharTrieNode.default() ⇒ CharTrieNode

utils/data-structures~TokenLatticeNode

new TokenLatticeNode(tokenId, nodeId, pos, length, score)

tokenLatticeNode.clone() ⇒ TokenLatticeNode

Xet Storage Details

`new PriorityQueue(comparator)`

`priorityQueue.size`

`priorityQueue.isEmpty()` ⇒ boolean

`priorityQueue.peek()` ⇒ any

`priorityQueue.push(...values)` ⇒ number

`priorityQueue.extend(values)` ⇒ number

`priorityQueue.pop()` ⇒ any

`priorityQueue.replace(value)` ⇒ *

`priorityQueue._siftUpFrom(node)`

`charTrie.extend(texts)`

`charTrie.push(text)`

`charTrie.commonPrefixSearch(text)`

`new TokenLattice(sentence, bosTokenId, eosTokenId)`

`tokenLattice.insert(pos, length, score, tokenId)`

`tokenLattice.viterbi()` ⇒ Array.<TokenLatticeNode>

`tokenLattice.piece(node)` ⇒ string

`tokenLattice.tokens()` ⇒ Array.<string>

`tokenLattice.tokenIds()` ⇒ Array.<number>

`new DictionarySplitter(dictionary)`

`dictionarySplitter.split(text)` ⇒ Array.<string>

`new LRUCache(capacity)`

`lruCache.get(key)` ⇒ any

`lruCache.put(key, value)`

`lruCache.clear()`

`new CharTrieNode(isLeaf, children)`

`CharTrieNode.default()` ⇒ CharTrieNode

`new TokenLatticeNode(tokenId, nodeId, pos, length, score)`

`tokenLatticeNode.clone()` ⇒ TokenLatticeNode