Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / transformers.js /pr_1480 /en /api /utils /data-structures.md

rtrm

about 1 month ago

preview code

download

raw

16.6 kB

	# utils/data-structures

	Custom data structures.

	These are only used internally, meaning an end-user shouldn't
	need to access anything here.

	* [utils/data-structures](#module_utils/data-structures)
	* _static_
	* [.PriorityQueue](#module_utils/data-structures.PriorityQueue)
	* [`new PriorityQueue(comparator)`](#new_module_utils/data-structures.PriorityQueue_new)
	* [`.size`](#module_utils/data-structures.PriorityQueue+size)
	* [`.isEmpty()`](#module_utils/data-structures.PriorityQueue+isEmpty) ⇒ boolean
	* [`.peek()`](#module_utils/data-structures.PriorityQueue+peek) ⇒ any
	* [`.push(...values)`](#module_utils/data-structures.PriorityQueue+push) ⇒ number
	* [`.extend(values)`](#module_utils/data-structures.PriorityQueue+extend) ⇒ number
	* [`.pop()`](#module_utils/data-structures.PriorityQueue+pop) ⇒ any
	* [`.replace(value)`](#module_utils/data-structures.PriorityQueue+replace) ⇒ *
	* [`._siftUpFrom(node)`](#module_utils/data-structures.PriorityQueue+_siftUpFrom)
	* [.CharTrie](#module_utils/data-structures.CharTrie)
	* [`.extend(texts)`](#module_utils/data-structures.CharTrie+extend)
	* [`.push(text)`](#module_utils/data-structures.CharTrie+push)
	* [`.commonPrefixSearch(text)`](#module_utils/data-structures.CharTrie+commonPrefixSearch)
	* [.TokenLattice](#module_utils/data-structures.TokenLattice)
	* [`new TokenLattice(sentence, bosTokenId, eosTokenId)`](#new_module_utils/data-structures.TokenLattice_new)
	* [`.insert(pos, length, score, tokenId)`](#module_utils/data-structures.TokenLattice+insert)
	* [`.viterbi()`](#module_utils/data-structures.TokenLattice+viterbi) ⇒ Array.<TokenLatticeNode>
	* [`.piece(node)`](#module_utils/data-structures.TokenLattice+piece) ⇒ string
	* [`.tokens()`](#module_utils/data-structures.TokenLattice+tokens) ⇒ Array.<string>
	* [`.tokenIds()`](#module_utils/data-structures.TokenLattice+tokenIds) ⇒ Array.<number>
	* [.DictionarySplitter](#module_utils/data-structures.DictionarySplitter)
	* [`new DictionarySplitter(dictionary)`](#new_module_utils/data-structures.DictionarySplitter_new)
	* [`.split(text)`](#module_utils/data-structures.DictionarySplitter+split) ⇒ Array.<string>
	* [.LRUCache](#module_utils/data-structures.LRUCache)
	* [`new LRUCache(capacity)`](#new_module_utils/data-structures.LRUCache_new)
	* [`.get(key)`](#module_utils/data-structures.LRUCache+get) ⇒ any
	* [`.put(key, value)`](#module_utils/data-structures.LRUCache+put)
	* [`.clear()`](#module_utils/data-structures.LRUCache+clear)
	* _inner_
	* [~CharTrieNode](#module_utils/data-structures..CharTrieNode)
	* [`new CharTrieNode(isLeaf, children)`](#new_module_utils/data-structures..CharTrieNode_new)
	* [`.default()`](#module_utils/data-structures..CharTrieNode.default) ⇒ CharTrieNode
	* [~TokenLatticeNode](#module_utils/data-structures..TokenLatticeNode)
	* [`new TokenLatticeNode(tokenId, nodeId, pos, length, score)`](#new_module_utils/data-structures..TokenLatticeNode_new)
	* [`.clone()`](#module_utils/data-structures..TokenLatticeNode+clone) ⇒ TokenLatticeNode

	* * *

	## utils/data-structures.PriorityQueue

	Efficient Heap-based Implementation of a Priority Queue.
	It uses an array-based binary heap, where the root is at index `0`, and the
	children of node `i` are located at indices `2i + 1` and `2i + 2`, respectively.

	Adapted from the following sources:
	- https://stackoverflow.com/a/42919752/13989043 (original)
	- https://github.com/belladoreai/llama-tokenizer-js (minor improvements)

	Kind: static class of [utils/data-structures](#module_utils/data-structures)

	* [.PriorityQueue](#module_utils/data-structures.PriorityQueue)
	* [`new PriorityQueue(comparator)`](#new_module_utils/data-structures.PriorityQueue_new)
	* [`.size`](#module_utils/data-structures.PriorityQueue+size)
	* [`.isEmpty()`](#module_utils/data-structures.PriorityQueue+isEmpty) ⇒ boolean
	* [`.peek()`](#module_utils/data-structures.PriorityQueue+peek) ⇒ any
	* [`.push(...values)`](#module_utils/data-structures.PriorityQueue+push) ⇒ number
	* [`.extend(values)`](#module_utils/data-structures.PriorityQueue+extend) ⇒ number
	* [`.pop()`](#module_utils/data-structures.PriorityQueue+pop) ⇒ any
	* [`.replace(value)`](#module_utils/data-structures.PriorityQueue+replace) ⇒ *
	* [`._siftUpFrom(node)`](#module_utils/data-structures.PriorityQueue+_siftUpFrom)

	* * *

	### `new PriorityQueue(comparator)`

	Create a new PriorityQueue.



	ParamTypeDescription




	comparatorfunctionComparator function to determine priority. Defaults to a MaxHeap.



	* * *

	### `priorityQueue.size`

	The size of the queue

	Kind: instance property of [PriorityQueue](#module_utils/data-structures.PriorityQueue)

	* * *

	### `priorityQueue.isEmpty()` ⇒ boolean

	Check if the queue is empty.

	Kind: instance method of [PriorityQueue](#module_utils/data-structures.PriorityQueue)
	Returns: boolean - `true` if the queue is empty, `false` otherwise.

	* * *

	### `priorityQueue.peek()` ⇒ any

	Return the element with the highest priority in the queue.

	Kind: instance method of [PriorityQueue](#module_utils/data-structures.PriorityQueue)
	Returns: any - The highest priority element in the queue.

	* * *

	### `priorityQueue.push(...values)` ⇒ number

	Add one or more elements to the queue.

	Kind: instance method of [PriorityQueue](#module_utils/data-structures.PriorityQueue)
	Returns: number - The new size of the queue.



	ParamTypeDescription




	...valuesanyThe values to push into the queue.



	* * *

	### `priorityQueue.extend(values)` ⇒ number

	Add multiple elements to the queue.

	Kind: instance method of [PriorityQueue](#module_utils/data-structures.PriorityQueue)
	Returns: number - The new size of the queue.



	ParamTypeDescription




	valuesArray.<any>The values to push into the queue.



	* * *

	### `priorityQueue.pop()` ⇒ any

	Remove and return the element with the highest priority in the queue.

	Kind: instance method of [PriorityQueue](#module_utils/data-structures.PriorityQueue)
	Returns: any - The element with the highest priority in the queue.

	* * *

	### `priorityQueue.replace(value)` ⇒ *

	Replace the element with the highest priority in the queue with a new value.

	Kind: instance method of [PriorityQueue](#module_utils/data-structures.PriorityQueue)
	Returns: * - The replaced value.



	ParamTypeDescription




	value*The new value.



	* * *

	### `priorityQueue._siftUpFrom(node)`

	Helper function to sift up from a given node.

	Kind: instance method of [PriorityQueue](#module_utils/data-structures.PriorityQueue)



	ParamTypeDescription




	nodenumberThe index of the node to start sifting up from.



	* * *

	## utils/data-structures.CharTrie

	A trie structure to efficiently store and search for strings.

	Kind: static class of [utils/data-structures](#module_utils/data-structures)

	* [.CharTrie](#module_utils/data-structures.CharTrie)
	* [`.extend(texts)`](#module_utils/data-structures.CharTrie+extend)
	* [`.push(text)`](#module_utils/data-structures.CharTrie+push)
	* [`.commonPrefixSearch(text)`](#module_utils/data-structures.CharTrie+commonPrefixSearch)

	* * *

	### `charTrie.extend(texts)`

	Adds one or more `texts` to the trie.

	Kind: instance method of [CharTrie](#module_utils/data-structures.CharTrie)



	ParamTypeDescription




	textsArray.<string>The strings to add to the trie.



	* * *

	### `charTrie.push(text)`

	Adds text to the trie.

	Kind: instance method of [CharTrie](#module_utils/data-structures.CharTrie)



	ParamTypeDescription




	textstringThe string to add to the trie.



	* * *

	### `charTrie.commonPrefixSearch(text)`

	Searches the trie for all strings with a common prefix of `text`.

	Kind: instance method of [CharTrie](#module_utils/data-structures.CharTrie)



	ParamTypeDescription




	textstringThe common prefix to search for.



	* * *

	## utils/data-structures.TokenLattice

	A lattice data structure to be used for tokenization.

	Kind: static class of [utils/data-structures](#module_utils/data-structures)

	* [.TokenLattice](#module_utils/data-structures.TokenLattice)
	* [`new TokenLattice(sentence, bosTokenId, eosTokenId)`](#new_module_utils/data-structures.TokenLattice_new)
	* [`.insert(pos, length, score, tokenId)`](#module_utils/data-structures.TokenLattice+insert)
	* [`.viterbi()`](#module_utils/data-structures.TokenLattice+viterbi) ⇒ Array.<TokenLatticeNode>
	* [`.piece(node)`](#module_utils/data-structures.TokenLattice+piece) ⇒ string
	* [`.tokens()`](#module_utils/data-structures.TokenLattice+tokens) ⇒ Array.<string>
	* [`.tokenIds()`](#module_utils/data-structures.TokenLattice+tokenIds) ⇒ Array.<number>

	* * *

	### `new TokenLattice(sentence, bosTokenId, eosTokenId)`

	Creates a new TokenLattice instance.



	ParamTypeDescription




	sentencestringThe input sentence to be tokenized.


	bosTokenIdnumberThe beginning-of-sequence token ID.


	eosTokenIdnumberThe end-of-sequence token ID.



	* * *

	### `tokenLattice.insert(pos, length, score, tokenId)`

	Inserts a new token node into the token lattice.

	Kind: instance method of [TokenLattice](#module_utils/data-structures.TokenLattice)



	ParamTypeDescription




	posnumberThe starting position of the token.


	lengthnumberThe length of the token.


	scorenumberThe score of the token.


	tokenIdnumberThe token ID of the token.



	* * *

	### `tokenLattice.viterbi()` ⇒ Array.<TokenLatticeNode>

	Implements the Viterbi algorithm to compute the most likely sequence of tokens.

	Kind: instance method of [TokenLattice](#module_utils/data-structures.TokenLattice)
	Returns: Array.<TokenLatticeNode> - The most likely sequence of tokens.

	* * *

	### `tokenLattice.piece(node)` ⇒ string

	Kind: instance method of [TokenLattice](#module_utils/data-structures.TokenLattice)
	Returns: string - The array of nodes representing the most likely sequence of tokens.



	ParamType




	nodeTokenLatticeNode


	* * *

	### `tokenLattice.tokens()` ⇒ Array.<string>

	Kind: instance method of [TokenLattice](#module_utils/data-structures.TokenLattice)
	Returns: Array.<string> - The most likely sequence of tokens.

	* * *

	### `tokenLattice.tokenIds()` ⇒ Array.<number>

	Kind: instance method of [TokenLattice](#module_utils/data-structures.TokenLattice)
	Returns: Array.<number> - The most likely sequence of token ids.

	* * *

	## utils/data-structures.DictionarySplitter

	A data structure which uses a trie to split a string into tokens based on a dictionary.
	It can also use a regular expression to preprocess the input text before splitting.

	NOTE: To ensure multi-byte characters are handled correctly, we operate at byte-level instead of character-level.

	Kind: static class of [utils/data-structures](#module_utils/data-structures)

	* [.DictionarySplitter](#module_utils/data-structures.DictionarySplitter)
	* [`new DictionarySplitter(dictionary)`](#new_module_utils/data-structures.DictionarySplitter_new)
	* [`.split(text)`](#module_utils/data-structures.DictionarySplitter+split) ⇒ Array.<string>

	* * *

	### `new DictionarySplitter(dictionary)`



	ParamTypeDescription




	dictionaryArray.<string>The dictionary of words to use for splitting.



	* * *

	### `dictionarySplitter.split(text)` ⇒ Array.<string>

	Splits the input text into tokens based on the dictionary.

	Kind: instance method of [DictionarySplitter](#module_utils/data-structures.DictionarySplitter)
	Returns: Array.<string> - An array of tokens.



	ParamTypeDescription




	textstringThe input text to split.



	* * *

	## utils/data-structures.LRUCache

	A simple Least Recently Used (LRU) cache implementation in JavaScript.
	This cache stores key-value pairs and evicts the least recently used item
	when the capacity is exceeded.

	Kind: static class of [utils/data-structures](#module_utils/data-structures)

	* [.LRUCache](#module_utils/data-structures.LRUCache)
	* [`new LRUCache(capacity)`](#new_module_utils/data-structures.LRUCache_new)
	* [`.get(key)`](#module_utils/data-structures.LRUCache+get) ⇒ any
	* [`.put(key, value)`](#module_utils/data-structures.LRUCache+put)
	* [`.clear()`](#module_utils/data-structures.LRUCache+clear)

	* * *

	### `new LRUCache(capacity)`

	Creates an LRUCache instance.



	ParamTypeDescription




	capacitynumberThe maximum number of items the cache can hold.



	* * *

	### `lruCache.get(key)` ⇒ any

	Retrieves the value associated with the given key and marks the key as recently used.

	Kind: instance method of [LRUCache](#module_utils/data-structures.LRUCache)
	Returns: any - The value associated with the key, or undefined if the key does not exist.



	ParamTypeDescription




	keyanyThe key to retrieve.



	* * *

	### `lruCache.put(key, value)`

	Inserts or updates the key-value pair in the cache.
	If the key already exists, it is updated and marked as recently used.
	If the cache exceeds its capacity, the least recently used item is evicted.

	Kind: instance method of [LRUCache](#module_utils/data-structures.LRUCache)



	ParamTypeDescription




	keyanyThe key to add or update.


	valueanyThe value to associate with the key.



	* * *

	### `lruCache.clear()`

	Clears the cache.

	Kind: instance method of [LRUCache](#module_utils/data-structures.LRUCache)

	* * *

	## utils/data-structures~CharTrieNode

	Represents a node in a character trie.

	Kind: inner class of [utils/data-structures](#module_utils/data-structures)

	* [~CharTrieNode](#module_utils/data-structures..CharTrieNode)
	* [`new CharTrieNode(isLeaf, children)`](#new_module_utils/data-structures..CharTrieNode_new)
	* [`.default()`](#module_utils/data-structures..CharTrieNode.default) ⇒ CharTrieNode

	* * *

	### `new CharTrieNode(isLeaf, children)`

	Create a new CharTrieNode.



	ParamTypeDescription




	isLeafbooleanWhether the node is a leaf node or not.


	childrenMap.<string, CharTrieNode>A map containing the node's children, where the key is a character and the value is a CharTrieNode.



	* * *

	### `CharTrieNode.default()` ⇒ CharTrieNode

	Returns a new `CharTrieNode` instance with default values.

	Kind: static method of [CharTrieNode](#module_utils/data-structures..CharTrieNode)
	Returns: CharTrieNode - A new `CharTrieNode` instance with `isLeaf` set to `false` and an empty `children` map.

	* * *

	## utils/data-structures~TokenLatticeNode

	Kind: inner class of [utils/data-structures](#module_utils/data-structures)

	* [~TokenLatticeNode](#module_utils/data-structures..TokenLatticeNode)
	* [`new TokenLatticeNode(tokenId, nodeId, pos, length, score)`](#new_module_utils/data-structures..TokenLatticeNode_new)
	* [`.clone()`](#module_utils/data-structures..TokenLatticeNode+clone) ⇒ TokenLatticeNode

	* * *

	### `new TokenLatticeNode(tokenId, nodeId, pos, length, score)`

	Represents a node in a token lattice for a given sentence.



	ParamTypeDescription




	tokenIdnumberThe ID of the token associated with this node.


	nodeIdnumberThe ID of this node.


	posnumberThe starting position of the token in the sentence.


	lengthnumberThe length of the token.


	scorenumberThe score associated with the token.



	* * *

	### `tokenLatticeNode.clone()` ⇒ TokenLatticeNode

	Returns a clone of this node.

	Kind: instance method of [TokenLatticeNode](#module_utils/data-structures..TokenLatticeNode)
	Returns: TokenLatticeNode - A clone of this node.

	* * *

Xet Storage Details

Size:: 16.6 kB
Xet hash:: 2a1fa62dc6f4270b7c51fa48a3f5a9dda93f3a2bb8ceac613fd006869f8b8196

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.