Spaces:

theformatisvalid
/

tokenizers-training

Sleeping

tokenizers-training / src /tokenizers_analysis.py

Update src/tokenizers_analysis.py

a55d080 verified 4 months ago

202 Bytes

	def calculate_oov(text, vocabulary):
	words = text.split(' ')
	oov_count = 0
	for word in words:
	if word not in vocabulary:
	oov_count += 1
	return oov_count / len(words)