Spaces:

guifav
/

token_tortoise

Sleeping

App Files Files Community

token_tortoise / README.md

Guilherme Favaron

Add application file

bbb1e4b about 1 year ago

preview code

raw

history blame contribute delete

2.62 kB

	---
	title: Token Tortoise
	emoji: 🐢
	colorFrom: pink
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.10.0
	app_file: app.py
	pinned: false
	license: mit
	short_description: Bulk Document Token Counter
	---

	# Token Tortoise - Bulk Document Token Counter

	A powerful and reliable tool for counting tokens across multiple document types simultaneously. Perfect for content creators, developers, and AI practitioners who need to manage token counts for large-scale text processing.

	## Features

	- Multi-Format Support: Process multiple files simultaneously in various formats:
	- PDF (.pdf)
	- Microsoft Word (.docx)
	- PowerPoint (.pptx)
	- Excel (.xlsx, .xls)
	- CSV (.csv)
	- Text files (.txt)

	- Bulk Processing: Upload multiple files at once for efficient token counting
	- Accurate Counting: Uses `tiktoken` encoder (cl100k_base) for precise token counting
	- Clear Results: Get detailed token counts per file and total count
	- User-Friendly Interface: Clean, intuitive design with instant results

	## Usage

	1. Visit [Token Tortoise on Hugging Face](https://huggingface.co/spaces/guifav/token_tortoise)
	2. Click the "Upload Files" button or drag and drop your files
	3. View the token count results for each file and the total count

	## Technical Details

	- Token Encoding: Uses OpenAI's `tiktoken` with cl100k_base encoding
	- Document Processing:
	- PDFs: PyPDF2 for text extraction
	- Word: python-docx for .docx parsing
	- PowerPoint: python-pptx for .pptx parsing
	- Excel/CSV: pandas for structured data handling

	## Installation for Local Development

	```bash
	git clone https://huggingface.co/spaces/guifav/token_tortoise
	cd token-tortoise
	pip install -r requirements.txt
	python app.py
	```

	## Requirements

	```
	gradio
	tiktoken
	pandas
	PyPDF2
	python-docx
	python-pptx
	openpyxl
	```

	## Limitations

	- Maximum file size: 100MB per file
	- Text extraction quality depends on document formatting
	- Some complex document formatting may affect token count accuracy

	## Contributing

	Contributions are welcome! Please feel free to submit a Pull Request.

	## License

	MIT License - see LICENSE file for details

	## About

	Created by [Guilherme Favaron](https://www.guilhermefavaron.com.br)
	Part of the [MindApps.ai](https://mindapps.ai) suite of AI tools

	## Support

	For issues and feature requests, please visit:
	[GitHub Issues](https://github.com/GuilhermeFavaron/token-tortoise/issues)

	Meet the developer: falecom_guilhermefavaron@googlegroups.com

	More information about AI & Business: www.guilhermefavaron.com.br

	🐢 Token Tortoise: Count with confidence, process with precision.

	---