token_tortoise / README.md
Guilherme Favaron
Add application file
bbb1e4b
---
title: Token Tortoise
emoji: 🐢
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.10.0
app_file: app.py
pinned: false
license: mit
short_description: Bulk Document Token Counter
---
# Token Tortoise - Bulk Document Token Counter
A powerful and reliable tool for counting tokens across multiple document types simultaneously. Perfect for content creators, developers, and AI practitioners who need to manage token counts for large-scale text processing.
## Features
- **Multi-Format Support**: Process multiple files simultaneously in various formats:
- PDF (.pdf)
- Microsoft Word (.docx)
- PowerPoint (.pptx)
- Excel (.xlsx, .xls)
- CSV (.csv)
- Text files (.txt)
- **Bulk Processing**: Upload multiple files at once for efficient token counting
- **Accurate Counting**: Uses `tiktoken` encoder (cl100k_base) for precise token counting
- **Clear Results**: Get detailed token counts per file and total count
- **User-Friendly Interface**: Clean, intuitive design with instant results
## Usage
1. Visit [Token Tortoise on Hugging Face](https://huggingface.co/spaces/guifav/token_tortoise)
2. Click the "Upload Files" button or drag and drop your files
3. View the token count results for each file and the total count
## Technical Details
- **Token Encoding**: Uses OpenAI's `tiktoken` with cl100k_base encoding
- **Document Processing**:
- PDFs: PyPDF2 for text extraction
- Word: python-docx for .docx parsing
- PowerPoint: python-pptx for .pptx parsing
- Excel/CSV: pandas for structured data handling
## Installation for Local Development
```bash
git clone https://huggingface.co/spaces/guifav/token_tortoise
cd token-tortoise
pip install -r requirements.txt
python app.py
```
## Requirements
```
gradio
tiktoken
pandas
PyPDF2
python-docx
python-pptx
openpyxl
```
## Limitations
- Maximum file size: 100MB per file
- Text extraction quality depends on document formatting
- Some complex document formatting may affect token count accuracy
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
MIT License - see LICENSE file for details
## About
Created by [Guilherme Favaron](https://www.guilhermefavaron.com.br)
Part of the [MindApps.ai](https://mindapps.ai) suite of AI tools
## Support
For issues and feature requests, please visit:
[GitHub Issues](https://github.com/GuilhermeFavaron/token-tortoise/issues)
Meet the developer: falecom_guilhermefavaron@googlegroups.com
More information about AI & Business: www.guilhermefavaron.com.br
🐢 Token Tortoise: Count with confidence, process with precision.
---