Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.6.0
title: Token Tortoise
emoji: 🐢
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.10.0
app_file: app.py
pinned: false
license: mit
short_description: Bulk Document Token Counter
Token Tortoise - Bulk Document Token Counter
A powerful and reliable tool for counting tokens across multiple document types simultaneously. Perfect for content creators, developers, and AI practitioners who need to manage token counts for large-scale text processing.
Features
Multi-Format Support: Process multiple files simultaneously in various formats:
- PDF (.pdf)
- Microsoft Word (.docx)
- PowerPoint (.pptx)
- Excel (.xlsx, .xls)
- CSV (.csv)
- Text files (.txt)
Bulk Processing: Upload multiple files at once for efficient token counting
Accurate Counting: Uses
tiktokenencoder (cl100k_base) for precise token countingClear Results: Get detailed token counts per file and total count
User-Friendly Interface: Clean, intuitive design with instant results
Usage
- Visit Token Tortoise on Hugging Face
- Click the "Upload Files" button or drag and drop your files
- View the token count results for each file and the total count
Technical Details
- Token Encoding: Uses OpenAI's
tiktokenwith cl100k_base encoding - Document Processing:
- PDFs: PyPDF2 for text extraction
- Word: python-docx for .docx parsing
- PowerPoint: python-pptx for .pptx parsing
- Excel/CSV: pandas for structured data handling
Installation for Local Development
git clone https://huggingface.co/spaces/guifav/token_tortoise
cd token-tortoise
pip install -r requirements.txt
python app.py
Requirements
gradio
tiktoken
pandas
PyPDF2
python-docx
python-pptx
openpyxl
Limitations
- Maximum file size: 100MB per file
- Text extraction quality depends on document formatting
- Some complex document formatting may affect token count accuracy
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details
About
Created by Guilherme Favaron Part of the MindApps.ai suite of AI tools
Support
For issues and feature requests, please visit: GitHub Issues
Meet the developer: falecom_guilhermefavaron@googlegroups.com
More information about AI & Business: www.guilhermefavaron.com.br
🐢 Token Tortoise: Count with confidence, process with precision.