--- title: Token Tortoise emoji: 🐢 colorFrom: pink colorTo: yellow sdk: gradio sdk_version: 5.10.0 app_file: app.py pinned: false license: mit short_description: Bulk Document Token Counter --- # Token Tortoise - Bulk Document Token Counter A powerful and reliable tool for counting tokens across multiple document types simultaneously. Perfect for content creators, developers, and AI practitioners who need to manage token counts for large-scale text processing. ## Features - **Multi-Format Support**: Process multiple files simultaneously in various formats: - PDF (.pdf) - Microsoft Word (.docx) - PowerPoint (.pptx) - Excel (.xlsx, .xls) - CSV (.csv) - Text files (.txt) - **Bulk Processing**: Upload multiple files at once for efficient token counting - **Accurate Counting**: Uses `tiktoken` encoder (cl100k_base) for precise token counting - **Clear Results**: Get detailed token counts per file and total count - **User-Friendly Interface**: Clean, intuitive design with instant results ## Usage 1. Visit [Token Tortoise on Hugging Face](https://huggingface.co/spaces/guifav/token_tortoise) 2. Click the "Upload Files" button or drag and drop your files 3. View the token count results for each file and the total count ## Technical Details - **Token Encoding**: Uses OpenAI's `tiktoken` with cl100k_base encoding - **Document Processing**: - PDFs: PyPDF2 for text extraction - Word: python-docx for .docx parsing - PowerPoint: python-pptx for .pptx parsing - Excel/CSV: pandas for structured data handling ## Installation for Local Development ```bash git clone https://huggingface.co/spaces/guifav/token_tortoise cd token-tortoise pip install -r requirements.txt python app.py ``` ## Requirements ``` gradio tiktoken pandas PyPDF2 python-docx python-pptx openpyxl ``` ## Limitations - Maximum file size: 100MB per file - Text extraction quality depends on document formatting - Some complex document formatting may affect token count accuracy ## Contributing Contributions are welcome! Please feel free to submit a Pull Request. ## License MIT License - see LICENSE file for details ## About Created by [Guilherme Favaron](https://www.guilhermefavaron.com.br) Part of the [MindApps.ai](https://mindapps.ai) suite of AI tools ## Support For issues and feature requests, please visit: [GitHub Issues](https://github.com/GuilhermeFavaron/token-tortoise/issues) Meet the developer: falecom_guilhermefavaron@googlegroups.com More information about AI & Business: www.guilhermefavaron.com.br 🐢 Token Tortoise: Count with confidence, process with precision. ---