Spaces:
Sleeping
Sleeping
| title: Token Tortoise | |
| emoji: 🐢 | |
| colorFrom: pink | |
| colorTo: yellow | |
| sdk: gradio | |
| sdk_version: 5.10.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: Bulk Document Token Counter | |
| # Token Tortoise - Bulk Document Token Counter | |
| A powerful and reliable tool for counting tokens across multiple document types simultaneously. Perfect for content creators, developers, and AI practitioners who need to manage token counts for large-scale text processing. | |
| ## Features | |
| - **Multi-Format Support**: Process multiple files simultaneously in various formats: | |
| - PDF (.pdf) | |
| - Microsoft Word (.docx) | |
| - PowerPoint (.pptx) | |
| - Excel (.xlsx, .xls) | |
| - CSV (.csv) | |
| - Text files (.txt) | |
| - **Bulk Processing**: Upload multiple files at once for efficient token counting | |
| - **Accurate Counting**: Uses `tiktoken` encoder (cl100k_base) for precise token counting | |
| - **Clear Results**: Get detailed token counts per file and total count | |
| - **User-Friendly Interface**: Clean, intuitive design with instant results | |
| ## Usage | |
| 1. Visit [Token Tortoise on Hugging Face](https://huggingface.co/spaces/guifav/token_tortoise) | |
| 2. Click the "Upload Files" button or drag and drop your files | |
| 3. View the token count results for each file and the total count | |
| ## Technical Details | |
| - **Token Encoding**: Uses OpenAI's `tiktoken` with cl100k_base encoding | |
| - **Document Processing**: | |
| - PDFs: PyPDF2 for text extraction | |
| - Word: python-docx for .docx parsing | |
| - PowerPoint: python-pptx for .pptx parsing | |
| - Excel/CSV: pandas for structured data handling | |
| ## Installation for Local Development | |
| ```bash | |
| git clone https://huggingface.co/spaces/guifav/token_tortoise | |
| cd token-tortoise | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| ## Requirements | |
| ``` | |
| gradio | |
| tiktoken | |
| pandas | |
| PyPDF2 | |
| python-docx | |
| python-pptx | |
| openpyxl | |
| ``` | |
| ## Limitations | |
| - Maximum file size: 100MB per file | |
| - Text extraction quality depends on document formatting | |
| - Some complex document formatting may affect token count accuracy | |
| ## Contributing | |
| Contributions are welcome! Please feel free to submit a Pull Request. | |
| ## License | |
| MIT License - see LICENSE file for details | |
| ## About | |
| Created by [Guilherme Favaron](https://www.guilhermefavaron.com.br) | |
| Part of the [MindApps.ai](https://mindapps.ai) suite of AI tools | |
| ## Support | |
| For issues and feature requests, please visit: | |
| [GitHub Issues](https://github.com/GuilhermeFavaron/token-tortoise/issues) | |
| Meet the developer: falecom_guilhermefavaron@googlegroups.com | |
| More information about AI & Business: www.guilhermefavaron.com.br | |
| 🐢 Token Tortoise: Count with confidence, process with precision. | |
| --- |