token_tortoise / README.md
Guilherme Favaron
Add application file
bbb1e4b

A newer version of the Gradio SDK is available: 6.6.0

Upgrade
metadata
title: Token Tortoise
emoji: 🐢
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.10.0
app_file: app.py
pinned: false
license: mit
short_description: Bulk Document Token Counter

Token Tortoise - Bulk Document Token Counter

A powerful and reliable tool for counting tokens across multiple document types simultaneously. Perfect for content creators, developers, and AI practitioners who need to manage token counts for large-scale text processing.

Features

  • Multi-Format Support: Process multiple files simultaneously in various formats:

    • PDF (.pdf)
    • Microsoft Word (.docx)
    • PowerPoint (.pptx)
    • Excel (.xlsx, .xls)
    • CSV (.csv)
    • Text files (.txt)
  • Bulk Processing: Upload multiple files at once for efficient token counting

  • Accurate Counting: Uses tiktoken encoder (cl100k_base) for precise token counting

  • Clear Results: Get detailed token counts per file and total count

  • User-Friendly Interface: Clean, intuitive design with instant results

Usage

  1. Visit Token Tortoise on Hugging Face
  2. Click the "Upload Files" button or drag and drop your files
  3. View the token count results for each file and the total count

Technical Details

  • Token Encoding: Uses OpenAI's tiktoken with cl100k_base encoding
  • Document Processing:
    • PDFs: PyPDF2 for text extraction
    • Word: python-docx for .docx parsing
    • PowerPoint: python-pptx for .pptx parsing
    • Excel/CSV: pandas for structured data handling

Installation for Local Development

git clone https://huggingface.co/spaces/guifav/token_tortoise
cd token-tortoise
pip install -r requirements.txt
python app.py

Requirements

gradio
tiktoken
pandas
PyPDF2
python-docx
python-pptx
openpyxl

Limitations

  • Maximum file size: 100MB per file
  • Text extraction quality depends on document formatting
  • Some complex document formatting may affect token count accuracy

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details

About

Created by Guilherme Favaron Part of the MindApps.ai suite of AI tools

Support

For issues and feature requests, please visit: GitHub Issues

Meet the developer: falecom_guilhermefavaron@googlegroups.com

More information about AI & Business: www.guilhermefavaron.com.br

🐢 Token Tortoise: Count with confidence, process with precision.