transcriptinator_v2 / README.md
emmajeed's picture
Update README.md
eac7afc verified

A newer version of the Gradio SDK is available: 6.15.0

Upgrade
metadata
title: Transcriptinator
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
app_file: app.py
pinned: false

πŸŽ™οΈ Transcriptinator

Simple, fast audio transcription powered by Google's Gemini AI.

Features

  • 🎯 Simple & Fast - Upload audio, get transcript in ~20-50 seconds
  • πŸ“ Smart Summaries - Automatic summary and key ideas extraction
  • πŸ”’ Private - Your API key, your data - nothing stored
  • πŸ’° Free - Uses your own Gemini API key (free tier: 15 requests/min)
  • πŸ“„ Markdown Output - Clean, formatted transcripts ready to download

How to Use

1. Get a Gemini API Key (Free)

  1. Go to Google AI Studio
  2. Click "Create API key"
  3. Copy the key

2. Transcribe Audio

  1. Upload your audio file (max 10 minutes)
    • Supported formats: MP3, WAV, M4A, OGG, FLAC, WEBM
  2. Paste your API key
  3. Click "πŸš€ Transcribe Audio"
  4. Wait ~20-50 seconds
  5. Download your transcript!

What You Get

Your transcript includes:

---
title: "Your Audio File"
date_processed: "2025-12-24"
summary: "Quick 2-3 sentence overview..."
key_ideas:
  - idea: "Main Point 1"
    description: "Explanation..."
  - idea: "Main Point 2"
    description: "Explanation..."
note_id: "unique-id"
---

## Key Ideas
- **Main Point 1:** Explanation...
- **Main Point 2:** Explanation...

## Full Transcription
[00:00] Speaker 1: Hello...
[00:15] Speaker 2: Welcome...

Limitations

  • Maximum audio length: 10 minutes (free HuggingFace tier timeout limit)
  • Processing time: ~20-50 seconds depending on audio length
  • API rate limits: 15 requests/minute (Gemini free tier)

Privacy & Security

βœ… Your API key is never stored - Used only for the current request
βœ… Audio files are temporary - Deleted immediately after processing
βœ… No data collection - Everything runs through your own API key

Technical Details

AI Calls per transcription: 3

  1. Transcription (with timestamps and speakers)
  2. Summary generation
  3. Key ideas extraction

Processing time estimate:

  • 2-minute audio: ~22 seconds
  • 5-minute audio: ~35 seconds
  • 10-minute audio: ~50 seconds

Troubleshooting

"Invalid API key"

"Audio file too long"

  • Maximum is 10 minutes for free tier
  • Split longer files or use the CLI version

"Processing timeout"

  • Audio might be too long or corrupted
  • Try with a shorter, clearer audio file

Local Installation

Want to run unlimited length audio? Clone the full version:

``bash git clone https://github.com/YOUR_USERNAME/transcriptinator cd transcriptinator pip install -r requirements.txt python audio_process_and_transcribe.py your_audio_folder -o output_folder


## Credits

Built with:
- [Gradio](https://gradio.app/) - Web interface
- [Google Gemini](https://ai.google.dev/) - AI transcription
- [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting

## License

MIT License - Feel free to use and modify!