Spaces:

emmajeed
/

transcriptinator_v2

Build error

App Files Files Community

transcriptinator_v2 / README.md

emmajeed

Update README.md

eac7afc verified about 1 month ago

preview code

raw

history blame contribute delete

3.18 kB

	---
	title: Transcriptinator
	emoji: 🎙️
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 6.12.0
	app_file: app.py
	pinned: false
	---

	# 🎙️ Transcriptinator

	Simple, fast audio transcription powered by Google's Gemini AI.

	## Features

	- 🎯 Simple & Fast - Upload audio, get transcript in ~20-50 seconds
	- 📝 Smart Summaries - Automatic summary and key ideas extraction
	- 🔒 Private - Your API key, your data - nothing stored
	- 💰 Free - Uses your own Gemini API key (free tier: 15 requests/min)
	- 📄 Markdown Output - Clean, formatted transcripts ready to download

	## How to Use

	### 1. Get a Gemini API Key (Free)

	1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
	2. Click "Create API key"
	3. Copy the key

	### 2. Transcribe Audio

	1. Upload your audio file (max 10 minutes)
	- Supported formats: MP3, WAV, M4A, OGG, FLAC, WEBM
	2. Paste your API key
	3. Click "🚀 Transcribe Audio"
	4. Wait ~20-50 seconds
	5. Download your transcript!

	## What You Get

	Your transcript includes:

	```yaml
	---
	title: "Your Audio File"
	date_processed: "2025-12-24"
	summary: "Quick 2-3 sentence overview..."
	key_ideas:
	- idea: "Main Point 1"
	description: "Explanation..."
	- idea: "Main Point 2"
	description: "Explanation..."
	note_id: "unique-id"
	---

	## Key Ideas
	- Main Point 1: Explanation...
	- Main Point 2: Explanation...

	## Full Transcription
	[00:00] Speaker 1: Hello...
	[00:15] Speaker 2: Welcome...
	```

	## Limitations

	- Maximum audio length: 10 minutes (free HuggingFace tier timeout limit)
	- Processing time: ~20-50 seconds depending on audio length
	- API rate limits: 15 requests/minute (Gemini free tier)

	## Privacy & Security

	✅ Your API key is never stored - Used only for the current request
	✅ Audio files are temporary - Deleted immediately after processing
	✅ No data collection - Everything runs through your own API key

	## Technical Details

	AI Calls per transcription: 3
	1. Transcription (with timestamps and speakers)
	2. Summary generation
	3. Key ideas extraction

	Processing time estimate:
	- 2-minute audio: ~22 seconds
	- 5-minute audio: ~35 seconds
	- 10-minute audio: ~50 seconds

	## Troubleshooting

	"Invalid API key"
	- Make sure you copied the entire key
	- Generate a new key at [Google AI Studio](https://aistudio.google.com/app/apikey)

	"Audio file too long"
	- Maximum is 10 minutes for free tier
	- Split longer files or use the [CLI version](https://github.com/YOUR_USERNAME/transcriptinator)

	"Processing timeout"
	- Audio might be too long or corrupted
	- Try with a shorter, clearer audio file

	## Local Installation

	Want to run unlimited length audio? Clone the full version:

	``bash
	git clone https://github.com/YOUR_USERNAME/transcriptinator
	cd transcriptinator
	pip install -r requirements.txt
	python audio_process_and_transcribe.py your_audio_folder -o output_folder
	```

	## Credits

	Built with:
	- [Gradio](https://gradio.app/) - Web interface
	- [Google Gemini](https://ai.google.dev/) - AI transcription
	- [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting

	## License

	MIT License - Feel free to use and modify!