File size: 3,181 Bytes
71a0fd4
7ee2bc7
 
 
 
71a0fd4
eac7afc
71a0fd4
 
 
 
7ee2bc7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eac7afc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: Transcriptinator
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
app_file: app.py
pinned: false
---

# πŸŽ™οΈ Transcriptinator

Simple, fast audio transcription powered by Google's Gemini AI.

## Features

- 🎯 **Simple & Fast** - Upload audio, get transcript in ~20-50 seconds
- πŸ“ **Smart Summaries** - Automatic summary and key ideas extraction
- πŸ”’ **Private** - Your API key, your data - nothing stored
- πŸ’° **Free** - Uses your own Gemini API key (free tier: 15 requests/min)
- πŸ“„ **Markdown Output** - Clean, formatted transcripts ready to download

## How to Use

### 1. Get a Gemini API Key (Free)

1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
2. Click "Create API key"
3. Copy the key

### 2. Transcribe Audio

1. Upload your audio file (max 10 minutes)
   - Supported formats: MP3, WAV, M4A, OGG, FLAC, WEBM
2. Paste your API key
3. Click "πŸš€ Transcribe Audio"
4. Wait ~20-50 seconds
5. Download your transcript!

## What You Get

Your transcript includes:

```yaml
---
title: "Your Audio File"
date_processed: "2025-12-24"
summary: "Quick 2-3 sentence overview..."
key_ideas:
  - idea: "Main Point 1"
    description: "Explanation..."
  - idea: "Main Point 2"
    description: "Explanation..."
note_id: "unique-id"
---

## Key Ideas
- **Main Point 1:** Explanation...
- **Main Point 2:** Explanation...

## Full Transcription
[00:00] Speaker 1: Hello...
[00:15] Speaker 2: Welcome...
```

## Limitations

- **Maximum audio length:** 10 minutes (free HuggingFace tier timeout limit)
- **Processing time:** ~20-50 seconds depending on audio length
- **API rate limits:** 15 requests/minute (Gemini free tier)

## Privacy & Security

βœ… **Your API key is never stored** - Used only for the current request  
βœ… **Audio files are temporary** - Deleted immediately after processing  
βœ… **No data collection** - Everything runs through your own API key

## Technical Details

**AI Calls per transcription:** 3
1. Transcription (with timestamps and speakers)
2. Summary generation
3. Key ideas extraction

**Processing time estimate:**
- 2-minute audio: ~22 seconds
- 5-minute audio: ~35 seconds
- 10-minute audio: ~50 seconds

## Troubleshooting

**"Invalid API key"**
- Make sure you copied the entire key
- Generate a new key at [Google AI Studio](https://aistudio.google.com/app/apikey)

**"Audio file too long"**
- Maximum is 10 minutes for free tier
- Split longer files or use the [CLI version](https://github.com/YOUR_USERNAME/transcriptinator)

**"Processing timeout"**
- Audio might be too long or corrupted
- Try with a shorter, clearer audio file

## Local Installation

Want to run unlimited length audio? Clone the full version:

``bash
git clone https://github.com/YOUR_USERNAME/transcriptinator
cd transcriptinator
pip install -r requirements.txt
python audio_process_and_transcribe.py your_audio_folder -o output_folder
```

## Credits

Built with:
- [Gradio](https://gradio.app/) - Web interface
- [Google Gemini](https://ai.google.dev/) - AI transcription
- [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting

## License

MIT License - Feel free to use and modify!