--- title: LocalDuo emoji: πŸ”₯ colorFrom: green colorTo: pink sdk: gradio sdk_version: 6.16.0 python_version: '3.12' app_file: app.py pinned: true short_description: πŸ‡°πŸ‡·βœ¨ LocalDuo - Learn Korean from Documents preload_from_hub: - Qwen/Qwen3.5-2B models: - Qwen/Qwen3.5-2B - CohereLabs/cohere-transcribe-03-2026 - Supertone/supertonic-3 thumbnail: >- https://raw.githubusercontent.com/ShayekhBinIslam/file-host/main/thumbnail.png tags: - track:backyard - achievement:offgrid - achievement:fieldnotes --- # LocalDuo β€” Build Small Hackathon Field Notes **Author:** Shayekh Bin Islam, KAIST, South Korea **Date:** June 2026 **Stack:** Gradio Β· Qwen 3.5-9B VLM Β· Cohere ASR Β· Supertonic TTS Β· HuggingFace Spaces (ZeroGPU) **Live Demo:** https://huggingface.co/spaces/build-small-hackathon/LocalDuo/ **Recorded Demo:** https://youtu.be/PoZs9ltbdik **Social:** https://www.linkedin.com/posts/shayekhbinislam_hi-everyone-i-have-built-this-app-localduo-share-7472275977369210880--Q6i/ **Field Note:** https://huggingface.co/blog/build-small-hackathon/localduo --- ## What I Built **LocalDuo** is an end-to-end Korean language learning application that takes *any* Korean-language content β€” a PDF textbook, a live website, an audio recording, or a YouTube video β€” and automatically transforms it into interactive vocabulary flashcards with native-quality audio pronunciation. The core idea: **instead of studying from generic word lists, learn vocabulary from content you actually care about.** Upload a chapter from your Korean textbook, paste a BBC Korean news article, or drop in a K-drama YouTube clip, and the app extracts the most useful Korean vocabulary, transliterates it into your native script, explains the grammar, generates TTS pronunciation audio, and packages everything into swipeable flashcards with a built-in quiz mode. ### Feature Overview | Feature | Description | |---|---| | **Multi-Source Input** | Website URLs, PDF uploads, audio file uploads, YouTube links, and pre-saved deck imports β€” five distinct input pipelines unified into one interface | | **Vision-Language Extraction** | Qwen 3.5-9B processes both text *and* page images simultaneously, enabling vocabulary extraction from visual content (handwritten notes, textbook diagrams, infographics) | | **Speech-to-Text Pipeline** | Cohere ASR (`cohere-transcribe-03-2026`) transcribes Korean audio from YouTube videos and uploaded audio files, with Korean-only filtering to strip English artifacts | | **Text-to-Speech Pronunciation** | Supertonic-3 TTS generates natural Korean pronunciation for every extracted word, embedded as base64 audio data URIs directly in the flashcard HTML | | **Interactive Flashcard SPA** | A full single-page application embedded via `