--- title: Infy emoji: 🐒 colorFrom: gray colorTo: purple sdk: gradio sdk_version: 5.23.1 python_version: 3.11 app_file: app.py pinned: false --- # πŸ€— HuggingFace Enabling Sessions **Interactive Demo Platform for Transformers, Hub APIs, and NLP Pipelines** ## πŸ“‹ Overview This is an interactive Gradio application designed for the **HuggingFace Enabling Sessions** workshop. It provides hands-on demonstrations of: - **Session 1 (45 min):** Introduction to the HuggingFace ecosystem, Transformers architecture, and best practices - **Session 2 (90 min):** Hands-on developer workshop with tokenization deep dives and inference playground across 5+ NLP tasks ## πŸš€ Quick Start The app is hosted on HuggingFace Spaces and requires **no local installation**. Simply: 1. Open the Spaces URL 2. Explore the 3 main tabs: - **Session 1: Introduction** β€” Embedded slides + live NLP demos - **Session 2: Hands-On Developer** β€” Tokenizer explorer + inference playground - **Resources & Next Steps** β€” Documentation links and learning resources ### 🎯 Pre-Session Setup (For Presenters) **Want instant, offline demos with zero network dependencies?** If you're presenting and need models pre-cached (e.g., company network restrictions), follow these guides: - **[QUICK_SETUP.md](QUICK_SETUP.md)** β€” 10-minute setup (recommended for demos) - Download models locally - Test everything works - Push to Spaces for instant loading - **[scripts/USING_LOCAL_MODELS.md](scripts/USING_LOCAL_MODELS.md)** β€” Deep dive guide - How local model caching works - Git LFS for large files - Troubleshooting **TL;DR:** `python3 scripts/download_lightweight_models.py && git add models/ && git push origin main` βœ… This ensures models are available **without any external downloads during your session**. ## πŸ“š Session Contents ### Session 1: Introduction to HuggingFace (45 minutes) **Topics Covered:** - HuggingFace Platform overview (Hub, Transformers, Datasets, Spaces) - Core abstractions: Pipelines, Models, Tokenizers - Architecture patterns: Encoders (BERT), Decoders (GPT), Encoder-Decoders (T5/BART) - Enterprise NLP landscape (licensing, open-source vs. commercial) **Live Demos:** - Sentiment Analysis using DistilBERT - Named Entity Recognition (NER) with BERT **Materials:** [SESSION1_SLIDES.md](slides/SESSION1_SLIDES.md) --- ### Session 2: Hands-On Developer Workshop (90 minutes) **Topics Covered:** - Tokenization mechanics and strategies - Inference across 5+ NLP tasks - Understanding model outputs and confidence scores - Production considerations and optimization **Interactive Tasks:** - πŸ”€ **Tokenization Explorer** β€” Visualize how text becomes token IDs - πŸ“Š **Sentiment Analysis** β€” Classify text emotions - 🏷️ **Named Entity Recognition** β€” Extract persons, organizations, locations - ❓ **Question Answering** β€” Answer questions from context - πŸ“ **Text Summarization** β€” Generate concise summaries - πŸ”— **Semantic Similarity** β€” Compare text meaning **Materials:** [SESSION2_SLIDES.md](slides/SESSION2_SLIDES.md) --- ## πŸ› οΈ Project Structure ``` infy/ β”œβ”€β”€ app.py # Main Gradio application β”œβ”€β”€ config.py # Configuration (model IDs, task definitions) β”œβ”€β”€ utils.py # Utility functions for inference β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ README.md # This file β”œβ”€β”€ SPEAKER_NOTES.md # Presenter guide with timing β”œβ”€β”€ slides/ β”‚ β”œβ”€β”€ SESSION1_SLIDES.md # Session 1 presentation content β”‚ └── SESSION2_SLIDES.md # Session 2 presentation content └── data/ β”œβ”€β”€ sample_texts.csv # Sample texts for demos └── demo_samples/ β”œβ”€β”€ sentiment.txt β”œβ”€β”€ ner.txt β”œβ”€β”€ qa.txt β”œβ”€β”€ summarization.txt └── embeddings.txt ``` ## πŸ€– Models Used | Task | Model | Type | License | |------|-------|------|---------| | Sentiment Analysis | distilbert-base-uncased-finetuned-sst-2-english | Encoder | Apache 2.0 | | Named Entity Recognition | dslim/bert-base-NER | Encoder | Apache 2.0 | | Question Answering | deepset/roberta-base-squad2 | Encoder | Apache 2.0 | | Summarization | facebook/bart-large-cnn | Encoder-Decoder | MIT | | Semantic Similarity | sentence-transformers/all-MiniLM-L6-v2 | Encoder | Apache 2.0 | ## πŸ“– How to Use ### During Sessions 1. **Access the Spaces URL** β€” Attendees join via shared link 2. **Session 1 (45 min)** - Presenter screens shares and narrates through slides - Live demos showcase "click-to-run" NLP tasks - Q&A after each major section 3. **Session 2 (90 min)** - Presenter guides attendees through tokenization and inference - Attendees observe interactive widgets - Exercise checkpoints for hands-on exploration - Discussion on production considerations ### After Sessions 1. **Clone the repository:** ```bash git clone https://huggingface.co/spaces/[your-username]/infy ``` 2. **Install dependencies:** ```bash pip install -r requirements.txt ``` 3. **Run locally:** ```bash python app.py ``` 4. **Explore further:** - Modify sample data in `data/sample_texts.csv` - Add more models to `config.py` - Create custom tasks in `app.py` ## πŸŽ“ Learning Resources ### Official Documentation - [Transformers Library Docs](https://huggingface.co/docs/transformers/) - [Datasets Library Docs](https://huggingface.co/docs/datasets/) - [HuggingFace Course (Free)](https://huggingface.co/course/) - [Hub Documentation](https://huggingface.co/docs/hub/) ### Model Hub - Browse 100K+ models: https://huggingface.co/models - Search by task, language, or architecture ### Community - [HuggingFace Forums](https://discuss.huggingface.co/) - [GitHub Issues](https://github.com/huggingface/transformers/issues) - Twitter: [@huggingface](https://twitter.com/huggingface) ### Next Steps - **Fine-tune on your data** β€” Adapt pre-trained models for domain-specific tasks - **Deploy to Spaces** β€” Create interactive demos like this - **Publish to the Hub** β€” Share models and datasets with the community - **Explore advanced techniques** β€” Quantization, distillation, multi-model pipelines ## πŸ”§ Customization ### Add a New Task 1. **Add model to `config.py`:** ```python "new_task": { "name": "Task Name", "model": "model-id-from-hub", "example": "example text", } ``` 2. **Add function to `utils.py`:** ```python def run_new_task(text): pipe = load_pipeline("new_task") return pipe(text) ``` 3. **Add widget to `app.py`:** ```python with gr.Tab("New Task"): input_box = gr.Textbox() output_box = gr.Markdown() btn.click(run_new_task, inputs=[input_box], outputs=[output_box]) ``` ### Modify Sample Data Edit `data/sample_texts.csv` or add `.txt` files to `data/demo_samples/` ## πŸ“ Environment - **Python:** 3.8+ - **Framework:** Gradio 6.9.0 - **ML:** Transformers, Torch - **Hosting:** HuggingFace Spaces ## πŸ“„ License This project is open-source and available for educational and commercial use. Model licenses varyβ€”see individual model cards for details. ## πŸ‘¨β€πŸ« Presenter Notes See [SPEAKER_NOTES.md](SPEAKER_NOTES.md) for: - Session timing breakdowns - Demo sequences and talking points - Troubleshooting common issues - Tips for live presentations ## πŸ“§ Questions & Feedback - Ask during the sessions - Post on HuggingFace Forums - Follow up on company Slack/Teams --- **Ready to dive into NLP? Start with Session 1: Introduction! πŸš€**