infy / README.md
shourya
Downgrade Gradio for Spaces compatibility
de8cf16
---
title: Infy
emoji: 🐒
colorFrom: gray
colorTo: purple
sdk: gradio
sdk_version: 5.23.1
python_version: 3.11
app_file: app.py
pinned: false
---
# πŸ€— HuggingFace Enabling Sessions
**Interactive Demo Platform for Transformers, Hub APIs, and NLP Pipelines**
## πŸ“‹ Overview
This is an interactive Gradio application designed for the **HuggingFace Enabling Sessions** workshop. It provides hands-on demonstrations of:
- **Session 1 (45 min):** Introduction to the HuggingFace ecosystem, Transformers architecture, and best practices
- **Session 2 (90 min):** Hands-on developer workshop with tokenization deep dives and inference playground across 5+ NLP tasks
## πŸš€ Quick Start
The app is hosted on HuggingFace Spaces and requires **no local installation**. Simply:
1. Open the Spaces URL
2. Explore the 3 main tabs:
- **Session 1: Introduction** β€” Embedded slides + live NLP demos
- **Session 2: Hands-On Developer** β€” Tokenizer explorer + inference playground
- **Resources & Next Steps** β€” Documentation links and learning resources
### 🎯 Pre-Session Setup (For Presenters)
**Want instant, offline demos with zero network dependencies?**
If you're presenting and need models pre-cached (e.g., company network restrictions), follow these guides:
- **[QUICK_SETUP.md](QUICK_SETUP.md)** β€” 10-minute setup (recommended for demos)
- Download models locally
- Test everything works
- Push to Spaces for instant loading
- **[scripts/USING_LOCAL_MODELS.md](scripts/USING_LOCAL_MODELS.md)** β€” Deep dive guide
- How local model caching works
- Git LFS for large files
- Troubleshooting
**TL;DR:** `python3 scripts/download_lightweight_models.py && git add models/ && git push origin main` βœ…
This ensures models are available **without any external downloads during your session**.
## πŸ“š Session Contents
### Session 1: Introduction to HuggingFace (45 minutes)
**Topics Covered:**
- HuggingFace Platform overview (Hub, Transformers, Datasets, Spaces)
- Core abstractions: Pipelines, Models, Tokenizers
- Architecture patterns: Encoders (BERT), Decoders (GPT), Encoder-Decoders (T5/BART)
- Enterprise NLP landscape (licensing, open-source vs. commercial)
**Live Demos:**
- Sentiment Analysis using DistilBERT
- Named Entity Recognition (NER) with BERT
**Materials:** [SESSION1_SLIDES.md](slides/SESSION1_SLIDES.md)
---
### Session 2: Hands-On Developer Workshop (90 minutes)
**Topics Covered:**
- Tokenization mechanics and strategies
- Inference across 5+ NLP tasks
- Understanding model outputs and confidence scores
- Production considerations and optimization
**Interactive Tasks:**
- πŸ”€ **Tokenization Explorer** β€” Visualize how text becomes token IDs
- πŸ“Š **Sentiment Analysis** β€” Classify text emotions
- 🏷️ **Named Entity Recognition** β€” Extract persons, organizations, locations
- ❓ **Question Answering** β€” Answer questions from context
- πŸ“ **Text Summarization** β€” Generate concise summaries
- πŸ”— **Semantic Similarity** β€” Compare text meaning
**Materials:** [SESSION2_SLIDES.md](slides/SESSION2_SLIDES.md)
---
## πŸ› οΈ Project Structure
```
infy/
β”œβ”€β”€ app.py # Main Gradio application
β”œβ”€β”€ config.py # Configuration (model IDs, task definitions)
β”œβ”€β”€ utils.py # Utility functions for inference
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ SPEAKER_NOTES.md # Presenter guide with timing
β”œβ”€β”€ slides/
β”‚ β”œβ”€β”€ SESSION1_SLIDES.md # Session 1 presentation content
β”‚ └── SESSION2_SLIDES.md # Session 2 presentation content
└── data/
β”œβ”€β”€ sample_texts.csv # Sample texts for demos
└── demo_samples/
β”œβ”€β”€ sentiment.txt
β”œβ”€β”€ ner.txt
β”œβ”€β”€ qa.txt
β”œβ”€β”€ summarization.txt
└── embeddings.txt
```
## πŸ€– Models Used
| Task | Model | Type | License |
|------|-------|------|---------|
| Sentiment Analysis | distilbert-base-uncased-finetuned-sst-2-english | Encoder | Apache 2.0 |
| Named Entity Recognition | dslim/bert-base-NER | Encoder | Apache 2.0 |
| Question Answering | deepset/roberta-base-squad2 | Encoder | Apache 2.0 |
| Summarization | facebook/bart-large-cnn | Encoder-Decoder | MIT |
| Semantic Similarity | sentence-transformers/all-MiniLM-L6-v2 | Encoder | Apache 2.0 |
## πŸ“– How to Use
### During Sessions
1. **Access the Spaces URL** β€” Attendees join via shared link
2. **Session 1 (45 min)**
- Presenter screens shares and narrates through slides
- Live demos showcase "click-to-run" NLP tasks
- Q&A after each major section
3. **Session 2 (90 min)**
- Presenter guides attendees through tokenization and inference
- Attendees observe interactive widgets
- Exercise checkpoints for hands-on exploration
- Discussion on production considerations
### After Sessions
1. **Clone the repository:**
```bash
git clone https://huggingface.co/spaces/[your-username]/infy
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Run locally:**
```bash
python app.py
```
4. **Explore further:**
- Modify sample data in `data/sample_texts.csv`
- Add more models to `config.py`
- Create custom tasks in `app.py`
## πŸŽ“ Learning Resources
### Official Documentation
- [Transformers Library Docs](https://huggingface.co/docs/transformers/)
- [Datasets Library Docs](https://huggingface.co/docs/datasets/)
- [HuggingFace Course (Free)](https://huggingface.co/course/)
- [Hub Documentation](https://huggingface.co/docs/hub/)
### Model Hub
- Browse 100K+ models: https://huggingface.co/models
- Search by task, language, or architecture
### Community
- [HuggingFace Forums](https://discuss.huggingface.co/)
- [GitHub Issues](https://github.com/huggingface/transformers/issues)
- Twitter: [@huggingface](https://twitter.com/huggingface)
### Next Steps
- **Fine-tune on your data** β€” Adapt pre-trained models for domain-specific tasks
- **Deploy to Spaces** β€” Create interactive demos like this
- **Publish to the Hub** β€” Share models and datasets with the community
- **Explore advanced techniques** β€” Quantization, distillation, multi-model pipelines
## πŸ”§ Customization
### Add a New Task
1. **Add model to `config.py`:**
```python
"new_task": {
"name": "Task Name",
"model": "model-id-from-hub",
"example": "example text",
}
```
2. **Add function to `utils.py`:**
```python
def run_new_task(text):
pipe = load_pipeline("new_task")
return pipe(text)
```
3. **Add widget to `app.py`:**
```python
with gr.Tab("New Task"):
input_box = gr.Textbox()
output_box = gr.Markdown()
btn.click(run_new_task, inputs=[input_box], outputs=[output_box])
```
### Modify Sample Data
Edit `data/sample_texts.csv` or add `.txt` files to `data/demo_samples/`
## πŸ“ Environment
- **Python:** 3.8+
- **Framework:** Gradio 6.9.0
- **ML:** Transformers, Torch
- **Hosting:** HuggingFace Spaces
## πŸ“„ License
This project is open-source and available for educational and commercial use. Model licenses varyβ€”see individual model cards for details.
## πŸ‘¨β€πŸ« Presenter Notes
See [SPEAKER_NOTES.md](SPEAKER_NOTES.md) for:
- Session timing breakdowns
- Demo sequences and talking points
- Troubleshooting common issues
- Tips for live presentations
## πŸ“§ Questions & Feedback
- Ask during the sessions
- Post on HuggingFace Forums
- Follow up on company Slack/Teams
---
**Ready to dive into NLP? Start with Session 1: Introduction! πŸš€**