infy / README.md
shourya
Downgrade Gradio for Spaces compatibility
de8cf16

A newer version of the Gradio SDK is available: 6.16.0

Upgrade
metadata
title: Infy
emoji: 🐒
colorFrom: gray
colorTo: purple
sdk: gradio
sdk_version: 5.23.1
python_version: 3.11
app_file: app.py
pinned: false

πŸ€— HuggingFace Enabling Sessions

Interactive Demo Platform for Transformers, Hub APIs, and NLP Pipelines

πŸ“‹ Overview

This is an interactive Gradio application designed for the HuggingFace Enabling Sessions workshop. It provides hands-on demonstrations of:

  • Session 1 (45 min): Introduction to the HuggingFace ecosystem, Transformers architecture, and best practices
  • Session 2 (90 min): Hands-on developer workshop with tokenization deep dives and inference playground across 5+ NLP tasks

πŸš€ Quick Start

The app is hosted on HuggingFace Spaces and requires no local installation. Simply:

  1. Open the Spaces URL
  2. Explore the 3 main tabs:
    • Session 1: Introduction β€” Embedded slides + live NLP demos
    • Session 2: Hands-On Developer β€” Tokenizer explorer + inference playground
    • Resources & Next Steps β€” Documentation links and learning resources

🎯 Pre-Session Setup (For Presenters)

Want instant, offline demos with zero network dependencies?

If you're presenting and need models pre-cached (e.g., company network restrictions), follow these guides:

  • QUICK_SETUP.md β€” 10-minute setup (recommended for demos)

    • Download models locally
    • Test everything works
    • Push to Spaces for instant loading
  • scripts/USING_LOCAL_MODELS.md β€” Deep dive guide

    • How local model caching works
    • Git LFS for large files
    • Troubleshooting

TL;DR: python3 scripts/download_lightweight_models.py && git add models/ && git push origin main βœ…

This ensures models are available without any external downloads during your session.

πŸ“š Session Contents

Session 1: Introduction to HuggingFace (45 minutes)

Topics Covered:

  • HuggingFace Platform overview (Hub, Transformers, Datasets, Spaces)
  • Core abstractions: Pipelines, Models, Tokenizers
  • Architecture patterns: Encoders (BERT), Decoders (GPT), Encoder-Decoders (T5/BART)
  • Enterprise NLP landscape (licensing, open-source vs. commercial)

Live Demos:

  • Sentiment Analysis using DistilBERT
  • Named Entity Recognition (NER) with BERT

Materials: SESSION1_SLIDES.md


Session 2: Hands-On Developer Workshop (90 minutes)

Topics Covered:

  • Tokenization mechanics and strategies
  • Inference across 5+ NLP tasks
  • Understanding model outputs and confidence scores
  • Production considerations and optimization

Interactive Tasks:

  • πŸ”€ Tokenization Explorer β€” Visualize how text becomes token IDs
  • πŸ“Š Sentiment Analysis β€” Classify text emotions
  • 🏷️ Named Entity Recognition β€” Extract persons, organizations, locations
  • ❓ Question Answering β€” Answer questions from context
  • πŸ“ Text Summarization β€” Generate concise summaries
  • πŸ”— Semantic Similarity β€” Compare text meaning

Materials: SESSION2_SLIDES.md


πŸ› οΈ Project Structure

infy/
β”œβ”€β”€ app.py                          # Main Gradio application
β”œβ”€β”€ config.py                       # Configuration (model IDs, task definitions)
β”œβ”€β”€ utils.py                        # Utility functions for inference
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ README.md                       # This file
β”œβ”€β”€ SPEAKER_NOTES.md               # Presenter guide with timing
β”œβ”€β”€ slides/
β”‚   β”œβ”€β”€ SESSION1_SLIDES.md        # Session 1 presentation content
β”‚   └── SESSION2_SLIDES.md        # Session 2 presentation content
└── data/
    β”œβ”€β”€ sample_texts.csv           # Sample texts for demos
    └── demo_samples/
        β”œβ”€β”€ sentiment.txt
        β”œβ”€β”€ ner.txt
        β”œβ”€β”€ qa.txt
        β”œβ”€β”€ summarization.txt
        └── embeddings.txt

πŸ€– Models Used

Task Model Type License
Sentiment Analysis distilbert-base-uncased-finetuned-sst-2-english Encoder Apache 2.0
Named Entity Recognition dslim/bert-base-NER Encoder Apache 2.0
Question Answering deepset/roberta-base-squad2 Encoder Apache 2.0
Summarization facebook/bart-large-cnn Encoder-Decoder MIT
Semantic Similarity sentence-transformers/all-MiniLM-L6-v2 Encoder Apache 2.0

πŸ“– How to Use

During Sessions

  1. Access the Spaces URL β€” Attendees join via shared link

  2. Session 1 (45 min)

    • Presenter screens shares and narrates through slides
    • Live demos showcase "click-to-run" NLP tasks
    • Q&A after each major section
  3. Session 2 (90 min)

    • Presenter guides attendees through tokenization and inference
    • Attendees observe interactive widgets
    • Exercise checkpoints for hands-on exploration
    • Discussion on production considerations

After Sessions

  1. Clone the repository:

    git clone https://huggingface.co/spaces/[your-username]/infy
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Run locally:

    python app.py
    
  4. Explore further:

    • Modify sample data in data/sample_texts.csv
    • Add more models to config.py
    • Create custom tasks in app.py

πŸŽ“ Learning Resources

Official Documentation

Model Hub

Community

Next Steps

  • Fine-tune on your data β€” Adapt pre-trained models for domain-specific tasks
  • Deploy to Spaces β€” Create interactive demos like this
  • Publish to the Hub β€” Share models and datasets with the community
  • Explore advanced techniques β€” Quantization, distillation, multi-model pipelines

πŸ”§ Customization

Add a New Task

  1. Add model to config.py:

    "new_task": {
        "name": "Task Name",
        "model": "model-id-from-hub",
        "example": "example text",
    }
    
  2. Add function to utils.py:

    def run_new_task(text):
        pipe = load_pipeline("new_task")
        return pipe(text)
    
  3. Add widget to app.py:

    with gr.Tab("New Task"):
        input_box = gr.Textbox()
        output_box = gr.Markdown()
        btn.click(run_new_task, inputs=[input_box], outputs=[output_box])
    

Modify Sample Data

Edit data/sample_texts.csv or add .txt files to data/demo_samples/

πŸ“ Environment

  • Python: 3.8+
  • Framework: Gradio 6.9.0
  • ML: Transformers, Torch
  • Hosting: HuggingFace Spaces

πŸ“„ License

This project is open-source and available for educational and commercial use. Model licenses varyβ€”see individual model cards for details.

πŸ‘¨β€πŸ« Presenter Notes

See SPEAKER_NOTES.md for:

  • Session timing breakdowns
  • Demo sequences and talking points
  • Troubleshooting common issues
  • Tips for live presentations

πŸ“§ Questions & Feedback

  • Ask during the sessions
  • Post on HuggingFace Forums
  • Follow up on company Slack/Teams

Ready to dive into NLP? Start with Session 1: Introduction! πŸš€