Spaces:
Paused
Paused
A newer version of the Gradio SDK is available: 6.12.0
metadata
title: Resume Normalizer Trainer
emoji: π
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.6.0
app_file: app.py
pinned: false
license: apache-2.0
hardware: 4xL4
Resume Normalizer Trainer
Fine-tune a Flan-T5 model for resume entity normalization and deduplication.
Features
- Company Name Normalization: Handle mergers, acquisitions, and rebranding (e.g., "Facebook" β "Meta Platforms Inc.")
- Job Title Standardization: Recognize equivalent roles and seniority levels (e.g., "SWE" β "Software Engineer")
- Skills Normalization: Standardize technology names and abbreviations (e.g., "JS" β "JavaScript")
- Binary Equivalency Detection: Determine if two entities refer to the same thing
Model Details
- Base Model: Google Flan-T5 (instruction-tuned for better zero-shot performance)
- Fine-tuning Method: LoRA (Low-Rank Adaptation) for efficient training
- Parameters: 250M (T5-Base) or 770M (T5-Large)
- Training Data: 9,302 high-quality examples (478 manual + 8,824 synthetic)
Usage
- Check that training data is available using the "Check Data" tab
- Enter your HuggingFace token and username
- Select model size and training epochs
- Click "Start Training" and monitor progress in the "Training Status" tab
- Once complete, your model will be available on HuggingFace Hub
Expected Performance
- Inference Speed: <100ms per query
- Accuracy: >90% on entity normalization tasks
- Memory Usage: ~1GB (T5-Base) or ~3GB (T5-Large)
Hardware Requirements
This Space runs on 4xL4 GPUs (96GB total VRAM) for efficient distributed training.