Spaces:
Sleeping
Sleeping
| title: Job Description Parser | |
| emoji: 🧠 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: 1.48.0 | |
| app_file: app.py | |
| pinned: false | |
| # Job Parser Model (Qwen Fine-Tuned) | |
| This repository contains a fine-tuned version of the [Qwen](https://huggingface.co/Qwen) model, specifically adapted to parse job descriptions into structured JSON format. | |
| --- | |
| ## 💼 Use Case | |
| The model takes raw job descriptions (JD) as input and outputs structured JSON data containing: | |
| - Job Titles | |
| - Company Name & Website | |
| - Skills | |
| - Compensation | |
| - Location | |
| - Work Mode | |
| - Experience | |
| - Qualification | |
| - Industry | |
| - Posted Date | |
| - Notice Period | |
| - Job Type | |
| Perfect for building: | |
| - Resume & JD analyzers | |
| - Job boards with smart filtering | |
| - HR automation tools | |
| - Job matching engines | |
| --- | |
| ## 🧠 Model Details | |
| - **Base Model**: `Qwen` (`qwen/Qwen3-0.6B`) | |
| - **Fine-tuned on**: 80+ custom-labeled job descriptions | |
| - **Trained using**: Hugging Face Transformers & TRL's SFTTrainer | |
| - **Dataset Format**: Few-shot prompting with Qwen’s `<|im_start|>` / `<|im_end|>` chat template | |
| - **Output**: Structured JSON response | |
| --- | |
| ## 🚀 How to Use | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| model_id = "Rithankoushik/job-parser-model-qwen" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16) | |
| prompt = """<|im_start|>system | |
| You are a helpful assistant that extracts structured information from job descriptions. | |
| <|im_end|> | |
| <|im_start|>user | |
| [Paste job description here] | |
| <|im_end|> | |
| <|im_start|>assistant | |
| """ | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=1024) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |