Vineethreddy commited on
Commit
cfb38a2
·
verified ·
1 Parent(s): ad07b9e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +254 -0
README.md ADDED
@@ -0,0 +1,254 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - meta-llama/Llama-3.2-3B-Instruct
7
+ ---
8
+ ---
9
+ <div align="center">
10
+ <img src="https://github.com/distil-labs/badges/blob/main/distillabs-logo.svg?raw=true" width="40%" alt="distil labs" />
11
+ </div>
12
+
13
+ ---
14
+
15
+ <div align="center">
16
+ <table>
17
+ <tr>
18
+ <td align="center">
19
+ <a href="https://www.distillabs.ai/?utm_source=hugging-face&utm_medium=referral&utm_campaign=distil-resume-roast">
20
+ <img src="https://github.com/distil-labs/badges/blob/main/badge-distillabs-home.svg?raw=true" alt="Homepage"/>
21
+ </a>
22
+ </td>
23
+ <td align="center">
24
+ <a href="https://github.com/distil-labs">
25
+ <img src="https://github.com/distil-labs/badges/blob/main/badge-github.svg?raw=true" alt="GitHub"/>
26
+ </a>
27
+ </td>
28
+ <td align="center">
29
+ <a href="https://huggingface.co/distil-labs">
30
+ <img src="https://github.com/distil-labs/badges/blob/main/badge-huggingface.svg?raw=true" alt="Hugging Face"/>
31
+ </a>
32
+ </td>
33
+ </tr>
34
+ <tr>
35
+ <td align="center">
36
+ <a href="https://www.linkedin.com/company/distil-labs/">
37
+ <img src="https://github.com/distil-labs/badges/blob/main/badge-linkedin.svg?raw=true" alt="LinkedIn"/>
38
+ </a>
39
+ </td>
40
+ <td align="center">
41
+ <a href="https://distil-labs-community.slack.com/join/shared_invite/zt-36zqj87le-i3quWUn2bjErRq22xoE58g">
42
+ <img src="https://github.com/distil-labs/badges/blob/main/badge-slack.svg?raw=true" alt="Slack"/>
43
+ </a>
44
+ </td>
45
+ <td align="center">
46
+ <a href="https://x.com/distil_labs">
47
+ <img src="https://github.com/distil-labs/badges/blob/main/badge-twitter.svg?raw=true" alt="Twitter"/>
48
+ </a>
49
+ </td>
50
+ </tr>
51
+ </table>
52
+ </div>
53
+
54
+ # Resume Roaster AI
55
+
56
+ We trained an SLM (Small Language Model) assistant for automatic resume critique — a Llama-3.2-3B parameter model that generates "Roast Mode" feedback and professional improvement suggestions.
57
+ Run it locally to keep your personal data private, or deploy it for instant feedback!
58
+
59
+
60
+ ### **1. Install Dependencies**
61
+
62
+ First, install **[Ollama](http://ollama.com/)** from their official website.
63
+ Then set up your Python environment:
64
+
65
+ ```bash
66
+ # Create a virtual environment
67
+ python -m venv .venv
68
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
69
+
70
+ # Install required tools
71
+ pip install huggingface_hub ollama rich pymupdf
72
+ ```
73
+ Available models hosted on HuggingFace:
74
+
75
+ - **[distil-labs/Distil-Rost-Resume-Llama-3.2-3B-Instruct](https://huggingface.co/distil-labs/Distil-Rost-Resume-Llama-3.2-3B-Instruct)**
76
+
77
+ ### **2. Setup the Model**
78
+
79
+ Download your fine-tuned GGUF model and register it with Ollama.
80
+
81
+ ```bash
82
+ hf download distil-labs/Distil-Rost-Resume-Llama-3.2-3B-Instruct --local-dir distil-model
83
+
84
+ cd distil-model
85
+ # Create the Ollama model from the Modelfile
86
+ ollama create roast_master -f Modelfile
87
+ ```
88
+
89
+ ### **3. Usage**
90
+
91
+ Now you can roast any resume PDF instantly from your terminal.
92
+
93
+ ```bash
94
+ # Syntax: python roast.py <path_to_resume.pdf>
95
+ python roast.py my_resume.pdf
96
+ ```
97
+
98
+ ## ✨ Features
99
+
100
+ The assistant is trained to analyze resumes and output structured JSON containing:
101
+
102
+ - **💀 Roast Critique**
103
+ A sarcastic, humorous paragraph quoting specific problematic parts of the resume (typos, clichés, gaps).
104
+
105
+ - **✨ Professional Suggestions**
106
+ A list of **exactly 3** constructive, actionable tips to improve the resume.
107
+
108
+ - **📊 Rating**
109
+ An integer score **(1–10)** based on overall resume quality.
110
+
111
+ ## 📊 Model Evaluation & Fine-Tuning Results
112
+
113
+ To validate the necessity of fine-tuning, we performed a strict **A/B Test** comparing the **Base Model** (Llama-3.2-3B-Instruct) against our **Fine-Tuned Student** (Llama-3.2-3B-Instruct).
114
+
115
+ ### 1. The Engineering Challenge
116
+ We needed the model to satisfy three conflicting requirements simultaneously:
117
+ 1. **Strict JSON Schema:** Output *only* valid JSON (no Markdown wrappers like ` ```json `, no conversational filler).
118
+ 2. **Persona Shift:** Move from the base model's "Helpful Assistant" tone to a "Ruthless Roaster" persona.
119
+ 3. **Context Awareness:** Cite specific details from the resume rather than giving generic advice.
120
+
121
+ ### 2. Quantitative Results
122
+
123
+ | Metric | 🤖 Base Model (Llama-3.2-1B) | 👨‍🏫 Teacher Model (gpt-oss-120b) | 🔥 Fine-Tuned Student (Custom) |
124
+ | :--- | :--- | :--- | :--- |
125
+ | **JSON Valid Rate** | **70% (Failed)** | **100% (Passed)** | **100% (Passed)** <br>Matches Teacher performance. |
126
+ | **Persona Score** | **4/10 (Generic Snark)** <br>Often polite or apologetic. | **10/10 (Benchmark)** | **10/10 (Highly Contextual)** <br>Successfully mimicked the Teacher's tone. |
127
+ | **Hallucination** | **High** <br>(Invented formatting wrappers) | **None** | **None** <br>Strict adherence. |
128
+
129
+ > **Methodology:** A human evaluator reviewed outputs.
130
+
131
+ ### **Evaluation Metrics**
132
+
133
+ | **Metric** | **Teacher** | **Student-Base (Untrained)** | **Student-Tuned (Ours)** |
134
+ |--------------------|-------------------|-------------------------------|---------------------------|
135
+ | **ROUGE-L** | 0.44 | 0.10 | **0.43 ⬆** |
136
+ | **METEOR** | 0.36 | 0.17 | **0.35 ⬆** |
137
+
138
+
139
+ ### 3. Qualitative Comparison (Side-by-Side)
140
+
141
+ We fed the same resume ("Paritosh") to both models.
142
+
143
+ #### ❌ Base Model Output (Failure)
144
+ * **Status:** **CRASH** (JSON Parsing Error)
145
+ * **Why:** The model "hallucinated" a conversational intro and added a disclaimer at the end, breaking the JSON parser.
146
+ ```text
147
+ Here is the critique for the resume:
148
+ {
149
+ "roast_critique": "The resume is okay but lacks specific details..."
150
+ }
151
+ Note: I tried to be critical but maintained a professional tone.
152
+ ```
153
+ #### ✅ Fine-Tuned Model Output (Success)
154
+ **Status:** PASS (Valid JSON)
155
+
156
+ **Why:** The model output raw JSON immediately. The tone was specific to the user's *"Income Inequality"* project.
157
+ #### JSON
158
+ ```json
159
+ {
160
+ "roast_critique": "PARITOSH, the future of computer science is you... in a really long time. Your resume is a jumbled mess of buzzwords. 'Integrated Unstructured data' sounds like something a 5-year-old would say, and don't even get me started on the 'Income Inequality Analysis' - who doesn't love a good spreadsheet? Oh, and please stop using Comic Sans.",
161
+ "professional_suggestions": [
162
+ "Provide specific examples of real-world problems you solved instead of listing generic tech stacks.",
163
+ "Quantify accomplishments. Instead of 'reduced response time', say 'improved response time from 2s to 0.4s'.",
164
+ "Get rid of the 'Certifications' section and replace it with real project experience."
165
+ ],
166
+ "rating": 6
167
+ }
168
+ ```
169
+
170
+ #### Example 1
171
+ <p align="center">
172
+ <img src="https://github.com/distil-labs/distil-resume-roast/blob/main/examples/rr-1.png??raw=true" width="550" alt="Example 1" />
173
+ </p>
174
+
175
+ ---
176
+
177
+ #### Example 2
178
+ <p align="center">
179
+ <img src="https://github.com/distil-labs/distil-resume-roast/blob/main/examples/rr-2.png?raw=true" width="550" alt="Example 2" />
180
+ </p>
181
+
182
+ ---
183
+
184
+ #### Example 3
185
+ <p align="center">
186
+ <img src="https://github.com/distil-labs/distil-resume-roast/blob/main/examples/rr-3.png?raw=true" width="550" alt="Example 3" />
187
+ </p>
188
+
189
+
190
+ ### **Training Config**
191
+
192
+ - **Student:** Llama-3.2-3B-Instruct
193
+ - **Teacher:** openai.gpt-oss-120b
194
+ - **Dataset:** 10,000 synthetic examples
195
+
196
+ ### 4. Conclusion
197
+
198
+ The fine-tuning process **successfully eliminated the formatting hallucinations** present in the base model and **significantly enhanced the "Roaster" persona**, making the outputs more structured, consistent, and aligned with the intended tone.
199
+
200
+
201
+
202
+ ## ❓ FAQ
203
+
204
+ ---
205
+
206
+ <details>
207
+ <summary><strong>Q: Why not just use ChatGPT or Claude?</strong></summary>
208
+
209
+ **Privacy and cost.**
210
+ Resumes contain sensitive personal data (PII). Sending them to cloud APIs risks exposure.
211
+ Our model runs **fully locally**, ensuring zero data leaks and costs **nothing** to run.
212
+ </details>
213
+
214
+ ---
215
+
216
+ <details>
217
+ <summary><strong>Q: How accurate is a 3B model compared to GPT-4?</strong></summary>
218
+
219
+ Surprisingly good for this specific task!
220
+ Because it’s fine-tuned on **6,000+ high-quality roast-style examples**, it performs far better than a generic prompt to GPT-4.
221
+ It captures the **roast persona** more consistently and is extremely fast.
222
+ </details>
223
+
224
+ ---
225
+
226
+ <details>
227
+ <summary><strong>Q: Can I use this for serious resume reviews?</strong></summary>
228
+
229
+ Yes!
230
+ The **Professional Suggestions** section is trained on real career guidance data.
231
+ You can ignore the roast and only use the actionable tips.
232
+ </details>
233
+
234
+ ---
235
+
236
+ <details>
237
+ <summary><strong>Q: The model is too mean! Can I change it?</strong></summary>
238
+
239
+ The model is intentionally “brutally honest.”
240
+ But since it outputs **structured JSON**, you can simply hide the `roast` field and show only the suggestions.
241
+ </details>
242
+
243
+ ---
244
+
245
+ <details>
246
+ <summary><strong>Q: What hardware do I need?</strong></summary>
247
+
248
+ **Minimum:**
249
+ - 8GB RAM (CPU Mode)
250
+ - Works well on modern laptops (Mac M1/M2/M3 recommended)
251
+
252
+ **Recommended:**
253
+ - NVIDIA GPU with **4GB+ VRAM** for 2–5s inference
254
+ </details>