Spaces:
Runtime error
Runtime error
File size: 14,306 Bytes
b85afa8 1ee78e5 f8fbe87 a01e4c2 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f452678 b85afa8 f3ac294 b7af1d6 4f49d90 b85afa8 f452678 2942d3b b85afa8 9773661 4f49d90 b85afa8 cc30e52 b85afa8 f452678 4f49d90 b85afa8 a0e8e60 4f49d90 a01e4c2 020e780 a0e8e60 4f49d90 020e780 f452678 4f49d90 b85afa8 4f49d90 b85afa8 4f49d90 b85afa8 9be4658 b85afa8 152bc5c b85afa8 a0e8e60 b85afa8 f452678 b85afa8 f452678 b85afa8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 | from openai import OpenAI
from models import Evaluations,EvalResult
from typing import List, Dict
import json
tags = {'AI': "This one is the competence description"} #list of competence to save, better to hit db.
client = OpenAI()
def generate_model_parameters(skill: str, transcript: str, lang: str):
eng = f"""
You are tasked with evaluating a transcript of an IT job interview. The interview that is conducted in the transcript is technical.
You need sufficient IT knowledge since you will evaluate the answer of the interviewee to determine whether the interviewee answer correctly or not.
You will output "SUCCESS" if the interviewee's answer is deemed correct and "FAIL" if it's deemed false.
Below are 5 examples of correct answers.
Here are 5 examples:
EXAMPLE 1:
SKILL TO BE EVALUATED: Python
INTERVIEWER:
What is the use of zip () in python?
INTERVIEWEE:
The zip returns an iterator and takes iterable as argument. These iterables can be list, tuple, dictionary etc. It maps similar index of every iterable to make a single entity.
OUTPUT: SUCCESS
EXAMPLE 2:
SKILL TO BE EVALUATED: Python
INTERVIEWER:
What will be the output of the following?
name=["swati","shweta"]
age=[10,20]
new_entity-zip(name,age)
new_entity-set(new_entity)
print(new_entity)
INTERVIEWEE:
The output is {{('shweta', 20), ('swati', 10)}}
OUTPUT: SUCCESS
EXAMPLE 3:
SKILL TO BE EVALUATED: Python
INTERVIEWER:
What will be the output of the following?
a=["1","2","3"]
b=["a","b","c"]
c=[x+y for x, y in zip(a,b)] print(c)
INTERVIEWEE:
The output is: ['1a', '2b', '3c']
OUTPUT: SUCCESS
EXAMPLE 4:
SKILL TO BE EVALUATED: Python
INTERVIEWER:
What will be the output of the following?
str="apple#banana#kiwi#orange"
print(str.split("#",2))
INTERVIEWEE:
['apple', 'banana', 'kiwi#orange']
OUTPUT: SUCCESS
EXAMPLE 5:
SKILL TO BE EVALUATED: Python
INTERVIEWER:
What are python modules? Name some commonly used built-in modules in Python?
INTERVIEWEE:
Python modules are files containing Python code. This code can either be function classes or variables. A Python module is a .py file containing executable code. Some of the commonly used built-in modules are:
- os
- sys
- math
- random
- data time
- json
OUTPUT: SUCCESS
Note that the examples that I give above have the correct answer. Your job is to generate the output only (SUCCESS OR FAIL). You don't need to explain your justification.
SKILL TO BE EVALUATED: {skill}
{transcript}
"""
idn = f"""
Anda ditugaskan untuk mengevaluasi transkrip dari sebuah wawancara kerja di bidang IT. Wawancara dalam transkrip tersebut bersifat teknis.
Anda perlu memiliki pengetahuan yang cukup tentang IT karena Anda akan mengevaluasi jawaban dari peserta wawancara untuk menentukan apakah jawaban peserta tersebut benar atau tidak.
Anda akan mengeluarkan output "SUCCESS" jika jawaban peserta dianggap benar dan "FAIL" jika dianggap salah.
Berikut adalah 5 contoh jawaban yang benar.
CONTOH 1:
KEMAMPUAN YANG DIEVALUASI: Python
PEWAWANCARA:
Apa kegunaan dari fungsi zip() di Python?
PESERTA:
Fungsi zip mengembalikan sebuah iterator dan menerima iterable sebagai argumen. Iterable ini bisa berupa list, tuple, dictionary, dll. Fungsi ini mencocokkan indeks yang sama dari setiap iterable untuk membentuk satu entitas.
OUTPUT: SUCCESS
CONTOH 2:
KEMAMPUAN YANG DIEVALUASI: Python
PEWAWANCARA:
Apa output dari kode berikut?
python
Copy
Edit
name = ["swati", "shweta"]
age = [10, 20]
new_entity = zip(name, age)
new_entity = set(new_entity)
print(new_entity)
PESERTA:
Output-nya adalah: {('shweta', 20), ('swati', 10)}
OUTPUT: SUCCESS
CONTOH 3:
KEMAMPUAN YANG DIEVALUASI: Python
PEWAWANCARA:
Apa output dari kode berikut?
python
Copy
Edit
a = ["1", "2", "3"]
b = ["a", "b", "c"]
c = [x + y for x, y in zip(a, b)]
print(c)
PESERTA:
Output-nya adalah: ['1a', '2b', '3c']
OUTPUT: SUCCESS
CONTOH 4:
KEMAMPUAN YANG DIEVALUASI: Python
PEWAWANCARA:
Apa output dari kode berikut?
python
Copy
Edit
str = "apple#banana#kiwi#orange"
print(str.split("#", 2))
PESERTA:
['apple', 'banana', 'kiwi#orange']
OUTPUT: SUCCESS
CONTOH 5:
KEMAMPUAN YANG DIEVALUASI: Python
PEWAWANCARA:
Apa itu modul Python? Sebutkan beberapa modul built-in yang umum digunakan di Python?
PESERTA:
Modul Python adalah file yang berisi kode Python. Kode ini bisa berupa fungsi, kelas, atau variabel. Sebuah modul Python adalah file .py yang berisi kode yang bisa dijalankan. Beberapa modul built-in yang sering digunakan adalah:
os
sys
math
random
datetime
json
OUTPUT: SUCCESS
Catatan: Contoh-contoh di atas memberikan jawaban yang benar. Tugas Anda adalah menghasilkan output saja (SUCCESS atau FAIL). Anda tidak perlu menjelaskan alasan Anda.
KEMAMPUAN YANG DIEVALUASI:{skill}
{transcript}
"""
model_parameters = {
"model":"gpt-4-0125-preview",
"messages":[
{"role": "system", "content": eng if lang == 'en' else idn},
]
}
return model_parameters
def gpt_evaluator(payload, fewshot, response_format):
print("-----tes")
print(fewshot)
print(payload)
res = []
for i in payload:
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": fewshot},
{"role": "user", "content": (i)},
],
response_format=response_format)
json_str = response.choices[0].message.parsed
res.append(json_str.value)
return res
def extract_competences_and_responses(competences: list[str], transcripts: list[dict]):
responses = []
for i in range(len(competences)):
transcript = transcripts[i]
response = ""
for idx, chat in enumerate(transcript):
# logger.info(chat)
response += chat["answer"]
if idx < len(transcript) - 1:
response += "\n"
responses.append(response)
return responses
def evaluate_interview(competences: list[str], transcript: list, lang: str = 'en'):
# global tags
model_inputs = []
responses = extract_competences_and_responses(transcript["comp_beha"], transcript["behavioral"])
print(len(competences))
print(len(responses))
# pprint(transcript)
for i in range(len(transcript["comp_beha"])):
competence = transcript["comp_beha"][i]
response = responses[i]
text = "KNOWLEDGE:\n"
knowledge_exist = False
text += f"\nCOMPETENCE: {competence}\n\n"
text += f"RESPONSE:\n{response}"
model_inputs.append(text)
print("------")
## TODO: change to gpt
idn = """
CONTOH 1:
KETERAMPILAN YANG DINILAI: Kejujuran
PEWAWANCARA:
Apa mimpi burukmu?
PESERTA WAWANCARA:
Saya tidak punya mimpi buruk.
Penilaian: Tidak mungkin seseorang tidak pernah mengalami mimpi buruk. Rasa takut terhadap sesuatu adalah hal yang umum dirasakan manusia.
Skor: 0.1
CONTOH 2:
PEWAWANCARA:
Bisakah Anda menceritakan saat Anda harus men-debug masalah yang sangat sulit di lingkungan produksi?
PESERTA WAWANCARA:
Di pekerjaan saya sebelumnya, kami menggunakan arsitektur berbasis mikroservis yang dideploy di Kubernetes. Suatu pagi, kami mulai menerima peringatan bahwa layanan autentikasi pengguna kami gagal secara intermiten, dan pengguna tidak bisa masuk.
Sebagai engineer yang sedang bertugas, tanggung jawab saya adalah segera mengidentifikasi akar permasalahan dan mengembalikan layanan ke fungsionalitas penuh tanpa memengaruhi layanan lain yang bergantung padanya.
Saya mulai dengan memeriksa log di Kibana dan melihat bahwa beberapa pod untuk layanan autentikasi terus-menerus restart. Saya lalu memeriksa metrik penggunaan resource di Prometheus dan melihat lonjakan memori sebelum setiap crash. Saya curiga terjadi memory leak akibat perubahan terbaru, jadi saya rollback ke image container sebelumnya untuk menstabilkan layanan.
Setelah stabil, saya menelusuri commit terbaru dan menemukan penggunaan session store in-memory baru yang tidak melepaskan sesi lama dengan benar. Saya menulis skrip analisis heap dump cepat, mengonfirmasi kebocoran memori tersebut, dan memperbaiki session store dengan cache LRU yang terbatas.
Perbaikannya dideploy di hari yang sama, dan masalah tidak pernah terjadi lagi. Laporan postmortem yang saya tulis juga mendorong tim untuk mengadopsi profiling memori untuk semua komponen layanan baru. Waktu penyelesaian insiden kami meningkat sekitar 30% di kuartal berikutnya berkat perbaikan proses tersebut.
"""
en = """
Here are 2 examples:
EXAMPLE 1:
SKILL TO BE EVALUATED: Honest
INTERVIEWER:
What are your nightmare?
INTERVIEWEE:
I do not have night mare
Judgement: It is impossible to some not having any nightmare. Scary of something is common human feels.
Score: 0.1
EXAMPLE 2:
INTERVIEWER:
Can you tell me about a time you had to debug a particularly difficult issue in a production environment?
INTERVIEWEE:
At my previous job, we had a microservices-based architecture deployed on Kubernetes. One morning, we started getting alerts that our user authentication service was intermittently failing, and users couldn’t log in.
As the engineer on call, my responsibility was to quickly identify the root cause and restore the service to full functionality without affecting other dependent services.
I began by checking the logs in Kibana and noticed that some of the pods for the authentication service were repeatedly restarting. I then checked the resource usage metrics in Prometheus and saw a memory spike before each crash. I suspected a memory leak introduced by a recent change, so I rolled back to the previous container image to stabilize the service.
After stabilizing, I dug deeper into the recent commits and found a new in-memory session store that was not properly releasing old sessions. I wrote a quick heap dump analysis script, confirmed the leak, and patched the session store to use a bounded LRU cache instead.
The fix was deployed the same day, and the issue never recurred. The postmortem I wrote also led to the team adopting memory profiling for all new service components. Our incident resolution time improved by about 30% over the next quarter due to those process improvements.
RETURN IN FORMAT BELOW:
{
value: [{
"Judgement": "It is impossible to some not having any nightmare. Scary of something is common human feels. Means he was lying",
"score": 0.1
},
{
"Judgement: "The candidate delivered a clear, concise STAR response that effectively demonstrated strong technical skills, composure under pressure, and a methodical approach to problem-solving in a production environment. The use of appropriate tools (Kibana, Prometheus), the decision to roll back, and the successful root cause analysis showed depth of experience. The result was measurable and impactful, indicating not just resolution but long-term improvement. Slightly more context on user or business impact would make it perfect, but overall, this is an excellent response that would strongly support a hiring decision."
"score": 0.95
}
]
}
"""
result = gpt_evaluator(model_inputs, en if lang == 'en' else idn,
Evaluations
)
## output:
final_score = 0
behavioral_scores = generate_behavioral_score(result)
technical_scores = generate_technical_score(transcript["comp_tech"], transcript["technical"])
final_score = aggregate_scores(behavioral_scores, technical_scores)
return EvalResult(final_score=final_score, details=result)
def aggregate_scores(b: list[int], t: list[int]):
total_score = 0
alls = b + t
for i in range(len(alls)):
score = alls[i]
total_score += score
return (total_score / len(alls)) * 100
def generate_behavioral_score(eval_array):
print(eval_array)
scores = []
for eval in eval_array:
scores.append(eval.score)
return scores
def aggregate_scores(b: list[int], t: list[int]):
total_score = 0
alls = b + t
for i in range(len(alls)):
score = alls[i]
total_score += score
return (total_score / len(b)) * 100
def generate_behavioral_score(eval_array):
print(eval_array)
scores = []
for eval in eval_array:
scores.append(eval.score)
return scores
def generate_technical_score(skills: str, transcript: str, lang: str):
# total_score = 0
scores = []
for idx, skill in enumerate(skills):
chat = transcript[idx]
if len(chat) > 0:
# print(chat)
transcript_text = f"INTERVIEWEE:\n{chat[0]['question'].lstrip('TECHNICAL: ')}\n\nINTERVIEWER:\n{chat[0]['answer']}"
# TODO: change to structured output
model_parameters = generate_model_parameters(skill, transcript_text, lang)
completion = client.chat.completions.create(
**model_parameters
)
generated = completion.choices[0].message.content
score = 1 if "SUCCESS" in generated else 0
# total_score += score
scores.append(score)
else:
scores.append(-1)
return scores
|