muhammad-mujtaba-ai commited on
Commit
f184350
·
verified ·
1 Parent(s): 88ab8cd

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -276,7 +276,7 @@ contract DecentralizedLibrary is Ownable(msg.sender) {
276
  ```
277
 
278
  # Evaluation Matrics
279
- To evaluate the performance of our fine-tuned LLM specialized in Solidity smart contract generation, we used **[Slither](https://github.com/crytic/slither)**, a static analysis framework widely used for analyzing Solidity code.
280
 
281
  We focused on six key evaluation criteria:
282
 
@@ -292,15 +292,17 @@ Using Slither’s gas optimization analysis, we identified areas in the generate
292
  - **Security Vulnerabilities**
293
  We analyzed each contract for known security vulnerabilities using Slither’s built-in detectors. We recorded the number and severity of the vulnerabilities detected, providing a measure of the security quality of the model’s outputs.
294
 
295
- - **Average Lines of Code**
296
- This metric provides insight into the verbosity or conciseness of the model’s output. Higher LOC may suggest redundancy or complete code, while lower LOC could indicate either efficiency or missing implementation details, depending on context.
297
 
298
- - **Correctness of Code**
299
- To assess how well the generated code aligns with the given prompt and category, We conducted both manual and OpenAI LLM evaluation of each generated contract. The prompt and the generated code were keenly observed for alignment analysis.
300
 
301
- These evaluation metrics help quantify the practical usability and reliability of the generated smart contracts in real-world scenarios.
 
302
 
303
 
 
304
 
305
 
306
  # Summary
 
276
  ```
277
 
278
  # Evaluation Matrics
279
+ To evaluate the performance of our fine-tuned LLM specialized in Solidity smart contract generation, we used **[Slither](https://github.com/crytic/slither)**, a static analysis framework widely used for analyzing Solidity code. Additionally, we used both automated LLM-based assessments and expert human evaluations to ensure a comprehensive benchmarking approach.
280
 
281
  We focused on six key evaluation criteria:
282
 
 
292
  - **Security Vulnerabilities**
293
  We analyzed each contract for known security vulnerabilities using Slither’s built-in detectors. We recorded the number and severity of the vulnerabilities detected, providing a measure of the security quality of the model’s outputs.
294
 
295
+ - **Average Lines of Code (LOC)**
296
+ Captures the average number of lines per generated contract, excluding blank lines but including comments. This metric reflects code verbosity or conciseness, and helps gauge implementation completeness versus potential redundancy.
297
 
298
+ - **Correctness (OpenAI Evaluation)**
299
+ Evaluates how accurately the generated contract matches the intended prompt using GPT-4o Mini. Prompts and outputs are scored against a structured rubric, providing a scalable LLM-based perspective on prompt alignment.
300
 
301
+ - **Correctness (Human Evaluation)**
302
+ Involves manual review by a blockchain expert to assess how well the output satisfies the original prompt and category. This provides human-validated insight into the practical applicability and quality of the generated code.
303
 
304
 
305
+ These metrics collectively provide a multi-dimensional view of the model’s effectiveness, spanning correctness, efficiency, security, and usability. They are designed to reflect both automated benchmarks and real-world developer expectations.
306
 
307
 
308
  # Summary