psychologyphd's picture
Update README.md
54d37c8 verified
---
license: llama2
datasets:
- b-mc2/sql-create-context
language:
- en
metrics:
- accuracy
base_model:
- codellama/CodeLlama-7b-hf
pipeline_tag: text-generation
tags:
- PEFT
- llama
- text-to-sql
- code
- lora
- nlp
- mps
- conversational
model_index:
- name: CodeLlama-7b-SQL-LoRA
results:
- task:
type: text-generation
name: Text-to-SQL
dataset:
name: sql-create-context
type: b-mc2/sql-create-context
metrics:
- type: accuracy
value: 71.0%
name: Exact Match
---
# CodeLlama-7b Text-to-SQL-MPS-FineTuned-v4 (Fine-Tuned on Mac MPS)
## Model Description
This is the 4th version of the fine-tuned **CodeLlama-7b-hf** specifically optimized for **Text-to-SQL** tasks.
It was trained on a **MacBook Pro M3** using **MPS (Metal Performance Shaders)** acceleration.
This version demostrate that MPS fine tuning can achieve **71.0%** accuracy based on **Exact Match**.
### Origin & Adaptation
This project is adapted from the **Microsoft "Generative AI for Beginners" Course (Chapter 18: Fine-tuning)**.
- Original Source: [Generative AI for Beginners](https://github.com/microsoft/generative-ai-for-beginners)
- Old Version: psychologyphd/CodeLlama-7b-Text-to-SQL-mps-finetuned
- **Modifications**: tons of modifications to achieve this accuracy.
## Evaluation Results
The model was evaluated on a held-out test set from the `b-mc2/sql-create-context` dataset.
| Metric | Value |
| --- | --- |
| **Accuracy** | **71.0%** |
| Evaluation Method | **Exact Match** |
| Framework | PEFT (LoRA) |
### Performance Notes:
* **Contextual Understanding:** The model shows strong performance in mapping natural language questions to complex SQL schemas provided in the context.
* **Limitations:** 71% accuracy indicates that while the model handles standard filter and aggregates well,
* eyeballing the results, the model struggles with table that needs to be created(index 10,39) possibly due to lack of such complex training examples,
* quotes (actually those cases should be considered correct because schema is not given),
* and applying correct function in selection (such as count).
### Example Output:
* **Correct Output:**
πŸ”Ή Index: 0
🎯 Truth: SELECT home FROM table_name_11 WHERE date = "16 april 2008"
πŸ€– Gen : select home from table_name_11 where date = '16 april 2008'
----------------------------------------
πŸ”Ή Index: 1
🎯 Truth: SELECT MAX(game) FROM table_name_34 WHERE team = "celtics" AND high_assists = "hedo türkoğlu (4)"
πŸ€– Gen : select max(game) from table_name_34 where team = 'celtics' and high_assists = 'hedo tΓΌrkoğlu (4)'
----------------------------------------
πŸ”Ή Index: 2
🎯 Truth: SELECT country FROM table_name_17 WHERE score = 72 - 66 - 72 = 210
πŸ€– Gen : select country from table_name_17 where score = 72 - 66 - 72 = 210
----------------------------------------
πŸ”Ή Index: 3
🎯 Truth: SELECT AVG(gold) FROM table_name_66 WHERE sport = "athletics" AND silver > 42
πŸ€– Gen : select avg(gold) from table_name_66 where sport = 'athletics' and silver > 42
----------------------------------------
πŸ”Ή Index: 4
🎯 Truth: SELECT club FROM table_name_36 WHERE head_coach = "casemiro mior"
πŸ€– Gen : select club from table_name_36 where head_coach = 'casemiro mior'
----------------------------------------
πŸ”Ή Index: 5
🎯 Truth: SELECT COUNT(high_points) FROM table_23186738_6 WHERE record = "5-17"
πŸ€– Gen : select count(high_points) from table_23186738_6 where record = '5-17'
----------------------------------------
πŸ”Ή Index: 6
🎯 Truth: SELECT date FROM table_name_10 WHERE away_team = "st kilda"
πŸ€– Gen : select date from table_name_10 where away_team = 'st kilda'
----------------------------------------
πŸ”Ή Index: 8
🎯 Truth: SELECT sail_number FROM table_25595209_1 WHERE skipper = "Matt Allen"
πŸ€– Gen : select sail_number from table_25595209_1 where skipper = 'matt allen'
----------------------------------------
πŸ”Ή Index: 9
🎯 Truth: SELECT venue FROM table_name_74 WHERE home_team = "melbourne"
πŸ€– Gen : select venue from table_name_74 where home_team = 'melbourne'
----------------------------------------
πŸ”Ή Index: 11
🎯 Truth: SELECT time FROM table_name_1 WHERE event = "k-1 the challenge 1999"
πŸ€– Gen : select time from table_name_1 where event = 'k-1 the challenge 1999'
----------------------------------------
πŸ”Ή Index: 12
🎯 Truth: SELECT COUNT(gold) FROM table_name_34 WHERE silver = 2 AND total < 7
πŸ€– Gen : select count(gold) from table_name_34 where silver = 2 and total < 7
----------------------------------------
πŸ”Ή Index: 13
🎯 Truth: SELECT player FROM table_name_92 WHERE team = "chicago bulls"
πŸ€– Gen : select player from table_name_92 where team = 'chicago bulls'
----------------------------------------
πŸ”Ή Index: 14
🎯 Truth: SELECT player FROM table_2679061_12 WHERE college_junior_club_team = "Litvinov (Czechoslovakia)"
πŸ€– Gen : select player from table_2679061_12 where college_junior_club_team = 'litvinov (czechoslovakia)'
----------------------------------------
πŸ”Ή Index: 15
🎯 Truth: SELECT DISTINCT name FROM instructor ORDER BY name
πŸ€– Gen : select distinct name from instructor order by name
----------------------------------------
πŸ”Ή Index: 17
🎯 Truth: SELECT school FROM table_11677691_2 WHERE college = "South Carolina"
πŸ€– Gen : select school from table_11677691_2 where college = 'south carolina'
----------------------------------------
πŸ”Ή Index: 19
🎯 Truth: SELECT venue FROM table_name_47 WHERE score = "0–0"
πŸ€– Gen : select venue from table_name_47 where score = '0–0'
----------------------------------------
πŸ”Ή Index: 20
🎯 Truth: SELECT name FROM table_name_88 WHERE nationality = "france" AND lane < 3
πŸ€– Gen : select name from table_name_88 where nationality = 'france' and lane < 3
----------------------------------------
πŸ”Ή Index: 21
🎯 Truth: SELECT nation FROM table_name_54 WHERE total < 19 AND bronze < 1
πŸ€– Gen : select nation from table_name_54 where total < 19 and bronze < 1
----------------------------------------
πŸ”Ή Index: 22
🎯 Truth: SELECT record FROM table_name_72 WHERE date = "october 27"
πŸ€– Gen : select record from table_name_72 where date = 'october 27'
----------------------------------------
πŸ”Ή Index: 23
🎯 Truth: SELECT score FROM table_name_33 WHERE date = "october 17, 2007"
πŸ€– Gen : select score from table_name_33 where date = 'october 17, 2007'
----------------------------------------
πŸ”Ή Index: 24
🎯 Truth: SELECT nationality FROM table_name_60 WHERE position = "forward" AND years_for_grizzlies = "2011"
πŸ€– Gen : select nationality from table_name_60 where position = 'forward' and years_for_grizzlies = '2011'
----------------------------------------
πŸ”Ή Index: 25
🎯 Truth: SELECT surface FROM table_name_45 WHERE partner = "galina voskoboeva"
πŸ€– Gen : select surface from table_name_45 where partner = 'galina voskoboeva'
----------------------------------------
πŸ”Ή Index: 27
🎯 Truth: SELECT rank FROM table_name_96 WHERE bronze < 7 AND nation = "norway"
πŸ€– Gen : select rank from table_name_96 where bronze < 7 and nation = 'norway'
----------------------------------------
πŸ”Ή Index: 28
🎯 Truth: SELECT sanskrt FROM table_name_38 WHERE japanese = "jayana"
πŸ€– Gen : select sanskrt from table_name_38 where japanese = 'jayana'
----------------------------------------
πŸ”Ή Index: 32
🎯 Truth: SELECT format FROM table_name_74 WHERE type = "primary" AND call_letters = "kbjs"
πŸ€– Gen : select format from table_name_74 where type = 'primary' and call_letters = 'kbjs'
----------------------------------------
πŸ”Ή Index: 34
🎯 Truth: SELECT MIN(byes) FROM table_name_3 WHERE against = 1946 AND wins > 2
πŸ€– Gen : select min(byes) from table_name_3 where against = 1946 and wins > 2
----------------------------------------
πŸ”Ή Index: 36
🎯 Truth: SELECT date FROM table_name_61 WHERE attendance = "79,431"
πŸ€– Gen : select date from table_name_61 where attendance = '79,431'
----------------------------------------
πŸ”Ή Index: 38
🎯 Truth: SELECT MIN(manhunt_international) FROM table_30018460_1
πŸ€– Gen : select min(manhunt_international) from table_30018460_1
----------------------------------------
πŸ”Ή Index: 40
🎯 Truth: SELECT COUNT(*) FROM device
πŸ€– Gen : select count(*) from device
----------------------------------------
πŸ”Ή Index: 42
🎯 Truth: SELECT COUNT(played) FROM table_name_42 WHERE position < 4 AND team = "witton albion"
πŸ€– Gen : select count(played) from table_name_42 where position < 4 and team = 'witton albion'
----------------------------------------
πŸ”Ή Index: 43
🎯 Truth: SELECT score FROM table_name_98 WHERE loss = "embree (1-2)"
πŸ€– Gen : select score from table_name_98 where loss = 'embree (1-2)'
----------------------------------------
πŸ”Ή Index: 44
🎯 Truth: SELECT date_of_appointment FROM table_name_91 WHERE manner_of_departure = "sacked"
πŸ€– Gen : select date_of_appointment from table_name_91 where manner_of_departure = 'sacked'
----------------------------------------
πŸ”Ή Index: 45
🎯 Truth: SELECT catalog FROM table_name_33 WHERE label = "grilled cheese"
πŸ€– Gen : select catalog from table_name_33 where label = 'grilled cheese'
----------------------------------------
πŸ”Ή Index: 46
🎯 Truth: SELECT medal FROM table_name_48 WHERE games = "2008 beijing" AND sport = "taekwondo"
πŸ€– Gen : select medal from table_name_48 where games = '2008 beijing' and sport = 'taekwondo'
----------------------------------------
πŸ”Ή Index: 47
🎯 Truth: SELECT engine FROM table_name_49 WHERE team = "arciero racing"
πŸ€– Gen : select engine from table_name_49 where team = 'arciero racing'
----------------------------------------
πŸ”Ή Index: 49
🎯 Truth: SELECT municipality FROM table_name_53 WHERE station = "benson street"
πŸ€– Gen : select municipality from table_name_53 where station = 'benson street'
----------------------------------------
* **Wrong Output:**
πŸ”Ή Index: 7
🎯 Truth: SELECT AVG(drawn) FROM table_name_45 WHERE points_1 = "33" AND goals_against = 45 AND played < 28
πŸ€– Gen : select avg(drawn) from table_name_45 where points_1 = 33 and goals_against = 45 and played < 28
----------------------------------------
πŸ”Ή Index: 10
🎯 Truth: SELECT major, age FROM student WHERE NOT stuid IN (SELECT T1.stuid FROM student AS T1 JOIN has_pet AS T2 ON T1.stuid = T2.stuid JOIN pets AS T3 ON T3.petid = T2.petid WHERE T3.pettype = 'cat')
πŸ€– Gen : select t1.major, t1.age from student as t1 join has_pet as t2 on t1.stuid = t2.stuid join pets as t3 on t2.petid = t3.petid where t3.pettype <> 'cat'
----------------------------------------
πŸ”Ή Index: 16
🎯 Truth: SELECT COUNT(goals) FROM table_name_49 WHERE goal_ratio < 0.8 AND games = 56
πŸ€– Gen : select goals from table_name_49 where goal_ratio < 0.8 and games = 56
----------------------------------------
πŸ”Ή Index: 18
🎯 Truth: SELECT SUM(lost) FROM table_name_86 WHERE points = 11 AND drawn > 1
πŸ€– Gen : select sum(lost) from table_name_86 where points > 11 and drawn > 1
----------------------------------------
πŸ”Ή Index: 26
🎯 Truth: SELECT reward FROM table_1893276_2 WHERE air_date = "October 6, 2005"
πŸ€– Gen : select count(reward) from table_1893276_2 where air_date = 'october 6, 2005'
----------------------------------------
πŸ”Ή Index: 29
🎯 Truth: SELECT name FROM table_name_19 WHERE dcsf_number = 3373
πŸ€– Gen : select name from table_name_19 where dcsf_number = '3373'
----------------------------------------
πŸ”Ή Index: 30
🎯 Truth: SELECT COUNT(average) FROM table_12094609_1 WHERE evening_gown = "8.988"
πŸ€– Gen : select count(average) from table_12094609_1 where evening_gown = 8.988
----------------------------------------
πŸ”Ή Index: 31
🎯 Truth: SELECT writer_s_ FROM table_17810099_3 WHERE episode__number = "14"
πŸ€– Gen : select writer_s_ from table_17810099_3 where episode__number = 14
----------------------------------------
πŸ”Ή Index: 33
🎯 Truth: SELECT location FROM table_name_65 WHERE round = 3 AND method = "decision"
πŸ€– Gen : select location from table_name_65 where round = '3' and method = 'won or loss by a decision'
----------------------------------------
πŸ”Ή Index: 35
🎯 Truth: SELECT written_by FROM table_27823359_1 WHERE season__number = "23"
πŸ€– Gen : select written_by from table_27823359_1 where season__number = 23
----------------------------------------
πŸ”Ή Index: 37
🎯 Truth: SELECT attendance FROM table_name_53 WHERE game_site = "bye"
πŸ€– Gen : select count(attendance) from table_name_53 where game_site = 'bye'
----------------------------------------
πŸ”Ή Index: 39
🎯 Truth: SELECT T1.fname, T1.lname FROM student AS T1 JOIN lives_in AS T2 ON T1.stuid = T2.stuid WHERE T2.dormid IN (SELECT T3.dormid FROM has_amenity AS T3 JOIN dorm_amenity AS T4 ON T3.amenid = T4.amenid WHERE T4.amenity_name = 'TV Lounge')
πŸ€– Gen : select t1.fname, t1.lname from student as t1 join lives_in as t2 on t1.stuid = t2.stuid join has_amenity as t3 on t2.dormid = t3.dormid join dorm_amenity as t4 on t3.amenid = t4.amenid where t4.amenity_name = 'tv lounge' group by t1.st
----------------------------------------
πŸ”Ή Index: 41
🎯 Truth: SELECT attendance FROM table_name_65 WHERE tie_no = "4"
πŸ€– Gen : select count(attendance) from table_name_65 where tie_no = '4'
----------------------------------------
πŸ”Ή Index: 48
🎯 Truth: SELECT home_team AS score FROM table_name_49 WHERE away_team = "hawthorn"
πŸ€– Gen : select home_team from table_name_49 where away_team = 'hawthorn'
----------------------------------------
## How to Use
See how_to_use_v4.ipynb.
- mps pipeline actually works in fine tuning. For this light weighted how to use, I still use model.generate. .
## Training Details
- **Hardware**: Mac M3 (MPS)
- **Base Model**: [codellama/CodeLlama-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf)
- **Dataset**: [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context)
- **Technique**: LoRA (PEFT)