HarshadAbleCredit commited on
Commit
de8995e
·
verified ·
1 Parent(s): c4eef99

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -5
README.md CHANGED
@@ -1,5 +1,83 @@
1
- ---
2
- license: other
3
- license_name: qwen-research-license
4
- license_link: LICENSE
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AbleCredit Reasoner R0 Qwen 2.5 3B Instruct
2
+
3
+ ## Introduction
4
+
5
+ This model is trained by Deepseek R1 style (GRPO) reinforcement learning on Qwen 2.5 3B Instruct as a base model.
6
+ Primarily intended for research in application of small LLMs trained using GRPO/RL in the domain of finance, credit underwriting etc.
7
+
8
+ ### Model Description
9
+
10
+ - **Fine Tuned by:** AbleCredit (LightBees Technologies Private Limited, Bengaluru, India)
11
+ - **License:** We've retained the original Qwen research license. Note that license does not allow commercial use.
12
+ - **Finetuned from model:** Qwen/Qwen2.5-3B-Instruct
13
+
14
+ ## How to Get Started with the Model
15
+
16
+ Use with standard Huggingface based setup
17
+
18
+ ```python
19
+ model_name = "AbleCredit/AbleCredit-R0-Qwen-2.5-3B-Instruct" # or local path to model
20
+ system_prompt = {
21
+ "role": "system",
22
+ "content": (
23
+ "You are a helpful assistant. User asks a question the assistant answers it.\n"
24
+ "The assistant first thinks about reasoning process in mind and then provides the user with the answer."
25
+ ),
26
+ }
27
+
28
+ suffix_prompt = {
29
+ "role": "assistant",
30
+ "content": "Let me solve this step by step.\n<think>",
31
+ }
32
+
33
+ prompt_msgs = [
34
+ system_prompt,
35
+ {"role": "user", "content": "What is 15 times 3 ?"},
36
+ suffix_prompt,
37
+ ]
38
+
39
+ base_model = AutoModelForCausalLM.from_pretrained(
40
+ model_name,
41
+ device_map="auto",
42
+ torch_dtype=torch.bfloat16,
43
+ )
44
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
45
+
46
+ prompt = tokenizer.apply_chat_template(
47
+ prompt_msgs,
48
+ tokenize=False,
49
+ continue_final_message=True,
50
+ add_generation_prompt=False,
51
+ )
52
+
53
+ # Tokenize the prompt and move it to the appropriate device.
54
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
55
+
56
+ print("\nGenerating response...\n")
57
+ outputs = model.generate(
58
+ **inputs,
59
+ max_new_tokens=1024,
60
+ temperature=0.5,
61
+ min_p=0.01,
62
+ )
63
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
64
+ print("\nResponse:\n", response)
65
+ ```
66
+
67
+ ## Training Details
68
+
69
+ ### Training Data
70
+
71
+ Trained using open source logical reasoning datasets and a proprietary finance dataset created by AbleCredit.com.
72
+
73
+ ### Training Procedure
74
+
75
+ Trained using deepseek style reinforcement learning using GRPO with rule based rewards.
76
+
77
+ ## Evaluation
78
+
79
+ - Model achieves ~67% score on GSM8K benchmark in a **zero shot** setting (check benchmarking script for more details).
80
+
81
+ ## Model Card Contact
82
+
83
+ [contact Harshad Saykhedkar via LinkedIn](https://www.linkedin.com/in/harshadss/)