HarshadAbleCredit commited on
Commit
d38695e
·
verified ·
1 Parent(s): 959ef24

Add a readme file

Browse files
Files changed (1) hide show
  1. README.md +87 -3
README.md CHANGED
@@ -1,3 +1,87 @@
1
- ---
2
- license: llama3.2
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.2
3
+ ---
4
+
5
+ # AbleCredit Reasoner R0 Llama 3.2 3B Instruct
6
+
7
+ ## Introduction
8
+
9
+ This model is trained by Deepseek R1 style (GRPO) reinforcement learning on Llama 3.2 3B Instruct as a base model.
10
+ Primarily intended for research in application of small LLMs trained using GRPO/RL in the domain of finance, credit underwriting etc.
11
+
12
+ ### Model Description
13
+
14
+ - **Fine Tuned by:** AbleCredit (LightBees Technologies Private Limited, Bengaluru, India)
15
+ - **License:** We've retained the original Llama community license for this model
16
+ - **Finetuned from model [optional]:** meta-llama/Llama-3.2-3B-Instruct
17
+
18
+ ## How to Get Started with the Model
19
+
20
+ Use with standard Huggingface based setup
21
+
22
+ ```python
23
+ model_name = "AbleCredit/AbleCredit-R0-Llama-3.2-3B-Instruct" # or local path to model
24
+ system_prompt = {
25
+ "role": "system",
26
+ "content": (
27
+ "You are a helpful assistant. User asks a question the assistant answers it.\n"
28
+ "The assistant first thinks about reasoning process in mind and then provides the user with the answer."
29
+ ),
30
+ }
31
+
32
+ suffix_prompt = {
33
+ "role": "assistant",
34
+ "content": "Let me solve this step by step.\n<think>",
35
+ }
36
+
37
+ prompt_msgs = [
38
+ system_prompt,
39
+ {"role": "user", "content": "What is 15 times 3 ?"},
40
+ suffix_prompt,
41
+ ]
42
+
43
+ base_model = AutoModelForCausalLM.from_pretrained(
44
+ model_name,
45
+ device_map="auto",
46
+ torch_dtype=torch.bfloat16,
47
+ )
48
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
49
+
50
+ prompt = tokenizer.apply_chat_template(
51
+ prompt_msgs,
52
+ tokenize=False,
53
+ continue_final_message=True,
54
+ add_generation_prompt=False,
55
+ )
56
+
57
+ # Tokenize the prompt and move it to the appropriate device.
58
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
59
+
60
+ print("\nGenerating response...\n")
61
+ outputs = model.generate(
62
+ **inputs,
63
+ max_new_tokens=1024,
64
+ temperature=0.5,
65
+ min_p=0.01,
66
+ )
67
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
68
+ print("\nResponse:\n", response)
69
+ ```
70
+
71
+ ## Training Details
72
+
73
+ ### Training Data
74
+
75
+ Trained using open source logical reasoning datasets and a proprietary finance dataset created by AbleCredit.com.
76
+
77
+ ### Training Procedure
78
+
79
+ Trained using deepseek style reinforcement learning using GRPO with rule based rewards.
80
+
81
+ ## Evaluation
82
+
83
+ - Model achieves ~64% score on GSM8K benchmark in a **zero shot** setting (check benchmarking script for more details).
84
+
85
+ ## Model Card Contact
86
+
87
+ [contact Harshad Saykhedkar via LinkedIn](https://www.linkedin.com/in/harshadss/)