| | ---
|
| | base_model:
|
| | - Qwen/Qwen2.5-3B-Instruct
|
| | pipeline_tag: text-generation
|
| | library_name: transformers
|
| | language:
|
| | - zho
|
| | - eng
|
| | - fra
|
| | - spa
|
| | - por
|
| | - deu
|
| | - ita
|
| | - rus
|
| | - jpn
|
| | - kor
|
| | - vie
|
| | - tha
|
| | - ara
|
| | ---
|
| | # AbleCredit Reasoner R0 Qwen 2.5 3B Instruct
|
| |
|
| | ## Introduction
|
| |
|
| | This model is trained by Deepseek R1 style (GRPO) reinforcement learning on Qwen 2.5 3B Instruct as a base model.
|
| | Primarily intended for research in application of small LLMs trained using GRPO/RL in the domain of finance, credit underwriting etc.
|
| |
|
| | ### Model Description
|
| |
|
| | - **Fine Tuned by:** AbleCredit (LightBees Technologies Private Limited, Bengaluru, India)
|
| | - **License:** We've retained the original Qwen research license. Note that license does not allow commercial use.
|
| | - **Finetuned from model:** Qwen/Qwen2.5-3B-Instruct
|
| |
|
| | ## How to Get Started with the Model
|
| |
|
| | Use with standard Huggingface based setup
|
| |
|
| | ```python
|
| | model_name = "AbleCredit/AbleCredit-R0-Qwen-2.5-3B-Instruct" # or local path to model
|
| | system_prompt = {
|
| | "role": "system",
|
| | "content": (
|
| | "You are a helpful assistant. User asks a question the assistant answers it.\n"
|
| | "The assistant first thinks about reasoning process in mind and then provides the user with the answer."
|
| | ),
|
| | }
|
| |
|
| | suffix_prompt = {
|
| | "role": "assistant",
|
| | "content": "Let me solve this step by step.\n<think>",
|
| | }
|
| |
|
| | prompt_msgs = [
|
| | system_prompt,
|
| | {"role": "user", "content": "What is 15 times 3 ?"},
|
| | suffix_prompt,
|
| | ]
|
| |
|
| | base_model = AutoModelForCausalLM.from_pretrained(
|
| | model_name,
|
| | device_map="auto",
|
| | torch_dtype=torch.bfloat16,
|
| | )
|
| | tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| |
|
| | prompt = tokenizer.apply_chat_template(
|
| | prompt_msgs,
|
| | tokenize=False,
|
| | continue_final_message=True,
|
| | add_generation_prompt=False,
|
| | )
|
| |
|
| | # Tokenize the prompt and move it to the appropriate device.
|
| | inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
|
| |
|
| | print("\nGenerating response...\n")
|
| | outputs = model.generate(
|
| | **inputs,
|
| | max_new_tokens=1024,
|
| | temperature=0.5,
|
| | min_p=0.01,
|
| | )
|
| | response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| | print("\nResponse:\n", response)
|
| | ```
|
| |
|
| | ## Training Details
|
| |
|
| | ### Training Data
|
| |
|
| | Trained using open source logical reasoning datasets and a proprietary finance dataset created by AbleCredit.com.
|
| |
|
| | ### Training Procedure
|
| |
|
| | Trained using deepseek style reinforcement learning using GRPO with rule based rewards.
|
| |
|
| | ## Evaluation
|
| |
|
| | - Model achieves ~67% score on GSM8K benchmark in a **zero shot** setting (check benchmarking script for more details).
|
| |
|
| | ## Model Card Contact
|
| |
|
| | [contact Harshad Saykhedkar via LinkedIn](https://www.linkedin.com/in/harshadss/) |