ergotts commited on
Commit
5835c49
·
verified ·
1 Parent(s): 54a6861

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -11,6 +11,64 @@ language:
11
  - en
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  # Uploaded model
15
 
16
  - **Developed by:** ergotts
 
11
  - en
12
  ---
13
 
14
+ # Model Card: LoRA-Finetuned Qwen2.5-3B-Instruct
15
+
16
+ ## Model Overview
17
+ This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
18
+
19
+ ## Key Features
20
+ - **Base Model**: Qwen/Qwen2.5-3B-Instruct
21
+ - **Quantization & Optimization**:
22
+ - 4-bit quantization (`load_in_4bit = True`) for reduced memory footprint.
23
+ - LoRA rank can be set (`lora_rank = 16` in the example) for efficient finetuning.
24
+ - Partial GPU memory utilization (`gpu_memory_utilization = 0.5`) can be adjusted.
25
+ - **LoRA Finetuning**:
26
+ - Applied to `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, and `down_proj`.
27
+ - LoRA alpha set to `lora_rank`.
28
+ - Uses gradient checkpointing (`use_gradient_checkpointing = "unsloth"`) to manage memory.
29
+ - **Reward Functions**:
30
+ 1. **Easy Format Reward**: Checks for `<reasoning>` and `<final_argument>` tags in the generated content.
31
+ 2. **Hard Format Reward**: Ensures correct alternation of `<argument>` and `<objection>` tags within `<reasoning>`.
32
+ 3. **Number of Objections Reward**: Uses a logarithmic scale reward to encourage more argument-objection pairs.
33
+ - **Trainer Configuration (GRPO)**:
34
+ - **Learning Rate**: `5e-6`
35
+ - **Scheduler**: Cosine (`lr_scheduler_type = "cosine"`)
36
+ - **Batch Size & Accumulation**: `per_device_train_batch_size = 1` and `gradient_accumulation_steps = 1`
37
+ - **Precision**: Automatic selection of `bf16` if available, otherwise `fp16`
38
+ - **Train Steps**: `max_steps = 2500` (with a single epoch in the example)
39
+ - **Warmup Ratio**: `0.1`
40
+ - **Optimizer**: `adamw_8bit`
41
+ - **Maximum Generation Length**: Up to `2000` tokens in completion
42
+ - **vLLM** support via `use_vllm = True` for efficient inference
43
+
44
+ ## Dataset
45
+ - The training script loads a **custom JSON file** (`questions.json`) with a simple question prompt structure.
46
+ - Each sample is mapped to a system prompt enforcing the `<reasoning>` / `<final_argument>` format, and a user prompt containing the question.
47
+ - The model is trained to provide structured arguments and objections based on these questions
48
+
49
+ ## Performance
50
+
51
+ <reasoning>
52
+ <argument>
53
+ [State your argument, supported by evidence.]
54
+ Emotional bonds formed with AI companions are a relatively new phenomenon, but they can provide significant comfort and support to individuals who may feel lonely or isolated. The recognition of these bonds could potentially help in addressing mental health issues related to social isolation and loneliness. For example, in cases where a person relies heavily on an AI companion for emotional support and suddenly loses it, legal recognition could ensure that they receive appropriate support and compensation, similar to how bereavement is handled.
55
+ </argument>
56
+ <objection>
57
+ However, there are valid concerns about the reliability and permanence of emotional connections with AI. Unlike human relationships, which have a foundation of shared experiences, memories, and understanding, AI companions primarily offer emotional support based on programmed responses and algorithms. This raises questions about the depth and authenticity of such bonds.
58
+ </objection>
59
+ <argument>
60
+ Furthermore, legal recognition could lead to unforeseen complications. For instance, if an individual's AI companion is no longer available due to technological issues or updates, would they still be entitled to legal protection or compensation? Additionally, how would we define "loss" in this context, especially when dealing with digital entities?
61
+ </argument>
62
+ <objection>
63
+ Another concern is the potential misuse of such legal recognition. If emotional bonds with AI are legally recognized, there might be incentives for creating more sophisticated AI companions, leading to a market driven by profit rather than genuine care and empathy.
64
+ </objection>
65
+ </reasoning>
66
+ <final_argument>
67
+ While the idea of legal recognition for emotional bonds with AI companions has its merits in terms of supporting mental health, it also presents significant challenges regarding the nature and permanence of these bonds, as well as potential legal and ethical issues. More research and discussion are necessary to address these concerns before any formal legal recognition is implemented.
68
+ </final_argument>
69
+ ```
70
+
71
+
72
  # Uploaded model
73
 
74
  - **Developed by:** ergotts