aphoticshaman commited on
Commit
d6fbb14
·
verified ·
1 Parent(s): 2312047

Add comprehensive model card

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen2.5-72B-Instruct
4
+ tags:
5
+ - math
6
+ - reasoning
7
+ - qwen2
8
+ - merged
9
+ - aimo3
10
+ library_name: transformers
11
+ pipeline_tag: text-generation
12
+ model-index:
13
+ - name: elle-72b-ultimate
14
+ results: []
15
+ ---
16
+
17
+ # Elle-72B-Ultimate
18
+
19
+ ## Model Description
20
+
21
+ Elle-72B-Ultimate is a fine-tuned version of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) optimized for mathematical reasoning and problem-solving, specifically designed for the AI Mathematical Olympiad Progress Prize 3 (AIMO3) competition.
22
+
23
+ This is a **merged full model** (LoRA adapter merged into base weights).
24
+
25
+ ## Model Details
26
+
27
+ - **Base Model**: Qwen/Qwen2.5-72B-Instruct
28
+ - **Parameters**: 72B
29
+ - **Precision**: BF16
30
+ - **Format**: Safetensors (31 shards)
31
+ - **Training Method**: LoRA (r=64, α=128)
32
+
33
+ ## Training Data
34
+
35
+ Fine-tuned on mathematical reasoning datasets including:
36
+ - NuminaMath-CoT
37
+ - Custom mathematical reasoning examples
38
+
39
+ ## Intended Use
40
+
41
+ - Mathematical problem solving
42
+ - Olympiad-style competition problems
43
+ - Code generation for computational solutions
44
+ - Chain-of-thought reasoning
45
+
46
+ ## Limitations
47
+
48
+ - **Size**: ~144GB in BF16 - requires significant VRAM
49
+ - **Quantization Recommended**: For inference on consumer hardware, use AWQ or GPTQ quantized versions
50
+
51
+ ## Usage
52
+
53
+ ```python
54
+ from transformers import AutoModelForCausalLM, AutoTokenizer
55
+
56
+ model = AutoModelForCausalLM.from_pretrained(
57
+ "aphoticshaman/elle-72b-ultimate",
58
+ torch_dtype="auto",
59
+ device_map="auto",
60
+ trust_remote_code=True
61
+ )
62
+ tokenizer = AutoTokenizer.from_pretrained("aphoticshaman/elle-72b-ultimate")
63
+
64
+ messages = [
65
+ {"role": "system", "content": "You are an expert mathematical problem solver."},
66
+ {"role": "user", "content": "Find all positive integers n such that n^2 + 1 divides n^3 + 1."}
67
+ ]
68
+
69
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
70
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
71
+ outputs = model.generate(**inputs, max_new_tokens=2048)
72
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
73
+ ```
74
+
75
+ ## Citation
76
+
77
+ ```bibtex
78
+ @misc{elle-72b-ultimate,
79
+ author = {aphoticshaman},
80
+ title = {Elle-72B-Ultimate: Mathematical Reasoning Model},
81
+ year = {2024},
82
+ publisher = {HuggingFace},
83
+ url = {https://huggingface.co/aphoticshaman/elle-72b-ultimate}
84
+ }
85
+ ```