Add comprehensive model card

Browse files

Files changed (1) hide show

README.md +85 -0

README.md ADDED Viewed

	@@ -0,0 +1,85 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-72B-Instruct
+tags:
+  - math
+  - reasoning
+  - qwen2
+  - merged
+  - aimo3
+library_name: transformers
+pipeline_tag: text-generation
+model-index:
+  - name: elle-72b-ultimate
+    results: []
+---
+# Elle-72B-Ultimate
+## Model Description
+Elle-72B-Ultimate is a fine-tuned version of [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) optimized for mathematical reasoning and problem-solving, specifically designed for the AI Mathematical Olympiad Progress Prize 3 (AIMO3) competition.
+This is a **merged full model** (LoRA adapter merged into base weights).
+## Model Details
+- **Base Model**: Qwen/Qwen2.5-72B-Instruct
+- **Parameters**: 72B
+- **Precision**: BF16
+- **Format**: Safetensors (31 shards)
+- **Training Method**: LoRA (r=64, α=128)
+## Training Data
+Fine-tuned on mathematical reasoning datasets including:
+- NuminaMath-CoT
+- Custom mathematical reasoning examples
+## Intended Use
+- Mathematical problem solving
+- Olympiad-style competition problems
+- Code generation for computational solutions
+- Chain-of-thought reasoning
+## Limitations
+- **Size**: ~144GB in BF16 - requires significant VRAM
+- **Quantization Recommended**: For inference on consumer hardware, use AWQ or GPTQ quantized versions
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(
+    "aphoticshaman/elle-72b-ultimate",
+    torch_dtype="auto",
+    device_map="auto",
+    trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained("aphoticshaman/elle-72b-ultimate")
+messages = [
+    {"role": "system", "content": "You are an expert mathematical problem solver."},
+    {"role": "user", "content": "Find all positive integers n such that n^2 + 1 divides n^3 + 1."}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=2048)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Citation
+```bibtex
+@misc{elle-72b-ultimate,
+  author = {aphoticshaman},
+  title = {Elle-72B-Ultimate: Mathematical Reasoning Model},
+  year = {2024},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/aphoticshaman/elle-72b-ultimate}
+}
+```