| --- |
| base_model: meta-llama/Meta-Llama-3.1-8B |
| library_name: peft |
| license: llama3.1 |
| datasets: |
| - openai/gsm8k |
| language: |
| - en |
| metrics: |
| - accuracy |
| - perplexity |
| pipeline_tag: text-generation |
| tags: |
| - llama.cpp |
| - unsloth |
| - transformers |
| - math |
| - custom-instruction |
| - LoRA |
| --- |
| |
| # ๐งฎ EfficientMath-AI (Llama 3.1 8B) |
|
|
| ## ๐ Project Overview |
| EfficientMath-AI is a parameter-efficient fine-tuned (PEFT) version of Meta's **Llama-3.1-8B**, specifically optimized to solve multi-step, grade-school math word problems. It was trained using LoRA (Low-Rank Adaptation) and compressed into a 4-bit quantized GGUF format, allowing it to perform high-level mathematical reasoning efficiently on standard CPU hardware. |
|
|
| **Creator:** Abhay Aditya |
| **Live Interactive Demo:** [EfficientMath-AI Web App](https://huggingface.co/spaces/iamabhayaditya/EfficientMath-AI) |
|
|
| ## ๐ง Model Details |
| * **Base Model:** `meta-llama/Meta-Llama-3.1-8B` |
| * **Fine-Tuning Method:** LoRA (Rank = 16, Alpha = 16) via Unsloth |
| * **Dataset:** GSM8K (Grade School Math 8K) |
| * **Quantization:** `Q4_K_M` (4-bit GGUF) |
| * **Parameters:** 8 Billion |
| * **Deployment Context:** Designed for high-speed, CPU-only inference via `llama.cpp`. |
|
|
| ## ๐ Evaluation & Performance |
| The model was evaluated against a rigorous test split of the GSM8K dataset, focusing on strict numeric extraction and step-by-step reasoning coherence. |
| * **Overall Accuracy:** 66% |
| * **Training Hardware:** Single NVIDIA T4 GPU (Free Tier) |
| * **Inference Hardware Requirement:** ~8GB RAM (Basic CPU) |
|
|
|  |
|
|
| ### Diagnostic Insights: |
| 1. **Perplexity:** The model exhibits a tightly clustered, low perplexity distribution (between 2.5 and 4.0), demonstrating high confidence and fluency in generating mathematical syntax. |
| 2. **Complexity Ceiling:** The model achieves near 80% accuracy on short word problems, maintaining a concise and highly accurate "Chain of Thought" without hallucinating verbose responses. Like many 8B class models, accuracy scales inversely with prompt length on highly complex, multi-paragraph logic puzzles. |
|
|
| ## ๐ป Usage Example (Python) |
| If you wish to run this model locally, you can use `llama-cpp-python`: |
|
|
| ```python |
| from llama_cpp import Llama |
| |
| llm = Llama( |
| model_path="Meta-Llama-3.1-8B.Q4_K_M.gguf", |
| n_ctx=2048, |
| n_threads=4 |
| ) |
| |
| output = llm( |
| "Below is a math word problem. Solve it step by step and provide the final answer.\n\n### Problem:\nIf the cost of 18 apples is 90 rupees, what is the cost of 24 apples?\n\n### Solution:\n", |
| max_tokens=256, |
| temperature=0.2, |
| stop=["<|eot_id|>"] |
| ) |
| print(output["choices"][0]["text"]) |