hanzla
/

Falcon3-Mamba-R1-v0

@@ -2,201 +2,138 @@
 library_name: transformers
 tags:
 - mamba
-- falcon3
 - reasoning
 base_model:
 - tiiuae/Falcon3-Mamba-7B-Instruct
 pipeline_tag: text-generation
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-This model is a fine tuned version of Falcon3 Mamba 7 billion instruct.
-It is tailored to reason and build logic before answering to the user question. The Mamba based model scales linearly with increased number of tokens, making it a very fast reasoner while maintaining consistent response quality.
-This is from an earlier checkpoint of the model training.
-- **Developed by:** Hanzla Javaid
-- **Model type:** Mamba
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 library_name: transformers
 tags:
 - mamba
+- deepseek
 - reasoning
 base_model:
 - tiiuae/Falcon3-Mamba-7B-Instruct
 pipeline_tag: text-generation
 ---
+# Model Card: Falcon3-Mamba-R1-v0
 ## Model Details
+**Model Description:**
+This model is a fine-tuned version of Falcon3-Mamba-7B-Instruct, optimized for logical reasoning and structured problem-solving before generating responses.
+It leverages the Mamba architecture, which scales linearly with an increased number of tokens, making it an efficient and fast reasoning model while maintaining high response quality.
+This fine-tuned version comes from an earlier checkpoint of the fine tuning pipeline.
+*   **Developed by:** Hanzla Javaid
+*   **Base Model:** tiiuae/Falcon3-Mamba-7B-Instruct
+*   **Model Type:** Mamba-based causal decoder
+*   **Model Release Date:** March 2025
+## Intended Uses
+**Direct Use:**
+This model is designed for:
+*   Reasoning-heavy tasks (math, logic, and structured problem-solving)
+*   STEM-based question-answering
+*   General-purpose text generation
+**Downstream Use:**
+*   Fine-tuning for domain-specific applications such as finance, law, medicine, and research.
+*   Integration into chatbots and virtual assistants that require advanced reasoning skills.
+*   Enhancement of automated coding assistants with structured logic building.
+**Out-of-Scope Use:**
+*   Misinformation or deceptive applications
+*   Automated decision-making in high-risk fields (e.g., medical diagnosis without human oversight)
+*   Bias-sensitive applications where fairness is critical but not explicitly controlled
+## Bias and Limitations
+**Known Biases:**
+*   The model prioritizes English language data, so performance on multilingual tasks may be weaker.
+*   Fine-tuning may introduce or amplify biases present in the training data, especially in areas like ethics, politics, and cultural perspectives.
+**Technical Limitations:**
+*   Performance may degrade on long-form generation beyond 64K tokens.
+**Recommendations:**
+*   Users should verify outputs for accuracy, especially in critical applications.
+*   Regular bias evaluation should be conducted when deploying in production environments.
+## Getting Started
+To use this model, you can load it with transformers:
+```python
+repo_name = "hanzla/Falcon3-Mamba-R1-v0"
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+tokenizer = AutoTokenizer.from_pretrained(repo_name)
+model = AutoModelForCausalLM.from_pretrained(
+    repo_name,
+    device_map="auto",
+    torch_dtype=torch.float16,
+)
+def generate_text(prompt,generation_model,generation_tokenizer,max_tokens=1024):
+    messages = [
+        {"role": "system", "content": "You are a helpful assistant"},
+        {"role": "user", "content": prompt},
+    ]
+    input_text = generation_tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+    print(input_text)
+    input_ids = generation_tokenizer(input_text, return_tensors="pt").input_ids.to("auto")
+    outputs = generation_model.generate(input_ids, max_new_tokens=max_tokens)
+    generated_tokens = outputs[0][len(input_ids[0]):]
+    return tokenizer.decode(generated_tokens, skip_special_tokens=True)
+```
 ## Training Details
+**Training Procedure:**
+*   **Pretrained Base Model:** Falcon3-Mamba-7B-Instruct
+*   **Fine-tuning Data:** A subset of STEM problems from open-thoughts/OpenThoughts-114k
+*   **Training Strategy:** GRPO
+*   **Training Hyperparameters:**
+    *   **Batch Size:** 4
+    *   **Epochs:** 3
+    *   **Precision:** Mixed (fp16 / bf16)
+    *   **Hardware:** 2xH100 GPUs
 ## Evaluation
+**Testing Data and Metrics:**
+The fine-tuned model's performance was evaluated on a variety of benchmarks to assess its reasoning abilities and overall capabilities.  The table below presents a comparison between the fine-tuned model and the base model:
+| Category      | Benchmark                      | Falcon3-Mamba-R1-v0  | Base Falcon3-Mamba-7B-Instruct |
+|---------------|--------------------------------|----------------------------------------|---------------------------------|
+| General       | MMLU (5-shot)                  | 72.1                                   | 65.3                            |
+| Math          | GSM8K (5-shot)                 | 89.5                                   | 65.2                            |
+| Reasoning     | Arc Challenge (25-shot)        | 75.8                                   | 53.7                            |
+## Technical Specifications
+**Model Architecture:**
+*   **Mamba Blocks:** 64
+*   **Hidden Size:** 4096
+**Software Requirements:**
+*   `transformers >= 4.38`
+*   `torch >= 2.1`
+*   `accelerate >= 0.25`
+*   `mamba-ssm`
+*   `causal-conv1d>=1.4.0`