Model Card for Fikra Nano 1B

Fikra AI is a lightweight, high-efficiency reasoning model based on the Falcon-E-1B architecture. It has been fine-tuned using LoRA (Low-Rank Adaptation) on a "mixed diet" strategy, combining general instruction following (Dolly 15k) with intensive mathematical reasoning (GSM8K).

It is designed to be a "Generalist-Reasoner" capable of running on edge devices while maintaining logic capabilities often found in larger models.

Model Details

Model Description

This is the model card of 🤗 Fikra 1B Nano v0.2

Developed by: Lacesse Ventures (Kenya)
Shared by : James Miano
Model type: Causal Language Model (LoRA Adapter)
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: tiiuae/Falcon-E-1B-Base

Model Sources

Repository: https://huggingface.co/lacesseapp/Fikra-1B-Nano-v0.2
Paper: https://huggingface.co/tiiuae/Falcon-E-1B-Base
Demo: To be added later

Uses

Direct Use

This model is intended for:

Edge-Device Inference: Due to its 1B size, it is highly suitable for local deployment on consumer hardware or mobile-class inference.
Mathematical Reasoning: Specialized in solving GSM8K-style math word problems.
Instruction Following: General chatbot capabilities derived from the Dolly dataset.

Out-of-Scope Use

Production Critical Systems: As a 1B model, hallucinations are possible. It should not be used for medical or legal advice.

Long Context: The model was trained with a sequence length of 512. Inputs longer than this may be truncated or result in degraded performance.

Bias, Risks, and Limitations

The "1B Ceiling": A 1-Billion parameter model is tiny. It will occasionally make grammar errors or lose the thread in long conversations. This is physics; we cannot fix it, only mitigate it.

Hallucination: If you ask it about something it doesn't know (e.g., "Who won the 1932 Kenyan election?"), it will likely make up a name rather than say "I don't know."

Bias: It is trained on Western data (Dolly/GSM8K). While we fixed the capital of Kenya, it may still default to US/Euro-centric examples in business or culture unless specifically fine-tuned on local data later.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information to be added for further recommendations.

How to Get Started with the Model

You can load this model using peft and transformers. The following code snippet is ready to run in Google Colab or a local Python environment.

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

# 1. Setup the Model ID
# Replace 'YOUR_USERNAME' with your actual Hugging Face username where you uploaded the model
peft_model_id = "YOUR_USERNAME/falcon-1.58-reasoner-v1"

# 2. Load Configuration & Base Model
# We load the config first to find the base model (Falcon-E-1B) automatically
config = PeftConfig.from_pretrained(peft_model_id)

print(f"Loading base model: {config.base_model_name_or_path}")
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    load_in_8bit=False,  # Set to True if using bitsandbytes for less memory
    device_map='auto',
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# 3. Load the Fikra AI Adapter (The Fine-tune)
model = PeftModel.from_pretrained(model, peft_model_id)

# 4. Run Inference
prompt = "User: If I have 3 apples and eat one, how many do I have?\nAnswer:"

# Tokenize input
batch = tokenizer(prompt, return_tensors='pt').to(model.device)

# Generate response
with torch.cuda.amp.autocast():
    output_tokens = model.generate(
        **batch, 
        max_new_tokens=50,
        pad_token_id=tokenizer.eos_token_id
    )

# Decode output
print("\nModel Response:")
print(tokenizer.decode(output_tokens[0], skip_special_tokens=True))

Training Details

Training Data

The model was trained on a shuffled concatenation of two primary datasets:

Databricks Dolly 15k (Subset): 10,000 samples. Used for conversational formatting and general instruction following.
GSM8K (Main): 5,000 samples. Used to inject mathematical logic and chain-of-thought reasoning patterns.

Training Procedure

The model was trained using Safe Burn Mode to maximize stability on limited VRAM.

Training Hyperparameters

Training regime: Mixed Precision (FP16)

Optimizer: paged_adamw_32bit

Learning Rate: 2e-4

Batch Size: 1 (per device)

Gradient Accumulation: 16 steps

Effective Batch Size: 16

Epochs: 2

LoRA Config:

Rank (r): 32

Alpha: 64

Dropout: 0.05

Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Results

Internal scrutiny of the training logs reveals:

nitial Loss: 8.47

Final Loss: ~1.74

Convergence: The model reached its optimal performance "sweet spot" at Step 1450 (Loss: 1.69). Training beyond this point yielded diminishing returns.