Model Card for Fikra Nano 1B
Fikra AI is a lightweight, high-efficiency reasoning model based on the Falcon-E-1B architecture. It has been fine-tuned using LoRA (Low-Rank Adaptation) on a "mixed diet" strategy, combining general instruction following (Dolly 15k) with intensive mathematical reasoning (GSM8K).
It is designed to be a "Generalist-Reasoner" capable of running on edge devices while maintaining logic capabilities often found in larger models.
Model Details
Model Description
This is the model card of 🤗 Fikra 1B Nano v0.2
- Developed by: Lacesse Ventures (Kenya)
- Shared by : James Miano
- Model type: Causal Language Model (LoRA Adapter)
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: tiiuae/Falcon-E-1B-Base
Model Sources
- Repository: https://huggingface.co/lacesseapp/Fikra-1B-Nano-v0.2
- Paper: https://huggingface.co/tiiuae/Falcon-E-1B-Base
- Demo: To be added later
Uses
Direct Use
This model is intended for:
Edge-Device Inference: Due to its 1B size, it is highly suitable for local deployment on consumer hardware or mobile-class inference.
Mathematical Reasoning: Specialized in solving GSM8K-style math word problems.
Instruction Following: General chatbot capabilities derived from the Dolly dataset.
Out-of-Scope Use
Production Critical Systems: As a 1B model, hallucinations are possible. It should not be used for medical or legal advice.
Long Context: The model was trained with a sequence length of 512. Inputs longer than this may be truncated or result in degraded performance.
Bias, Risks, and Limitations
The "1B Ceiling": A 1-Billion parameter model is tiny. It will occasionally make grammar errors or lose the thread in long conversations. This is physics; we cannot fix it, only mitigate it.
Hallucination: If you ask it about something it doesn't know (e.g., "Who won the 1932 Kenyan election?"), it will likely make up a name rather than say "I don't know."
Bias: It is trained on Western data (Dolly/GSM8K). While we fixed the capital of Kenya, it may still default to US/Euro-centric examples in business or culture unless specifically fine-tuned on local data later.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information to be added for further recommendations.
How to Get Started with the Model
You can load this model using peft and transformers. The following code snippet is ready to run in Google Colab or a local Python environment.
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
# 1. Setup the Model ID
# Replace 'YOUR_USERNAME' with your actual Hugging Face username where you uploaded the model
peft_model_id = "YOUR_USERNAME/falcon-1.58-reasoner-v1"
# 2. Load Configuration & Base Model
# We load the config first to find the base model (Falcon-E-1B) automatically
config = PeftConfig.from_pretrained(peft_model_id)
print(f"Loading base model: {config.base_model_name_or_path}")
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
load_in_8bit=False, # Set to True if using bitsandbytes for less memory
device_map='auto',
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
# 3. Load the Fikra AI Adapter (The Fine-tune)
model = PeftModel.from_pretrained(model, peft_model_id)
# 4. Run Inference
prompt = "User: If I have 3 apples and eat one, how many do I have?\nAnswer:"
# Tokenize input
batch = tokenizer(prompt, return_tensors='pt').to(model.device)
# Generate response
with torch.cuda.amp.autocast():
output_tokens = model.generate(
**batch,
max_new_tokens=50,
pad_token_id=tokenizer.eos_token_id
)
# Decode output
print("\nModel Response:")
print(tokenizer.decode(output_tokens[0], skip_special_tokens=True))
Training Details
Training Data
The model was trained on a shuffled concatenation of two primary datasets:
Databricks Dolly 15k (Subset): 10,000 samples. Used for conversational formatting and general instruction following.
GSM8K (Main): 5,000 samples. Used to inject mathematical logic and chain-of-thought reasoning patterns.
Training Procedure
The model was trained using Safe Burn Mode to maximize stability on limited VRAM.
Training Hyperparameters
Training regime: Mixed Precision (FP16)
Optimizer: paged_adamw_32bit
Learning Rate: 2e-4
Batch Size: 1 (per device)
Gradient Accumulation: 16 steps
Effective Batch Size: 16
Epochs: 2
LoRA Config:
Rank (r): 32
Alpha: 64
Dropout: 0.05
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Results
Internal scrutiny of the training logs reveals:
nitial Loss: 8.47
Final Loss: ~1.74
Convergence: The model reached its optimal performance "sweet spot" at Step 1450 (Loss: 1.69). Training beyond this point yielded diminishing returns.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: T4 GPU x 2
- Hours used: ~5 Hours
- Cloud Provider: Kaggle
- Compute Region: US
- Carbon Emitted: ~0.15 kg CO2eq
Model Card Authors
James Miano (https://linkedin.com/in/jamesmiano)
Model Card Contact
Email: support@lacesse.co.ke
- Downloads last month
- 41