🧠 Nova2-14B

Nova2-14B is a fine-tuned large language model built on top of Qwen/Qwen3-14B. It is the core model powering NovaMind — an AI chat application developed by Frederick Sundeep Mallela.

Nova2-14B is a fully standalone merged model — the LoRA adapter has been permanently baked into the base weights, requiring no adapter dependency at inference time.


🚀 Model Description

Property Value
Model Name Nova2-14B
Developer Frederick Sundeep Mallela
Base Model Qwen/Qwen3-14B
Fine-tuning Method QLoRA (Quantized Low-Rank Adaptation)
Fine-tuning Framework Unsloth + TRL
Model Type Causal Language Model
Parameters ~14.7 Billion
Context Length 2048 tokens (base supports up to 40K)
Language English
License Apache 2.0
Merge Status ✅ Fully merged — standalone base model

💡 What Makes Nova2-14B Different

Nova2-14B retains all of Qwen3-14B's capabilities — coding, reasoning, math, multilingual support — while adding a custom persona and identity through supervised fine-tuning:

  • Responds as Nova, an AI assistant created by Frederick
  • Consistent identity across all conversation styles
  • Trained to never reveal underlying architecture details
  • Optimized for use in the NovaMind chat application

🛠️ How to Use

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "FrederickSundeep/nova2-14b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
model.eval()

messages = [
    {"role": "system", "content": "You are Nova, an AI assistant created by Frederick."},
    {"role": "user",   "content": "Who are you?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.8,
        top_k=20,
        do_sample=True,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
print(response)

With 4-bit Quantization (Low VRAM)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model_id = "FrederickSundeep/nova2-14b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

Recommended Generation Parameters

# For conversational / chat use
generation_config = {
    "temperature": 0.7,
    "top_p": 0.8,
    "top_k": 20,
    "repetition_penalty": 1.05,
    "do_sample": True,
    "max_new_tokens": 1024,
}

# For coding / precise tasks
generation_config_precise = {
    "temperature": 0.3,
    "top_p": 0.9,
    "do_sample": True,
    "max_new_tokens": 2048,
}

🏋️ Training Details

Fine-tuning Setup

Setting Value
Base Model unsloth/Qwen3-14B-bnb-4bit
Method Supervised Fine-Tuning (SFT) with QLoRA
LoRA Rank 16
LoRA Alpha 16
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Batch Size 2 (effective 8 with gradient accumulation)
Gradient Accumulation 4 steps
Learning Rate 2e-4
Epochs 3
Optimizer AdamW 8-bit
LR Scheduler Linear
Max Sequence Length 2048
Training Hardware NVIDIA Tesla T4 (16GB) via Google Colab
Training Framework Unsloth + TRL SFTTrainer
Thinking Mode Disabled (enable_thinking=False)

Dataset

Custom curated dataset of conversational examples covering:

  • Identity & persona — Nova's name, creator, what it is and isn't
  • Technical knowledge — coding, system design, AI/ML concepts
  • Personality & tone — concise, direct, technically precise responses
  • Edge cases — handling questions about underlying architecture

⚙️ Hardware Requirements

Setup VRAM Notes
Full fp16 ~28 GB A100 80GB or 2x A40
8-bit quantized ~15 GB Single A100 40GB or RTX 3090
4-bit quantized ~9 GB Single RTX 3080/3090/4090 or T4
CPU only 32 GB RAM Very slow — not recommended

📊 Capabilities

Nova2-14B inherits all Qwen3-14B capabilities:

  • Code generation — Python, JavaScript, TypeScript, Java, C++, SQL, and more
  • Reasoning — step-by-step logical problem solving
  • Math — arithmetic to advanced mathematics
  • Instruction following — precise task execution
  • Multilingual — 100+ languages (from base model)
  • Long context — supports up to 40K tokens (base architecture)
  • Tool use — function calling compatible
  • System prompt — fully supports custom system prompts

🔒 Intended Use

Intended for:

  • Powering the NovaMind AI chat application
  • General-purpose AI assistant tasks
  • Code generation and debugging
  • Technical question answering
  • Further fine-tuning as a base model

Not intended for:

  • Harmful, unethical, or illegal content generation
  • Medical or legal advice without human oversight
  • High-stakes autonomous decision making

⚠️ Limitations

  • Fine-tuned on a relatively small custom dataset — may occasionally revert to base Qwen3 behavior in edge cases
  • Not evaluated on standard benchmarks post fine-tuning
  • Thinking mode disabled during fine-tuning — re-enable via enable_thinking=True in chat template if needed
  • Context limited to 2048 tokens in fine-tuned configuration (base supports 40K)

🔗 Related

  • NovaMind App: AI chat application powered by this model
  • Base Model: Qwen/Qwen3-14B
  • Fine-tuning Framework: Unsloth
  • Developer: Frederick Sundeep Mallela

📄 License

This model is released under the Apache 2.0 License, inheriting the license of the base model Qwen3-14B.

See LICENSE for full details.


📝 Citation

If you use Nova2-14B in your research or application, please cite:

@misc{nova2-14b-2025,
  author       = {Frederick Sundeep Mallela},
  title        = {Nova2-14B: A Fine-tuned Conversational AI Assistant},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/FrederickSundeep/nova2-14b}},
  note         = {Fine-tuned from Qwen/Qwen3-14B using QLoRA and Unsloth}
}
Downloads last month
397
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FrederickSundeep/nova2-14b

Finetuned
Qwen/Qwen3-14B
Adapter
(378)
this model
Finetunes
2 models