nova2-14b / README.md
FrederickSundeep's picture
Create README.md
6930929 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen3-14B
tags:
  - text-generation
  - conversational
  - fine-tuned
  - qwen3
  - nova
  - novamind
  - lora
  - qlora
  - unsloth
language:
  - en
pipeline_tag: text-generation
library_name: transformers
model_type: qwen3
inference: true
datasets:
  - custom
metrics:
  - accuracy
widget:
  - text: Who are you?
    example_title: Identity
  - text: What is a REST API?
    example_title: Technical Question
  - text: Write a Python function to reverse a string
    example_title: Code Generation

🧠 Nova2-14B

Nova2-14B is a fine-tuned large language model built on top of Qwen/Qwen3-14B. It is the core model powering NovaMind β€” an AI chat application developed by Frederick Sundeep Mallela.

Nova2-14B is a fully standalone merged model β€” the LoRA adapter has been permanently baked into the base weights, requiring no adapter dependency at inference time.


πŸš€ Model Description

Property Value
Model Name Nova2-14B
Developer Frederick Sundeep Mallela
Base Model Qwen/Qwen3-14B
Fine-tuning Method QLoRA (Quantized Low-Rank Adaptation)
Fine-tuning Framework Unsloth + TRL
Model Type Causal Language Model
Parameters ~14.7 Billion
Context Length 2048 tokens (base supports up to 40K)
Language English
License Apache 2.0
Merge Status βœ… Fully merged β€” standalone base model

πŸ’‘ What Makes Nova2-14B Different

Nova2-14B retains all of Qwen3-14B's capabilities β€” coding, reasoning, math, multilingual support β€” while adding a custom persona and identity through supervised fine-tuning:

  • Responds as Nova, an AI assistant created by Frederick
  • Consistent identity across all conversation styles
  • Trained to never reveal underlying architecture details
  • Optimized for use in the NovaMind chat application

πŸ› οΈ How to Use

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "FrederickSundeep/nova2-14b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
model.eval()

messages = [
    {"role": "system", "content": "You are Nova, an AI assistant created by Frederick."},
    {"role": "user",   "content": "Who are you?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    enable_thinking=False,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=512,
        temperature=0.7,
        top_p=0.8,
        top_k=20,
        do_sample=True,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)
print(response)

With 4-bit Quantization (Low VRAM)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model_id = "FrederickSundeep/nova2-14b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

Recommended Generation Parameters

# For conversational / chat use
generation_config = {
    "temperature": 0.7,
    "top_p": 0.8,
    "top_k": 20,
    "repetition_penalty": 1.05,
    "do_sample": True,
    "max_new_tokens": 1024,
}

# For coding / precise tasks
generation_config_precise = {
    "temperature": 0.3,
    "top_p": 0.9,
    "do_sample": True,
    "max_new_tokens": 2048,
}

πŸ‹οΈ Training Details

Fine-tuning Setup

Setting Value
Base Model unsloth/Qwen3-14B-bnb-4bit
Method Supervised Fine-Tuning (SFT) with QLoRA
LoRA Rank 16
LoRA Alpha 16
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Batch Size 2 (effective 8 with gradient accumulation)
Gradient Accumulation 4 steps
Learning Rate 2e-4
Epochs 3
Optimizer AdamW 8-bit
LR Scheduler Linear
Max Sequence Length 2048
Training Hardware NVIDIA Tesla T4 (16GB) via Google Colab
Training Framework Unsloth + TRL SFTTrainer
Thinking Mode Disabled (enable_thinking=False)

Dataset

Custom curated dataset of conversational examples covering:

  • Identity & persona β€” Nova's name, creator, what it is and isn't
  • Technical knowledge β€” coding, system design, AI/ML concepts
  • Personality & tone β€” concise, direct, technically precise responses
  • Edge cases β€” handling questions about underlying architecture

βš™οΈ Hardware Requirements

Setup VRAM Notes
Full fp16 ~28 GB A100 80GB or 2x A40
8-bit quantized ~15 GB Single A100 40GB or RTX 3090
4-bit quantized ~9 GB Single RTX 3080/3090/4090 or T4
CPU only 32 GB RAM Very slow β€” not recommended

πŸ“Š Capabilities

Nova2-14B inherits all Qwen3-14B capabilities:

  • βœ… Code generation β€” Python, JavaScript, TypeScript, Java, C++, SQL, and more
  • βœ… Reasoning β€” step-by-step logical problem solving
  • βœ… Math β€” arithmetic to advanced mathematics
  • βœ… Instruction following β€” precise task execution
  • βœ… Multilingual β€” 100+ languages (from base model)
  • βœ… Long context β€” supports up to 40K tokens (base architecture)
  • βœ… Tool use β€” function calling compatible
  • βœ… System prompt β€” fully supports custom system prompts

πŸ”’ Intended Use

Intended for:

  • Powering the NovaMind AI chat application
  • General-purpose AI assistant tasks
  • Code generation and debugging
  • Technical question answering
  • Further fine-tuning as a base model

Not intended for:

  • Harmful, unethical, or illegal content generation
  • Medical or legal advice without human oversight
  • High-stakes autonomous decision making

⚠️ Limitations

  • Fine-tuned on a relatively small custom dataset β€” may occasionally revert to base Qwen3 behavior in edge cases
  • Not evaluated on standard benchmarks post fine-tuning
  • Thinking mode disabled during fine-tuning β€” re-enable via enable_thinking=True in chat template if needed
  • Context limited to 2048 tokens in fine-tuned configuration (base supports 40K)

πŸ”— Related

  • NovaMind App: AI chat application powered by this model
  • Base Model: Qwen/Qwen3-14B
  • Fine-tuning Framework: Unsloth
  • Developer: Frederick Sundeep Mallela

πŸ“„ License

This model is released under the Apache 2.0 License, inheriting the license of the base model Qwen3-14B.

See LICENSE for full details.


πŸ“ Citation

If you use Nova2-14B in your research or application, please cite:

@misc{nova2-14b-2025,
  author       = {Frederick Sundeep Mallela},
  title        = {Nova2-14B: A Fine-tuned Conversational AI Assistant},
  year         = {2025},
  publisher    = {HuggingFace},
  howpublished = {\url{https://huggingface.co/FrederickSundeep/nova2-14b}},
  note         = {Fine-tuned from Qwen/Qwen3-14B using QLoRA and Unsloth}
}