--- license: apache-2.0 base_model: Qwen/Qwen3-14B tags: - text-generation - conversational - fine-tuned - qwen3 - nova - novamind - lora - qlora - unsloth language: - en pipeline_tag: text-generation library_name: transformers model_type: qwen3 inference: true datasets: - custom metrics: - accuracy widget: - text: "Who are you?" example_title: "Identity" - text: "What is a REST API?" example_title: "Technical Question" - text: "Write a Python function to reverse a string" example_title: "Code Generation" --- # 🧠 Nova2-14B

**Nova2-14B** is a fine-tuned large language model built on top of [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B). It is the core model powering **NovaMind** — an AI chat application developed by **Frederick Sundeep Mallela**. Nova2-14B is a **fully standalone merged model** — the LoRA adapter has been permanently baked into the base weights, requiring no adapter dependency at inference time. --- ## 🚀 Model Description | Property | Value | |---|---| | **Model Name** | Nova2-14B | | **Developer** | Frederick Sundeep Mallela | | **Base Model** | Qwen/Qwen3-14B | | **Fine-tuning Method** | QLoRA (Quantized Low-Rank Adaptation) | | **Fine-tuning Framework** | Unsloth + TRL | | **Model Type** | Causal Language Model | | **Parameters** | ~14.7 Billion | | **Context Length** | 2048 tokens (base supports up to 40K) | | **Language** | English | | **License** | Apache 2.0 | | **Merge Status** | ✅ Fully merged — standalone base model | --- ## 💡 What Makes Nova2-14B Different Nova2-14B retains **all of Qwen3-14B's capabilities** — coding, reasoning, math, multilingual support — while adding a custom persona and identity through supervised fine-tuning: - Responds as **Nova**, an AI assistant created by Frederick - Consistent identity across all conversation styles - Trained to never reveal underlying architecture details - Optimized for use in the **NovaMind** chat application --- ## 🛠️ How to Use ### Basic Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "FrederickSundeep/nova2-14b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", ) model.eval() messages = [ {"role": "system", "content": "You are Nova, an AI assistant created by Frederick."}, {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, enable_thinking=False, return_tensors="pt", ).to(model.device) with torch.no_grad(): outputs = model.generate( input_ids=inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20, do_sample=True, repetition_penalty=1.05, pad_token_id=tokenizer.eos_token_id, ) response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True) print(response) ``` ### With 4-bit Quantization (Low VRAM) ```python from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig import torch bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", ) model_id = "FrederickSundeep/nova2-14b" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, quantization_config=bnb_config, device_map="auto", ) ``` ### Recommended Generation Parameters ```python # For conversational / chat use generation_config = { "temperature": 0.7, "top_p": 0.8, "top_k": 20, "repetition_penalty": 1.05, "do_sample": True, "max_new_tokens": 1024, } # For coding / precise tasks generation_config_precise = { "temperature": 0.3, "top_p": 0.9, "do_sample": True, "max_new_tokens": 2048, } ``` --- ## 🏋️ Training Details ### Fine-tuning Setup | Setting | Value | |---|---| | **Base Model** | unsloth/Qwen3-14B-bnb-4bit | | **Method** | Supervised Fine-Tuning (SFT) with QLoRA | | **LoRA Rank** | 16 | | **LoRA Alpha** | 16 | | **Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | **Batch Size** | 2 (effective 8 with gradient accumulation) | | **Gradient Accumulation** | 4 steps | | **Learning Rate** | 2e-4 | | **Epochs** | 3 | | **Optimizer** | AdamW 8-bit | | **LR Scheduler** | Linear | | **Max Sequence Length** | 2048 | | **Training Hardware** | NVIDIA Tesla T4 (16GB) via Google Colab | | **Training Framework** | Unsloth + TRL SFTTrainer | | **Thinking Mode** | Disabled (enable_thinking=False) | ### Dataset Custom curated dataset of conversational examples covering: - **Identity & persona** — Nova's name, creator, what it is and isn't - **Technical knowledge** — coding, system design, AI/ML concepts - **Personality & tone** — concise, direct, technically precise responses - **Edge cases** — handling questions about underlying architecture --- ## ⚙️ Hardware Requirements | Setup | VRAM | Notes | |---|---|---| | Full fp16 | ~28 GB | A100 80GB or 2x A40 | | 8-bit quantized | ~15 GB | Single A100 40GB or RTX 3090 | | 4-bit quantized | ~9 GB | Single RTX 3080/3090/4090 or T4 | | CPU only | 32 GB RAM | Very slow — not recommended | --- ## 📊 Capabilities Nova2-14B inherits all Qwen3-14B capabilities: - ✅ **Code generation** — Python, JavaScript, TypeScript, Java, C++, SQL, and more - ✅ **Reasoning** — step-by-step logical problem solving - ✅ **Math** — arithmetic to advanced mathematics - ✅ **Instruction following** — precise task execution - ✅ **Multilingual** — 100+ languages (from base model) - ✅ **Long context** — supports up to 40K tokens (base architecture) - ✅ **Tool use** — function calling compatible - ✅ **System prompt** — fully supports custom system prompts --- ## 🔒 Intended Use **Intended for:** - Powering the NovaMind AI chat application - General-purpose AI assistant tasks - Code generation and debugging - Technical question answering - Further fine-tuning as a base model **Not intended for:** - Harmful, unethical, or illegal content generation - Medical or legal advice without human oversight - High-stakes autonomous decision making --- ## ⚠️ Limitations - Fine-tuned on a relatively small custom dataset — may occasionally revert to base Qwen3 behavior in edge cases - Not evaluated on standard benchmarks post fine-tuning - Thinking mode disabled during fine-tuning — re-enable via `enable_thinking=True` in chat template if needed - Context limited to 2048 tokens in fine-tuned configuration (base supports 40K) --- ## 🔗 Related - **NovaMind App:** AI chat application powered by this model - **Base Model:** [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) - **Fine-tuning Framework:** [Unsloth](https://github.com/unslothai/unsloth) - **Developer:** Frederick Sundeep Mallela --- ## 📄 License This model is released under the **Apache 2.0 License**, inheriting the license of the base model Qwen3-14B. See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for full details. --- ## 📝 Citation If you use Nova2-14B in your research or application, please cite: ```bibtex @misc{nova2-14b-2025, author = {Frederick Sundeep Mallela}, title = {Nova2-14B: A Fine-tuned Conversational AI Assistant}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/FrederickSundeep/nova2-14b}}, note = {Fine-tuned from Qwen/Qwen3-14B using QLoRA and Unsloth} } ```