Text Generation
Transformers
Safetensors
English
qwen3
conversational
fine-tuned
nova
novamind
lora
qlora
unsloth
text-generation-inference
Instructions to use FrederickSundeep/nova2-14b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FrederickSundeep/nova2-14b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="FrederickSundeep/nova2-14b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("FrederickSundeep/nova2-14b") model = AutoModelForCausalLM.from_pretrained("FrederickSundeep/nova2-14b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use FrederickSundeep/nova2-14b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "FrederickSundeep/nova2-14b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrederickSundeep/nova2-14b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/FrederickSundeep/nova2-14b
- SGLang
How to use FrederickSundeep/nova2-14b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "FrederickSundeep/nova2-14b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrederickSundeep/nova2-14b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "FrederickSundeep/nova2-14b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "FrederickSundeep/nova2-14b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use FrederickSundeep/nova2-14b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FrederickSundeep/nova2-14b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for FrederickSundeep/nova2-14b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for FrederickSundeep/nova2-14b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="FrederickSundeep/nova2-14b", max_seq_length=2048, ) - Docker Model Runner
How to use FrederickSundeep/nova2-14b with Docker Model Runner:
docker model run hf.co/FrederickSundeep/nova2-14b
| license: apache-2.0 | |
| base_model: Qwen/Qwen3-14B | |
| tags: | |
| - text-generation | |
| - conversational | |
| - fine-tuned | |
| - qwen3 | |
| - nova | |
| - novamind | |
| - lora | |
| - qlora | |
| - unsloth | |
| language: | |
| - en | |
| pipeline_tag: text-generation | |
| library_name: transformers | |
| model_type: qwen3 | |
| inference: true | |
| datasets: | |
| - custom | |
| metrics: | |
| - accuracy | |
| widget: | |
| - text: "Who are you?" | |
| example_title: "Identity" | |
| - text: "What is a REST API?" | |
| example_title: "Technical Question" | |
| - text: "Write a Python function to reverse a string" | |
| example_title: "Code Generation" | |
| # π§ Nova2-14B | |
| <p align="center"> | |
| <img src="https://img.shields.io/badge/Base%20Model-Qwen3--14B-blue?style=flat-square" /> | |
| <img src="https://img.shields.io/badge/Fine--tuned%20with-Unsloth%20%2B%20QLoRA-green?style=flat-square" /> | |
| <img src="https://img.shields.io/badge/License-Apache%202.0-orange?style=flat-square" /> | |
| <img src="https://img.shields.io/badge/Language-English-red?style=flat-square" /> | |
| <img src="https://img.shields.io/badge/Parameters-14B-purple?style=flat-square" /> | |
| </p> | |
| **Nova2-14B** is a fine-tuned large language model built on top of [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B). | |
| It is the core model powering **NovaMind** β an AI chat application developed by **Frederick Sundeep Mallela**. | |
| Nova2-14B is a **fully standalone merged model** β the LoRA adapter has been permanently baked into the base weights, | |
| requiring no adapter dependency at inference time. | |
| --- | |
| ## π Model Description | |
| | Property | Value | | |
| |---|---| | |
| | **Model Name** | Nova2-14B | | |
| | **Developer** | Frederick Sundeep Mallela | | |
| | **Base Model** | Qwen/Qwen3-14B | | |
| | **Fine-tuning Method** | QLoRA (Quantized Low-Rank Adaptation) | | |
| | **Fine-tuning Framework** | Unsloth + TRL | | |
| | **Model Type** | Causal Language Model | | |
| | **Parameters** | ~14.7 Billion | | |
| | **Context Length** | 2048 tokens (base supports up to 40K) | | |
| | **Language** | English | | |
| | **License** | Apache 2.0 | | |
| | **Merge Status** | β Fully merged β standalone base model | | |
| --- | |
| ## π‘ What Makes Nova2-14B Different | |
| Nova2-14B retains **all of Qwen3-14B's capabilities** β coding, reasoning, math, multilingual support β | |
| while adding a custom persona and identity through supervised fine-tuning: | |
| - Responds as **Nova**, an AI assistant created by Frederick | |
| - Consistent identity across all conversation styles | |
| - Trained to never reveal underlying architecture details | |
| - Optimized for use in the **NovaMind** chat application | |
| --- | |
| ## π οΈ How to Use | |
| ### Basic Usage | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| model_id = "FrederickSundeep/nova2-14b" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| ) | |
| model.eval() | |
| messages = [ | |
| {"role": "system", "content": "You are Nova, an AI assistant created by Frederick."}, | |
| {"role": "user", "content": "Who are you?"}, | |
| ] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=True, | |
| add_generation_prompt=True, | |
| enable_thinking=False, | |
| return_tensors="pt", | |
| ).to(model.device) | |
| with torch.no_grad(): | |
| outputs = model.generate( | |
| input_ids=inputs, | |
| max_new_tokens=512, | |
| temperature=0.7, | |
| top_p=0.8, | |
| top_k=20, | |
| do_sample=True, | |
| repetition_penalty=1.05, | |
| pad_token_id=tokenizer.eos_token_id, | |
| ) | |
| response = tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True) | |
| print(response) | |
| ``` | |
| ### With 4-bit Quantization (Low VRAM) | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig | |
| import torch | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_compute_dtype=torch.float16, | |
| bnb_4bit_use_double_quant=True, | |
| bnb_4bit_quant_type="nf4", | |
| ) | |
| model_id = "FrederickSundeep/nova2-14b" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| quantization_config=bnb_config, | |
| device_map="auto", | |
| ) | |
| ``` | |
| ### Recommended Generation Parameters | |
| ```python | |
| # For conversational / chat use | |
| generation_config = { | |
| "temperature": 0.7, | |
| "top_p": 0.8, | |
| "top_k": 20, | |
| "repetition_penalty": 1.05, | |
| "do_sample": True, | |
| "max_new_tokens": 1024, | |
| } | |
| # For coding / precise tasks | |
| generation_config_precise = { | |
| "temperature": 0.3, | |
| "top_p": 0.9, | |
| "do_sample": True, | |
| "max_new_tokens": 2048, | |
| } | |
| ``` | |
| --- | |
| ## ποΈ Training Details | |
| ### Fine-tuning Setup | |
| | Setting | Value | | |
| |---|---| | |
| | **Base Model** | unsloth/Qwen3-14B-bnb-4bit | | |
| | **Method** | Supervised Fine-Tuning (SFT) with QLoRA | | |
| | **LoRA Rank** | 16 | | |
| | **LoRA Alpha** | 16 | | |
| | **Target Modules** | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | |
| | **Batch Size** | 2 (effective 8 with gradient accumulation) | | |
| | **Gradient Accumulation** | 4 steps | | |
| | **Learning Rate** | 2e-4 | | |
| | **Epochs** | 3 | | |
| | **Optimizer** | AdamW 8-bit | | |
| | **LR Scheduler** | Linear | | |
| | **Max Sequence Length** | 2048 | | |
| | **Training Hardware** | NVIDIA Tesla T4 (16GB) via Google Colab | | |
| | **Training Framework** | Unsloth + TRL SFTTrainer | | |
| | **Thinking Mode** | Disabled (enable_thinking=False) | | |
| ### Dataset | |
| Custom curated dataset of conversational examples covering: | |
| - **Identity & persona** β Nova's name, creator, what it is and isn't | |
| - **Technical knowledge** β coding, system design, AI/ML concepts | |
| - **Personality & tone** β concise, direct, technically precise responses | |
| - **Edge cases** β handling questions about underlying architecture | |
| --- | |
| ## βοΈ Hardware Requirements | |
| | Setup | VRAM | Notes | | |
| |---|---|---| | |
| | Full fp16 | ~28 GB | A100 80GB or 2x A40 | | |
| | 8-bit quantized | ~15 GB | Single A100 40GB or RTX 3090 | | |
| | 4-bit quantized | ~9 GB | Single RTX 3080/3090/4090 or T4 | | |
| | CPU only | 32 GB RAM | Very slow β not recommended | | |
| --- | |
| ## π Capabilities | |
| Nova2-14B inherits all Qwen3-14B capabilities: | |
| - β **Code generation** β Python, JavaScript, TypeScript, Java, C++, SQL, and more | |
| - β **Reasoning** β step-by-step logical problem solving | |
| - β **Math** β arithmetic to advanced mathematics | |
| - β **Instruction following** β precise task execution | |
| - β **Multilingual** β 100+ languages (from base model) | |
| - β **Long context** β supports up to 40K tokens (base architecture) | |
| - β **Tool use** β function calling compatible | |
| - β **System prompt** β fully supports custom system prompts | |
| --- | |
| ## π Intended Use | |
| **Intended for:** | |
| - Powering the NovaMind AI chat application | |
| - General-purpose AI assistant tasks | |
| - Code generation and debugging | |
| - Technical question answering | |
| - Further fine-tuning as a base model | |
| **Not intended for:** | |
| - Harmful, unethical, or illegal content generation | |
| - Medical or legal advice without human oversight | |
| - High-stakes autonomous decision making | |
| --- | |
| ## β οΈ Limitations | |
| - Fine-tuned on a relatively small custom dataset β may occasionally revert to base Qwen3 behavior in edge cases | |
| - Not evaluated on standard benchmarks post fine-tuning | |
| - Thinking mode disabled during fine-tuning β re-enable via `enable_thinking=True` in chat template if needed | |
| - Context limited to 2048 tokens in fine-tuned configuration (base supports 40K) | |
| --- | |
| ## π Related | |
| - **NovaMind App:** AI chat application powered by this model | |
| - **Base Model:** [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) | |
| - **Fine-tuning Framework:** [Unsloth](https://github.com/unslothai/unsloth) | |
| - **Developer:** Frederick Sundeep Mallela | |
| --- | |
| ## π License | |
| This model is released under the **Apache 2.0 License**, inheriting the license of the base model Qwen3-14B. | |
| See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for full details. | |
| --- | |
| ## π Citation | |
| If you use Nova2-14B in your research or application, please cite: | |
| ```bibtex | |
| @misc{nova2-14b-2025, | |
| author = {Frederick Sundeep Mallela}, | |
| title = {Nova2-14B: A Fine-tuned Conversational AI Assistant}, | |
| year = {2025}, | |
| publisher = {HuggingFace}, | |
| howpublished = {\url{https://huggingface.co/FrederickSundeep/nova2-14b}}, | |
| note = {Fine-tuned from Qwen/Qwen3-14B using QLoRA and Unsloth} | |
| } | |
| ``` |