How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="NisargUpadhyay/Qwen3.5-2B-Filter-Extractor",
	filename="qwen3.5-2b.gguf",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3.5-2B Filter Extractor (merged)

Model: Qwen3.5-2B fine-tuned adapter merged into base weights.

Files included:

  • model.safetensors — merged weights
  • config.json — model config
  • tokenizer.json and tokenizer_config.json — tokenizer files
  • generation_config.json — generation defaults
  • chat_template.jinja — chat formatting template

Notes:

  • Architecture: Qwen3.5-2B (hybrid linear + full-attention)
  • dtype: bfloat16
  • This model may not be directly compatible with Ollama/llama.cpp due to hybrid attention layers.

Usage (Transformers):

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("NisargUpadhyay/Qwen3.5-2B-Filter-Extractor")
tokenizer = AutoTokenizer.from_pretrained("NisargUpadhyay/Qwen3.5-2B-Filter-Extractor")
print(tokenizer.encode("Hello\n"))

License: Add license details here.

Downloads last month
130
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support