Instructions to use p1atdev/nekoqarasu-14b-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use p1atdev/nekoqarasu-14b-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="p1atdev/nekoqarasu-14b-chat", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("p1atdev/nekoqarasu-14b-chat", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use p1atdev/nekoqarasu-14b-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "p1atdev/nekoqarasu-14b-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "p1atdev/nekoqarasu-14b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/p1atdev/nekoqarasu-14b-chat
- SGLang
How to use p1atdev/nekoqarasu-14b-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "p1atdev/nekoqarasu-14b-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "p1atdev/nekoqarasu-14b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "p1atdev/nekoqarasu-14b-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "p1atdev/nekoqarasu-14b-chat", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use p1atdev/nekoqarasu-14b-chat with Docker Model Runner:
docker model run hf.co/p1atdev/nekoqarasu-14b-chat
Nekoqarasu
nekoqarasu-14b-chat = rinna/nekomata-14b + lightblue/qarasu-14B-chat-plus-unleashed - Qwen/Qwen-14B
Example
pip install accelerate transformers tiktoken einops scipy transformers_stream_generator
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
MODEL_NAME = "p1atdev/nekoqarasu-14b-chat"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
load_in_4bit=True,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
model.eval()
# model = torch.compile(model) # recommended if you're using linux
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
prompt = """\
# system
誠実で紳士的で優秀なAIアシスタントとして、簡潔でわかりやすく役に立つ回答を自信をもって答えなさい。
# question
まどか☆マギカでは誰が一番かわいい?
# answer
"""
input_ids = tokenizer(prompt, return_tensors="pt", add_special_tokens=False).to(model.device)
_ = model.generate(
**input_ids,
max_new_tokens=256,
do_sample=True,
top_k=20,
top_p=0.95,
temperature=1.0,
repetition_penalty=1.1,
num_beams=1,
eos_token_id=151643,
pad_token_id=151643,
streamer=streamer
)
The output:
この問題は主観的な評価に基づくため、個人の好みや考え方によって異なることがあります。しかし、一般的に「まどか☆マギカ」の登場人物の中で最も可愛いとされるのは、鹿目まどか(かなめ まどか)です。彼女は純粋で優しい性格でありながら、他のキャラクターたちとは一味違う可愛らしさを持っています。
ただし、「可愛い」という基準には個人差があり、同じ作品に対する感覚や好みも異なることがあります。そのため、特定のキャラクターを選ぶだけでなく、複数のキャラクターから好きな要素を見つけたり、全体的に楽しむことが大切だと思います。
Merge process
See merge.ipynb. (this worked on 8GB VRAM GPU)
Features
- Fluent Japanese responses
- Good knowledge of common sense in Japanese culture
Limitation
This model often generates too long responses, that are not related to user's instructions or questions. Also, this model does not seem to know how to end a text.
- Due to these issues, ChatML format prompt works worse compared to other Qwen based models, such as Qwen-Chat or Qarasu.
Since I have not measured any benchmarks, this model cannot be quantitatively evaluated.
License
- Downloads last month
- 24