Instructions to use NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2

SGLang

How to use NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2 with Docker Model Runner:
```
docker model run hf.co/NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

  - model fine tune base: cognitivecomputations/dolphin-2_6-phi-2
  - sft
  - flash-attention 2
  - loss: 0.85
  - steps: 3000
  - max_length: 2028
  - neftune_noise_alpha: 5

Install packages

!python -m pip install --upgrade pip
!pip install -q datasets trl peft bitsandbytes sentencepiece wandb
!pip install -q accelerate safetensors deepspeed
!pip install -q scipy

!export CUDA_HOME=/usr/local/cuda-11.8
# !pip install ninja
!pip install ninja packaging --upgrade -qqq
!MAX_JOBS=4 pip install flash-attn --no-build-isolation -qqq
!pip install git+"https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary" -qqq
!python -m pip install optimum -qqq

Ioad model and generate text


from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
    GenerationConfig,
    TextIteratorStreamer,
)
# from attention_sinks import AutoModelForCausalLM

import torch

model_id = "NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2"

model = AutoModelForCausalLM.from_pretrained(model_id,
                                             device_map="auto",
                                             trust_remote_code=True,
                                             torch_dtype=torch.bfloat16,
                                             load_in_4bit=True, 
                                             low_cpu_mem_usage= True,
                                             flash_attn=True,
                                             flash_rotary=True, 
                                             fused_dense=True,
                                             )
max_length=2028 
print("max_length",max_length)

tokenizer = AutoTokenizer.from_pretrained(model_id,
                                          use_fast = True,
                                          max_length=max_length,
                                          trust_remote_code=True,)

prompt= """<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
tengo hambre que me recomiendas<|im_end|>
<|im_start|>assistant"""

inputs = tokenizer.encode(prompt,
                          return_tensors="pt",
                          add_special_tokens=False).cuda()#.to("cuda") # False # True

generation_config = GenerationConfig(
              max_new_tokens=700,
              temperature=0.5,
              top_p=0.9,
              top_k=45,
              repetition_penalty=1.15, #1.1, # 1.0 means no penalty, > 1.0 means penalty, 1.2 from CTRL paper
              do_sample=True,
              pad_token_id=tokenizer.eos_token_id,
              eos_token_id=tokenizer.eos_token_id,
              # bos_token_id=tokenizer.eos_token_id,
              #  use_cache=True,
              # stopping_criteria= StoppingCriteriaList([stopping_criteria]),
          )

outputs = model.generate(generation_config=generation_config,
                         input_ids=inputs,)
# tokenizer.decode(outputs[0], skip_special_tokens=False) #True
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

'''OUTPUT:
<|im_start|>system
You are a helpful AI assistant.<|im_end|>
<|im_start|>user
tengo hambre que me recomiendas<|im_end|>
<|im_start|>assistant
Hay muchos tipos de alimentos que puedes probar si tienes hambre, pero aquí te muestro una lista ordenada por calor:

1- Frutas y verduras: Estas son buena opción para quitar hambre sin agotar tu cuerpo con grandes cantidades de carbohidratos. Algunas frutas y verduras que podrían ser suficientemente altas en calor durante el día incluyen tomates, plátanos, espinacas, papas, nueces, manzanas, limones, guisantes, cucumbers, zanahorias, etc.
2- Proteínas: Estas son importantes para mantener tu masa muscular y fuerzosa durante el día. Algunas proteínas que podrían ser útiles para quitar hambre durante el día incluyen carne, aceite de oliva, miel, yogur, leche fresca o sopa de gorditas, etc.
3- Carbohidratos: Estas son importantes para energizarte durante el día y mantenerte físico. Algunas frutas y verduras que podrían ser útiles para quitar hambre durante el día incluyen pan, tortillas, roti, arroz, pasta, rice, polenta, cereales, granola, etc.
4- Grains: Estas son importantes para mantenerte satiente durante el día y reducir la frecuencia de comidas rápida. Algunas gromas que podrían ser útiles para quitar hambre durante el día incluyen lentejas, farinas, tortilla, ensalada, etc.
5- Nuts y semolina: Estas son buenas opciones para quitar hambre durante el día sin agotar tu cuerpo con grandes cantidades de azúcar. Algunas frutas y verduras que podrían ser útiles para quitar hambre durante el día incluyen anacardios, almendras, macetas, bocaditos, panquesado, etc.
6- Papel picado: Esta es una opción deliciosa y económica que puedes preparar en caso de quitar hambre durante el día. Para hacer papel picado, primero cortezamos las frutas y verduras que deseas usarlas, y luego cortezamos las frutas y verduras que no deseas usarlas. A continuación, cortezamos las frutas y verduras que deseas usarlas más grandes y que estén más frescas, y luego cortezamos las frutas y verduras
'''

Downloads last month: 14

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for NickyNicky/dolphin-2_6-phi-2_oasst2_chatML_V2

Quantizations

1 model