Instructions to use Masterjp123/SnowyRP-V2-13B-L2_BetaTest with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Masterjp123/SnowyRP-V2-13B-L2_BetaTest with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Masterjp123/SnowyRP-V2-13B-L2_BetaTest")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Masterjp123/SnowyRP-V2-13B-L2_BetaTest")
model = AutoModelForCausalLM.from_pretrained("Masterjp123/SnowyRP-V2-13B-L2_BetaTest")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Masterjp123/SnowyRP-V2-13B-L2_BetaTest with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Masterjp123/SnowyRP-V2-13B-L2_BetaTest"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Masterjp123/SnowyRP-V2-13B-L2_BetaTest",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Masterjp123/SnowyRP-V2-13B-L2_BetaTest

SGLang

How to use Masterjp123/SnowyRP-V2-13B-L2_BetaTest with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Masterjp123/SnowyRP-V2-13B-L2_BetaTest" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Masterjp123/SnowyRP-V2-13B-L2_BetaTest",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Masterjp123/SnowyRP-V2-13B-L2_BetaTest" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Masterjp123/SnowyRP-V2-13B-L2_BetaTest",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Masterjp123/SnowyRP-V2-13B-L2_BetaTest with Docker Model Runner:
```
docker model run hf.co/Masterjp123/SnowyRP-V2-13B-L2_BetaTest
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model

This is the Bf16 unquantized version of SnowyRP V2 Beta And the First Public Beta Model in the SnowyRP series of models!

BF16

NOTE: this model has gave me issues when I tried to quantize it, So I guess if you want, IDK get the bloke the do it, after all they do stuff better than me anyways.

Merge Details

just originally made V2beta to be a test, But it seems like it is good, So I am quantizing it.

These models CAN and WILL produce X rated or harmful content, due to being heavily uncensored in a attempt to not limit or make the model worse.

This Model has a Very good knowledge base and understands anatomy decently, Plus this Model is VERY versitle and is great for General assistant work, RP and ERP, RPG RPs and much more.

Model Use:

This model is very good... WITH THE RIGHT SETTINGS. I personally use microstat mixed with dynamic temp with epsion cut off and eta cut off.

    Optimal Settings (so far)

  Microstat Mode: 2
    tau: 2.95
    eta: 0.05

  Dynamic Temp
    min: 0.25
    max: 1.8

  Cut offs
    epsilon: 3
    eta: 3

Merge Method

This model was merged using the ties merge method using TheBloke/Llama-2-13B-fp16 as a base.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

base_model:
  model:
    path: TheBloke/Llama-2-13B-fp16
dtype: bfloat16
merge_method: ties
parameters:
  int8_mask: 1.0
  normalize: 1.0
slices:
- sources:
  - layer_range: [0, 40]
    model:
      model:
        path: Masterjp123/Snowyrp-V2B-P1
    parameters:
      density: [1.0, 0.7, 0.1]
      weight: 1.0
  - layer_range: [0, 40]
    model:
      model:
        path: Masterjp123/SnowyRP-FinalV1-L2-13B
    parameters:
      density: 0.5
      weight: [0.0, 0.3, 0.7, 1.0]
  - layer_range: [0, 40]
    model:
      model:
        path: sauce1337/BerrySauce-L2-13b
    parameters:
      density: 0.33
      weight:
      - filter: mlp
        value: 0.5
      - value: 0.0
  - layer_range: [0, 40]
    model:
      model:
        path: TheBloke/Llama-2-13B-fp16

for Masterjp123/Snowyrp-V2B-P1

base_model:
  model:
    path: TheBloke/Llama-2-13B-fp16
dtype: bfloat16
merge_method: ties
parameters:
  int8_mask: 1.0
  normalize: 1.0
slices:
- sources:
  - layer_range: [0, 40]
    model:
      model:
        path: Sao10K/Stheno-1.8-L2-13B
    parameters:
      density: [1.0, 0.7, 0.1]
      weight: 1.0
  - layer_range: [0, 40]
    model:
      model:
        path: ValiantLabs/ShiningValiantXS
    parameters:
      density: 0.5
      weight: [0.0, 0.3, 0.7, 1.0]
  - layer_range: [0, 40]
    model:
      model:
        path: posicube/Llama2-chat-AYB-13B
    parameters:
      density: 0.33
      weight:
      - filter: mlp
        value: 0.5
      - value: 0.0
  - layer_range: [0, 40]
    model:
      model:
        path: TheBloke/Llama-2-13B-fp16