Instructions to use TomGrc/FusionNet_7Bx2_MoE_14B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TomGrc/FusionNet_7Bx2_MoE_14B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TomGrc/FusionNet_7Bx2_MoE_14B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TomGrc/FusionNet_7Bx2_MoE_14B")
model = AutoModelForCausalLM.from_pretrained("TomGrc/FusionNet_7Bx2_MoE_14B")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TomGrc/FusionNet_7Bx2_MoE_14B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TomGrc/FusionNet_7Bx2_MoE_14B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TomGrc/FusionNet_7Bx2_MoE_14B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TomGrc/FusionNet_7Bx2_MoE_14B

SGLang

How to use TomGrc/FusionNet_7Bx2_MoE_14B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TomGrc/FusionNet_7Bx2_MoE_14B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TomGrc/FusionNet_7Bx2_MoE_14B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TomGrc/FusionNet_7Bx2_MoE_14B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TomGrc/FusionNet_7Bx2_MoE_14B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use TomGrc/FusionNet_7Bx2_MoE_14B with Docker Model Runner:
```
docker model run hf.co/TomGrc/FusionNet_7Bx2_MoE_14B
```

GGUF Version

by SimSim93 - opened Jan 14, 2024

Discussion

SimSim93

Jan 14, 2024

•

edited Jan 14, 2024

First of all, thanks a lot for your mixture of expert model!
Just out of curiosity, which models did you merge?

@TheBloke could you please provide us with a gguf quant version? =)

Thank you all for the awesome work you do for the community!

SimSim93 changed discussion title from GGML Version to GGUF Version Jan 14, 2024

DKRacingFan

Jan 14, 2024

@TheBloke this would be great!

testerav

Jan 15, 2024

yeah waiting

Pumba2

Jan 15, 2024

•

edited Jan 15, 2024

Leaderboard shows some impressive results...

pramjana

Jan 15, 2024

@TheBloke would really appreciate a GGUF version for this model, thank you!

johnnnna

Jan 15, 2024

@TheBloke please consider making a GGUF version of this. Just look at its scores

Pumba2

Jan 16, 2024

•

edited Jan 16, 2024

Strange @TheBloke is not doing gguf for this model.
EDIT: i meant it doesnt seem so as its 16th and he must have received the notifications regarding this post.

testerav

Jan 16, 2024

Strange @TheBloke is not doing gguf for this model.

did you ask him?

FlorianJc

Jan 18, 2024

TheBloke is not your slave. Is it too hard to call a Python script to do it yourself ?

johnnnna

Jan 19, 2024

TheBloke is not your slave. Is it too hard to call a Python script to do it yourself ?

I said "please consider". Maybe you need to learn to read better

testerav

Jan 19, 2024

•

edited Jan 21, 2024

TheBloke is not your slave. Is it too hard to call a Python script to do it yourself ?

among the tons of models he compiles , what's wrong with doing one on the leaderboard.
Also isn't that the reason people know him?

Nan-Do

Jan 19, 2024

I have generated the gguf quantized version of the model.
The files can be found at https://huggingface.co/Nan-Do/FusionNet_7Bx2_MoE_14B-GGUF

johnnnna

Jan 19, 2024

•

edited Jan 19, 2024

I have generated the gguf quantized version of the model.
The files can be found at https://huggingface.co/Nan-Do/FusionNet_7Bx2_MoE_14B-GGUF

Thx @Nan-Do , I will try it asap

testerav

Jan 19, 2024

@johnnnna pls do give feeback , wonder if its actually good or hoax.

Rybens

Jan 20, 2024

@Nan-Do 's quants does not work for me (it generates random tokens) so I made my own basic quants of this model.

https://huggingface.co/Rybens/FusionNet_7Bx2_MoE_14B_gguf

Nan-Do

Jan 20, 2024

@Rybens there was indeed a problem with the uploaded files, thanks for letting me know. I should have checked the uploaded files.

I have redone the repository (and re-checked the files) and it should be fine now, also I have added a model's card.

Rybens

Jan 20, 2024

•

edited Jan 20, 2024

Thanks @Nan-Do
But I'll leave my repository with quants in case anyone needs it

Nan-Do

Jan 20, 2024

•

edited Jan 20, 2024

@Rybens sure, that's good.

DKRacingFan

Jan 20, 2024

•

edited Jan 20, 2024

Thanks @Nan-Do
But I'll leave my repository with quants in case anyone needs it

Just tried out the model in LM Studio. Works very well! I'm impressed!

TomGrc changed discussion status to closed Mar 4, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment