Instructions to use Contamination/contaminated_proof_7b_v1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Contamination/contaminated_proof_7b_v1.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Contamination/contaminated_proof_7b_v1.0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Contamination/contaminated_proof_7b_v1.0")
model = AutoModelForCausalLM.from_pretrained("Contamination/contaminated_proof_7b_v1.0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Contamination/contaminated_proof_7b_v1.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Contamination/contaminated_proof_7b_v1.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Contamination/contaminated_proof_7b_v1.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Contamination/contaminated_proof_7b_v1.0

SGLang

How to use Contamination/contaminated_proof_7b_v1.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Contamination/contaminated_proof_7b_v1.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Contamination/contaminated_proof_7b_v1.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Contamination/contaminated_proof_7b_v1.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Contamination/contaminated_proof_7b_v1.0",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Contamination/contaminated_proof_7b_v1.0 with Docker Model Runner:
```
docker model run hf.co/Contamination/contaminated_proof_7b_v1.0
```

Yeap

by deleted - opened Mar 29, 2024

Discussion

deleted

Mar 29, 2024

I don't blame the leaderboard or HF because there's really nothing within reason that can be done about it.

But the flood of Mistrals scoring over 67 on the leaderboard (around the upper limit of Mistral), and up to 77, is absurd (that's fare higher than Mixtrals, which are far more powerful). And to read their model cards bragging about their scores is annoying. And most aren't deliberately cheating. They're just so excessively merged and fine-tuned, sometimes with a database with over a million entries, that all of Mistral's fringe data has been scrambled.

For example, when I ask questions about 4 names from popular movies and TV shows that Mistral base and reasonable fine tunes get right, or mostly right, the high scoring Mistrals reliably get them wrong. This even included OpenChat and Starling. Fine-tuning on tons of user feedback might help you climb on chat arenas, but it leaves Mistral an empty shell that can no longer solve simple logic problems or answer questions along Mistral's fringes, such as character names in shows and movies, identifying songs from lyrics and so on.

Fine tuning is meant to guide a foundational model in the right direction, not take over. And the base Chinese models are all cheating (e.g. Yi-34b doesn't have anywhere near an MMLU of 77, based on my fringe knowledge questions its true MMLU score is around 68-70).

mirek190

Apr 4, 2024

yep ... I AGREE

altomek

Apr 8, 2024

Fine-tuning on tons of user feedback might help you climb on chat arenas, but it leaves Mistral an empty shell that can no longer solve simple logic problems or answer questions along Mistral's fringes, such as character names in shows and movies, identifying songs from lyrics and so on.

I always value you observations! I share your observation. Thay are made so smart, so you don't know they are cheeting on you most of the time. But there is more demage. When used to summarize longer contexts they often have the same error. They interpret like 20% of text properly and insert something wrong, one sentence or change some fact, and continue summarization. It leads to two problems. User wil get wrong information or/and due to change in sentence logic in 20% of text reading that text abuses you interpretation of what you read and make it difficult to remember anything you read because summarization has flowed logic. Useing this chat tuned things for longer may be not god for you memory.

And! This personality of drug diller! Do you want more... Are you sure you want more...

This is ugly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment