Instructions to use Contamination/contaminated_proof_7b_v1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Contamination/contaminated_proof_7b_v1.0 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Contamination/contaminated_proof_7b_v1.0") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Contamination/contaminated_proof_7b_v1.0") model = AutoModelForCausalLM.from_pretrained("Contamination/contaminated_proof_7b_v1.0") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Contamination/contaminated_proof_7b_v1.0 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Contamination/contaminated_proof_7b_v1.0" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Contamination/contaminated_proof_7b_v1.0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Contamination/contaminated_proof_7b_v1.0
- SGLang
How to use Contamination/contaminated_proof_7b_v1.0 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Contamination/contaminated_proof_7b_v1.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Contamination/contaminated_proof_7b_v1.0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Contamination/contaminated_proof_7b_v1.0" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Contamination/contaminated_proof_7b_v1.0", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Contamination/contaminated_proof_7b_v1.0 with Docker Model Runner:
docker model run hf.co/Contamination/contaminated_proof_7b_v1.0
Yeap
I don't blame the leaderboard or HF because there's really nothing within reason that can be done about it.
But the flood of Mistrals scoring over 67 on the leaderboard (around the upper limit of Mistral), and up to 77, is absurd (that's fare higher than Mixtrals, which are far more powerful). And to read their model cards bragging about their scores is annoying. And most aren't deliberately cheating. They're just so excessively merged and fine-tuned, sometimes with a database with over a million entries, that all of Mistral's fringe data has been scrambled.
For example, when I ask questions about 4 names from popular movies and TV shows that Mistral base and reasonable fine tunes get right, or mostly right, the high scoring Mistrals reliably get them wrong. This even included OpenChat and Starling. Fine-tuning on tons of user feedback might help you climb on chat arenas, but it leaves Mistral an empty shell that can no longer solve simple logic problems or answer questions along Mistral's fringes, such as character names in shows and movies, identifying songs from lyrics and so on.
Fine tuning is meant to guide a foundational model in the right direction, not take over. And the base Chinese models are all cheating (e.g. Yi-34b doesn't have anywhere near an MMLU of 77, based on my fringe knowledge questions its true MMLU score is around 68-70).
yep ... I AGREE
Fine-tuning on tons of user feedback might help you climb on chat arenas, but it leaves Mistral an empty shell that can no longer solve simple logic problems or answer questions along Mistral's fringes, such as character names in shows and movies, identifying songs from lyrics and so on.
I always value you observations! I share your observation. Thay are made so smart, so you don't know they are cheeting on you most of the time. But there is more demage. When used to summarize longer contexts they often have the same error. They interpret like 20% of text properly and insert something wrong, one sentence or change some fact, and continue summarization. It leads to two problems. User wil get wrong information or/and due to change in sentence logic in 20% of text reading that text abuses you interpretation of what you read and make it difficult to remember anything you read because summarization has flowed logic. Useing this chat tuned things for longer may be not god for you memory.
And! This personality of drug diller! Do you want more... Are you sure you want more...
This is ugly.