Text Generation
Transformers
PyTorch
llama
text-generation-inference
How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lifeofcoding/mastermax-llama-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lifeofcoding/mastermax-llama-7b",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker
docker model run hf.co/lifeofcoding/mastermax-llama-7b
Quick Links

Mastermax Llama 7B

This is a a Llama2 7B base model that was fined tuned on additional datasets, in attempts improve performance.

How to use with HugginFace pipeline

from transformers import AutoModelForCausalLM, AutoTokenizer, pineline

model = AutoModelForCausalLM.from_pretrained(
          "lifeofcoding/mastermax-llama-7b",
          load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("lifeofcoding/mastermax-llama-7b", trust_remote_code=True)

# Generate text using the pipeline
pipe = pipeline(task="text-generation",
                model=model,
                tokenizer=tokenizer,
                max_length=200)

result = pipe(f"<s>[INST] {prompt} [/INST]")
generated_text = result[0]['generated_text']
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train lifeofcoding/mastermax-llama-7b