Text Generation
Transformers
PyTorch
longllama
code
text-generation-inference
custom_code
Eval Results (legacy)
Instructions to use syzymon/long_llama_code_7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use syzymon/long_llama_code_7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="syzymon/long_llama_code_7b", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("syzymon/long_llama_code_7b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use syzymon/long_llama_code_7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "syzymon/long_llama_code_7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "syzymon/long_llama_code_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/syzymon/long_llama_code_7b
- SGLang
How to use syzymon/long_llama_code_7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "syzymon/long_llama_code_7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "syzymon/long_llama_code_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "syzymon/long_llama_code_7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "syzymon/long_llama_code_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use syzymon/long_llama_code_7b with Docker Model Runner:
docker model run hf.co/syzymon/long_llama_code_7b
Commit History
Update README.md 0f3a950
Update README.md 3dea16c
Update README.md 81abcb1
Update README.md 35678d3
Update README.md bbf5464
Update README.md ae971cd
Update README.md c682b6d
Update README.md a9f5b43
Update README.md 07822d3
Update README.md d7488ec
Update README.md c879d23
Update README.md 7e37e56
Update README.md 239069a
Update README.md bea3280
Update README.md 76ac8b7
Update README.md 049a26e
Update config.json 509943e
Update README.md 0552cd3
init release f36dfc1
Szymon Tworkowski commited on