Text Generation
Transformers
PyTorch
longllama
code
text-generation-inference
custom_code
Eval Results (legacy)
Instructions to use syzymon/long_llama_code_7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use syzymon/long_llama_code_7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="syzymon/long_llama_code_7b", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("syzymon/long_llama_code_7b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use syzymon/long_llama_code_7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "syzymon/long_llama_code_7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "syzymon/long_llama_code_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/syzymon/long_llama_code_7b
- SGLang
How to use syzymon/long_llama_code_7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "syzymon/long_llama_code_7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "syzymon/long_llama_code_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "syzymon/long_llama_code_7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "syzymon/long_llama_code_7b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use syzymon/long_llama_code_7b with Docker Model Runner:
docker model run hf.co/syzymon/long_llama_code_7b
Update README.md
Browse files
README.md
CHANGED
|
@@ -106,7 +106,7 @@ This repository contains the research preview of **LongLLaMA, a large language m
|
|
| 106 |
|
| 107 |
LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
|
| 108 |
|
| 109 |
-
LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4% after just continued pre-training, no in-distribution fine-tuning
|
| 110 |
|
| 111 |
<p align="center" width="100%">
|
| 112 |
<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
|
|
@@ -129,8 +129,8 @@ with three layers used for context extension. **Crucially, LongLLaMA is able to
|
|
| 129 |
|----------------|----------|----------|-----------|
|
| 130 |
| Source model | [OpenLLaMA-3B](https://huggingface.co/openlm-research/open_llama_3b_easylm) | [OpenLLaMA-3Bv2](https://huggingface.co/openlm-research/open_llama_3b_v2_easylm) | [CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) |
|
| 131 |
| Source model tokens | 1T | 1 T | 2T + 0.5 T |
|
| 132 |
-
| Fine-tuning context | 8K | 32K | 32K |
|
| 133 |
-
| Fine-tuning tokens | 10B | 5B | 35B |
|
| 134 |
| Memory layers | 6, 12, 18 | 6, 12, 18 | 8, 16, 24 |
|
| 135 |
|
| 136 |
</div>
|
|
|
|
| 106 |
|
| 107 |
LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
|
| 108 |
|
| 109 |
+
LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4% after just continued pre-training, no in-distribution fine-tuning**.
|
| 110 |
|
| 111 |
<p align="center" width="100%">
|
| 112 |
<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
|
|
|
|
| 129 |
|----------------|----------|----------|-----------|
|
| 130 |
| Source model | [OpenLLaMA-3B](https://huggingface.co/openlm-research/open_llama_3b_easylm) | [OpenLLaMA-3Bv2](https://huggingface.co/openlm-research/open_llama_3b_v2_easylm) | [CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) |
|
| 131 |
| Source model tokens | 1T | 1 T | 2T + 0.5 T |
|
| 132 |
+
| Fine-tuning context | 8K | **32K** | **32K** |
|
| 133 |
+
| Fine-tuning tokens | 10B | 5B | **35B** |
|
| 134 |
| Memory layers | 6, 12, 18 | 6, 12, 18 | 8, 16, 24 |
|
| 135 |
|
| 136 |
</div>
|