Text Generation
Transformers
Safetensors
PEFT
English
qwen2
trl
grpo
lora
python
code-generation
conversational
text-generation-inference
Instructions to use summerMC/ume with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use summerMC/ume with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="summerMC/ume") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("summerMC/ume") model = AutoModelForCausalLM.from_pretrained("summerMC/ume") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use summerMC/ume with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use summerMC/ume with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "summerMC/ume" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "summerMC/ume", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/summerMC/ume
- SGLang
How to use summerMC/ume with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "summerMC/ume" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "summerMC/ume", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "summerMC/ume" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "summerMC/ume", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use summerMC/ume with Docker Model Runner:
docker model run hf.co/summerMC/ume
| language: | |
| - en | |
| tags: | |
| - transformers | |
| - trl | |
| - grpo | |
| - peft | |
| - lora | |
| - python | |
| - code-generation | |
| pipeline_tag: text-generation | |
| base_model: summerMC/matutake | |
| library_name: transformers | |
| # ume | |
| `ume` is a GRPO fine-tuned derivative of [`summerMC/matutake`](https://huggingface.co/summerMC/matutake), trained with LoRA on Python code-generation tasks and merged back into the base model for standalone inference. | |
| ## Model Summary | |
| * **Model name:** `summerMC/ume` | |
| * **Base model:** `summerMC/matutake` | |
| * **Training method:** GRPO (Group Relative Policy Optimization) | |
| * **Parameter-efficient tuning:** LoRA | |
| * **Training dataset:** `Hoglet-33/python-coding-dataset` | |
| * **Final artifact:** merged checkpoint for direct inference | |
| This model is intended to improve Python code generation behavior using lightweight reward functions that favor syntactically valid, code-like outputs. | |
| --- | |
| ## Training Details | |
| ### Base model | |
| * `summerMC/matutake` | |
| ### Dataset | |
| * `Hoglet-33/python-coding-dataset` | |
| ### Fine-tuning method | |
| * **Trainer:** TRL `GRPOTrainer` | |
| * **Adapter method:** LoRA | |
| * **Final export:** merged LoRA weights into the base model | |
| ### Reward functions | |
| Training used simple heuristic reward functions: | |
| #### 1) Syntax reward | |
| Rewards outputs that can be parsed as valid Python: | |
| * `1.0` if `ast.parse(output)` succeeds | |
| * `0.0` otherwise | |
| #### 2) Code-shape reward | |
| Rewards outputs that look more like actual Python code: | |
| * no Markdown code fences | |
| * contains Python-like tokens such as `def`, `import`, `return`, `class` | |
| * non-trivially long output | |
| * avoids extremely long generations | |
| These rewards are intentionally lightweight and should be treated as a baseline GRPO setup rather than a production-grade evaluation system. | |
| --- | |
| ## Prompt Format | |
| The training data was converted into a chat-style coding prompt like this: | |
| ```python | |
| [ | |
| { | |
| "role": "user", | |
| "content": ( | |
| "Write correct Python code for the following task.\n" | |
| "Return only Python code. Do not use markdown.\n\n" | |
| "<task text>" | |
| ), | |
| } | |
| ] | |
| ``` | |
| For best results, prompt the model with a direct coding task and explicitly request **code only**. | |
| --- | |
| ## Usage | |
| ### Transformers | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| model_id = "summerMC/ume" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32, | |
| device_map="auto", | |
| trust_remote_code=True, | |
| ) | |
| messages = [ | |
| { | |
| "role": "user", | |
| "content": "Write a Python function that computes fibonacci numbers with memoization." | |
| } | |
| ] | |
| inputs = tokenizer.apply_chat_template( | |
| messages, | |
| add_generation_prompt=True, | |
| tokenize=True, | |
| return_tensors="pt", | |
| ).to(model.device) | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=256, | |
| do_sample=True, | |
| temperature=0.7, | |
| ) | |
| response = tokenizer.decode( | |
| outputs[0][inputs["input_ids"].shape[-1]:], | |
| skip_special_tokens=True, | |
| ) | |
| print(response) | |
| ``` | |
| --- | |
| ## Example Prompt | |
| ### Input | |
| ```text | |
| Write a Python function that returns the longest common prefix of a list of strings. | |
| Return only Python code. | |
| ``` | |
| ### Expected output style | |
| ```python | |
| def longest_common_prefix(strs): | |
| if not strs: | |
| return "" | |
| prefix = strs[0] | |
| for s in strs[1:]: | |
| while not s.startswith(prefix): | |
| prefix = prefix[:-1] | |
| if not prefix: | |
| return "" | |
| return prefix | |
| ``` | |
| --- | |
| ## Training Configuration | |
| The model was trained with a setup similar to the following: | |
| * **LoRA rank (`r`)**: 16 | |
| * **LoRA alpha**: 32 | |
| * **LoRA dropout**: 0.05 | |
| * **Learning rate**: 5e-6 | |
| * **Batch size**: 1 | |
| * **Gradient accumulation**: 8 | |
| * **Generation batch size**: 2 | |
| * **Number of generations**: 2 | |
| * **Epochs**: 1 | |
| ### LoRA target modules | |
| ```python | |
| [ | |
| "q_proj", "k_proj", "v_proj", "o_proj", | |
| "gate_proj", "up_proj", "down_proj", | |
| ] | |
| ``` | |
| --- | |
| ## Limitations | |
| * Training rewards are heuristic and do **not** verify functional correctness with unit tests. | |
| * The model may still produce syntactically valid but logically incorrect code. | |
| * Outputs may include hallucinated APIs, inefficient solutions, or incomplete implementations. | |
| * Performance depends heavily on the capabilities and constraints of the base model `summerMC/matutake`. | |
| --- | |
| ## Intended Use | |
| `summerMC/ume` is intended for: | |
| * Python code generation experiments | |
| * GRPO / RLHF-style fine-tuning experiments | |
| * LoRA + merge workflows | |
| * lightweight coding assistant prototyping | |
| * research and hobbyist use | |
| It is **not** validated for: | |
| * production-critical software generation | |
| * security-sensitive code | |
| * safety-critical systems | |
| * correctness-sensitive automated coding pipelines without external verification | |
| --- | |
| ## Reproducibility | |
| The training pipeline used: | |
| * `transformers` | |
| * `datasets` | |
| * `trl` | |
| * `peft` | |
| * `torch` | |
| A simplified training flow: | |
| 1. Load `summerMC/matutake` | |
| 2. Convert the dataset into chat prompts | |
| 3. Train with `GRPOTrainer` using LoRA adapters | |
| 4. Save the LoRA adapter | |
| 5. Merge adapter weights back into the base model | |
| 6. Save the merged model as `summerMC/ume` | |
| --- | |
| ## Base Model and Dataset Attribution | |
| ### Base model | |
| * [`summerMC/matutake`](https://huggingface.co/summerMC/matutake) | |
| ### Dataset | |
| * [`Hoglet-33/python-coding-dataset`](https://huggingface.co/datasets/Hoglet-33/python-coding-dataset) | |
| --- | |
| ## License | |
| Please follow the licenses and usage terms of: | |
| 1. the original base model `summerMC/matutake` | |
| 2. the training dataset `Hoglet-33/python-coding-dataset` | |
| If you redistribute or publish derivative checkpoints, confirm that your use is compatible with both upstream licenses. | |
| --- | |
| ## Citation | |
| If you use this model in a project or experiment, please cite the upstream base model and dataset. | |