Instructions to use summerMC/ume with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use summerMC/ume with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="summerMC/ume")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("summerMC/ume")
model = AutoModelForCausalLM.from_pretrained("summerMC/ume")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use summerMC/ume with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use summerMC/ume with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "summerMC/ume"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "summerMC/ume",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/summerMC/ume

SGLang

How to use summerMC/ume with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "summerMC/ume" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "summerMC/ume",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "summerMC/ume" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "summerMC/ume",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use summerMC/ume with Docker Model Runner:
```
docker model run hf.co/summerMC/ume
```

ume / README.md

summerMC

Update README.md

9da777c verified 12 days ago

preview code

Raw

History Blame Contribute Delete

5.92 kB

	---
	language:
	- en
	tags:
	- transformers
	- trl
	- grpo
	- peft
	- lora
	- python
	- code-generation
	pipeline_tag: text-generation
	base_model: summerMC/matutake
	library_name: transformers
	---


	# ume

	`ume` is a GRPO fine-tuned derivative of [`summerMC/matutake`](https://huggingface.co/summerMC/matutake), trained with LoRA on Python code-generation tasks and merged back into the base model for standalone inference.

	## Model Summary

	* Model name: `summerMC/ume`
	* Base model: `summerMC/matutake`
	* Training method: GRPO (Group Relative Policy Optimization)
	* Parameter-efficient tuning: LoRA
	* Training dataset: `Hoglet-33/python-coding-dataset`
	* Final artifact: merged checkpoint for direct inference

	This model is intended to improve Python code generation behavior using lightweight reward functions that favor syntactically valid, code-like outputs.

	---

	## Training Details

	### Base model

	* `summerMC/matutake`

	### Dataset

	* `Hoglet-33/python-coding-dataset`

	### Fine-tuning method

	* Trainer: TRL `GRPOTrainer`
	* Adapter method: LoRA
	* Final export: merged LoRA weights into the base model

	### Reward functions

	Training used simple heuristic reward functions:

	#### 1) Syntax reward

	Rewards outputs that can be parsed as valid Python:

	* `1.0` if `ast.parse(output)` succeeds
	* `0.0` otherwise

	#### 2) Code-shape reward

	Rewards outputs that look more like actual Python code:

	* no Markdown code fences
	* contains Python-like tokens such as `def`, `import`, `return`, `class`
	* non-trivially long output
	* avoids extremely long generations

	These rewards are intentionally lightweight and should be treated as a baseline GRPO setup rather than a production-grade evaluation system.

	---

	## Prompt Format

	The training data was converted into a chat-style coding prompt like this:

	```python
	[
	{
	"role": "user",
	"content": (
	"Write correct Python code for the following task.\n"
	"Return only Python code. Do not use markdown.\n\n"
	"<task text>"
	),
	}
	]
	```

	For best results, prompt the model with a direct coding task and explicitly request code only.

	---

	## Usage

	### Transformers

	```python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_id = "summerMC/ume"

	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
	device_map="auto",
	trust_remote_code=True,
	)

	messages = [
	{
	"role": "user",
	"content": "Write a Python function that computes fibonacci numbers with memoization."
	}
	]

	inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_tensors="pt",
	).to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	do_sample=True,
	temperature=0.7,
	)

	response = tokenizer.decode(
	outputs[0][inputs["input_ids"].shape[-1]:],
	skip_special_tokens=True,
	)

	print(response)
	```

	---

	## Example Prompt

	### Input

	```text
	Write a Python function that returns the longest common prefix of a list of strings.
	Return only Python code.
	```

	### Expected output style

	```python
	def longest_common_prefix(strs):
	if not strs:
	return ""

	prefix = strs[0]
	for s in strs[1:]:
	while not s.startswith(prefix):
	prefix = prefix[:-1]
	if not prefix:
	return ""
	return prefix
	```

	---

	## Training Configuration

	The model was trained with a setup similar to the following:

	* LoRA rank (`r`): 16
	* LoRA alpha: 32
	* LoRA dropout: 0.05
	* Learning rate: 5e-6
	* Batch size: 1
	* Gradient accumulation: 8
	* Generation batch size: 2
	* Number of generations: 2
	* Epochs: 1

	### LoRA target modules

	```python
	[
	"q_proj", "k_proj", "v_proj", "o_proj",
	"gate_proj", "up_proj", "down_proj",
	]
	```

	---

	## Limitations

	* Training rewards are heuristic and do not verify functional correctness with unit tests.
	* The model may still produce syntactically valid but logically incorrect code.
	* Outputs may include hallucinated APIs, inefficient solutions, or incomplete implementations.
	* Performance depends heavily on the capabilities and constraints of the base model `summerMC/matutake`.

	---

	## Intended Use

	`summerMC/ume` is intended for:

	* Python code generation experiments
	* GRPO / RLHF-style fine-tuning experiments
	* LoRA + merge workflows
	* lightweight coding assistant prototyping
	* research and hobbyist use

	It is not validated for:

	* production-critical software generation
	* security-sensitive code
	* safety-critical systems
	* correctness-sensitive automated coding pipelines without external verification

	---

	## Reproducibility

	The training pipeline used:

	* `transformers`
	* `datasets`
	* `trl`
	* `peft`
	* `torch`

	A simplified training flow:

	1. Load `summerMC/matutake`
	2. Convert the dataset into chat prompts
	3. Train with `GRPOTrainer` using LoRA adapters
	4. Save the LoRA adapter
	5. Merge adapter weights back into the base model
	6. Save the merged model as `summerMC/ume`

	---

	## Base Model and Dataset Attribution

	### Base model

	* [`summerMC/matutake`](https://huggingface.co/summerMC/matutake)

	### Dataset

	* [`Hoglet-33/python-coding-dataset`](https://huggingface.co/datasets/Hoglet-33/python-coding-dataset)

	---

	## License

	Please follow the licenses and usage terms of:

	1. the original base model `summerMC/matutake`
	2. the training dataset `Hoglet-33/python-coding-dataset`

	If you redistribute or publish derivative checkpoints, confirm that your use is compatible with both upstream licenses.

	---

	## Citation

	If you use this model in a project or experiment, please cite the upstream base model and dataset.