Instructions to use argo11/0399-tv-full-base-fp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use argo11/0399-tv-full-base-fp with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="argo11/0399-tv-full-base-fp") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("argo11/0399-tv-full-base-fp") model = AutoModelForCausalLM.from_pretrained("argo11/0399-tv-full-base-fp") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use argo11/0399-tv-full-base-fp with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "argo11/0399-tv-full-base-fp" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "argo11/0399-tv-full-base-fp", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/argo11/0399-tv-full-base-fp
- SGLang
How to use argo11/0399-tv-full-base-fp with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "argo11/0399-tv-full-base-fp" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "argo11/0399-tv-full-base-fp", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "argo11/0399-tv-full-base-fp" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "argo11/0399-tv-full-base-fp", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use argo11/0399-tv-full-base-fp with Docker Model Runner:
docker model run hf.co/argo11/0399-tv-full-base-fp
0399 Team Victory Full-Parameter SFT Checkpoint
This repository contains intermediate/full-parameter SFT checkpoints from LLM-jp experiment 0399, using Team Victory valid-clean math reasoning data.
Model Variants
Two model repositories were trained under the same data and training recipe, differing only in initialization:
| Repo | Initialization | W&B Run |
|---|---|---|
argo11/0399-tv-full-thinking-fp |
llm-jp/llm-jp-4-8b-thinking |
0399_tv-full-thinking_fp |
argo11/0399-tv-full-base-fp |
llm-jp/llm-jp-4-8b-base |
0399_tv-full-base_fp |
Current uploaded checkpoint:
checkpoint-500/- Upload excludes large optimizer/FSDP duplicate states from Hub.
- Local full training state, including optimizer state, is retained under the ABCI experiment directory.
Experiment Links
- GitHub issue: llm-jp/experiments#399
- W&B project: argo-lab/llmjp4-8b-teamvictory-sft-difficulty-20260629
- Thinking W&B run: 0399_tv-full-thinking_fp
- Base W&B run: 0399_tv-full-base_fp
- Tokenized dataset: argo11/0399-tv-valid-clean-sft-tokenized-llmjp4-8b
Intended Use
These checkpoints are intended for research on Japanese/English mathematical reasoning and reinforcement-learning initialization.
Primary intended uses:
- Compare
thinkinginitialization vsbaseinitialization after identical Team Victory SFT. - Use as candidate initial checkpoints for later RL / GRPO-style math reasoning experiments.
- Evaluate whether Team Victory SFT improves AIME-style pass@1 while preserving broader ability.
Not intended uses:
- Production deployment without additional evaluation.
- Safety-critical mathematical, financial, legal, medical, or educational grading decisions.
- Claims of benchmark superiority before full evaluation is complete.
Training Data
Source dataset:
Filtered dataset:
TV_valid_clean- Rows:
5,461,079 - Clean parquet SHA-256:
b1fbc4d5c05dbacbf9366055200a034551a46d70bfb2bff4c6432f2175a10d9b - Filter rule:
is_valid == 1- contamination quarantine applied against target benchmark problems
- Tokenized reusable dataset:
argo11/0399-tv-valid-clean-sft-tokenized-llmjp4-8b- Rows:
5,461,079 - Shards:
110 - Max length:
4096 - Columns:
input_ids,attention_mask,labels,prompt_len,seq_len
The tokenized dataset was prepared on CPU and uploaded to Hugging Face to avoid repeated expensive tokenization on GPU nodes.
Training Procedure
Training mode:
- Full-parameter SFT
- No LoRA / PEFT
- Hugging Face
Trainer - FSDP:
full_shard auto_wrap - Assistant response tokens only are trained.
- Prompt tokens are masked with
-100.
Core hyperparameters:
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Max length | 4096 |
| Per-device train batch size | 1 |
| Gradient accumulation steps | 16 |
| Learning rate | 2.0e-5 |
| Warmup ratio | 0.03 |
| LR scheduler | cosine |
| Precision | bf16 |
| Optimizer | adamw_torch |
| Save steps | 500 |
| Save total limit | 3 |
| Seed | 20260629 |
Infrastructure:
- ABCI 3.0
- Group:
gcg51557 - Reserved queue:
R9920261000 - Experiment directory:
/groups/gcg51557/experiments/0399_tv_sft - SFT jobs:
- Thinking:
2004401.pbs1 - Base:
2004402.pbs1
- Thinking:
- HF checkpoint sync job:
2004479.pbs1
Monitoring Status
The initial production SFT run was monitored past checkpoint-500.
Observed stability:
| Run | Step observed | Memory plateau | Error status |
|---|---|---|---|
| Thinking | 600+ |
~`303GB/1.92TB` |
No OOM/NCCL/Traceback observed |
| Base | 590+ |
~`294GB/1.92TB` |
No OOM/NCCL/Traceback observed |
Notes:
- Both runs skipped GPU-side JSONL regeneration.
- Both runs used the uploaded tokenized dataset.
- W&B online logging was confirmed.
checkpoint-500was written locally and uploaded to Hugging Face Hub.- Base run showed
grad_norm=infin early logs while loss remained finite. This should be considered during downstream quality review.
Uploaded Checkpoint Contents
Each checkpoint-500/ directory on Hub contains:
model.safetensorsconfig.jsongeneration_config.json- tokenizer files
trainer_state.jsontraining_args.bin- RNG states
scheduler.pt
The following large local training-state files are intentionally not uploaded to Hub:
optimizer.binpytorch_model_fsdp.bin
They are retained in the ABCI experiment directory for local recovery/debugging.
Evaluation
Full benchmark evaluation is not yet included in this model card.
Planned gates for experiment 0399:
Primary math gates:
- AIME 2024
- AIME 2025
- AIME 2026
- MATH-500
Regression gates:
- LiveCodeBench
- IFEval
- MT-Bench
Final-candidate-only gates:
- GPQA Diamond
- BBH
- MMLU-Pro
Do not treat this checkpoint as validated until these evaluations are complete.
Limitations
- This is an intermediate/full-param SFT checkpoint from an active experiment.
- The checkpoint is optimized for math reasoning style data and may regress in non-math tasks.
- Training data may contain long chain-of-thought style solutions; generated outputs may be verbose.
- Benchmark contamination mitigation was applied, but no contamination process is perfect.
- The uploaded
checkpoint-500is early in a longer training run and should not be interpreted as final model quality. - Safety alignment was not the primary target of this experiment.
Citation / Attribution
Base models:
@misc{llmjp4,
title = {LLM-jp-4 8B Models},
author = {LLM-jp},
year = {2026},
url = {https://huggingface.co/llm-jp}
}
Experiment tracking:
- GitHub issue: https://github.com/llm-jp/experiments/issues/399
- W&B project: https://wandb.ai/argo-lab/llmjp4-8b-teamvictory-sft-difficulty-20260629
Reproducibility Metadata
Experiment ID: 0399
Experiment slug: tv_sft
Canonical experiment directory:
/groups/gcg51557/experiments/0399_tv_sft
Key manifests:
/groups/gcg51557/experiments/0399_tv_sft/manifests/tokenized_sft_tv_valid_clean_llmjp4_8b.json
/groups/gcg51557/experiments/0399_tv_sft/manifests/sft_stability_monitor_20260701.json
/groups/gcg51557/experiments/0399_tv_sft/manifests/hf_checkpoint_sync_2004479.pbs1.json
Training configs:
/groups/gcg51557/experiments/0399_tv_sft/configs/sft_full_thinking.yaml
/groups/gcg51557/experiments/0399_tv_sft/configs/sft_full_base.yaml
- Downloads last month
- 420
Model tree for argo11/0399-tv-full-base-fp
Base model
llm-jp/llm-jp-4-8b-base