Buckets:

rparkr
/

lfm-coder-training-bucket

Files

xet

rparkr/lfm-coder-training-bucket / README.md

rparkr

17 days ago

preview code

download

raw

4.96 kB

LFM-Coder training bucket

This bucket contains training artifacts from the fine-tuned model rparkr/LFM2.5-1.2B-Instruct-Coding.

For an interactive view of training metrics, see the Trackio space for this training run.

completions

This directory contains every group of model completions during training. The model was trained on 1,000 examples for 3 epochs, so 3,000 groups (files) in total, where each group has 8 completions to the same prompt.

Each completions group is a Parquet file with these columns:

step: The training step number.
prompt: The prompt given to the model.
completion: The model's completion.
coding_accuracy_reward: The percentage of test cases answered correctly by the completion, or simply 0 or 1 for a binary reward (1 if all test cases passed, 0 otherwise).
advantage: The advantage value used for updating the LoRA weights through backpropagation, based on the relative coding_accuracy_reward compared to other completions in the group.

You can explore the data using, for example, duckdb, like this:

# Select any file from completions_00001.parquet to completions_03000.parquet
COMPLETIONS_FILE="completions_00001.parquet"
duckdb -c "SELECT
  *
FROM
  read_parquet('https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/completions/$COMPLETIONS_FILE?download=true')
;"

Alternatively, you can mount the bucket using hf-mount and read all the data at once, following the instructions in the "Mount this bucket" button on the directory page.

# Install hf-mount
brew install hf-mount

# Mount this bucket as a local folder
hf-mount start bucket rparkr/lfm-coder-training-bucket ./local

# Query all files
duckdb -c "SELECT
  *
FROM
  read_parquet('./local/completions/*.parquet')
LIMIT 1000;"

# Unmount when done
hf-mount stop ./local

eval_results

These are JSON lines files that contain the model's results on the evaluation benchmarks, recorded every 1,000 training steps (i.e., at steps 1,000, 2,000, and 3,000).

Similar to the completions directory, you can explore the data using, for example, duckdb:

# The three files are named based on the timestamp of when evaluation began.
# Step 1,000: eval_results/LFM2.5-1.2B-Instruct-grpo_2026-04-28T02-52-22Z.jsonl
# Step 2,000: eval_results/LFM2.5-1.2B-Instruct-grpo_2026-04-30T01-00-08Z.jsonl
# Step 3,000: eval_results/LFM2.5-1.2B-Instruct-grpo_2026-05-01T05-54-59Z.jsonl
EVAL_RESULTS_FILE="eval_results/LFM2.5-1.2B-Instruct-grpo_2026-04-28T02-52-22Z.jsonl"

duckdb -c "SELECT
  *
FROM
  read_json('https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/$EVAL_RESULTS_FILE?download=true')
LIMIT 10
;"

You can also mount the bucket to read all the data at once. See the completions section above for instructions.

trackio

This is the trackio database that stores metrics from the training run. You can view the trackio space here, or explore the SQLite database using DuckDB:

# Download the SQLite database and journal file
curl -L -O "https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/trackio/huggingface.db?download=true"
curl -L -O "https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/trackio/huggingface.db-journal?download=true"

# Connect to the trackio database
duckdb -c "ATTACH './huggingface.db' AS trackio (TYPE sqlite);"

# List all tables in the database
duckdb -c "SHOW TABLES FROM trackio;"

# Query the metrics table (e.g., loss, coding_accuracy_reward)
duckdb -c "SELECT * FROM trackio.metrics;"

# Query the system metrics table (e.g., GPU utilization)
duckdb -c "SELECT * FROM trackio.system_metrics;"

training_logs

A JSON lines file with logs from the training codebase during the training run.

You can similarly explore this dataset using duckdb:

duckdb -c "SELECT
  *
FROM
  read_json('https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/training_logs/training-log_2026-04-28.jsonl?download=true')
;"

Here's a screenshot of using DuckDB for log analysis (launched with duckdb -ui to use the notebook-based web UI):

Xet Storage Details

Size:: 4.96 kB
Xet hash:: 639706dc1656e3d602dccf8af3a952725ba44c82789ac0ae7bd0e66c243d0ed3

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.