Buckets:
| # LFM-Coder training bucket | |
| This bucket contains training artifacts from the fine-tuned model [rparkr/LFM2.5-1.2B-Instruct-Coding](https://huggingface.co/rparkr/LFM2.5-1.2B-Instruct-Coding). | |
| For an interactive view of training metrics, see the [Trackio space for this training run](https://huggingface.co/spaces/rparkr/lfm-coder-training). | |
| # Contents | |
| ## [completions](https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/tree/completions) | |
| This directory contains every group of model completions during training. The model was trained on 1,000 examples for 3 epochs, so 3,000 groups (files) in total, where each group has 8 completions to the same prompt. | |
| Each completions group is a Parquet file with these columns: | |
| - `step`: The training step number. | |
| - `prompt`: The prompt given to the model. | |
| - `completion`: The model's completion. | |
| - `coding_accuracy_reward`: The percentage of test cases answered correctly by the completion, or simply 0 or 1 for a binary reward (1 if all test cases passed, 0 otherwise). | |
| - `advantage`: The advantage value used for updating the LoRA weights through backpropagation, based on the relative `coding_accuracy_reward` compared to other completions in the group. | |
| You can explore the data using, for example, [duckdb](https://duckdb.org/install/?environment=cli), like this: | |
| ```bash | |
| # Select any file from completions_00001.parquet to completions_03000.parquet | |
| COMPLETIONS_FILE="completions_00001.parquet" | |
| duckdb -c "SELECT | |
| * | |
| FROM | |
| read_parquet('https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/completions/$COMPLETIONS_FILE?download=true') | |
| ;" | |
| ``` | |
| Alternatively, you can mount the bucket using [`hf-mount`](https://github.com/huggingface/hf-mount) and read all the data at once, following the instructions in the "Mount this bucket" button on the [directory page](https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/tree/completions). | |
| ```bash | |
| # Install hf-mount | |
| brew install hf-mount | |
| # Mount this bucket as a local folder | |
| hf-mount start bucket rparkr/lfm-coder-training-bucket ./local | |
| # Query all files | |
| duckdb -c "SELECT | |
| * | |
| FROM | |
| read_parquet('./local/completions/*.parquet') | |
| LIMIT 1000;" | |
| ``` | |
| ```bash | |
| # Unmount when done | |
| hf-mount stop ./local | |
| ``` | |
| ## [eval_results](https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/tree/eval_results) | |
| These are JSON lines files that contain the model's results on the evaluation benchmarks, recorded every 1,000 training steps (i.e., at steps 1,000, 2,000, and 3,000). | |
| Similar to the completions directory, you can explore the data using, for example, duckdb: | |
| ```bash | |
| # The three files are named based on the timestamp of when evaluation began. | |
| # Step 1,000: eval_results/LFM2.5-1.2B-Instruct-grpo_2026-04-28T02-52-22Z.jsonl | |
| # Step 2,000: eval_results/LFM2.5-1.2B-Instruct-grpo_2026-04-30T01-00-08Z.jsonl | |
| # Step 3,000: eval_results/LFM2.5-1.2B-Instruct-grpo_2026-05-01T05-54-59Z.jsonl | |
| EVAL_RESULTS_FILE="eval_results/LFM2.5-1.2B-Instruct-grpo_2026-04-28T02-52-22Z.jsonl" | |
| duckdb -c "SELECT | |
| * | |
| FROM | |
| read_json('https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/$EVAL_RESULTS_FILE?download=true') | |
| LIMIT 10 | |
| ;" | |
| ``` | |
| You can also mount the bucket to read all the data at once. See the [completions](#completions) section above for instructions. | |
| ## [trackio](https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/tree/trackio) | |
| This is the trackio database that stores metrics from the training run. You can view the trackio space [here](https://huggingface.co/spaces/rparkr/lfm-coder-training), or explore the SQLite database using DuckDB: | |
| ```bash | |
| # Download the SQLite database and journal file | |
| curl -L -O "https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/trackio/huggingface.db?download=true" | |
| curl -L -O "https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/trackio/huggingface.db-journal?download=true" | |
| # Connect to the trackio database | |
| duckdb -c "ATTACH './huggingface.db' AS trackio (TYPE sqlite);" | |
| # List all tables in the database | |
| duckdb -c "SHOW TABLES FROM trackio;" | |
| # Query the metrics table (e.g., loss, coding_accuracy_reward) | |
| duckdb -c "SELECT * FROM trackio.metrics;" | |
| # Query the system metrics table (e.g., GPU utilization) | |
| duckdb -c "SELECT * FROM trackio.system_metrics;" | |
| ``` | |
| ## [training_logs](https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/tree/training_logs) | |
| A JSON lines file with logs from the [training codebase](https://github.com/rparkr/lfm-coder) during the training run. | |
| You can similarly explore this dataset using duckdb: | |
| ```bash | |
| duckdb -c "SELECT | |
| * | |
| FROM | |
| read_json('https://huggingface.co/buckets/rparkr/lfm-coder-training-bucket/resolve/training_logs/training-log_2026-04-28.jsonl?download=true') | |
| ;" | |
| ``` | |
| Here's a screenshot of using DuckDB for log analysis (launched with `duckdb -ui` to use the notebook-based web UI): | |
|  |
Xet Storage Details
- Size:
- 4.96 kB
- Xet hash:
- 639706dc1656e3d602dccf8af3a952725ba44c82789ac0ae7bd0e66c243d0ed3
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.