jnjj commited on
Commit
06dd723
·
verified ·
1 Parent(s): 7e3686e

Update README.md via script

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -23,14 +23,14 @@ The fully merged model weights and tokenizer are updated periodically at the roo
23
  - **Dynamic Dataset Source:** The script iterates through a wide array of Hugging Face Hub datasets.
24
  - **Rapid Iteration Strategy:** Training per dataset configuration is brief (`max_steps=1`), prioritizing breadth of exposure over depth on any single dataset.
25
  ## Training Progress
26
- - **Datasets Processed (Successfully trained on at least one config):** 4
27
- - **Text Examples Streamed (Total):** 24
28
- - **Tokens Processed (Total):** 12288
29
- - **Last Successful Model Update:** 2025-05-08 15:40:28 UTC
30
  ### Evaluation Snapshot (Approximate)
31
 
32
- - **Current Perplexity (wikitext Subset):** 286.11
33
- - **Perplexity Change:** `-0.92` ⬇️ (vs previous cycle's perplexity)
34
 
35
  #### Generated Examples (Qualitative Assessment)
36
 
@@ -41,7 +41,7 @@ The fully merged model weights and tokenizer are updated periodically at the roo
41
  | Creative Prompt | `Describe a friendly robot that love...` | `We are pleased to announce the launch of... ` |
42
  | Question Answering (Basic) | `What is the main color of a ripe ba...` | `As an example we've been using the same ... ` |
43
  | Code Generation (Simple Python) | `Write a Python function that takes ...` | `We are looking forward to seeing us in t... ` |
44
- | Reasoning (Simple) | `If a train leaves station A at 10:0...` | `The time of day we were trying to get ou... ` |
45
 
46
  #### Standard Benchmarks (via `lighteval`)
47
  _Note: Running standard benchmarks requires a dedicated setup using the `lighteval` harness. The table below shows scores if available in `evaluation_stats.json`, otherwise `N/A`._
 
23
  - **Dynamic Dataset Source:** The script iterates through a wide array of Hugging Face Hub datasets.
24
  - **Rapid Iteration Strategy:** Training per dataset configuration is brief (`max_steps=1`), prioritizing breadth of exposure over depth on any single dataset.
25
  ## Training Progress
26
+ - **Datasets Processed (Successfully trained on at least one config):** 5
27
+ - **Text Examples Streamed (Total):** 30
28
+ - **Tokens Processed (Total):** 15360
29
+ - **Last Successful Model Update:** 2025-05-08 15:42:00 UTC
30
  ### Evaluation Snapshot (Approximate)
31
 
32
+ - **Current Perplexity (wikitext Subset):** 284.82
33
+ - **Perplexity Change:** `-1.29` ⬇️ (vs previous cycle's perplexity)
34
 
35
  #### Generated Examples (Qualitative Assessment)
36
 
 
41
  | Creative Prompt | `Describe a friendly robot that love...` | `We are pleased to announce the launch of... ` |
42
  | Question Answering (Basic) | `What is the main color of a ripe ba...` | `As an example we've been using the same ... ` |
43
  | Code Generation (Simple Python) | `Write a Python function that takes ...` | `We are looking forward to seeing us in t... ` |
44
+ | Reasoning (Simple) | `If a train leaves station A at 10:0...` | `This is a big task force to get ready fo... ` |
45
 
46
  #### Standard Benchmarks (via `lighteval`)
47
  _Note: Running standard benchmarks requires a dedicated setup using the `lighteval` harness. The table below shows scores if available in `evaluation_stats.json`, otherwise `N/A`._