Muiru commited on
Commit
3504f2d
·
1 Parent(s): c6cf259

docs: refine dataset sources with specific corpora and licensing

Browse files
Files changed (2) hide show
  1. README.hf.md +3 -3
  2. README.md +3 -3
README.hf.md CHANGED
@@ -48,9 +48,9 @@ Foundational fine‑tuned model developed by CogniX LTD.
48
 
49
  ### Dataset Sources:
50
 
51
- - Native datasets: open mental health dialogue corpora curated for supportive conversation and coaching contexts.
52
- - Synthetic datasets: additional coaching‑style dialogues generated using OpenAI models to augment coverage and style diversity.
53
- - Fine‑tuning combined both native and synthetic sources with safety‑oriented filtering and prompt design.
54
 
55
 
56
  ### Model Details:
 
48
 
49
  ### Dataset Sources:
50
 
51
+ - **Native datasets**: Open mental health dialogue corpora curated for supportive conversation and coaching contexts. This includes publicly available datasets such as **Counsel Chat** and **Psych8k**.
52
+ - **Synthetic datasets**: Additional coaching‑style dialogues generated using OpenAI models (**GPT‑4o**) to augment coverage and style diversity.
53
+ - **Release & Licensing**: Fine‑tuning combined both native and synthetic sources with safety‑oriented filtering and prompt design. Full dataset provenance will be released with our open dataset under a **CC‑BY 4.0** license.
54
 
55
 
56
  ### Model Details:
README.md CHANGED
@@ -31,9 +31,9 @@ Foundational fine‑tuned model developed by CogniX LTD.
31
 
32
  ### Dataset Sources:
33
 
34
- - Native datasets: open mental health dialogue corpora curated for supportive conversation and coaching contexts.
35
- - Synthetic datasets: additional coaching‑style dialogues generated using OpenAI models to augment coverage and style diversity.
36
- - Fine‑tuning combined both native and synthetic sources with safety‑oriented filtering and prompt design.
37
 
38
 
39
  ### Model Details:
 
31
 
32
  ### Dataset Sources:
33
 
34
+ - **Native datasets**: Open mental health dialogue corpora curated for supportive conversation and coaching contexts. This includes publicly available datasets such as **Counsel Chat** and **Psych8k**.
35
+ - **Synthetic datasets**: Additional coaching‑style dialogues generated using OpenAI models (**GPT‑4o**) to augment coverage and style diversity.
36
+ - **Release & Licensing**: Fine‑tuning combined both native and synthetic sources with safety‑oriented filtering and prompt design. Full dataset provenance will be released with our open dataset under a **CC‑BY 4.0** license.
37
 
38
 
39
  ### Model Details: