ereniko commited on
Commit
6859197
·
verified ·
1 Parent(s): e01e6a0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -3
README.md CHANGED
@@ -72,11 +72,11 @@ But the model suffers from extreme hallucination, which we are trying to solve.
72
 
73
  And experimentation ate a lot of budget from us, so it caused slowdowns.
74
 
75
- For the current training hardware, originally we thought of using the Nvidia RTX PRO 6000, but because of software errors, we used an MI300X. Though we have plans of using TPUs also.
76
 
77
- Though our new training code shows good improvements, we hope for good results after the budget is solved.
78
 
79
- But the current versions like we said are pretty rough for now.
80
 
81
  Our latest tests resulted in this:
82
 
@@ -96,6 +96,16 @@ cat log_4 | head
96
 
97
  This is from LaaLM-v2.
98
 
 
 
 
 
 
 
 
 
 
 
99
  ---
100
 
101
  ## How to use it
 
72
 
73
  And experimentation ate a lot of budget from us, so it caused slowdowns.
74
 
75
+ For the current training hardware, we originally planned to use the Nvidia RTX PRO 6000, but due to software errors, we used an MI300X instead. Though we have plans of using TPUs also.
76
 
77
+ Although our new training code shows good improvements, we hope to see good results once the budget is resolved.
78
 
79
+ But the current versions, as we said, are pretty rough for now.
80
 
81
  Our latest tests resulted in this:
82
 
 
96
 
97
  This is from LaaLM-v2.
98
 
99
+ #### 20 February 2026 Update
100
+
101
+ While we can't say we are very sure, we will try training LaaLM-v2 on the TPU v6e chips for our next tries. TPUs offer excellent price-to-performance for us, which we really need and with their fast interconnects and memory.
102
+
103
+ But it will take us a good amount of time to switch LaaLM-v2's codebase to TPUs and do the bug fixing steps after that.
104
+
105
+ We don't really know about our configuration, but our current planning seems that we will probably use a v6e-4 configuration for the training.
106
+
107
+ We will try to update when we get our first results out of TPUs.
108
+
109
  ---
110
 
111
  ## How to use it