Update README.md
Browse files
README.md
CHANGED
|
@@ -72,11 +72,11 @@ But the model suffers from extreme hallucination, which we are trying to solve.
|
|
| 72 |
|
| 73 |
And experimentation ate a lot of budget from us, so it caused slowdowns.
|
| 74 |
|
| 75 |
-
For the current training hardware, originally
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
-
But the current versions
|
| 80 |
|
| 81 |
Our latest tests resulted in this:
|
| 82 |
|
|
@@ -96,6 +96,16 @@ cat log_4 | head
|
|
| 96 |
|
| 97 |
This is from LaaLM-v2.
|
| 98 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
---
|
| 100 |
|
| 101 |
## How to use it
|
|
|
|
| 72 |
|
| 73 |
And experimentation ate a lot of budget from us, so it caused slowdowns.
|
| 74 |
|
| 75 |
+
For the current training hardware, we originally planned to use the Nvidia RTX PRO 6000, but due to software errors, we used an MI300X instead. Though we have plans of using TPUs also.
|
| 76 |
|
| 77 |
+
Although our new training code shows good improvements, we hope to see good results once the budget is resolved.
|
| 78 |
|
| 79 |
+
But the current versions, as we said, are pretty rough for now.
|
| 80 |
|
| 81 |
Our latest tests resulted in this:
|
| 82 |
|
|
|
|
| 96 |
|
| 97 |
This is from LaaLM-v2.
|
| 98 |
|
| 99 |
+
#### 20 February 2026 Update
|
| 100 |
+
|
| 101 |
+
While we can't say we are very sure, we will try training LaaLM-v2 on the TPU v6e chips for our next tries. TPUs offer excellent price-to-performance for us, which we really need and with their fast interconnects and memory.
|
| 102 |
+
|
| 103 |
+
But it will take us a good amount of time to switch LaaLM-v2's codebase to TPUs and do the bug fixing steps after that.
|
| 104 |
+
|
| 105 |
+
We don't really know about our configuration, but our current planning seems that we will probably use a v6e-4 configuration for the training.
|
| 106 |
+
|
| 107 |
+
We will try to update when we get our first results out of TPUs.
|
| 108 |
+
|
| 109 |
---
|
| 110 |
|
| 111 |
## How to use it
|