echobt commited on
Commit Β·
79915e4
1
Parent(s): 4aca544
Rename to BASE-1, professional model card: remove diagram, structured roadmap
Browse files
README.md
CHANGED
|
@@ -10,79 +10,99 @@ tags:
|
|
| 10 |
|
| 11 |
<div align="center">
|
| 12 |
|
| 13 |
-
](https://github.com/PlatformNetwork/prism)
|
| 20 |
[]()
|
| 21 |
[]()
|
|
|
|
| 22 |
|
| 23 |
</div>
|
| 24 |
|
| 25 |
---
|
| 26 |
|
| 27 |
-
##
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Overview
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
## Modalities
|
| 38 |
|
|
|
|
|
|
|
| 39 |
| Input | Output |
|
| 40 |
|-------|--------|
|
| 41 |
-
| Text
|
| 42 |
-
| Image
|
| 43 |
-
|
| 44 |
-
Base-1 will support **Text/Image β Text**: it will accept both text and images as input and generate text as output.
|
| 45 |
|
| 46 |
-
## Why is the model size
|
| 47 |
|
| 48 |
-
The parameter count of
|
| 49 |
|
| 50 |
-
In a conventional training pipeline,
|
| 51 |
|
| 52 |
-
1. **Architecture search comes first.** PRISM evaluates candidate architectures at compact proxy scales, measuring loss curves, gradient stability, activation behavior, and how performance
|
| 53 |
-
2. **Scaling laws are derived from the
|
| 54 |
3. **The final size is chosen from evidence, not convention.** Once the winning architecture's scaling characteristics are measured, the parameter budget will be set where the compute/performance trade-off is optimal for that specific design.
|
| 55 |
|
| 56 |
-
The final model size will be announced
|
| 57 |
|
| 58 |
-
##
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
```
|
| 68 |
|
| 69 |
-
|
| 70 |
-
- **Scaling-aware evaluation**: candidates are rewarded for smooth loss curves, stable gradients, and consistent improvements across scales β not just raw benchmark numbers.
|
| 71 |
-
- **Separate ownership**: architecture discovery and training-recipe improvements are attributed and rewarded independently, so both the design and the training procedure are optimized.
|
| 72 |
|
| 73 |
-
|
|
|
|
|
|
|
| 74 |
|
| 75 |
-
- [
|
| 76 |
-
- [ ] Neural architecture search via PRISM *(in progress)*
|
| 77 |
-
- [ ] Final architecture & model size announcement
|
| 78 |
-
- [ ] Training
|
| 79 |
-
- [ ] Weights release
|
| 80 |
|
| 81 |
-
##
|
| 82 |
|
| 83 |
-
-
|
| 84 |
-
-
|
|
|
|
| 85 |
|
| 86 |
## License
|
| 87 |
|
| 88 |
-
Apache 2.0
|
|
|
|
| 10 |
|
| 11 |
<div align="center">
|
| 12 |
|
| 13 |
+

|
| 14 |
|
| 15 |
+
# BASE-1
|
| 16 |
|
| 17 |
+
**A multimodal foundation model whose architecture is discovered through decentralized neural architecture search**
|
| 18 |
|
| 19 |
[](https://github.com/PlatformNetwork/prism)
|
| 20 |
[]()
|
| 21 |
[]()
|
| 22 |
+
[](https://www.apache.org/licenses/LICENSE-2.0)
|
| 23 |
|
| 24 |
</div>
|
| 25 |
|
| 26 |
---
|
| 27 |
|
| 28 |
+
## Status: In Development
|
| 29 |
|
| 30 |
+
BASE-1 is currently under active development. No weights are available yet. This repository will host the model checkpoints, configuration, and usage documentation once the architecture search and training phases are complete.
|
| 31 |
+
|
| 32 |
+
## Model Summary
|
| 33 |
+
|
| 34 |
+
| | |
|
| 35 |
+
|---|---|
|
| 36 |
+
| **Developer** | CortexLM |
|
| 37 |
+
| **Architecture** | Determined by neural architecture search (in progress) |
|
| 38 |
+
| **Parameters** | To be announced after architecture search |
|
| 39 |
+
| **Input modalities** | Text, Image |
|
| 40 |
+
| **Output modality** | Text |
|
| 41 |
+
| **Architecture search** | [PRISM](https://github.com/PlatformNetwork/prism) β decentralized NAS on the Platform Network |
|
| 42 |
+
| **License** | Apache 2.0 |
|
| 43 |
|
| 44 |
## Overview
|
| 45 |
|
| 46 |
+
BASE-1 is a foundation model being developed through [PRISM](https://github.com/PlatformNetwork/prism), a decentralized neural architecture search (NAS) challenge running on the Platform Network. Rather than committing to a hand-designed architecture upfront, BASE-1's design is discovered competitively: miners across the network submit novel architecture families and training recipes, which are evaluated in isolated benchmark environments for learning quality, training stability, and scaling behavior.
|
| 47 |
+
|
| 48 |
+
The best-performing architecture that emerges from this search will be used to train BASE-1 at scale.
|
| 49 |
+
|
| 50 |
+
### How the architecture is discovered
|
| 51 |
+
|
| 52 |
+
PRISM fixes the dataset and evaluation protocol, not the search space. Candidate submissions are scored on:
|
| 53 |
+
|
| 54 |
+
- **Learning quality** β proxy loss performance under a shared, deterministic evaluation contract
|
| 55 |
+
- **Training stability** β smooth loss curves, stable gradients, and well-behaved activations
|
| 56 |
+
- **Scaling signals** β consistent improvements across model size, depth, sequence length, and batch scaling
|
| 57 |
+
- **Noise resistance** β dynamic thresholds prevent marginal random fluctuations from being rewarded as improvements
|
| 58 |
|
| 59 |
+
Architecture discovery and training-recipe improvements (optimizer, loss computation, inference, train step) are attributed and rewarded independently, so both the model design and its training procedure are optimized by the network.
|
| 60 |
|
| 61 |
## Modalities
|
| 62 |
|
| 63 |
+
BASE-1 will support **Text/Image to Text**: it will accept text and images as input and generate text as output.
|
| 64 |
+
|
| 65 |
| Input | Output |
|
| 66 |
|-------|--------|
|
| 67 |
+
| Text | Text |
|
| 68 |
+
| Image | Text |
|
|
|
|
|
|
|
| 69 |
|
| 70 |
+
## Why is the model size not announced?
|
| 71 |
|
| 72 |
+
The parameter count of BASE-1 is genuinely not decided yet β and this is by design.
|
| 73 |
|
| 74 |
+
In a conventional training pipeline, the architecture and parameter budget are fixed first, then training begins. BASE-1 inverts this process:
|
| 75 |
|
| 76 |
+
1. **Architecture search comes first.** PRISM evaluates candidate architectures at compact proxy scales, measuring loss curves, gradient stability, activation behavior, and how performance evolves across model size, depth, sequence length, and batch size.
|
| 77 |
+
2. **Scaling laws are derived from the winning architecture.** Each architecture family exhibits its own scaling behavior. The optimal parameter count depends on the scaling-law signals of the architecture that wins the search β a number that cannot be known before the search concludes.
|
| 78 |
3. **The final size is chosen from evidence, not convention.** Once the winning architecture's scaling characteristics are measured, the parameter budget will be set where the compute/performance trade-off is optimal for that specific design.
|
| 79 |
|
| 80 |
+
The final model size will be announced once the architecture search is complete.
|
| 81 |
|
| 82 |
+
## Roadmap
|
| 83 |
|
| 84 |
+
| Phase | Description | Status |
|
| 85 |
+
|-------|-------------|--------|
|
| 86 |
+
| 1. PRISM challenge launch | Open the decentralized architecture search to miners on the Platform Network | In progress |
|
| 87 |
+
| 2. Architecture selection | Identify the best-performing architecture family from competitive evaluation and scaling analysis | Pending |
|
| 88 |
+
| 3. Dataset curation | Assemble and validate the large-scale multimodal training corpus | Pending |
|
| 89 |
+
| 4. Large-scale training | Train BASE-1 at the parameter budget derived from the winning architecture's scaling laws | Pending |
|
| 90 |
+
| 5. Model release | Publish weights, configuration, evaluation results, and usage documentation in this repository | Pending |
|
|
|
|
| 91 |
|
| 92 |
+
## Intended Use
|
|
|
|
|
|
|
| 93 |
|
| 94 |
+
BASE-1 is intended as a general-purpose multimodal foundation model for text generation conditioned on text and image inputs. Detailed intended-use guidance, limitations, and evaluation results will be published with the model release.
|
| 95 |
+
|
| 96 |
+
## Evaluation
|
| 97 |
|
| 98 |
+
Benchmark results will be published alongside the weights once training is complete. Architecture-search-stage evaluations follow the PRISM scoring protocol, documented in [Scoring and rewards](https://github.com/PlatformNetwork/prism/blob/main/docs/scoring.md) and [Scaling evaluation](https://github.com/PlatformNetwork/prism/blob/main/docs/scaling.md).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
+
## Resources
|
| 101 |
|
| 102 |
+
- PRISM (architecture search): [github.com/PlatformNetwork/prism](https://github.com/PlatformNetwork/prism)
|
| 103 |
+
- PRISM documentation: [Overview](https://github.com/PlatformNetwork/prism/blob/main/docs/overview.md) | [Architecture](https://github.com/PlatformNetwork/prism/blob/main/docs/architecture.md) | [Scoring](https://github.com/PlatformNetwork/prism/blob/main/docs/scoring.md) | [Scaling](https://github.com/PlatformNetwork/prism/blob/main/docs/scaling.md)
|
| 104 |
+
- Platform Network: [platform.network](https://platform.network)
|
| 105 |
|
| 106 |
## License
|
| 107 |
|
| 108 |
+
This repository is released under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|