Text Generation
English
RWKV-7-25M-Base / README.md
xTimeCrystal's picture
Update README.md
5a9917f verified
metadata
license: mit
datasets:
  - HuggingFaceFW/fineweb-edu
  - open-web-math/open-web-math
language:
  - en
pipeline_tag: text-generation

Model Card for Model ID

Model Details

Model Description

  • Developed by: xTimeCrystal
  • Model type: RWKV 7 (NOTE: the decay is computed using -F.softplus instead of -0.606*torch.sigmoid, all LoRAs use Tanh, LoRA weights are stored like nn.Linear)
  • Language(s) (NLP): English
  • License: MIT

Uses

Direct Use

Fast autocomplete model.

Out-of-Scope Use

Don't use it for anything serious, it lacks any form of intelligence.

Bias, Risks, and Limitations

Limited to ~couple exaFLOPs of compute, don't expect anything coherent beyond a couple sentences.

Recommendations

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

50B Bytes of custom FineWeb Edu & Open Web Math mixture.

Training Hyperparameters

  • Training regime: bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3.

Speeds, Sizes, Times

Throughput = 350 characters/second using unoptimized inference code. Prompt processing is basically instantaneous, so generation is likely bottlenecked by bandwidth and overhead.

Evaluation

Results

Bits-per-byte: ~1 HellaSwag Accuracy: 33.4% (removed Wikihow entries)

Summary

Technical Specifications

Model Architecture and Objective

Modded RWKV 7 (see top)

Compute Infrastructure

1 x RTX 4080 for 1 week