RWKV-7-25M-Base / README.md

xTimeCrystal

Update README.md

5a9917f verified 7 months ago

preview code

raw

history blame contribute delete

2.68 kB

metadata

license: mit
datasets:
  - HuggingFaceFW/fineweb-edu
  - open-web-math/open-web-math
language:
  - en
pipeline_tag: text-generation

Model Card for Model ID

Model Details

Model Description

Developed by: xTimeCrystal
Model type: RWKV 7 (NOTE: the decay is computed using -F.softplus instead of -0.606*torch.sigmoid, all LoRAs use Tanh, LoRA weights are stored like nn.Linear)
Language(s) (NLP): English
License: MIT

Uses

Direct Use

Fast autocomplete model.

Out-of-Scope Use

Don't use it for anything serious, it lacks any form of intelligence.

Bias, Risks, and Limitations

Limited to ~couple exaFLOPs of compute, don't expect anything coherent beyond a couple sentences.

Recommendations

How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

Training Details

Training Data

50B Bytes of custom FineWeb Edu & Open Web Math mixture.

Training Hyperparameters

Training regime: bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3.

Speeds, Sizes, Times

Throughput = 350 characters/second using unoptimized inference code. Prompt processing is basically instantaneous, so generation is likely bottlenecked by bandwidth and overhead.

xTimeCrystal
/

RWKV-7-25M-Base

Model Card for Model ID

Model Details

Model Description

Uses

Direct Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Hyperparameters

Speeds, Sizes, Times

Evaluation

Results

Summary

Technical Specifications

Model Architecture and Objective

Compute Infrastructure