| license: apache-2.0 | |
| base_model: pszemraj/jamba-900M-v0.13-KIx2 | |
| tags: | |
| - textbook | |
| - '16384' | |
| - long document | |
| metrics: | |
| - accuracy | |
| language: | |
| - en | |
| inference: false | |
| # BEE-spoke-data/Jamba-900M-doc-writer | |
| > to test it out, try [this notebook](https://colab.research.google.com/gist/pszemraj/28985fdbbb2460f8375d2d84b8babe9a/jamba-test-sandbox.ipynb) | |
| This model produces long, surprisingly coherent output that extends some input text; you can see an example [here](https://gist.github.com/pszemraj/b7c7ac65e56365cf5eab69622f16b356), which is a generated textbook about underwater city design. | |
|  | |
| Thanks to the Jamba arch, it uses low VRAM while generating outputs: about 2.5 GB VRAM to generate 12,288 tokens. | |
| ## Model description | |
| This model is a fine-tuned version of [pszemraj/jamba-900M-v0.13-KIx2](https://huggingface.co/pszemraj/jamba-900M-v0.13-KIx2) on some textbook data. | |
| It achieves the following results on the evaluation set: | |
| - Loss: 3.0200 | |
| - Accuracy: 0.4544 | |
| - Num Input Tokens Seen: 4940890112 | |
| ## Intended Uses & Limitations | |
| - Long context generation | |
| - It requires a rather long prompt (aka 'Introduction') to be coaxed into consistently producing long, textbook-like text | |
| - this model itself is small, so its reasoning, knowledge, etc. is limited, but still impressive for the size (hidden size 1024) | |
| --- |