BEE-spoke-data
/

Jamba-900M-doc-writer

Text Generation

Model card Files Files and versions

Jamba-900M-doc-writer / README.md

pszemraj's picture

Super-squash branch 'main' using huggingface_hub

9d4ee7c verified 4 months ago

|

history blame contribute delete

1.45 kB

	---
	license: apache-2.0
	base_model: pszemraj/jamba-900M-v0.13-KIx2
	tags:
	- textbook
	- '16384'
	- long document
	metrics:
	- accuracy
	language:
	- en
	inference: false
	---

	# BEE-spoke-data/Jamba-900M-doc-writer

	> to test it out, try [this notebook](https://colab.research.google.com/gist/pszemraj/28985fdbbb2460f8375d2d84b8babe9a/jamba-test-sandbox.ipynb)

	This model produces long, surprisingly coherent output that extends some input text; you can see an example [here](https://gist.github.com/pszemraj/b7c7ac65e56365cf5eab69622f16b356), which is a generated textbook about underwater city design.


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/wWCnoAQ1NSoa3k4w3xvP9.png)


	Thanks to the Jamba arch, it uses low VRAM while generating outputs: about 2.5 GB VRAM to generate 12,288 tokens.

	## Model description

	This model is a fine-tuned version of [pszemraj/jamba-900M-v0.13-KIx2](https://huggingface.co/pszemraj/jamba-900M-v0.13-KIx2) on some textbook data.

	It achieves the following results on the evaluation set:
	- Loss: 3.0200
	- Accuracy: 0.4544
	- Num Input Tokens Seen: 4940890112

	## Intended Uses & Limitations

	- Long context generation
	- It requires a rather long prompt (aka 'Introduction') to be coaxed into consistently producing long, textbook-like text
	- this model itself is small, so its reasoning, knowledge, etc. is limited, but still impressive for the size (hidden size 1024)

	---