llmware
/

bling-tiny-llama-npu-ov

Model card Files Files and versions

doberst commited on Mar 27, 2025

Commit

7817db2

·

verified ·

1 Parent(s): b85c3a8

Upload README.md

Files changed (1) hide show

README.md +31 -3

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+inference: false
+base_model: llmware/bling-tiny-llama-v0
+base_model_relation: quantized
+tags: [green, llmware-rag, p1, ov]
+---
+# bling-tiny-llama-npu-ov
+**bling-tiny-llama-npu-ov** is a very small, very fast fact-based question-answering model, designed for retrieval augmented generation (RAG) with complex business documents, quantized and packaged in OpenVino int4 for AI PCs using Intel NPU.
+This model is one of the smallest and fastest in the series.  For higher accuracy, look at larger models in the BLING/DRAGON series.
+### Model Description
+- **Developed by:** llmware
+- **Model type:** tinyllama
+- **Parameters:** 1.1 billion
+- **Quantization:** int4
+- **Model Parent:** [llmware/bling-tiny-llama-v0](https://www.huggingface.co/llmware/bling-tiny-llama-v0)
+- **Language(s) (NLP):** English
+- **License:** Apache 2.0
+- **Uses:** Fact-based question-answering, RAG
+- **RAG Benchmark Accuracy Score:** 86.5
+## Model Card Contact
+[llmware on github](https://www.github.com/llmware-ai/llmware)
+[llmware on hf](https://www.huggingface.co/llmware)
+[llmware website](https://www.llmware.ai)