Text Classification
Scikit-learn
Joblib
English
intent-classification
logistic-regression
conference-talk-demo
Instructions to use thinktecture/intent-logreg-nextera with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use thinktecture/intent-logreg-nextera with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("thinktecture/intent-logreg-nextera", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
Add MODEL_LICENSES.md (so model card can link relatively)
Browse files- MODEL_LICENSES.md +101 -0
MODEL_LICENSES.md
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Model Licenses β Read Before Redistributing
|
| 2 |
+
|
| 3 |
+
The Apache-2.0 [`LICENSE`](../LICENSE) at the repo root covers **this repo's code**.
|
| 4 |
+
|
| 5 |
+
It does **not** cover the model weights you download, train against, or merge into
|
| 6 |
+
GGUFs. Those follow their respective base-model licenses. If you fine-tune one of
|
| 7 |
+
these models and publish the result (e.g. push to HuggingFace, ship in a product),
|
| 8 |
+
the base-model terms come with it.
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## Base models used by this pipeline
|
| 13 |
+
|
| 14 |
+
| Model | HF ID | License | Click-through? |
|
| 15 |
+
|---|---|---|---|
|
| 16 |
+
| Gemma 3 (1B, 4B, EmbeddingGemma 300M) | `google/gemma-3-*`, `google/embeddinggemma-300m` | [Gemma Terms of Use](https://ai.google.dev/gemma/terms) | **Yes β once per HF account** |
|
| 17 |
+
| Qwen 3.5-4B | `Qwen/Qwen3.5-4B` | [Tongyi Qianwen License Agreement](https://huggingface.co/Qwen/Qwen3.5-4B/blob/main/LICENSE) | No |
|
| 18 |
+
| GLM-OCR (optional, for OCR upload path) | `zai-org/GLM-OCR` | [MIT](https://huggingface.co/zai-org/GLM-OCR) (check current README) | No |
|
| 19 |
+
|
| 20 |
+
> The `ggml-org/*-GGUF` repos referenced in `setup.sh` are official llama.cpp
|
| 21 |
+
> quantizations of the same upstream models above β same license terms apply.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## What "derivative work" means here
|
| 26 |
+
|
| 27 |
+
When you run `python -m finetune.train_gemma3` and produce
|
| 28 |
+
`models/gemma3-1b-ft-merged/gemma3-ft-<scenario>.gguf`, that file is a **derivative
|
| 29 |
+
work of Gemma 3**. Anyone who downloads it must accept the Gemma Terms of Use
|
| 30 |
+
the same way you did when you first pulled the base weights from HuggingFace.
|
| 31 |
+
|
| 32 |
+
Practical implications:
|
| 33 |
+
- **Inside this repo only**: nothing to do. The base weights are gitignored
|
| 34 |
+
(`models/*.gguf`), the FT outputs are gitignored, and the training data is
|
| 35 |
+
scenario-specific synthetic content owned by you.
|
| 36 |
+
- **Publishing a fine-tuned GGUF**: include the Gemma / Qwen license file in
|
| 37 |
+
the release, name the base model in the model card, and follow the upstream
|
| 38 |
+
attribution requirements.
|
| 39 |
+
- **Building a product on top**: the model is "used", not redistributed β
|
| 40 |
+
most permissive license terms allow that. Read the actual license; this is
|
| 41 |
+
not legal advice.
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## Gemma 3 β quick reference
|
| 46 |
+
|
| 47 |
+
The [Gemma Terms of Use](https://ai.google.dev/gemma/terms) (you accept them
|
| 48 |
+
when you click "Agree" on the HuggingFace page) permit:
|
| 49 |
+
|
| 50 |
+
- Commercial use, distribution, modification, fine-tuning
|
| 51 |
+
- Creating derivative models (including merged + quantized GGUFs)
|
| 52 |
+
|
| 53 |
+
And require:
|
| 54 |
+
- Including the prohibited-use policy in any redistribution
|
| 55 |
+
- Marking outputs from Gemma as AI-generated when relevant
|
| 56 |
+
- Not redistributing without including the Terms
|
| 57 |
+
|
| 58 |
+
There's **no patent grant** in the Gemma Terms (unlike Apache-2.0). For most
|
| 59 |
+
AI-application scenarios this is benign, but it's the main thing that
|
| 60 |
+
distinguishes "Gemma derivative" from "Apache-2.0 derivative."
|
| 61 |
+
|
| 62 |
+
---
|
| 63 |
+
|
| 64 |
+
## Qwen 3.5 β quick reference
|
| 65 |
+
|
| 66 |
+
The [Tongyi Qianwen License](https://huggingface.co/Qwen/Qwen3.5-4B/blob/main/LICENSE)
|
| 67 |
+
permits commercial use up to a monthly active user threshold (re-check the
|
| 68 |
+
current text β Alibaba has revised this between Qwen versions). Above the
|
| 69 |
+
threshold a separate commercial license is required.
|
| 70 |
+
|
| 71 |
+
For typical local-AI / on-prem deployments well under that MAU bar, the license
|
| 72 |
+
behaves like an Apache-2.0-equivalent. Attribution to Qwen / Alibaba Cloud is
|
| 73 |
+
required in any distribution.
|
| 74 |
+
|
| 75 |
+
---
|
| 76 |
+
|
| 77 |
+
## Your fine-tuned outputs
|
| 78 |
+
|
| 79 |
+
The training data, prompts, and scenario configurations in this repo are MIT/Apache-2.0
|
| 80 |
+
licensed (per the repo `LICENSE`). The **merged GGUF model files** that
|
| 81 |
+
the training pipeline produces are:
|
| 82 |
+
|
| 83 |
+
- a derivative of the base model β covered by **Gemma Terms / Qwen License**
|
| 84 |
+
- AND contain weights influenced by your training data β covered by **your own license**
|
| 85 |
+
|
| 86 |
+
In practical terms, when you publish a fine-tuned GGUF you should ship:
|
| 87 |
+
1. The base-model license (Gemma Terms or Qwen License, alongside the GGUF)
|
| 88 |
+
2. A model card describing your training data + intended use (you can use
|
| 89 |
+
the existing `data/training-data/*.jsonl` as the data card)
|
| 90 |
+
3. Your own license for the training-data-derived artifacts (Apache-2.0
|
| 91 |
+
inherited from this repo's `LICENSE` is the default)
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## Not legal advice
|
| 96 |
+
|
| 97 |
+
This is a one-page operator's summary. For anything material (commercial release,
|
| 98 |
+
public model card, enterprise deployment) consult an actual lawyer who reads
|
| 99 |
+
AI licenses for a living β the Gemma and Qwen license texts have evolved
|
| 100 |
+
multiple times and the obligations may have changed since this document was
|
| 101 |
+
written.
|