Instructions to use Joaoffg/SHARE-4B-Base-2604 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Joaoffg/SHARE-4B-Base-2604 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Joaoffg/SHARE-4B-Base-2604")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Joaoffg/SHARE-4B-Base-2604")
model = AutoModelForCausalLM.from_pretrained("Joaoffg/SHARE-4B-Base-2604")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Joaoffg/SHARE-4B-Base-2604 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Joaoffg/SHARE-4B-Base-2604"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joaoffg/SHARE-4B-Base-2604",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Joaoffg/SHARE-4B-Base-2604

SGLang

How to use Joaoffg/SHARE-4B-Base-2604 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Joaoffg/SHARE-4B-Base-2604" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joaoffg/SHARE-4B-Base-2604",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Joaoffg/SHARE-4B-Base-2604" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joaoffg/SHARE-4B-Base-2604",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Joaoffg/SHARE-4B-Base-2604 with Docker Model Runner:
```
docker model run hf.co/Joaoffg/SHARE-4B-Base-2604
```

Model Card for SHARE-4B

SHARE-4B (Social-Humanities AI for Research and Education) is a 3.9-billion-parameter decoder-only causal language model pretrained exclusively on content relevant to the social sciences and humanities (SSH). It is intended as a domain-specific base model for SSH research and education, and is designed to be used through the MIRROR interface, which surfaces token-level surprisal rather than generating new text.

This model was introduced in the paper SHARE: Social-Humanities AI for Research and Education.

Note: This is a fully trained base (pretrained-only) model with no SFT, DPO, or RLHF. Due to its smaller size, a quantized version of SHARE-4B can be deployed on local machines with only CPU compute (e.g., student laptops), making it significantly more accessible and carbon-efficient than larger comparable models. This base model is not suitable for chat applications.

Model Details

Model Description

SHARE-4B is part of the first family of causal language models fully pretrained by and for the SSH disciplines. It mirrors the Phi-4-mini architecture but uses a custom 50,000-token BPE tokenizer trained on the SHARE corpus, and is pretrained exclusively on a curated SSH dataset drawn from Wikipedia, Project Gutenberg, PeS2o, and (for the larger SHARE-14B) CORE. On a custom SSH Cloze benchmark, SHARE-4B achieves 69.8% raw accuracy and 66.2% prior-corrected accuracy, marginally outperforming the comparable Pythia-3B (63.6% prior-corrected) despite having seen far fewer training tokens.

Developed by: João Gonçalves, Sonia de Jager, Petr Knoth, David Pride, Nick Jelicic
Funded by: NVIDIA Academic Grant; Dutch Research Council (NWO) VENI grant VI.Veni.221S.154
Model type: Decoder-only transformer causal language model (Phi-4-mini architecture)
Language(s) (NLP): Primarily English, with a smaller proportion of Dutch
License: Custom Responsible AI License (RAIL-SHARE) — non-commercial, no model distillation, restricted text generation use

Model Sources

Repository: https://github.com/Joaoffg/SHARE
Paper: SHARE: Social-Humanities AI for Research and Education
Demo (MIRROR interface): [Add link]
Contact: ferreiragoncalves@eshcc.eur.nl

Uses

Direct Use

SHARE-4B is intended primarily as a base model deployed through the MIRROR interface for SSH researchers, educators, and students. Its smaller size makes it particularly suitable for local, low-resource deployments such as on student laptops. Through MIRROR, the model is used to compute token-level surprisal and entropy on user-written texts in order to:

Identify typos, stylistic anomalies, and possible factual mistakes in academic writing
Highlight innovative or unexpected contributions in scholarly texts
Surface disciplinary biases and norms encoded in SSH literature
Support reflective revision of student and scholarly writing in the SSH

Downstream Use

Potential downstream uses include perplexity-based analyses of SSH texts, domain-specific text classification, and research on the structure and biases of SSH scholarly discourse. Downstream use is governed by the RAIL-SHARE license (non-commercial; no distillation).

Out-of-Scope Use

Commercial applications of any kind (forbidden by license)
Model distillation into other models (forbidden by license)
Unconstrained text generation, especially in academic contexts where it could enable student or faculty fraud
STEM, biomedical, mathematical, or coding tasks — the model was deliberately not trained on these domains
Use as a chat assistant — the model is base-pretrained only, with no SFT or alignment
Multilingual applications outside of English and (to a lesser extent) Dutch
Any safety-critical decision-making

Bias, Risks, and Limitations

SHARE-4B inherits the systemic biases present in the open-access English-language SSH scholarship it was trained on. As illustrated in the paper, terms associated with non-Western scholarship (e.g. "African" in the context of locations of knowledge production) can register as unexpected, reflecting the field's existing imbalances rather than properties of the topics themselves.

Other limitations and risks:

Smaller parameter count: SHARE-4B flags fewer nuanced stylistic and factual deviations than SHARE-14B. For example, in qualitative testing the 4B model was less confident than the 14B model in identifying incorrect author attributions and more subtle stylistic issues
Smaller training corpus: SHARE-4B was trained on the Wikipedia, Project Gutenberg, and PeS2o subsets but excluded the CORE dataset, resulting in a total of ~28 billion tokens across 2 epochs
English-dominant data, which is a meaningful constraint for SSH fields where multilingual scholarship matters
Causal interpretation effect: because surprisal is computed on preceding tokens, an early mistake in a text propagates and can mask later anomalies
Use in text reading/reviewing could be misused to shortcut careful reading of academic work
No alignment or safety tuning has been applied — the model is released as a base model
Outperformed by smaller masked models on some tasks: on the SSH Cloze benchmark, SHARE-4B ranks below the much smaller SSciBERT model, showing that masked language models with tightly aligned training corpora can still outperform larger causal models in Cloze tasks

Recommendations

Users should treat MIRROR outputs as prompts for reflection rather than authoritative judgments. Surprisal does not equal correctness, and unexpectedness can signal innovation as readily as error. When using MIRROR for revision, work from the beginning of the text to mitigate the propagation of earlier surprisal into later tokens. For use cases requiring more nuanced detection of stylistic or factual deviations, users may prefer SHARE-14B; SHARE-4B is well-suited for local deployment and lower-resource contexts. Researchers should be aware of the model's biases toward dominant SSH discourses and read its outputs critically. Use of SHARE for direct text generation is discouraged.

Training Details

Training Data

The training corpus for SHARE-4B combines three SSH-focused subsets (the CORE dataset was added only for SHARE-14B):

Wikipedia (English and Dutch): articles selected by traversing the category tree from SSH-relevant main topic classifications using PetScan and extracted with WikiExtractor
Project Gutenberg: books filtered by SSH-relevant Library of Congress Classes (B, C, D, G, H, J, K, L, M, N)
Academic publications: drawn from PeS2o, filtered using AllenAI's Field of Science (FoS) classifier to retain SSH disciplines (Art, Business, Economics, Geography, Education, History, Law, Linguistics, Philosophy, Political Science, Psychology, Sociology), plus additional materials provided through agreements with publishers including Open Humanities Press

The SHARE-4B corpus totals approximately 14 billion tokens, covered across 2 training epochs. See the technical report for details on filtering and selection.

Training Procedure

Preprocessing

Raw data preprocessing was carried out exclusively on EU servers. A custom BPE tokenizer with a 50,000-token vocabulary was trained on the full SHARE corpus.

Training Hyperparameters

Training regime: Mixed precision with FlashAttention-2
Architecture: Phi-4-mini (decoder-only transformer)
Context length: 4096 tokens
Global batch size: 64
Warm-up steps: 3000
Learning rate: 2e-4 with cosine learning rate scheduler
Weight decay: 0.01
Epochs: 2

Speeds, Sizes, Times

Training was conducted on Saturn Cloud using 8× NVIDIA A100 GPUs under data parallelism over a period of 656 hours to complete 2 epochs over approximately 28 billion tokens. Training loss, evaluation loss, and gradient normalization values indicated a smooth training run, reaching an evaluation perplexity of 11.94.

Evaluation

Testing Data, Factors & Metrics

Testing Data

Perplexity comparison: Erasmus University Rotterdam research output abstracts from Q3–Q4 2025, out of distribution from the training data
SSH Cloze benchmark: 275 SSH abstracts published in Q1 2026 (25 per Web of Science field across 11 SSH disciplines), constructed by selecting sentences with equivalent-token decisions (e.g. positive/negative, higher/lower) where SSH knowledge is required to predict the correct token

Factors

Scientific domain (FoS classifier categories)
Faculty affiliation of authors at Erasmus University Rotterdam (used as an ecological-validity check)

Metrics

Log-perplexity difference relative to Phi-4-mini (lower means better SHARE fit)
Raw and prior-corrected accuracy on the SSH Cloze benchmark (prior correction accounts for models guessing the more frequent token)

Results

On the SSH Cloze benchmark, SHARE-4B achieves 69.8% raw accuracy and 66.2% prior-corrected accuracy. This marginally outperforms the comparable Pythia-3B (65.8% / 63.6%) and Pythia-12B (67.3% / 61.5%) despite SHARE-4B being trained on substantially fewer tokens (~28B vs 300B). SHARE-4B underperforms Phi-4-mini (73.8% / 69.8%), which was trained on ~5 trillion tokens, but demonstrates meaningfully better compute efficiency. Notably, the much smaller SSciBERT-e2 (110M, masked LM) achieves a higher prior-corrected accuracy (67.6%), reflecting the strength of tightly domain-aligned masked models on Cloze tasks.

Perplexity analyses show that the gap between SHARE-4B and Phi-4-mini is consistently smaller for SSH fields (Art, Education, Sociology) than for STEM fields (Biology, Engineering, Medicine), indicating the intended SSH specialization. At the faculty level, the same pattern holds: Erasmus MC (medical) shows the largest gap, while SSH-focused faculties show the smallest.

Summary

SHARE-4B is a fully trained, compact SSH-specialized model that achieves an evaluation perplexity of 11.94 and competitive Cloze performance relative to similarly sized general-purpose causal models. Its smaller footprint makes it suitable for local, CPU-based deployment in educational contexts.

Model Examination

Early experiments with instruction-tuned variants suggest that, because the training data deliberately excludes domains such as cybersecurity, biological weapons, and CSAM, classical safety risks are limited; the model also tends to default to harm-reducing framings when prompted with SSH-relevant harmful queries. Memorization probes on the SHARE family indicate that the models do not reproduce copyrighted content, with the few instances of memorization corresponding only to disclaimers and standard headers.

Environmental Impact

Hardware Type: 8× NVIDIA A100 GPUs (Saturn Cloud)
Hours used: ~656 hours
Cloud Provider: Saturn Cloud
Compute Region: United States
Carbon Emitted: Estimated at approximately 1.2 metric tons of CO₂ equivalents, roughly equivalent to the emissions of an economy one-way flight from Amsterdam to New York

The project applied Chinchilla scaling laws to budget compute and used efficiency techniques (mixed precision, FlashAttention-2, gradient checkpointing) to reduce energy use. A quantized version of SHARE-4B runs on CPU-only hardware such as student laptops, offering a significantly lower-carbon alternative to larger models for educational use.

Citation

BibTeX:

@misc{gonçalves2026sharesocialhumanitiesairesearch,
      title={SHARE: Social-Humanities AI for Research and Education}, 
      author={João Gonçalves and Sonia de Jager and Petr Knoth and David Pride and Nick Jelicic},
      year={2026},
      eprint={2604.11152},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.11152}, 
}

APA:

Gonçalves, J., de Jager, S., Knoth, P., Pride, D., & Jelicic, N. (2026). SHARE: Social-humanities AI for research and education. arXiv. https://arxiv.org/abs/2604.11152

Privacy statement

Personal data, such as author names, may be included in the training documents for SHARE; we use legitimate interest as legal basis for processing the data under the EU's GDPR. The full privacy statement can be consulted here: https://surfdrive.surf.nl/s/gFnxgL6f5jer8yy

Glossary

SSH: Social Sciences and Humanities
MIRROR: Model Interface for Reflective Research Output Revisions — the user interface that displays per-token surprisal from SHARE rather than generating text
Surprisal: Negative log probability of an observed token under the model
Prior-corrected accuracy: Cloze accuracy adjusted to discount correct guesses arising from token frequency priors
FoS: Field of Science (AllenAI classifier used for disciplinary labelling)
RAIL: Responsible AI License

More Information

This model is released alongside SHARE-14B and the MIRROR interface as part of a technical report inviting feedback from the SSH and ML communities. SHARE-4B is particularly intended to enable local, low-resource deployment in educational settings.

Model Card Authors

João Gonçalves

Model Card Contact

ferreiragoncalves@eshcc.eur.nl

Downloads last month: 28

Safetensors

Model size

4B params

Tensor type

F32

Paper for Joaoffg/SHARE-4B-Base-2604

SHARE: Social-Humanities AI for Research and Education

Paper • 2604.11152 • Published Apr 13 • 1