🀏 smolified-ocr-data-extract-and-compare

Intelligence, Distilled.

This is a Domain Specific Language Model (DSLM) generated by the Smolify Foundry.

It has been synthetically distilled from SOTA reasoning engines into a high-efficiency architecture, optimized for deployment on edge hardware (CPU/NPU) or low-VRAM environments.

πŸ“¦ Asset Details

  • Origin: Smolify Foundry (Job ID: 790dd5fa)
  • Architecture: DSLM-Micro (270M Parameter Class)
  • Training Method: Proprietary Neural Distillation
  • Optimization: 4-bit Quantized / FP16 Mixed
  • Dataset: Link to Dataset

πŸš€ Usage (Inference)

This model is compatible with standard inference backends like vLLM.

# Example: Running your Sovereign Model
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "titou4ng/smolified-ocr-data-extract-and-compare"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

messages = [
    {'role': 'system', 'content': '''You are an OCR data extraction and comparison engine. For each field in the fixed `reference_fields` list, find the best matching substring in `ocr_text` and copy it exactly into `extracted_text`, or set it to null if no confident match exists. Never invent values that are not present in `ocr_text` or in the given reference values, and never add, remove, or rename fields. Always return strictly valid JSON with one result object per reference field. Always return exactly 12 fields.'''},
    {'role': 'user', 'content': '''ocr_text: "Date: 14/11/2023\nRef. Ticket: 90021\nSite D'origine: CARRIERES DE L'OUEST SARL Siret 19876543210000\nAdresse: 25 RUE DE LA ROCHE, 49000 ANGERS\nSite De Destination: CENTRALE BETON DU VAL DE LOIRE SAS 10293847560000\nAdresse: 12 CHEMIN DU MOULIN, 37000 TOURS\nType De MatΓ©riau: GRAVIERS\nPoids NET: 45.0 T"'''}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
).removeprefix('<bos>')

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 1000,
    temperature = 1, top_p = 0.95, top_k = 64,
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

βš–οΈ License & Ownership

This model weights are a sovereign asset owned by titou4ng. Generated via Smolify.ai.

Downloads last month
124
Safetensors
Model size
0.3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support