HYDARIM7 commited on
Commit
c4636e3
·
verified ·
1 Parent(s): 28a7f7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -8
README.md CHANGED
@@ -67,15 +67,28 @@ model_name = "InfocubeSrl/LexCube"
67
  tokenizer = AutoTokenizer.from_pretrained(model_name)
68
  model = AutoModelForMaskedLM.from_pretrained(model_name)
69
 
70
- text = "La legge [MASK] approvata dal parlamento."
71
- inputs = tokenizer(text, return_tensors="pt")
72
- outputs = model(**inputs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
- mask_index = (inputs["input_ids"][0] == tokenizer.mask_token_id).nonzero()[0]
75
- predicted_id = outputs.logits[0, mask_index].argmax()
76
- predicted_token = tokenizer.decode(predicted_id)
77
-
78
- print("Prediction:", predicted_token)
79
  ```
80
 
81
 
@@ -90,3 +103,5 @@ print("Prediction:", predicted_token)
90
  - Structured format with numbered provisions and cross-citations
91
  - Avg. length: ~909 words (≈2,193 tokens per document); some documents exceed 11k tokens
92
  - **Confidentiality:** Raw dataset cannot be shared due to contractual agreements, but it has been statistically and linguistically analyzed for research
 
 
 
67
  tokenizer = AutoTokenizer.from_pretrained(model_name)
68
  model = AutoModelForMaskedLM.from_pretrained(model_name)
69
 
70
+ # Examples with [MASK]
71
+ examples = [
72
+ "[MASK] il Decreto Legislativo 18 agosto 2000, n. 267 (Testo Unico delle leggi sull'ordinamento degli Enti Locali)",
73
+ "ACQUISITI, ai sensi dell'art. [MASK] del D.Lgs. 267/2000, i pareri favorevoli di regolarità tecnica e di regolarità contabile",
74
+ "Visto gli art. [MASK] e 42 del D.Lgs n.267/2000, Testo unico degli enti locali.",
75
+ "DI DICHIARARE la presente deliberazione immediatamente [MASK] ai sensi dell'art. 134, comma 4, del D.Lgs. n. 267/2000."
76
+ ]
77
+
78
+ for text in examples:
79
+ inputs = tokenizer(text, return_tensors="pt")
80
+ outputs = model(**inputs)
81
+
82
+ # Find mask token position
83
+ mask_index = (inputs["input_ids"][0] == tokenizer.mask_token_id).nonzero(as_tuple=True)[0]
84
+
85
+ # Get top prediction
86
+ predicted_id = outputs.logits[0, mask_index].argmax(dim=-1)
87
+ predicted_token = tokenizer.decode(predicted_id)
88
+
89
+ print(f"Input: {text}")
90
+ print(f"Prediction: {predicted_token}\n")
91
 
 
 
 
 
 
92
  ```
93
 
94
 
 
103
  - Structured format with numbered provisions and cross-citations
104
  - Avg. length: ~909 words (≈2,193 tokens per document); some documents exceed 11k tokens
105
  - **Confidentiality:** Raw dataset cannot be shared due to contractual agreements, but it has been statistically and linguistically analyzed for research
106
+
107
+