lenamerkli commited on
Commit
72f02e1
·
verified ·
1 Parent(s): c54e1e9

Upload 9 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ politscanner.pdf filter=lfs diff=lfs merge=lfs -text
37
+ Qwen3-1.7B-PolitScanner-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
Qwen3-1.7B-PolitScanner-Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:45b90f2ed2ee072908a7d8a61f0c5a0a6ebc1e1793c72d70887c290cad396e15
3
+ size 1230579264
README.md CHANGED
@@ -1,3 +1,142 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PolitScanner
2
+
3
+ ## Abstract
4
+
5
+ Swiss politicians lie.
6
+ And they mostly get away with it.
7
+ "One reason for this is that fact-checks, which can only be carried out retrospectively, are surprisingly ineffective. Listeners still remember the false information. The correction is forgotten." — Philipp Gerlach.
8
+ This technical report provides a comprehensive overview of the artificial intelligence components of the PolitScanner project.
9
+ It aims to automatically detect false narratives and fake news in the speeches of Swiss politicians while avoiding the inaccuracies inherent in Large Language Models.
10
+
11
+ The training code can be found on [GitHub](https://github.com/lenamerkli/PolitScanner).
12
+
13
+ This is an entry for the Swiss AI Challenge 2025.
14
+ More information can be found at [www.ki-challenge.ch](https://www.ki-challenge.ch/).
15
+
16
+ ## Table of Contents
17
+
18
+ 0. [Abstract](#abstract)
19
+ 1. [Paper](#paper)
20
+ 2. [Installation](#installation)
21
+ 3. [Usage](#usage)
22
+ 4. [License](#license)
23
+ 5. [Citation](#citation)
24
+
25
+ ## Paper
26
+
27
+ Read the full paper [here](https://huggingface.co/lenamerkli/PolitScanner/blob/main/PolitScanner.pdf).
28
+
29
+ ## Installation
30
+
31
+ **Note: inference only**
32
+
33
+ This installation guide is for Nobara Linux.
34
+ Other distributions should work as well.
35
+
36
+ For the full installation guide, see [the development readme](https://github.com/lenamerkli/PolitScanner/blob/main/README.md).
37
+
38
+ ### Increase memlock
39
+
40
+ Add (or update) the following lines to `/etc/security/limits.conf`:
41
+ ```text
42
+ * soft memlock 50331648
43
+ * hard memlock 50331648
44
+ ```
45
+
46
+ ### Git
47
+
48
+ Install git:
49
+
50
+ ```shell
51
+ sudo dnf install git
52
+ ```
53
+
54
+ Clone the PolitScanner repository:
55
+
56
+ ```shell
57
+ git clone https://huggingface.co/lenamerkli/PolitScanner
58
+ cd PolitScanner
59
+ ```
60
+
61
+ ### Python
62
+
63
+ Install Python version 3.12.10 with the following command:
64
+
65
+ ```shell
66
+ sudo dnf install python3.12-0:3.12.10-1.fc41.x86_64
67
+ ```
68
+
69
+ Install the Python virtual environment package:
70
+
71
+ ```shell
72
+ sudo dnf install python3-virtualenv
73
+ ```
74
+
75
+ Create the virtual environment:
76
+
77
+ ```shell
78
+ ./create_venv.sh
79
+ ```
80
+
81
+ Activate the virtual environment:
82
+
83
+ ```shell
84
+ source .venv/bin/activate
85
+ ```
86
+
87
+ ### llama.cpp
88
+
89
+ If llama.cpp is not installed, check the [development readme](https://github.com/lenamerkli/PolitScanner/blob/main/README.md) for instructions.
90
+
91
+ ### Download models
92
+
93
+ Run the downloader:
94
+
95
+ ```shell
96
+ python3 download_ggufs.py
97
+ ```
98
+
99
+ Move the PolitScanner model:
100
+
101
+ ```shell
102
+ mv ./Qwen3-1.7B-PolitScanner-Q5_K_S.gguf /opt/llms/Qwen3-1.7B-PolitScanner-Q5_K_S.gguf
103
+ ```
104
+
105
+ ## Usage
106
+
107
+ Copy the political speech (preferably in swiss high german) to the `input.txt` file.
108
+
109
+ Run the program:
110
+
111
+ ```shell
112
+ python3 main.py
113
+ ```
114
+
115
+ The output will be written to the `output.txt` file.
116
+
117
+ ## License
118
+
119
+ [MIT License](https://github.com/lenamerkli/PolitScanner/blob/main/LICENSE)
120
+
121
+ ## Citation
122
+
123
+ bibtex:
124
+ ```bibtex
125
+ @misc{merkli2025politscanner,
126
+ title = {PolitScanner: Automatic Detection of common Incorrect Statements in Speeches of Swiss Politicians},
127
+ author = {Lena Merkli},
128
+ year = {2025},
129
+ month = {07},
130
+ url = {https://huggingface.co/lenamerkli/PolitScanner}
131
+ }
132
+ ```
133
+ biblatex:
134
+ ```biblatex
135
+ @online{merkli2025politscanner,
136
+ title = {PolitScanner: Automatic Detection of common Incorrect Statements in Speeches of Swiss Politicians},
137
+ author = {Lena Merkli},
138
+ year = {2025},
139
+ month = {07},
140
+ url = {https://huggingface.co/lenamerkli/PolitScanner}
141
+ }
142
+ ```
create_venv.sh ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ if [ ! -d .venv ]; then
2
+ /usr/bin/python3.12 -m venv .venv
3
+ fi
4
+
5
+ # install packages
6
+ .venv/bin/pip3 install --trusted-host pypi.org --trusted-host files.pythonhosted.org --upgrade pip
7
+ .venv/bin/pip3 install wheel==0.46.1 setuptools==79.0.0 flask==3.1.0 requests==2.32.3 tqdm==4.67.1 chromadb==1.0.7 certifi==2025.6.15
download_ggufs.py ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ sys.path.append('/home/lena/Documents/python/PolitScanner/util')
3
+
4
+ from pathlib import Path
5
+ from os.path import join
6
+ from requests import get
7
+ from shutil import copyfile
8
+
9
+
10
+ MODELS = {
11
+ 'DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf': 'https://huggingface.co/bartowski/deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-GGUF/resolve/main/deepseek-ai_DeepSeek-R1-0528-Qwen3-8B-Q5_K_L.gguf',
12
+ 'Ministral-8B-Instruct-2410-Q4_K_S.gguf': 'https://huggingface.co/bartowski/Ministral-8B-Instruct-2410-GGUF/resolve/main/Ministral-8B-Instruct-2410-Q4_K_S.gguf',
13
+ 'Qwen3-30B-A3B-Q5_K_M.gguf': 'https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-GGUF/resolve/main/Qwen_Qwen3-30B-A3B-Q5_K_M.gguf',
14
+ 'Qwen3-8B-Q5_K_M.gguf': 'https://huggingface.co/bartowski/Qwen_Qwen3-8B-GGUF/resolve/main/Qwen_Qwen3-8B-Q5_K_M.gguf',
15
+ 'Qwen3-32B-Q4_K_S.gguf': 'https://huggingface.co/unsloth/Qwen3-32B-GGUF/resolve/main/Qwen3-32B-Q4_K_S.gguf',
16
+ }
17
+
18
+
19
+ def download_file(url: str, directory: str, filename: str = None):
20
+ Path(directory).mkdir(parents=True, exist_ok=True)
21
+ if filename is None:
22
+ filename = url.split('/')[-1].split('?')[0]
23
+ filepath = join(directory, filename)
24
+ with get(url, stream=True) as r:
25
+ r.raise_for_status()
26
+ with open(filepath, 'wb') as f:
27
+ for chunk in r.iter_content(chunk_size=4 * 1024 * 1024):
28
+ f.write(chunk)
29
+ return filepath
30
+
31
+
32
+ def main() -> None:
33
+ Path('/opt/llms').mkdir(exist_ok=True, parents=True)
34
+ question = 'Which of these following models do you want to download?'
35
+ for i, model in enumerate(MODELS.keys()):
36
+ question += f"\n[{i + 1}] {model}"
37
+ response = input(question).replace(' ', '').replace(';', ',')
38
+ parsed = [int(x) for x in response.split(',')]
39
+ if len(parsed) > 0:
40
+ copyfile('./index.json', '/opt/llms/index.json')
41
+ for i in parsed:
42
+ model = list(MODELS.keys())[i - 1]
43
+ url = MODELS[model]
44
+ print(f"Downloading {model}")
45
+ download_file(url, '/opt/llms/', model)
46
+ print(f"Downloaded {model}")
47
+ print('Done')
48
+
49
+
50
+ if __name__ == '__main__':
51
+ main()
52
+
53
+
index.json ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Qwen3-30B-A3B": {
3
+ "parameters": 30532122624,
4
+ "context": 32768,
5
+ "layers": 48,
6
+ "thinking": true,
7
+ "optional_thinking": true,
8
+ "system_message": "",
9
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
10
+ "sampling": {
11
+ "temperature": 0.7,
12
+ "top_p": 0.8,
13
+ "top_k": 20,
14
+ "min_p": 0
15
+ },
16
+ "sampling_thinking": {
17
+ "temperature": 0.6,
18
+ "top_p": 0.95,
19
+ "top_k": 20,
20
+ "min_p": 0
21
+ }
22
+ },
23
+ "Qwen3-1.7B": {
24
+ "parameters": 2031739904,
25
+ "context": 32768,
26
+ "layers": 28,
27
+ "thinking": true,
28
+ "optional_thinking": true,
29
+ "system_message": "",
30
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
31
+ "sampling": {
32
+ "temperature": 0.7,
33
+ "top_p": 0.8,
34
+ "top_k": 20,
35
+ "min_p": 0
36
+ },
37
+ "sampling_thinking": {
38
+ "temperature": 0.6,
39
+ "top_p": 0.95,
40
+ "top_k": 20,
41
+ "min_p": 0
42
+ }
43
+ },
44
+ "Qwen3-8B": {
45
+ "parameters": 8190735360,
46
+ "context": 32768,
47
+ "layers": 36,
48
+ "thinking": true,
49
+ "optional_thinking": true,
50
+ "system_message": "",
51
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if message.content is string %}\n {%- set content = message.content %}\n {%- else %}\n {%- set content = '' %}\n {%- endif %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is string %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in content %}\n {%- set reasoning_content = content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- set content = content.split('</think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
52
+ "sampling": {
53
+ "temperature": 0.7,
54
+ "top_p": 0.8,
55
+ "top_k": 20,
56
+ "min_p": 0
57
+ },
58
+ "sampling_thinking": {
59
+ "temperature": 0.6,
60
+ "top_p": 0.95,
61
+ "top_k": 20,
62
+ "min_p": 0
63
+ }
64
+ },
65
+ "Qwen3-32B": {
66
+ "parameters": 32762123264,
67
+ "context": 32768,
68
+ "layers": 64,
69
+ "thinking": true,
70
+ "optional_thinking": true,
71
+ "system_message": "",
72
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0].role == 'system' %}\n {{- messages[0].content + '\\n\\n' }}\n {%- endif %}\n {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0].role == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}\n{%- for message in messages[::-1] %}\n {%- set index = (messages|length - 1) - loop.index0 %}\n {%- if ns.multi_step_tool and message.role == \"user\" and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}\n {%- set ns.multi_step_tool = false %}\n {%- set ns.last_query_index = index %}\n {%- endif %}\n{%- endfor %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {%- set content = message.content %}\n {%- set reasoning_content = '' %}\n {%- if message.reasoning_content is defined and message.reasoning_content is not none %}\n {%- set reasoning_content = message.reasoning_content %}\n {%- else %}\n {%- if '</think>' in message.content %}\n {%- set content = message.content.split('</think>')[-1].lstrip('\\n') %}\n {%- set reasoning_content = message.content.split('</think>')[0].rstrip('\\n').split('<think>')[-1].lstrip('\\n') %}\n {%- endif %}\n {%- endif %}\n {%- if loop.index0 > ns.last_query_index %}\n {%- if loop.last or (not loop.last and reasoning_content) %}\n {{- '<|im_start|>' + message.role + '\\n<think>\\n' + reasoning_content.strip('\\n') + '\\n</think>\\n\\n' + content.lstrip('\\n') }}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- else %}\n {{- '<|im_start|>' + message.role + '\\n' + content }}\n {%- endif %}\n {%- if message.tool_calls %}\n {%- for tool_call in message.tool_calls %}\n {%- if (loop.first and content) or (not loop.first) %}\n {{- '\\n' }}\n {%- endif %}\n {%- if tool_call.function %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {%- if tool_call.arguments is string %}\n {{- tool_call.arguments }}\n {%- else %}\n {{- tool_call.arguments | tojson }}\n {%- endif %}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {%- endif %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n {%- if enable_thinking is defined and enable_thinking is false %}\n {{- '<think>\\n\\n</think>\\n\\n' }}\n {%- endif %}\n{%- endif %}",
73
+ "sampling": {
74
+ "temperature": 0.7,
75
+ "top_p": 0.8,
76
+ "top_k": 20,
77
+ "min_p": 0
78
+ },
79
+ "sampling_thinking": {
80
+ "temperature": 0.6,
81
+ "top_p": 0.95,
82
+ "top_k": 20,
83
+ "min_p": 0
84
+ }
85
+ },
86
+ "Deepseek-R1-0528-Qwen3-8B": {
87
+ "parameters": 8190735360,
88
+ "context": 32768,
89
+ "layers": 36,
90
+ "thinking": true,
91
+ "optional_thinking": false,
92
+ "system_message": "该助手为DeepSeek-R1,由深度求索公司创造。",
93
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true, is_last_user=false) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + '\n\n' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{ bos_token }}{{ ns.system_prompt }}{%- for message in messages %}{% set content = message['content'] %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{%- set ns.is_first = false -%}{%- set ns.is_last_user = true -%}{{'<|User|>' + content + '<|Assistant|>'}}{%- endif %}{%- if message['role'] == 'assistant' %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{% endif %}{%- if message['role'] == 'assistant' and message['tool_calls'] is defined and message['tool_calls'] is not none %}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{%- endif %}{%- set ns.is_first = false %}{%- set ns.is_tool = false -%}{%- set ns.is_output_first = true %}{%- for tool in message['tool_calls'] %}{%- if not ns.is_first %}{%- if content is none %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- else %}{{content + '<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<|tool▁call▁end|>'}}{%- endif %}{%- endfor %}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{%- endif %}{%- if message['role'] == 'assistant' and (message['tool_calls'] is not defined or message['tool_calls'] is none)%}{%- set ns.is_last_user = false -%}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>' + content + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{%- else %}{{content + '<|end▁of▁sentence|>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_last_user = false -%}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + content + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<|tool▁output▁begin|>' + content + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_last_user and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}",
94
+ "sampling_thinking": {
95
+ "temperature": 0.6,
96
+ "top_p": 0.95,
97
+ "top_k": 20,
98
+ "min_p": 0
99
+ }
100
+ },
101
+ "Ministral-8B-Instruct-2410": {
102
+ "parameters": 8019808256,
103
+ "context": 32768,
104
+ "layers": 26,
105
+ "thinking": false,
106
+ "optional_thinking": false,
107
+ "system_message": "",
108
+ "chat_template": "{%- if messages[0][\"role\"] == \"system\" %}\n {%- set system_message = messages[0][\"content\"] %}\n {%- set loop_messages = messages[1:] %}\n{%- else %}\n {%- set loop_messages = messages %}\n{%- endif %}\n{%- if not tools is defined %}\n {%- set tools = none %}\n{%- endif %}\n{%- set user_messages = loop_messages | selectattr(\"role\", \"equalto\", \"user\") | list %}\n\n{#- This block checks for alternating user/assistant messages, skipping tool calling messages #}\n{%- set ns = namespace() %}\n{%- set ns.index = 0 %}\n{%- for message in loop_messages %}\n {%- if not (message.role == \"tool\" or message.role == \"tool_results\" or (message.tool_calls is defined and message.tool_calls is not none)) %}\n {%- if (message[\"role\"] == \"user\") != (ns.index % 2 == 0) %}\n {{- raise_exception(\"After the optional system message, conversation roles must alternate user/assistant/user/assistant/...\") }}\n {%- endif %}\n {%- set ns.index = ns.index + 1 %}\n {%- endif %}\n{%- endfor %}\n\n{{- bos_token }}\n{%- for message in loop_messages %}\n {%- if message[\"role\"] == \"user\" %}\n {%- if tools is not none and (message == user_messages[-1]) %}\n {{- \"[AVAILABLE_TOOLS][\" }}\n {%- for tool in tools %}\n {%- set tool = tool.function %}\n {{- '{\"type\": \"function\", \"function\": {' }}\n {%- for key, val in tool.items() if key != \"return\" %}\n {%- if val is string %}\n {{- '\"' + key + '\": \"' + val + '\"' }}\n {%- else %}\n {{- '\"' + key + '\": ' + val|tojson }}\n {%- endif %}\n {%- if not loop.last %}\n {{- \", \" }}\n {%- endif %}\n {%- endfor %}\n {{- \"}}\" }}\n {%- if not loop.last %}\n {{- \", \" }}\n {%- else %}\n {{- \"]\" }}\n {%- endif %}\n {%- endfor %}\n {{- \"[/AVAILABLE_TOOLS]\" }}\n {%- endif %}\n {%- if loop.last and system_message is defined %}\n {{- \"[INST]\" + system_message + \"\\n\\n\" + message[\"content\"] + \"[/INST]\" }}\n {%- else %}\n {{- \"[INST]\" + message[\"content\"] + \"[/INST]\" }}\n {%- endif %}\n {%- elif (message.tool_calls is defined and message.tool_calls is not none) %}\n {{- \"[TOOL_CALLS][\" }}\n {%- for tool_call in message.tool_calls %}\n {%- set out = tool_call.function|tojson %}\n {{- out[:-1] }}\n {%- if not tool_call.id is defined or tool_call.id|length != 9 %}\n {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n {%- endif %}\n {{- ', \"id\": \"' + tool_call.id + '\"}' }}\n {%- if not loop.last %}\n {{- \", \" }}\n {%- else %}\n {{- \"]\" + eos_token }}\n {%- endif %}\n {%- endfor %}\n {%- elif message[\"role\"] == \"assistant\" %}\n {{- message[\"content\"] + eos_token}}\n {%- elif message[\"role\"] == \"tool_results\" or message[\"role\"] == \"tool\" %}\n {%- if message.content is defined and message.content.content is defined %}\n {%- set content = message.content.content %}\n {%- else %}\n {%- set content = message.content %}\n {%- endif %}\n {{- '[TOOL_RESULTS]{\"content\": ' + content|string + \", \" }}\n {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}\n {{- raise_exception(\"Tool call IDs should be alphanumeric strings with length 9!\") }}\n {%- endif %}\n {{- '\"call_id\": \"' + message.tool_call_id + '\"}[/TOOL_RESULTS]' }}\n {%- else %}\n {{- raise_exception(\"Only user and assistant roles are supported, with the exception of an initial optional system message!\") }}\n {%- endif %}\n{%- endfor %}\n",
109
+ "sampling": {
110
+ "temperature": 0.5,
111
+ "top_p": 0.9,
112
+ "top_k": 20,
113
+ "min_p": 0
114
+ }
115
+ }
116
+ }
input_example.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ Jetzt ist ja Klima ein grosses Thema. Sie haben sich bisher nicht gross geäussert. Jetzt ist eine Strategie in Zürich rausgekommen, man will die rot-grüne Klimapanik bekämpfen. Wie stehen Sie jetzt zu diesem Thema? Wie stehen Sie zu diesen Demonstrationen von all diesen Kindern und Schülern, die wöchentlich auf die Strasse gehen? Die, die auf die Strasse gehen und sagen, wir sind für ein gesundes Klima, da kann ich niemandem dagegen sein. Das ist ja schön, wenn sie das machen. Aber was dahinter steht, politisch bei den Grünen, das sind Sachen, die ganz verwerflich sind. Die wollen jetzt das lösen, mehr Eingriffe vom Staat, mehr Steuern, Abgaben, Gebühren, 20 ApB fürs Benzin, Heizkosten bis zu 1400 Franken pro Haushalt. Dann, was wir sollen essen und nicht sollen essen und wie wir sollen essen und wie wir sollen leben und wohnen und wie gross die Wohnungen sind, das hört nicht mehr auf. Gegen das sind wir massiv. Das ist ein Eingriff in die Freiheit und bringt dem Klima schlussendlich gar nichts. Was würde Ihrer Meinung nach etwas bringen, um den Klimawandel zu stoppen? Weiterfahren mit dem Programm, das wir jetzt haben. Wir müssen mal schauen, was wir schon alles haben. Wir haben die ganze Gewässerverschmutzung unter Kontrolle. Wir haben saubere Gewässer, sagen mir die Fischer, oder? Weil wir sogar das Meteowasser reinigen. Wir haben Rauchgasreinigung. Wir haben beim Auto Abgasvorschriften eingeführt. Wir gehen noch weiter mit dem. Muss ja technologisch möglich sein. Und dann die Innovation nicht vom Staat fördern, sondern schauen, dass die Privaten etwas machen. Wer hat ein Elektroauto? Wer hat ein Hybrid gemacht? Nicht der Staat und auch nicht die Grünen, sondern die Autoindustrie, die Wirtschaft hat das entwickelt. Kommt mit dem. Ich höre, sie arbeiten schon an den Flugzeugen. Sie wollen abgasfreie Flugzeuge. Auf der Lärmseite höre ich, das sind doch Massnahmen, die wir treffen müssen. Wir treffen sie nicht, weil das Klima sonst kaputt geht, sondern weil wir saubere Luft, reines Wasser, gesunden Boden wollen. Das war immer das Programm und bei dem müssen wir bleiben. Und dann wird es gut kommen. Kann man denn mit einer Klima-Offensive bei der SVP rechnen? Weil bisher hat man eher nur gehört, auch gerade von Herr Köppel, die Politik der anderen, die Klimapolitik der anderen, lehnen wir ab. Aber einen Vorschlag von ihnen haben wir noch nicht. Wir brauchen keine. Wir machen schon alles. Das habe ich gerade eben gesagt. Das müssen wir weiterführen. Das ist aber theoretisch nichts machen. Diese Strategie hat ja nicht wirklich ... Nichts falsches machen. Sie möchten gerne, dass wir die falschen Massnahmen machen. Jetzt wollen wir mehr Lenkungsabgaben. Jetzt wollen wir mehr für das Benzin. Das haben die in den Städten, die das Tram vor der Tür haben, die können das gut sagen. Die Leute der Agglomeration, die in die Stadt möchten, wo Züge und alles verstopft ist. Bei der Zuwanderung, die der Hauptgrund ist, machen die nichts. Da sind wir in der Offensive. Jetzt kommt ja die Begrenzungsinitiative. von Nau.ch
main.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys
2
+ from pathlib import Path
3
+ sys.path.append(Path(__file__).parent.absolute() + '/util')
4
+ sys.path.append(Path(__file__).parent.absolute() + '/sentence_splitter')
5
+
6
+ import chromadb
7
+ from util.llm import LLaMaCPP
8
+ from os.path import exists
9
+ from json import load as json_load
10
+ from time import sleep
11
+ from sentence_splitter import split # noqa
12
+
13
+
14
+ MAX_DIFFERENCE = 1.3
15
+ MAX_DB_RESULTS = 10
16
+ with open('prompt.md', 'r', encoding='utf-8') as _f:
17
+ PROMPT = _f.read()
18
+ GBNF_TEMPLATE = """
19
+ root ::= "```python\\n[" list "]\\n```"
20
+ list ::= %%
21
+ """
22
+ GBNF_TEMPLATE_ITEM = '("\'%%\'")?'
23
+ GBNF_SEPARATOR = ' (", ")? '
24
+
25
+
26
+ def db_read(texts: list[str]):
27
+ """
28
+ Get results from ChromaDB based on vector similarity
29
+ :param texts: a list of strings to search for
30
+ :return: Query results directly from ChromaDB
31
+ """
32
+ client = chromadb.PersistentClient(path=Path(__file__).resolve().parent.parent.absolute().__str__() + '/data/database.chroma')
33
+ collection = client.get_collection(name='PolitScanner')
34
+ return collection.query(query_texts=texts, n_results=MAX_DB_RESULTS)
35
+
36
+
37
+ def process(sentences: list, llm: LLaMaCPP) -> list:
38
+ """
39
+ Check the given sentences for topics
40
+ :param sentences: a list of sentences as strings
41
+ :param llm: LLaMaCPP instance with a loaded model (PolitScanner fine-tune preferred)
42
+ :return: a list of topics
43
+ """
44
+ db_results = db_read(sentences)
45
+ print(db_results)
46
+ if len(db_results['ids'][0]) == 0:
47
+ return []
48
+ topic_ids = []
49
+ # check if the results are below a certain threshold
50
+ for i, result in enumerate(db_results['ids'][0]):
51
+ if db_results['distances'][0][i] < MAX_DIFFERENCE:
52
+ id_ = result.split('-')[0]
53
+ if id_ not in topic_ids:
54
+ topic_ids.append(id_)
55
+ if len(topic_ids) == 0:
56
+ return []
57
+ # if there is only one topic, add 'menschengemachter Klimawandel' in order for the prompt template to make sense
58
+ if len(topic_ids) == 1 and topic_ids[0] != '0':
59
+ topic_ids.append('0')
60
+ topics = []
61
+ titles = {}
62
+ # Load the information about the relevant topics
63
+ for topic_id in topic_ids:
64
+ with open(Path(__file__).resolve().parent.parent.absolute().__str__() + f"/data/parsed/{topic_id}.json", 'r') as f:
65
+ topics.append(json_load(f))
66
+ titles[topics[-1]['topic']] = len(topics) - 1
67
+ formatted_topics = ''
68
+ titles_list = list(titles.keys())
69
+ titles_list.sort()
70
+ items = []
71
+ # create the gbnf on the fly
72
+ for title in titles_list:
73
+ items.append(GBNF_TEMPLATE_ITEM.replace('%%', title))
74
+ grammar = GBNF_TEMPLATE.replace('%%', GBNF_SEPARATOR.join(items))
75
+ topics.sort(key=lambda x: x['topic'])
76
+ for topic in topics:
77
+ if len(formatted_topics) > 0:
78
+ formatted_topics += '\n'
79
+ formatted_topics += f"'{topic['topic']}'"
80
+ # create the prompt
81
+ prompt = PROMPT.replace('{TOPICS}', formatted_topics)
82
+ for i, sentence in enumerate(sentences):
83
+ prompt = prompt.replace('{' + f'SENTENCE_{i+1}' + '}', sentence)
84
+ # conversation template for Qwen3
85
+ prompt = f"<|im_start|>user\n{prompt}\n/no_think\n<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n"
86
+ print(prompt)
87
+ output = llm.generate(prompt, enable_thinking=False, grammar=grammar, temperature=0.0)
88
+ print(output)
89
+ # extract the results
90
+ output = output.split('[')[-1].split(']')[0]
91
+ truths = []
92
+ for title in titles_list:
93
+ if title in output:
94
+ truths.append(topics[titles[title]]['fact']) # noqa
95
+ return truths
96
+
97
+
98
+ def main() -> None:
99
+ """
100
+ Check the `input.txt` file for topics and return the results in `output.txt`
101
+ :return: None
102
+ """
103
+ if not exists('input.txt'):
104
+ raise FileNotFoundError('input.txt not found')
105
+ with open('input.txt', 'r') as f:
106
+ text = f.read()
107
+ # Select the Large Language Model
108
+ llm = LLaMaCPP()
109
+ if exists('/opt/llms/Qwen3-1.7B-PolitScanner-Q5_K_S.gguf'):
110
+ llm.set_model('Qwen3-1.7B-PolitScanner-Q5_K_S.gguf')
111
+ else:
112
+ llm.set_model('Qwen3-30B-A3B-Q5_K_M.gguf')
113
+ # Split the file into sentences
114
+ sentences = split(text)
115
+ print(f"{len(sentences)=}")
116
+ chunked_sentences = []
117
+ # Create overlapping chunks of 3 sentences (plus two sentences of context)
118
+ for i in range(0, len(sentences), 3):
119
+ if i == 0:
120
+ chunk2 = ['EMPTY'] + sentences[:4]
121
+ elif i + 3 >= len(sentences):
122
+ chunk2 = sentences[-5:-1] + ['EMPTY']
123
+ else:
124
+ chunk2 = sentences[i - 1:i + 4]
125
+ chunked_sentences.append(chunk2)
126
+ print(f"{len(chunked_sentences)=}")
127
+ llm.load_model(print_log=True, threads=16, kv_cache_type='q8_0', context=8192)
128
+ while llm.is_loading() or not llm.is_running():
129
+ sleep(1)
130
+ with open('output.txt', 'w', encoding='utf-8') as f:
131
+ # Process the chunks
132
+ for chunked_sentences2 in chunked_sentences:
133
+ truths = process(chunked_sentences2, llm)
134
+ for truth in truths:
135
+ f.write(f" # Hinweis: {truth}\n")
136
+ for i, sentence in enumerate(chunked_sentences2):
137
+ if i in range(1, 4):
138
+ f.write(f"{sentence}\n")
139
+ f.write('\n')
140
+ print('REACHED `llm.stop()`')
141
+ llm.stop()
142
+
143
+
144
+ if __name__ == '__main__':
145
+ main()
politscanner.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f65020224511c29aa93fae4e67d9975073af8e359c5d6a5db213ecc34005efc
3
+ size 239194
prompt.md ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Hier ist eine Liste von Themen.
2
+ ```text
3
+ {TOPICS}
4
+ ```
5
+ Hier sind die 5 Sätze:
6
+ ```text
7
+ 1: {SENTENCE_1}
8
+ 2: {SENTENCE_2}
9
+ 3: {SENTENCE_3}
10
+ 4: {SENTENCE_4}
11
+ 5: {SENTENCE_5}
12
+ ```
13
+ Erstelle eine python-Liste mit den Themen, die in den 5 Sätzen vorkommen. Häufig ist die leere Liste `[]` die richtige Antwort.