File size: 11,533 Bytes
dffabb7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
#!/usr/bin/env python3
"""
Synthetic Data Generator for CPI Tender Matcher
Generates 40 tender documents across EN/FR as .txt files
Run: python generate_data.py
"""

import json
import random
import os
from datetime import datetime, timedelta

random.seed(42)

SECTORS = ["agritech", "healthtech", "cleantech", "edtech", "fintech", "wastetech"]
BUDGETS = [
    ("5,000", 5000),
    ("50,000", 50000),
    ("200,000", 200000),
    ("1,000,000", 1000000),
]
REGIONS = ["East Africa", "West Africa", "Central Africa", "Southern Africa"]
COUNTRIES = ["Rwanda", "Kenya", "Uganda", "Senegal", "DRC", "Ethiopia", "Tanzania", "Ghana", "Nigeria", "Cameroon"]
ORGS = [
    "African Development Bank", "USAID", "EU Delegation", "World Bank",
    "GIZ", "UNDP", "African Union", "Bill & Melinda Gates Foundation",
    "Mastercard Foundation", "Omidyar Network"
]

EN_TEMPLATES = [
    """GRANT OPPORTUNITY: {title}

Issuing Organization: {org}
Tender Reference: TND-{ref}
Sector: {sector}
Region: {region}
Eligible Countries: {countries}

OVERVIEW
{org} invites applications from qualified organizations for the {title}. This grant supports innovative solutions in the {sector} space across {region}.

BUDGET
Total available funding: USD {budget_str}
Maximum grant per applicant: USD {max_grant}

ELIGIBILITY
- Registered organizations operating in {region}
- Minimum {min_employees} full-time employees
- At least 1 year of operational history
- Prior funding experience preferred: {prior_funding}

OBJECTIVES
This tender aims to:
1. Accelerate {sector} innovation in underserved communities
2. Support scalable and sustainable business models
3. Foster cross-border collaboration in {region}
4. Promote gender inclusion and youth employment

APPLICATION REQUIREMENTS
Applicants must submit:
- Technical proposal (max 15 pages)
- Budget breakdown
- Organizational profile
- Letters of support from local partners

DEADLINE
Application deadline: {deadline}
Results announcement: {result_date}

CONTACT
For inquiries, contact: grants@{org_email}.org
Reference: {ref}
""",

    """FUNDING CALL: {title}

Reference Number: FC-{ref}
Funding Body: {org}
Focus Area: {sector}
Target Geography: {region}

BACKGROUND
Access to {sector} solutions remains limited across {region}. {org} is committed to bridging this gap through targeted grant support.

GRANT DETAILS
- Total envelope: USD {budget_str}
- Individual awards: up to USD {max_grant}
- Duration: 12–24 months

WHO CAN APPLY
Eligible applicants include:
• Social enterprises and cooperatives in {countries}
• NGOs with a proven track record in {sector}
• University spin-offs and research centres
• Minimum team size: {min_employees} employees

EVALUATION CRITERIA
Applications will be scored on:
- Innovation and scalability (30%)
- Impact on underserved populations (25%)
- Financial sustainability (20%)
- Team capability (15%)
- Regional relevance (10%)

KEY DATES
Submission deadline: {deadline}
Interview round: {result_date}

SUBMIT AT: apply.{org_email}.org/FC-{ref}
"""
]

FR_TEMPLATES = [
    """APPEL À CANDIDATURES : {title}

Organisme émetteur : {org}
Référence : TND-{ref}
Secteur : {sector}
Région : {region}
Pays éligibles : {countries}

PRÉSENTATION
{org} lance un appel à candidatures pour le {title}. Ce financement soutient des solutions innovantes dans le domaine {sector} à travers {region}.

BUDGET
Enveloppe totale disponible : USD {budget_str}
Subvention maximale par candidat : USD {max_grant}

ÉLIGIBILITÉ
- Organisations enregistrées opérant en {region}
- Au moins {min_employees} employés à temps plein
- Au moins 1 an d'existence
- Expérience de financement antérieure souhaitée : {prior_funding}

OBJECTIFS
Cet appel vise à :
1. Accélérer l'innovation {sector} dans les communautés mal desservies
2. Soutenir des modèles économiques évolutifs et durables
3. Favoriser la coopération transfrontalière en {region}
4. Promouvoir l'inclusion des femmes et l'emploi des jeunes

DOSSIER DE CANDIDATURE
Les candidats doivent soumettre :
- Proposition technique (15 pages max)
- Détail budgétaire
- Profil organisationnel
- Lettres de soutien de partenaires locaux

DATE LIMITE
Date de soumission : {deadline}
Annonce des résultats : {result_date}

CONTACT
Pour toute question : subventions@{org_email}.org
Référence : {ref}
""",

    """APPEL À PROJETS : {title}

Numéro de référence : AP-{ref}
Bailleur de fonds : {org}
Domaine prioritaire : {sector}
Zone géographique : {region}

CONTEXTE
L'accès aux solutions {sector} reste limité dans {region}. {org} s'engage à combler ce fossé grâce à un soutien ciblé.

DÉTAILS DU FINANCEMENT
- Enveloppe totale : USD {budget_str}
- Subventions individuelles : jusqu'à USD {max_grant}
- Durée : 12 à 24 mois

QUI PEUT CANDIDATER
Les candidats éligibles comprennent :
• Entreprises sociales et coopératives en {countries}
• ONG avec un historique prouvé dans {sector}
• Start-ups universitaires et centres de recherche
• Taille minimale de l'équipe : {min_employees} employés

CRITÈRES D'ÉVALUATION
Les dossiers seront notés sur :
- Innovation et capacité à l'échelle (30%)
- Impact sur les populations mal desservies (25%)
- Viabilité financière (20%)
- Compétences de l'équipe (15%)
- Pertinence régionale (10%)

CALENDRIER
Date limite de soumission : {deadline}
Entretiens : {result_date}

SOUMISSION : candidatures.{org_email}.org/AP-{ref}
"""
]

SECTOR_TITLES_EN = {
    "agritech": ["Digital Agriculture Innovation Grant", "Precision Farming Support Fund", "Smallholder AgriTech Scale-Up Grant", "Agricultural Digitization Challenge"],
    "healthtech": ["Rural Health Technology Grant", "Community Health Innovation Fund", "Digital Health Access Programme", "Telemedicine Expansion Grant"],
    "cleantech": ["Clean Energy Access Fund", "Renewable Energy Scale-Up Grant", "Green Technology Innovation Award", "Solar Solutions Deployment Grant"],
    "edtech": ["Digital Learning Innovation Fund", "EdTech for Inclusion Grant", "Offline Education Technology Grant", "Rural Digital Literacy Programme"],
    "fintech": ["Financial Inclusion Innovation Grant", "Digital Finance Scale-Up Fund", "Cooperative Finance Technology Grant", "Mobile Money Expansion Award"],
    "wastetech": ["Circular Economy Innovation Grant", "Waste-to-Value Technology Fund", "Sustainable Waste Management Grant", "Biogas and Composting Scale-Up"]
}

SECTOR_TITLES_FR = {
    "agritech": ["Subvention pour l'Innovation Agricole Numérique", "Fonds de Soutien à l'Agriculture de Précision", "Programme AgriTech pour Petits Exploitants"],
    "healthtech": ["Subvention Technologie Santé Rurale", "Fonds Innovation Santé Communautaire", "Programme de Télémédecine Rurale"],
    "cleantech": ["Fonds d'Accès à l'Énergie Propre", "Subvention Énergie Renouvelable", "Prix Innovation Technologie Verte"],
    "edtech": ["Fonds Innovation Apprentissage Numérique", "Subvention EdTech pour l'Inclusion", "Programme Éducation Hors-Ligne"],
    "fintech": ["Subvention Inclusion Financière", "Fonds Finance Numérique", "Programme Finance Coopérative Mobile"],
    "wastetech": ["Subvention Économie Circulaire", "Fonds Valorisation des Déchets", "Programme Biogaz et Compostage"]
}


def random_deadline(days_min=30, days_max=120):
    future = datetime.now() + timedelta(days=random.randint(days_min, days_max))
    return future.strftime("%d %B %Y")


def random_result_date(deadline_str):
    deadline = datetime.strptime(deadline_str, "%d %B %Y")
    result = deadline + timedelta(days=random.randint(30, 60))
    return result.strftime("%d %B %Y")


def generate_tender(tender_id, lang, sector, budget_tuple):
    budget_str, budget_val = budget_tuple
    max_grant = budget_val // 2
    is_fr = lang == "fr"

    if is_fr:
        title = random.choice(SECTOR_TITLES_FR[sector])
        template = random.choice(FR_TEMPLATES)
    else:
        title = random.choice(SECTOR_TITLES_EN[sector])
        template = random.choice(EN_TEMPLATES)

    org = random.choice(ORGS)
    region = random.choice(REGIONS)
    countries = ", ".join(random.sample(COUNTRIES, 3))
    min_employees = random.choice([3, 5, 10, 15])
    prior_funding = random.choice(["Not required", "Preferred", "Required"])
    deadline = random_deadline()
    result_date = random_result_date(deadline)
    org_email = org.lower().replace(" ", "").replace("&", "and")[:15]
    ref = f"{tender_id:03d}{random.randint(100,999)}"

    content = template.format(
        title=title,
        org=org,
        ref=ref,
        sector=sector,
        region=region,
        countries=countries,
        budget_str=budget_str,
        max_grant=f"{max_grant:,}",
        min_employees=min_employees,
        prior_funding=prior_funding,
        deadline=deadline,
        result_date=result_date,
        org_email=org_email
    )

    return {
        "id": f"T{tender_id:03d}",
        "title": title,
        "sector": sector,
        "budget": budget_val,
        "deadline": deadline,
        "region": region,
        "language": lang,
        "content": content
    }


def main():
    os.makedirs("data/tenders", exist_ok=True)
    tenders = []
    tender_id = 1

    # Generate 40 tenders: 60% EN, 40% FR
    # Ensure each sector has tenders in both languages
    plan = []
    for sector in SECTORS:
        for budget in BUDGETS[:2]:  # 2 budgets per sector = 12 EN
            plan.append(("en", sector, budget))
    for sector in SECTORS:
        for budget in BUDGETS[2:]:  # 2 budgets per sector = 12 FR ... adjust
            plan.append(("fr", sector, budget))
    # Add 16 more EN tenders for 60/40 split
    extras_en = []
    for sector in random.choices(SECTORS, k=8):
        extras_en.append(("en", sector, random.choice(BUDGETS)))
    extras_fr = []
    for sector in random.choices(SECTORS, k=4):
        extras_fr.append(("fr", sector, random.choice(BUDGETS)))

    plan = plan + extras_en + extras_fr
    random.shuffle(plan)
    plan = plan[:40]

    for lang, sector, budget in plan:
        tender = generate_tender(tender_id, lang, sector, budget)
        tenders.append(tender)
        fname = f"data/tenders/{tender['id']}_{lang}_{sector}.txt"
        with open(fname, "w", encoding="utf-8") as f:
            f.write(tender["content"])
        print(f"  Generated: {fname}")
        tender_id += 1

    # Save metadata
    meta = [{k: v for k, v in t.items() if k != "content"} for t in tenders]
    with open("data/tenders_meta.json", "w") as f:
        json.dump(meta, f, indent=2)

    # Generate gold_matches.csv (3 expert matches per profile)
    profiles = json.load(open("data/profiles.json"))
    gold_rows = ["profile_id,tender_id,rank"]

    sector_to_tenders = {}
    for t in tenders:
        sector_to_tenders.setdefault(t["sector"], []).append(t["id"])

    for p in profiles:
        sector = p["sector"]
        candidates = sector_to_tenders.get(sector, [])
        if len(candidates) < 3:
            # fallback: any tender
            candidates = [t["id"] for t in tenders]
        chosen = random.sample(candidates, min(3, len(candidates)))
        for rank, tid in enumerate(chosen, 1):
            gold_rows.append(f"{p['id']},{tid},{rank}")

    with open("data/gold_matches.csv", "w") as f:
        f.write("\n".join(gold_rows))

    print(f"\n✅ Generated {len(tenders)} tenders in data/tenders/")
    print(f"✅ Saved data/tenders_meta.json")
    print(f"✅ Saved data/gold_matches.csv")


if __name__ == "__main__":
    main()