GodsDevProject commited on
Commit
1a57863
Β·
verified Β·
1 Parent(s): f768e52

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +168 -93
README.md CHANGED
@@ -1,152 +1,227 @@
1
  ---
2
- license: mit
3
- title: 'FOIA DECLASSIFIED DOCUMENTS SEARCH '
 
 
4
  sdk: gradio
5
- colorFrom: purple
6
- colorTo: gray
7
  pinned: true
 
8
  short_description: 'FOIA DECLASSIFIED DOCUMENTS SEARCH '
9
- sdk_version: 6.3.0
10
  ---
11
 
12
  # πŸ›οΈ Federal FOIA Intelligence Search
13
- ### Public Electronic Reading Rooms Only
14
 
15
- A **live federated search application** for discovering documents published in **public U.S. Government FOIA Electronic Reading Rooms**.
16
 
17
- This application **does not scrape**, **does not bypass access controls**, and **does not access classified, restricted, or non-public systems**.
 
 
 
18
 
19
  ---
20
 
21
- ## πŸ” What This App Does
 
 
 
22
 
23
- - Searches **publicly available FOIA libraries**
24
- - Aggregates results across multiple agencies
25
- - Clearly distinguishes:
26
- - 🟒 **Live public sources**
27
- - πŸ”’ **Stub / informational coverage**
28
- - Provides transparency tooling for journalists, researchers, and the public
 
 
 
 
29
 
30
  ---
31
 
32
- ## πŸ›‘οΈ Compliance & Safety Guarantees
 
 
 
 
 
 
 
 
33
 
34
- - βœ… Public Electronic Reading Rooms only
35
- - βœ… Honors robots.txt per adapter
36
- - βœ… No authentication, credentials, or scraping
37
- - βœ… No training, inference, or ML on restricted data
38
- - βœ… Stub results **cannot be exported**
39
- - βœ… All exported documents link to public URLs
40
 
41
- > **Stub results are informational and cannot be exported.**
42
 
43
  ---
44
 
45
- ## 🧠 Search Modes
 
46
 
47
- ### Standard Mode (Default)
48
- - Live public FOIA reading rooms
49
- - Safe for export
50
- - No experimental features
 
 
51
 
52
- ### Extended Coverage Mode (Opt-In)
53
- Some agencies publish material inconsistently or restrict automation.
54
 
55
- Extended mode:
56
- - Requires user acknowledgment
57
- - Clearly labels blocked or stubbed agencies
58
- - Never exports restricted results
59
 
60
- Agencies currently marked as **blocked or partial**:
61
- - DIA
62
- - NGA
63
 
64
- ---
 
 
 
 
 
 
 
 
65
 
66
- ## 🧾 Export Rules
67
 
68
- - ZIP export is enabled **only when live results are present**
69
- - Stub results are excluded automatically
70
- - All exports are traceable to public URLs
 
 
 
71
 
72
  ---
73
 
74
- ## πŸ“Š Built-In Transparency Tools
 
 
 
 
75
 
76
- - πŸ“Š **Agency coverage heatmap**
77
- - ⏱️ **Per-agency latency & health badges**
78
- - πŸ•’ **Release timeline (by publication date)**
79
- - 🌐 **Agency discovery status**
80
- - 🧾 **Court-ready citation formatting (public documents only)**
81
 
82
  ---
83
 
84
- ## βš–οΈ Phase-3 Expansion Pack (Planned)
 
 
 
 
85
 
86
- These features are **architectural extensions** and are **not active by default**.
 
 
 
 
87
 
88
- ### 🧾 Court Tools
89
- - Litigation appendix generator (PDF)
90
- - Exhibit numbering (A-1, A-2…)
91
- - Declaration-ready citation blocks
92
- - FOIA exemption (b-code) frequency charts
93
 
94
- ### πŸ“° Journalism Tools
95
- - Timeline narrative builder
96
- - Source confidence tags
97
- - β€œWhat’s missing” agency gap analysis
98
- - Redaction density metrics
99
 
100
- ### βš–οΈ Compliance Controls
101
- - Export locked to live results only
102
- - Every document traceable to public URL
103
- - Stub data never enters PDFs
104
- - Optional disclosure watermarking
105
 
106
- ### 🧠 Advanced (Opt-In)
107
- - Semantic clustering by topic (post-retrieval only)
108
- - Cross-agency entity graphs
109
- - FOIA response-time benchmarking
110
 
111
- > No Phase-3 feature enables access to non-public data.
 
 
 
 
 
112
 
113
  ---
114
 
115
- ## πŸ“œ Why This Does Not Violate Intelligence Restrictions
116
 
117
- - The app queries **only public-facing FOIA libraries**
118
- - No inference is performed on classified material
119
- - No automation targets restricted systems
120
- - All content is already published by the agencies themselves
121
- - The app functions as a **search index**, not a data broker
122
 
123
- ---
 
 
 
 
 
 
124
 
125
- ## πŸ§ͺ Testing & Reliability
 
 
 
126
 
127
- - Adapter compliance tests ensure:
128
- - Public access only
129
- - Robots.txt compliance
130
- - Required metadata fields
131
- - Health checks run without persistence or tracking
132
 
133
  ---
134
 
135
- ## πŸš€ Intended Users
 
 
 
 
 
 
 
 
 
 
 
136
 
137
- - Journalists
138
- - Researchers
139
- - Attorneys
140
- - Transparency advocates
141
- - Members of the public
 
142
 
143
  ---
144
 
145
- ## πŸ“Œ Disclaimer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
146
 
147
- This application is **not affiliated with any U.S. government agency**.
148
- All documents remain the property of their originating agencies.
 
 
 
149
 
150
  ---
151
 
152
- **Public transparency. Public sources. Clear boundaries.**
 
 
 
1
  ---
2
+ title: Federal FOIA Intelligence Search
3
+ emoji: πŸ›οΈ
4
+ colorFrom: blue
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 4.44.1
8
+ app_file: app.py
9
  pinned: true
10
+ license: mit
11
  short_description: 'FOIA DECLASSIFIED DOCUMENTS SEARCH '
 
12
  ---
13
 
14
  # πŸ›οΈ Federal FOIA Intelligence Search
 
15
 
16
+ **Public FOIA Electronic Reading Rooms β€’ Link-Out Only β€’ Court-Aware**
17
 
18
+ A Hugging Face–hosted research, journalism, and legal-support tool for discovering, organizing, and exporting **public U.S. government FOIA materials** from official electronic reading rooms.
19
+
20
+ This application **does not scrape, host, or redistribute documents**.
21
+ All results are **direct links to official government FOIA libraries**.
22
 
23
  ---
24
 
25
+ ## πŸ” Core Capabilities
26
+
27
+ ### βœ” Federated FOIA Search (LIVE)
28
+ Search across multiple U.S. government FOIA libraries simultaneously.
29
 
30
+ **LIVE Agencies (Public & Safe):**
31
+ - CIA β€” FOIA Electronic Reading Room
32
+ - FBI β€” The Vault
33
+ - DOJ β€” FOIA Library
34
+ - DHS β€” FOIA Reading Room
35
+ - State Department β€” FOIA Search
36
+ - GSA β€” FOIA Library
37
+ - NSA β€” FOIA Reading Room
38
+
39
+ All results link directly to official government sources.
40
 
41
  ---
42
 
43
+ ### πŸ§ͺ Extended Coverage (Clearly Labeled STUBS)
44
+ Optional **non-exportable indicators** for agencies where automated access may be restricted or ambiguous.
45
+
46
+ - DIA
47
+ - NGA
48
+ - NRO
49
+ - TEN-CAP
50
+ - AATIP
51
+ - SAP / Special Activities
52
 
53
+ **STUB results:**
54
+ - ❌ No URLs
55
+ - ❌ No exports
56
+ - ❌ No PDFs
57
+ - ❌ No citations
 
58
 
59
+ This separation is a **deliberate compliance safeguard**.
60
 
61
  ---
62
 
63
+ ### πŸ“„ PDF Thumbnail Gallery
64
+ For results that link directly to `.pdf` files:
65
 
66
+ - Inline iframe preview (HF-safe)
67
+ - Action buttons:
68
+ - **View**
69
+ - **Download**
70
+ - **Share** (device-native share API if supported)
71
+ - **Ask AI** (safe placeholder β€” no ingestion)
72
 
73
+ > PDFs are never downloaded, cached, or stored by the app.
 
74
 
75
+ ---
 
 
 
76
 
77
+ ### πŸ—‚ Journalist ZIP Export (LIVE Only)
78
+ Generates a ZIP package for editorial or investigative workflows.
 
79
 
80
+ **Contents:**
81
+ - `README.txt` β€” scope & disclaimer
82
+ - `citations.txt` β€” Bluebook-ready citations
83
+ - `links.csv` β€” agency, title, URL, timestamp
84
+ - `pdf_links.txt` β€” direct PDF URLs (no files)
85
+
86
+ βœ” Public sources only
87
+ βœ” No redistribution
88
+ βœ” No STUB data
89
 
90
+ ---
91
 
92
+ ### 🌐 Public Shareable Result Pages
93
+ Generate static, shareable result summaries containing:
94
+ - Agency + title
95
+ - Direct source links
96
+ - Bluebook citations
97
+ - Citation hash for integrity verification
98
 
99
  ---
100
 
101
+ ### βš–οΈ Court-Aware Features
102
+ Each LIVE result includes:
103
+ - SHA-256 citation hash
104
+ - Bluebook citation
105
+ - Retrieval timestamp
106
 
107
+ Supports:
108
+ - Litigation appendices
109
+ - Exhibit preparation
110
+ - Evidentiary traceability
 
111
 
112
  ---
113
 
114
+ ### 🧠 Semantic Mode (Opt-In)
115
+ - FAISS + SentenceTransformers (optional)
116
+ - Metadata embeddings only
117
+ - Disabled by default
118
+ - Auto-disabled if dependencies unavailable
119
 
120
+ ---
121
+
122
+ ### πŸ“Š Visual Analytics
123
+ - Entity / domain frequency graphs
124
+ - Retrieval timeline charts
125
 
126
+ ---
 
 
 
 
127
 
128
+ ### πŸ“ FOIA Request Generator
129
+ Generate a fillable FOIA request PDF using:
130
+ - Requester name
131
+ - Description of records
132
+ - Agencies surfaced in LIVE results
133
 
134
+ ---
 
 
 
 
135
 
136
+ ## 🧱 What This App Does NOT Do
 
 
 
137
 
138
+ ❌ No scraping
139
+ ❌ No crawling behind authentication
140
+ ❌ No hosting or redistributing documents
141
+ ❌ No classified access
142
+ ❌ No private datasets
143
+ ❌ No surveillance or tracking
144
 
145
  ---
146
 
147
+ # πŸ›‘οΈ TRUST & SAFETY ADDENDUM (HF REVIEWERS)
148
 
149
+ ### Compliance Summary
150
+ This Space is intentionally designed to comply with:
151
+ - Hugging Face Spaces policies
152
+ - U.S. FOIA public-access norms
153
+ - Journalism ethics standards
154
 
155
+ ### Safety Controls
156
+ - **Link-out only** (no content ingestion)
157
+ - **Explicit STUB labeling**
158
+ - **Export gating (LIVE-only)**
159
+ - **No PDF storage**
160
+ - **No background crawling**
161
+ - **User-initiated queries only**
162
 
163
+ ### Risk Mitigation
164
+ - No robots.txt violations (manual user navigation)
165
+ - No automated retrieval of sensitive systems
166
+ - No claims of completeness or authority
167
 
168
+ This Space functions as a **research navigation and citation tool**, not a data collection system.
 
 
 
 
169
 
170
  ---
171
 
172
+ # βš–οΈ LEGAL REVIEW MEMO (NON-BINDING)
173
+
174
+ ### Scope
175
+ This application operates entirely within:
176
+ - Public FOIA Electronic Reading Rooms
177
+ - User-directed navigation
178
+
179
+ ### Key Legal Characteristics
180
+ - No republication of copyrighted works
181
+ - No derivative document creation
182
+ - No alteration of government records
183
+ - No implied agency endorsement
184
 
185
+ ### Litigation Safety
186
+ - Citation hashes provide integrity verification
187
+ - Source URLs remain authoritative
188
+ - Export artifacts include disclaimers
189
+
190
+ This tool supports lawful research and reporting, not legal conclusions.
191
 
192
  ---
193
 
194
+ # πŸ“° JOURNALIST ONBOARDING GUIDE
195
+
196
+ ### Typical Workflow
197
+ 1. Enter investigative topic
198
+ 2. Review LIVE agency coverage
199
+ 3. Examine PDF previews
200
+ 4. Export journalist ZIP
201
+ 5. Generate FOIA follow-up request
202
+ 6. Share result summary with editors
203
+
204
+ ### Best Practices
205
+ - Always open documents at the source
206
+ - Use citation hashes for verification
207
+ - Treat STUB indicators as leads only
208
+ - File FOIA requests directly with agencies
209
+
210
+ ### Ethical Use
211
+ - Attribute sources accurately
212
+ - Avoid implying classified access
213
+ - Verify context before publication
214
+
215
+ ---
216
 
217
+ ## ⚠️ Disclaimer
218
+ This tool:
219
+ - Is not affiliated with any U.S. government agency
220
+ - Does not guarantee completeness
221
+ - Does not provide legal advice
222
 
223
  ---
224
 
225
+ **Built for transparency.
226
+ Designed for accountability.
227
+ Safe by construction.**