GodsDevProject commited on
Commit
8abbf13
Β·
verified Β·
1 Parent(s): a678b2e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -110
README.md CHANGED
@@ -2,176 +2,167 @@
2
  title: Federal FOIA Intelligence Search
3
  emoji: πŸ›οΈ
4
  colorFrom: blue
5
- colorTo: indigo
6
  sdk: gradio
7
- sdk_version: 6.3.0
8
  app_file: app.py
9
  pinned: true
10
- tags:
11
- - foia
12
- - government-transparency
13
- - public-records
14
- - journalism
15
- - open-data
16
- - legal-tech
17
- - oversight
18
  license: mit
19
- short_description: 'FOIA INTELLIGENCE SEARCH '
20
  ---
21
 
22
  # Federal FOIA Intelligence Search
 
23
 
24
- **Public Electronic Reading Rooms Only**
25
 
26
- This Hugging Face Space provides a federated search and analysis interface for
27
- **publicly released U.S. Government records made available under the Freedom of Information Act (FOIA)**.
28
-
29
- It is designed for **journalists, researchers, legal professionals, oversight bodies, and the public**.
30
 
31
  ---
32
 
33
- ## πŸ” What This Space Does
34
 
35
- - Searches **only publicly accessible FOIA Electronic Reading Rooms**
36
- - Uses **official agency search endpoints or public landing pages**
37
- - Enforces **robots.txt, rate limiting, and safe defaults**
38
- - Provides **semantic search, clustering, and visualization**
39
- - Generates **court-ready citations and FOIA request packets**
 
40
 
41
  ---
42
 
43
- ## πŸ›οΈ Live Public Sources
44
-
45
- This Space currently supports live querying or indexed access to the following **public FOIA repositories**:
46
 
47
- - **CIA FOIA Electronic Reading Room**
48
- - **FBI Vault**
49
- - **NSA FOIA Library**
50
- - **Department of Defense FOIA Reading Room**
51
- - **National Reconnaissance Office (NRO) Declassified Releases**
52
- - **Department of Justice (DOJ) FOIA Library**
53
- - **Department of Homeland Security (DHS) FOIA Library**
54
- - **U.S. Department of State FOIA Reading Room**
55
-
56
- All access is:
57
- - Public
58
- - Unauthenticated
59
- - Non-privileged
60
- - Read-only
61
 
62
  ---
63
 
64
- ## πŸ“‚ Hosted Public Collections (Clearly Labeled)
65
 
66
- Some historically named programs or collections (e.g. **AATIP**, **Special Access Programs**, **Special Activities**) do **not** operate independent FOIA portals.
67
 
68
- When these appear in the interface, they are:
69
-
70
- - Explicitly labeled as **Hosted Public Releases**
71
- - Linked to the **original publishing agency**
72
- - Limited to **already-declassified, publicly released documents**
73
- - Included for **historical research and transparency**
74
 
75
- No restricted or classified systems are accessed.
 
 
 
76
 
77
  ---
78
 
79
- ## πŸ“Š Analytics & Visualization Features
80
 
81
- - **Real-time agency coverage heatmap** (documents per agency)
82
- - **Latency & health indicators** per source
83
- - **Interactive semantic cluster graph** (Plotly)
84
- - **Timeline views** for document release dates
85
- - **Result deduplication and clustering**
86
 
87
- ---
88
 
89
- ## 🧠 Semantic Search
90
 
91
- - Uses **sentence-transformers + FAISS**
92
- - Supports:
93
- - Semantic clustering
94
- - Search-within-results
95
- - Topic grouping
96
- - No model training on classified or private data
 
 
 
97
 
98
  ---
99
 
100
- ## 🧾 Legal & Journalistic Tools
101
 
102
- - **Court-ready Bluebook citation PDF export**
103
- - **Journalist ZIP export** (documents + index)
104
- - **FOIA request packet generator** (PDF / text)
105
- - **FOIA exemption (b-code) labeling** (heuristic, informational only)
106
 
107
- > ⚠️ FOIA requests are **generated only**.
108
- > This Space does **not** submit requests on behalf of users.
 
 
 
 
109
 
110
  ---
111
 
112
- ## βš–οΈ What This Space Does NOT Do
113
 
114
- ❌ No classified access
115
- ❌ No scraping behind authentication
116
- ❌ No bypassing agency safeguards
117
- ❌ No intelligence analysis or inference
118
- ❌ No automated FOIA submissions
119
- ❌ No user tracking beyond standard HF analytics
120
 
121
- ---
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
- ## πŸ“œ Legal Basis
124
 
125
- All content accessed through:
126
 
127
- - **5 U.S.C. Β§ 552 (Freedom of Information Act)**
128
- - Agency-maintained **Electronic Reading Rooms**
129
- - Public U.S. Government websites
 
 
130
 
131
- This Space operates entirely within:
132
- - U.S. law
133
- - Agency publication rules
134
- - Hugging Face platform policies
 
135
 
136
  ---
137
 
138
- ## πŸ›‘οΈ Safety & Governance
139
 
140
- - Robots.txt enforcement per adapter
141
- - Per-agency rate limits
142
- - Per-agency kill switches
143
- - Health monitoring & auto-disable
144
- - Adapter compliance tests (CI)
145
 
146
  ---
147
 
148
- ## 🎯 Intended Use
149
-
150
- This project is intended to support:
151
 
152
- - Investigative journalism
153
- - Academic and historical research
154
- - Legal review and litigation support
155
- - Government oversight
156
- - Public transparency initiatives
157
 
158
  ---
159
 
160
- ## πŸ“¬ Disclaimer
161
-
162
- This tool aggregates and analyzes **public information only**.
163
- It does not provide legal advice, intelligence assessments, or official interpretations.
164
 
165
- Users are responsible for verifying primary sources.
 
 
 
 
 
166
 
167
  ---
168
 
169
- ## πŸ“¦ Open Source & Transparency
170
 
171
- All adapters, logic, and safety mechanisms are visible in the source code.
172
- No hidden data sources or privileged access exist.
173
 
174
  ---
175
 
176
- **Federal FOIA Intelligence Search**
177
- *Making public records easier to find, understand, and cite.*
 
2
  title: Federal FOIA Intelligence Search
3
  emoji: πŸ›οΈ
4
  colorFrom: blue
5
+ colorTo: gray
6
  sdk: gradio
7
+ sdk_version: 4.36.0
8
  app_file: app.py
9
  pinned: true
 
 
 
 
 
 
 
 
10
  license: mit
11
+ short_description: 'FOIA DECLASSIFIED DOCUMENTS SEARCH '
12
  ---
13
 
14
  # Federal FOIA Intelligence Search
15
+ ### Public Electronic Reading Rooms Only
16
 
17
+ A **live, robots-compliant search application** for discovering publicly released U.S. government documents available in **FOIA Electronic Reading Rooms**.
18
 
19
+ This Space is designed for **journalists, researchers, historians, attorneys, and the general public** to explore already-public government records β€” **not to access classified, restricted, or non-public information**.
 
 
 
20
 
21
  ---
22
 
23
+ ## βœ… What This App Does
24
 
25
+ - πŸ” Searches **public FOIA Electronic Reading Rooms**
26
+ - 🌐 Uses **live queries** only where automated access is explicitly permitted
27
+ - πŸ€– Enforces **robots.txt compliance per agency**
28
+ - 🧾 Clearly distinguishes **live results vs. stub (non-live) sources**
29
+ - πŸ“¦ Allows export of results for research and reporting
30
+ - βš–οΈ Designed to comply with **FOIA, HF policies, and U.S. law**
31
 
32
  ---
33
 
34
+ ## 🚫 What This App Does NOT Do
 
 
35
 
36
+ - ❌ No scraping of restricted or protected systems
37
+ - ❌ No bypassing of authentication, paywalls, or CAPTCHAs
38
+ - ❌ No access to classified, controlled, or non-public data
39
+ - ❌ No querying of intelligence systems or operational databases
40
+ - ❌ No real-time surveillance or monitoring
 
 
 
 
 
 
 
 
 
41
 
42
  ---
43
 
44
+ ## 🟒 Live Sources (Queried in Real Time)
45
 
46
+ These agencies provide **public FOIA search endpoints** and explicitly permit automated access:
47
 
48
+ - **CIA** – FOIA Electronic Reading Room
49
+ - **FBI Vault**
50
+ - **Department of Justice (DOJ)** – FOIA Library
51
+ - **Department of Homeland Security (DHS)** – FOIA Library
52
+ - **U.S. Department of State** – FOIA Search
53
+ - **General Services Administration (GSA)** – FOIA Library
54
 
55
+ Live sources are:
56
+ - Rate-limited
57
+ - Queried read-only
58
+ - Checked against robots.txt before each use
59
 
60
  ---
61
 
62
+ ## 🟑 Stub-Only Sources (Transparency Mode)
63
 
64
+ Some agencies **do not permit automated querying**, **block access via robots.txt**, or **do not provide a public FOIA search endpoint**.
 
 
 
 
65
 
66
+ These are represented as **clearly labeled stub adapters** for transparency only.
67
 
68
+ Stub sources **DO NOT perform live queries**.
69
 
70
+ ### Stub Sources Include:
71
+ - NSA
72
+ - NRO
73
+ - DIA
74
+ - NGA
75
+ - Special Access Programs (SAP)
76
+ - TEN-CAP
77
+ - AATIP
78
+ - Special Activities
79
 
80
  ---
81
 
82
+ ## ⚠️ Extended Features (User Warning)
83
 
84
+ An optional **Extended Features** mode allows users to include **stub-only results**.
 
 
 
85
 
86
+ Before enabling, users must explicitly acknowledge that:
87
+ - No live queries will be performed
88
+ - Results are informational only
89
+ - Some agencies are restricted by law or policy
90
+
91
+ This design prevents misuse while maintaining public-interest transparency.
92
 
93
  ---
94
 
95
+ ## πŸ›‘οΈ Trust, Safety & Compliance
96
 
97
+ This Space was designed with the following safeguards:
 
 
 
 
 
98
 
99
+ - βœ… **robots.txt enforcement** per adapter
100
+ - βœ… **No authentication or credential use**
101
+ - βœ… **Public-domain / public-release content only**
102
+ - βœ… **Clear labeling of non-live sources**
103
+ - βœ… **User acknowledgment for extended features**
104
+ - βœ… **Read-only access**
105
+ - βœ… **No data retention or tracking**
106
+
107
+ The application complies with:
108
+ - U.S. Freedom of Information Act (FOIA)
109
+ - Hugging Face Spaces policies
110
+ - Standard web crawling norms
111
+ - Public-interest research standards
112
 
113
+ ---
114
 
115
+ ## πŸ“¦ Export & Research Use
116
 
117
+ Users may export search results as a ZIP archive for:
118
+ - Journalism
119
+ - Academic research
120
+ - Legal analysis
121
+ - Historical archiving
122
 
123
+ Exports include:
124
+ - Source agency
125
+ - Document title
126
+ - Public URL
127
+ - Snippet / context
128
 
129
  ---
130
 
131
+ ## πŸ‘©β€βš–οΈ Legal & Policy Notes
132
 
133
+ - This tool does **not replace formal FOIA requests**
134
+ - It only indexes **already-public disclosures**
135
+ - For non-public records, users should submit FOIA requests directly to agencies
136
+ - Inclusion of an agency name does **not imply live access**
 
137
 
138
  ---
139
 
140
+ ## πŸ§ͺ Technical Overview
 
 
141
 
142
+ - **Frontend:** Gradio (HF Spaces)
143
+ - **Architecture:** Async federated adapters
144
+ - **Safety:** Per-adapter robots enforcement
145
+ - **Design:** Explicit live vs stub separation
146
+ - **Deployment:** Single `app.py` entrypoint
147
 
148
  ---
149
 
150
+ ## 🧭 Intended Audience
 
 
 
151
 
152
+ - Journalists
153
+ - Researchers
154
+ - Policy analysts
155
+ - Attorneys
156
+ - Historians
157
+ - Members of the public seeking transparency
158
 
159
  ---
160
 
161
+ ## πŸ“¬ Contact / Issues
162
 
163
+ This Space is an open, public-interest research tool.
164
+ For issues, suggestions, or compliance questions, please use the Hugging Face discussion tab.
165
 
166
  ---
167
 
168
+ ### πŸ›οΈ Transparency First. Public Records Only.