ChakriYamasani commited on
Commit
f8bcebb
Β·
verified Β·
1 Parent(s): 4c43683

Create attached_assets/Pasted--Ancestral-Archive-Ancestral-Archive-is-a-multilingual-open-source-Streamlit-application-t-1752745833951_1752745833952.txt

Browse files
attached_assets/Pasted--Ancestral-Archive-Ancestral-Archive-is-a-multilingual-open-source-Streamlit-application-t-1752745833951_1752745833952.txt ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ“š Ancestral Archive
2
+
3
+ **Ancestral Archive** is a multilingual, open-source Streamlit application to **collect, preserve, and share ancestral Indian wisdom** β€” including home remedies, sustainable farming practices, spiritual rituals, folk stories, proverbs, and oral histories. Built with an **offline-first, low-bandwidth design**, it enables contributions from under-connected regions and generates a culturally rich corpus suitable for open-source AI research.
4
+
5
+ ---
6
+
7
+ ## 🌟 Why This Project?
8
+ Many forms of traditional knowledge are undocumented and at risk of disappearing. By making it *easy and rewarding* for people to contribute in their own language, Ancestral Archive becomes both:
9
+ - A community heritage project
10
+ - A **corpus collection engine** (aligned with the viswam.ai challenge) that captures diverse, real-world Indian language data
11
+
12
+ ---
13
+
14
+ ## ✨ Core MVP Features (Week 1 Goal)
15
+ - Submit an entry (title, description/body text, language, category)
16
+ - Optional media upload (image/audio)
17
+ - Local JSON storage (offline-friendly; minimal dependencies)
18
+ - Browse previously submitted entries
19
+ - Export entries to JSONL/CSV for corpus use
20
+ - Basic sidebar navigation with future feature placeholders
21
+
22
+ ---
23
+
24
+ ## πŸ›€ Project Timeline Alignment
25
+ This README corresponds to the 4-week structured sprint:
26
+
27
+ | Phase | Duration | Focus | Deliverable |
28
+ |--------------|--------------|-------------------------------------------------|--------------------------|
29
+ | **Week 1** | Dev Sprint | Build functional MVP; deploy to Hugging Face | Live app link |
30
+ | **Week 2** | Beta Testing | Recruit testers; low-bandwidth checks; feedback | Feedback log + fixes |
31
+ | **Weeks 3–4**| Growth | User acquisition; measure entries & languages | Metrics in REPORT.md |
32
+
33
+ See **REPORT.md** for detailed lifecycle documentation, metrics tables, feedback logs, and growth outcomes.
34
+
35
+ ---
36
+
37
+ ## 🧠 Planned AI Integrations
38
+ *(Post-MVP / stretch goals; all open-source models only)*
39
+ - Language detection
40
+ - Auto translation (IndicNLP / Hugging Face models)
41
+ - Summarization of long entries
42
+ - Dialect clustering for linguistic research
43
+
44
+ ---
45
+
46
+ ## πŸ›  Tech Stack
47
+ - **Frontend:** Streamlit (Python)
48
+ - **Backend:** Local JSON storage (offline-first)
49
+ - **Media:** Image & audio stored in `/data_entries`
50
+ - **Deployment:** Hugging Face Spaces
51
+ - **AI Models (planned):** Hugging Face open-source models
52
+
53
+ ---
54
+
55
+ ## πŸš€ Getting Started
56
+
57
+ ### βœ… Prerequisites
58
+ - Python 3.9+
59
+ - pip (Python package manager)
60
+
61
+ ### βœ… Installation
62
+ Clone the repository:
63
+
64
+ git clone https://code.swecha.org/your-team/ancestral-archive.git
65
+ cd ancestral-archive
66
+
67
+ Install dependencies:
68
+ pip install -r requirements.txt
69
+
70
+ Run the app:
71
+ streamlit run app/main.py
72
+
73
+ ---
74
+
75
+ ## πŸ“‚ Project Structure
76
+ ```bash
77
+ ancestral-archive/
78
+ β”œβ”€β”€ app/
79
+ β”‚ β”œβ”€β”€ main.py # Streamlit app logic
80
+ β”‚ └── helpers.py # Utility functions
81
+ β”œβ”€β”€ data_entries/ # JSON files & media uploads
82
+ β”œβ”€β”€ .streamlit/ # Optional config
83
+ β”œβ”€β”€ README.md # Project overview (this file)
84
+ β”œβ”€β”€ REPORT.md # Detailed project report
85
+ β”œβ”€β”€ requirements.txt # Python dependencies
86
+ β”œβ”€β”€ CONTRIBUTING.md # Guidelines for contributors
87
+ β”œβ”€β”€ CHANGELOG.md # Version history
88
+ β”œβ”€β”€ LICENSE # MIT license
89
+ ```
90
+ ---
91
+
92
+ ## πŸ§ͺ Testing & Feedback Plan
93
+ **Week 2 (Beta Testing):**
94
+ - Target users: Students, elders, rural communities
95
+ - Test offline & on low-bandwidth connections (hotspot/2G)
96
+ - Collect feedback via Google Forms & short interviews
97
+ - Log bugs + fixes in `CHANGELOG.md` and summary in `REPORT.md`
98
+
99
+ ---
100
+
101
+ ## πŸ“ˆ Growth Strategy
102
+ **Weeks 3–4:**
103
+ - Share via WhatsApp, local community groups, NGO partners
104
+ - Use posters + short video demo to drive participation
105
+ - Encourage submissions in regional languages/dialects
106
+ - Track metrics: unique users, entries, languages, media attachments
107
+
108
+ ---
109
+
110
+ ## πŸ“œ License
111
+ This project is licensed under the **MIT License**. See the `LICENSE` file for details.
112
+
113
+ ---
114
+
115
+ ## πŸŽ₯ Demo Video
116
+ - Brief intro: Problem + Solution
117
+ - Walkthrough of core features
118
+ - Offline/low-bandwidth usage demo
119
+ - Future AI integrations roadmap