Spaces:
Sleeping
Sleeping
Neural Arun commited on
Commit ·
95074cf
1
Parent(s): a94304d
changes made in rules of engagement.md
Browse files- core/agent.py +1 -1
- data/static/rules_of_engagement.md +2 -2
- db/5500b9f3-fb82-44ff-acc9-77407e3b580c/data_level0.bin +1 -1
- db/5500b9f3-fb82-44ff-acc9-77407e3b580c/length.bin +1 -1
- db/chroma.sqlite3 +2 -2
- db/ingestion_state.json +1 -1
- docs/operations/master_operations_guide.md +99 -0
- docs/scaling_guide.md +0 -27
- docs/updating_digital_twin.md +0 -78
- docs/workflows/adding_new_data.md +0 -123
core/agent.py
CHANGED
|
@@ -225,7 +225,7 @@ def init_agent():
|
|
| 225 |
tools = [notify_arun, search_arun_knowledge]
|
| 226 |
main_llm = ChatGroq(
|
| 227 |
temperature=0.2,
|
| 228 |
-
model_name="
|
| 229 |
api_key=os.getenv("GROQ_API_KEY")
|
| 230 |
).bind_tools(tools)
|
| 231 |
|
|
|
|
| 225 |
tools = [notify_arun, search_arun_knowledge]
|
| 226 |
main_llm = ChatGroq(
|
| 227 |
temperature=0.2,
|
| 228 |
+
model_name="llama-3.3-70b-versatile",
|
| 229 |
api_key=os.getenv("GROQ_API_KEY")
|
| 230 |
).bind_tools(tools)
|
| 231 |
|
data/static/rules_of_engagement.md
CHANGED
|
@@ -14,7 +14,7 @@ last_updated: 2026-04-09
|
|
| 14 |
* **The Handoff:** If a user wants to negotiate a contract, hire Arun, or asks a highly personal question, state that you will log their request and the "real Arun" will contact them shortly and use tool to send the message to Arun about it.
|
| 15 |
|
| 16 |
## 2. Zero Hallucination (Strict Grounding)
|
| 17 |
-
* **Truthfulness:** NEVER hallucinate, invent, or guess details about Arun's life, skills, or projects. You are constrained completely by the provided context.
|
| 18 |
* **The Veto Rule:** If a user requests a technology, service, or programming language (e.g., Next.js, React, Java) that is **not explicitly listed** in your **Tech Stack** section in `public_profile.md`, you must politely decline. Reply with: *"I currently specialize in backend AI systems and data pipelines; I do not offer [requested technology] services at this time. However, I can flag this interest for the real Arun to review."*
|
| 19 |
* **Firm Ambiguity:** If the retrieved knowledge context does not explicitly contain the answer, reply exactly with: *"I don't have that information in my knowledge base, but I can flag this for the real Arun to answer."*
|
| 20 |
|
|
@@ -31,7 +31,7 @@ last_updated: 2026-04-09
|
|
| 31 |
## 4. Professional Tone & Aesthetic
|
| 32 |
* **Professional & Concise:** Speak professionally, directly, and confidently. Eliminate AI robotic phrases like "As an AI..."
|
| 33 |
* **Attribution:** Always back up your technical claims by referencing specific projects with specific URL.
|
| 34 |
-
* **Aesthetics matter:** Every response must look premium and intentional with clear and simple language and
|
| 35 |
*Use proper markdown format to answer a question.
|
| 36 |
* **Structured Contact Info:** Present social links as a clean bulleted list with labels, like this:
|
| 37 |
- **LinkedIn**: [neuralarun](https://linkedin.com/in/neuralarun)
|
|
|
|
| 14 |
* **The Handoff:** If a user wants to negotiate a contract, hire Arun, or asks a highly personal question, state that you will log their request and the "real Arun" will contact them shortly and use tool to send the message to Arun about it.
|
| 15 |
|
| 16 |
## 2. Zero Hallucination (Strict Grounding)
|
| 17 |
+
* **Truthfulness:** NEVER hallucinate, invent, or guess details about Arun's life, skills, URL or projects. You are constrained completely by the provided context. and never make any URL up. only provide URL which you have in context.
|
| 18 |
* **The Veto Rule:** If a user requests a technology, service, or programming language (e.g., Next.js, React, Java) that is **not explicitly listed** in your **Tech Stack** section in `public_profile.md`, you must politely decline. Reply with: *"I currently specialize in backend AI systems and data pipelines; I do not offer [requested technology] services at this time. However, I can flag this interest for the real Arun to review."*
|
| 19 |
* **Firm Ambiguity:** If the retrieved knowledge context does not explicitly contain the answer, reply exactly with: *"I don't have that information in my knowledge base, but I can flag this for the real Arun to answer."*
|
| 20 |
|
|
|
|
| 31 |
## 4. Professional Tone & Aesthetic
|
| 32 |
* **Professional & Concise:** Speak professionally, directly, and confidently. Eliminate AI robotic phrases like "As an AI..."
|
| 33 |
* **Attribution:** Always back up your technical claims by referencing specific projects with specific URL.
|
| 34 |
+
* **Aesthetics matter:** Every response must look premium and intentional with clear and simple language make it look more pretty. and use a lot of emojis to make it more engaging. always use bullet points instead of paragraphs and try to make things funny while keeping the professional tone intact.
|
| 35 |
*Use proper markdown format to answer a question.
|
| 36 |
* **Structured Contact Info:** Present social links as a clean bulleted list with labels, like this:
|
| 37 |
- **LinkedIn**: [neuralarun](https://linkedin.com/in/neuralarun)
|
db/5500b9f3-fb82-44ff-acc9-77407e3b580c/data_level0.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 628400
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c796bdb71dd8343ff21939068076238ac191283aac58a1b27c0a177d59684a92
|
| 3 |
size 628400
|
db/5500b9f3-fb82-44ff-acc9-77407e3b580c/length.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 400
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:05788f243d3b01c8500cf3dfaff33d0b5d758363cf7b5a8593386eaf5848b848
|
| 3 |
size 400
|
db/chroma.sqlite3
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4a321e53c9df41f65ca4df75e7804af994d270558cfde0aac35c656c99170c7f
|
| 3 |
+
size 5578752
|
db/ingestion_state.json
CHANGED
|
@@ -42,5 +42,5 @@
|
|
| 42 |
"data/raw/metadata.json": "f12b1f61791c478891a2d0aec726d097",
|
| 43 |
"data/static/public_profile.md": "0777d7e5e10fa28dab5ea79611026229",
|
| 44 |
"data/static/README.md": "e71befe905c6348f2d497aee35f37f81",
|
| 45 |
-
"data/static/rules_of_engagement.md": "
|
| 46 |
}
|
|
|
|
| 42 |
"data/raw/metadata.json": "f12b1f61791c478891a2d0aec726d097",
|
| 43 |
"data/static/public_profile.md": "0777d7e5e10fa28dab5ea79611026229",
|
| 44 |
"data/static/README.md": "e71befe905c6348f2d497aee35f37f81",
|
| 45 |
+
"data/static/rules_of_engagement.md": "400606c2e22098c229f26154b3c7c4b0"
|
| 46 |
}
|
docs/operations/master_operations_guide.md
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Master Operations Guide: ArunCore Digital Twin
|
| 2 |
+
|
| 3 |
+
This document is the ultimate, unified operational manual for maintaining, updating, and infinitely scaling your AI Digital Twin. It covers how to inject new knowledge, push database updates, deploy UI modifications, and migrate to enterprise-grade infrastructure.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Phase 1: Injecting New Knowledge (Data Layer)
|
| 8 |
+
|
| 9 |
+
Because the system uses **File Hashing** and **Deterministic Upserting**, you never have to worry about duplicating data or deleting old databases. The ingestion script will automatically detect what has changed.
|
| 10 |
+
|
| 11 |
+
### Scenario A: Adding a New GitHub Project
|
| 12 |
+
1. **Create the Folder**: Navigate to `d:\ArunCore\data\github\` and create a folder mapping the repository name (e.g., `new_project_name`).
|
| 13 |
+
2. **`metadata.json`**: Create this file inside to configure rapid RAG routing filters:
|
| 14 |
+
```json
|
| 15 |
+
{
|
| 16 |
+
"project_name": "New Project",
|
| 17 |
+
"repo_url": "https://github.com/neural-arun/...",
|
| 18 |
+
"description": "Short description of the system.",
|
| 19 |
+
"tech_stack": ["python", "fastapi"],
|
| 20 |
+
"status": "completed"
|
| 21 |
+
}
|
| 22 |
+
```
|
| 23 |
+
3. **`overview.md`**: Focus entirely on the *why* (problem solved) and the *how* (architectural solution). Do not paste code.
|
| 24 |
+
4. **`decisions.md` (Crucial)**: Detail the exact engineering trade-offs. Why tool X over tool Y?
|
| 25 |
+
5. **Update Central Narratives**: Add the project briefly to your `data/static/public_profile.md` and `data/raw/all_projects_summary.md`.
|
| 26 |
+
|
| 27 |
+
### Scenario B: Updating LinkedIn Posts
|
| 28 |
+
1. Open `d:\ArunCore\data\linkedin\posts.md`.
|
| 29 |
+
2. Prepend the new post directly to the top of the file. The system will auto-detect the file hash has changed and selectively re-vectorize this specific document without double-charging you for OpenAI embeddings on other unchanged files.
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## Phase 2: Compiling the Brain (Engine Layer)
|
| 34 |
+
|
| 35 |
+
Once your Markdown and JSON files are ready locally, you must compile them into the ChromaDB vector graph.
|
| 36 |
+
|
| 37 |
+
1. Open your terminal in the root `d:\ArunCore` directory.
|
| 38 |
+
2. Run the compiler:
|
| 39 |
+
```bash
|
| 40 |
+
python core/ingest.py
|
| 41 |
+
```
|
| 42 |
+
*Within 30 seconds, your local system will permanently remember the new data inside the `db/` folder.*
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## Phase 3: Going Live (Cloud Deployment)
|
| 47 |
+
|
| 48 |
+
Running `ingest.py` only updates your *local* machine. For your globally accessible Digital Twin to access the new knowledge, you MUST push the updated `db/` folder to the Hugging Face backend.
|
| 49 |
+
|
| 50 |
+
1. **Stage data & database chunks:**
|
| 51 |
+
```bash
|
| 52 |
+
git add data/ db/
|
| 53 |
+
```
|
| 54 |
+
2. **Commit the knowledge update:**
|
| 55 |
+
```bash
|
| 56 |
+
git commit -m "Knowledge Base Update: Ingested new projects into the core vector graph"
|
| 57 |
+
```
|
| 58 |
+
3. **Push to Hugging Face (The Live API Engine):**
|
| 59 |
+
```bash
|
| 60 |
+
git push hf main
|
| 61 |
+
```
|
| 62 |
+
*Status: Once Hugging Face finishes building (usually ~60s), your AI is live with the new data.*
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## Phase 4: Updating the Visual UI (Vercel Layer)
|
| 67 |
+
|
| 68 |
+
If you have made styling changes to `frontend/app/page.tsx` (like adding new Markdown renderers or changing colors), that code lives independently from the Python engine.
|
| 69 |
+
|
| 70 |
+
To deploy visual UI updates:
|
| 71 |
+
1. Navigate into the frontend folder:
|
| 72 |
+
```bash
|
| 73 |
+
cd frontend
|
| 74 |
+
```
|
| 75 |
+
2. Push directly using the Vercel CLI:
|
| 76 |
+
```bash
|
| 77 |
+
npx vercel --prod --yes
|
| 78 |
+
```
|
| 79 |
+
*Status: Vercel will build the Next.js bundle and deploy the new UI globally in ~40 seconds.*
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## Phase 5: Infinite Scaling (Future-Proofing Guide)
|
| 84 |
+
|
| 85 |
+
This outlines the upgrade path for transitioning ArunCore from a HuggingFace prototype environment to an enterprise-grade cloud-native deployment capable of handling massive traffic spikes.
|
| 86 |
+
|
| 87 |
+
### Level 1: Vector Database Migration
|
| 88 |
+
Currently, ArunCore relies on a local instance of **ChromaDB**. Serverless environments (like Vercel Lambdas or AWS Lambda) are ephemeral and cannot read/write persistent local files reliably.
|
| 89 |
+
To completely decouple the backend architecture for infinite scaling:
|
| 90 |
+
1. **Select a Cloud Vector Engine:** Sign up for Pinecone or Weaviate.
|
| 91 |
+
2. **Modify `ingest.py`:** Update the chunking pipeline to serialize data using standard LangChain Pinecone bindings instead of Chroma bindings.
|
| 92 |
+
3. **Update `agent.py`:** Change the `$GLOBAL_VECTORSTORE` to connect to the Pinecone index instead of a local disk directory.
|
| 93 |
+
*(With this change, the entire `db/` folder can be deleted from the Repo, drastically lowering the repository size).*
|
| 94 |
+
|
| 95 |
+
### Level 2: Backend API Severance
|
| 96 |
+
Right now, the FastAPI (`core/api.py`) runs as a persistent listener. For cost-efficiency, the Python engine should be moved to Serverless Functions using Vercel's native Python Edge support.
|
| 97 |
+
|
| 98 |
+
### Level 3: Streaming Responses
|
| 99 |
+
For the best UX during long LangChain loops, switch from a blocked REST request (`POST /chat`) to HTTP Streaming (`text/event-stream`). Update the `main_llm` invocation to yield tokens dynamically, passing the feeling of immediate speed to the user while backend processes crunch data in parallel.
|
docs/scaling_guide.md
DELETED
|
@@ -1,27 +0,0 @@
|
|
| 1 |
-
# Scaling & Deployment Guide
|
| 2 |
-
|
| 3 |
-
This document outlines the upgrade path for transitioning ArunCore from a local/HuggingFace prototype environment to an enterprise-grade cloud-native deployment capable of handling massive traffic spikes and highly decoupled CI/CD.
|
| 4 |
-
|
| 5 |
-
## Level 1: Vector Database Migration
|
| 6 |
-
Currently, ArunCore relies on a local instance of **ChromaDB**. The `db/` folder is actively persisted in the filesystem. While this works beautifully for server-side monolithic deployments (like HuggingFace Spaces), it breaks down in serverless environments (like Vercel Lambdas) because serverless functions are ephemeral and cannot read/write persistent local files reliably.
|
| 7 |
-
|
| 8 |
-
### Upgrading to Pinecone or Weaviate
|
| 9 |
-
To completely decouple the backend architecture for infinite scaling:
|
| 10 |
-
1. **Select a Cloud Vector Engine:** Sign up for an API-based hosted vector database like Pinecone.
|
| 11 |
-
2. **Modify `ingest.py`:** Update the chunking pipeline to serialize data using standard LangChain Pinecone bindings instead of Chroma bindings.
|
| 12 |
-
3. **Update `agent.py`:** Change the `init_agent()` vectorstore configuration.
|
| 13 |
-
```python
|
| 14 |
-
# Target Architecture
|
| 15 |
-
from langchain_pinecone import PineconeVectorStore
|
| 16 |
-
_GLOBAL_VECTORSTORE = PineconeVectorStore(index_name="aruncore", embedding=embeddings)
|
| 17 |
-
```
|
| 18 |
-
4. **Action:** With this change, the entire `db/` folder can be deleted, drastically lowering the repository size limit.
|
| 19 |
-
|
| 20 |
-
## Level 2: Backend API Severance (Vercel / AWS)
|
| 21 |
-
Right now, the FastAPI (`core/api.py`) runs as a persistent listener. But for true cost-efficiency and auto-scaling, the Python engine should be moved to Serverless Functions.
|
| 22 |
-
1. Since you deployed the Node.js frontend on Vercel, Vercel also supports Python Serverless Edge functions.
|
| 23 |
-
2. Move the `api.py` endpoint from FastAPI to the `api/` directory format expected by Vercel inside the Next.js `frontend` directory.
|
| 24 |
-
3. Update the timeout limits. LLM operations take 10-30 seconds, so ensure Vercel limits are set up to handle long-running transactions (or migrate to a WebSocket/streaming model).
|
| 25 |
-
|
| 26 |
-
## Level 3: Streaming Responses
|
| 27 |
-
For the best UI experience during long LangChain loops (when it's searching the DB multiple times), switch from a blocked REST request (`POST /chat`) to HTTP Streaming (`text/event-stream`). Update the `main_llm` invocation to yield tokens dynamically, passing the feeling of immediate speed to the user while backend processes crunch data.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/updating_digital_twin.md
DELETED
|
@@ -1,78 +0,0 @@
|
|
| 1 |
-
# Updating the Digital Twin's Knowledge Base
|
| 2 |
-
|
| 3 |
-
As Arun completes new projects or learns new architectures, ArunCore must evolve with him. Adding new knowledge to the digital twin requires ZERO changes to the core Python logic. You simply adjust data and instruct the ingestor.
|
| 4 |
-
|
| 5 |
-
## How to Inject New Knowledge
|
| 6 |
-
|
| 7 |
-
### 1. Structure the Project Directory
|
| 8 |
-
Navigate to the `data/github/` directory and create a completely new folder reflecting the project's name (e.g., `data/github/quantum_trading_bot/`).
|
| 9 |
-
|
| 10 |
-
### 2. Required Payload Files
|
| 11 |
-
Inside your new folder, create these necessary files so the AI can parse the context correctly:
|
| 12 |
-
|
| 13 |
-
#### A. `overview.md`
|
| 14 |
-
This is the narrative file. Write 2-3 paragraphs describing why you built it, the central bottleneck you solved, and the final impact. Example:
|
| 15 |
-
* "I engineered this quantum trading system because..."
|
| 16 |
-
|
| 17 |
-
#### B. `metadata.json`
|
| 18 |
-
A rigid structured data file providing rapid context for the agent's RAG routing constraints.
|
| 19 |
-
```json
|
| 20 |
-
{
|
| 21 |
-
"project_name": "Quantum Trading Bot",
|
| 22 |
-
"technologies": ["Python", "TensorFlow", "Pandas"],
|
| 23 |
-
"deployment": "AWS EC2",
|
| 24 |
-
"status": "Archived",
|
| 25 |
-
"repository": "https://github.com/neural-arun/temp"
|
| 26 |
-
}
|
| 27 |
-
```
|
| 28 |
-
|
| 29 |
-
#### C. `code_summaries.json` (Optional but highly recommended)
|
| 30 |
-
If you want the AI to confidently discuss the actual code structure of the project without feeding it thousands of raw `.py` lines:
|
| 31 |
-
```json
|
| 32 |
-
{
|
| 33 |
-
"files": [
|
| 34 |
-
{
|
| 35 |
-
"path": "src/algo.py",
|
| 36 |
-
"summary": "Implements the core matrix mutation algorithms used against real-time API feeds."
|
| 37 |
-
}
|
| 38 |
-
]
|
| 39 |
-
}
|
| 40 |
-
```
|
| 41 |
-
|
| 42 |
-
### 3. Updating the Central Narratives
|
| 43 |
-
Whenever a major milestone project is completed, you should also edit:
|
| 44 |
-
* `data/static/public_profile.md`: To add the skill to your top-level "Tech Skills" array.
|
| 45 |
-
* `data/github/Agentic_AI_Projects/master_portfolio_summary.md`: Add a high-level bullet pointing to the new project.
|
| 46 |
-
|
| 47 |
-
### 4. Rebuilding the Brain
|
| 48 |
-
Once the markdown files and JSONs look correct, simply fire up the compiler:
|
| 49 |
-
```bash
|
| 50 |
-
python core/ingest.py
|
| 51 |
-
```
|
| 52 |
-
This script acts as a compiler for the LLM. It will automatically crawl the newly added folders, perform LangChain structural chunking, calculate semantic embeddings by calling the OpenAI API, and seamlessly merge the physical vector graphs inside `db/`.
|
| 53 |
-
|
| 54 |
-
**Done.** Within 30 seconds, your local digital twin will permanently "remember" your new project.
|
| 55 |
-
|
| 56 |
-
### 5. Cloud Deployment (Going Live for the World)
|
| 57 |
-
Just running `ingest.py` only updates your *local* machine. For your globally accessible Digital Twin (`aruncore.vercel.app`) to access the new knowledge, you MUST push the updated `db/` folder to the Hugging Face backend.
|
| 58 |
-
|
| 59 |
-
Follow these exact steps in your terminal:
|
| 60 |
-
|
| 61 |
-
1. **Stage all new data and database chunks:**
|
| 62 |
-
```bash
|
| 63 |
-
git add data/ db/
|
| 64 |
-
```
|
| 65 |
-
2. **Commit the knowledge update:**
|
| 66 |
-
```bash
|
| 67 |
-
git commit -m "Knowledge Base Expand: Embedded new project data into the main vector graph"
|
| 68 |
-
```
|
| 69 |
-
3. **Push to GitHub (Your Backup):**
|
| 70 |
-
```bash
|
| 71 |
-
git push origin main
|
| 72 |
-
```
|
| 73 |
-
4. **Push to Hugging Face (The Live API Engine):**
|
| 74 |
-
```bash
|
| 75 |
-
git push hf main
|
| 76 |
-
```
|
| 77 |
-
|
| 78 |
-
**Production Status:** Once the Hugging Face Space finishes building (usually ~60 seconds), your online Vercel UI will instantly begin querying the newly injected knowledge. No frontend updates are required!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/workflows/adding_new_data.md
DELETED
|
@@ -1,123 +0,0 @@
|
|
| 1 |
-
# Workflow: Adding New Data to ArunCore
|
| 2 |
-
|
| 3 |
-
This document outlines the exact, step-by-step process for adding new projects, updating your profile, or adding new LinkedIn posts to the ArunCore knowledge base.
|
| 4 |
-
|
| 5 |
-
Because the system uses **File Hashing** and **Deterministic Upserting**, you never have to worry about duplicating data or deleting old databases. The ingestion script will automatically detect what has changed and only process the new/modified files.
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## Scenario A: Adding a New GitHub Project (Tier 1 - Major Project)
|
| 10 |
-
|
| 11 |
-
When you finish a major new engineering project, follow this exact workflow to teach it to your AI:
|
| 12 |
-
|
| 13 |
-
### Step 1: Create the Project Folder
|
| 14 |
-
Navigate to your GitHub data folder:
|
| 15 |
-
`d:\ArunCore\data\github\`
|
| 16 |
-
Create a new folder using the exact repository name (e.g., `new_awesome_project`).
|
| 17 |
-
|
| 18 |
-
### Step 2: Create `metadata.json`
|
| 19 |
-
Inside the new folder, create `metadata.json` and fill it out:
|
| 20 |
-
```json
|
| 21 |
-
{
|
| 22 |
-
"project_name": "New Awesome Project",
|
| 23 |
-
"repo_url": "https://github.com/neural-arun/new_awesome_project",
|
| 24 |
-
"description": "A 1-2 sentence description of what the system does.",
|
| 25 |
-
"created_at": "YYYY-MM",
|
| 26 |
-
"updated_at": "YYYY-MM",
|
| 27 |
-
"tech_stack": ["python", "fastapi", "tool_name"],
|
| 28 |
-
"status": "completed",
|
| 29 |
-
"visibility": "PUBLIC"
|
| 30 |
-
}
|
| 31 |
-
```
|
| 32 |
-
|
| 33 |
-
### Step 3: Create `readme.md`
|
| 34 |
-
Create `readme.md`. Do NOT copy-paste installation instructions or generic GitHub badges. Focus strictly on:
|
| 35 |
-
- **Problem:** What real-world issue does this solve?
|
| 36 |
-
- **Solution:** How did you solve it?
|
| 37 |
-
- **Key Features:** Bullet points of the primary capabilities.
|
| 38 |
-
|
| 39 |
-
### Step 4: Create `architecture.md`
|
| 40 |
-
Create `architecture.md`. Explain the data flow and system design.
|
| 41 |
-
- Define the major components (e.g., UI, Backend, Database).
|
| 42 |
-
- Explain the pipeline (e.g., *Data flows from X to Y using asyncio queues...*).
|
| 43 |
-
|
| 44 |
-
### Step 5: Create `code_summaries.json`
|
| 45 |
-
Create `code_summaries.json`. List out the major scripts/modules and what they do:
|
| 46 |
-
```json
|
| 47 |
-
[
|
| 48 |
-
{
|
| 49 |
-
"module": "main.py",
|
| 50 |
-
"summary": "Entry point for the application, handles FastAPI routing.",
|
| 51 |
-
"file_url": "https://github.com/neural-arun/..."
|
| 52 |
-
}
|
| 53 |
-
]
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
### Step 6: Create `decisions.md`
|
| 57 |
-
Create `decisions.md`. This is the most important file for proving you are an engineer.
|
| 58 |
-
Format it as:
|
| 59 |
-
- **Decision 1: [Tool/Architecture Choice]**
|
| 60 |
-
- **What:** I chose X over Y.
|
| 61 |
-
- **Why:** (Explain the trade-off. Did Y have rate limits? Was X faster?)
|
| 62 |
-
|
| 63 |
-
### Step 7: Run Ingestion
|
| 64 |
-
Open your terminal in `d:\ArunCore` and run the ingestion script:
|
| 65 |
-
```powershell
|
| 66 |
-
python core/ingest.py
|
| 67 |
-
```
|
| 68 |
-
*The script will automatically detect the new folder, chunk the 5 files, and embed them into the database. Old projects will be skipped instantly.*
|
| 69 |
-
|
| 70 |
-
---
|
| 71 |
-
|
| 72 |
-
## Scenario B: Adding a Minor Project (Tier 2)
|
| 73 |
-
|
| 74 |
-
If the project is a small utility or learning lab, it doesn't need architecture docs.
|
| 75 |
-
|
| 76 |
-
1. Go to `d:\ArunCore\data\github\`.
|
| 77 |
-
2. Create `your_minor_project` folder.
|
| 78 |
-
3. Create ONLY `metadata.json` and a short `readme.md`.
|
| 79 |
-
4. Run `python core/ingest.py`.
|
| 80 |
-
|
| 81 |
-
---
|
| 82 |
-
|
| 83 |
-
## Scenario C: Updating LinkedIn Posts
|
| 84 |
-
|
| 85 |
-
When you publish new content on LinkedIn that you want ArunCore to know about:
|
| 86 |
-
|
| 87 |
-
### Step 1: Open `posts.md`
|
| 88 |
-
Open `d:\ArunCore\data\linkedin\posts.md`.
|
| 89 |
-
|
| 90 |
-
### Step 2: Prepend the New Post
|
| 91 |
-
Add the new post to the **top** of the document using this exact semantic format:
|
| 92 |
-
|
| 93 |
-
```markdown
|
| 94 |
-
## Post: [Punchy Title About The Post]
|
| 95 |
-
|
| 96 |
-
**Date:** [Month Year]
|
| 97 |
-
**Topic:** [Comma separated topics]
|
| 98 |
-
**Links:** [Optional URL]
|
| 99 |
-
|
| 100 |
-
**Core Message**
|
| 101 |
-
1-2 sentences summarizing the main thesis of your post.
|
| 102 |
-
|
| 103 |
-
**Technical Deep Dive**
|
| 104 |
-
- Detail 1: ...
|
| 105 |
-
- Detail 2: ...
|
| 106 |
-
```
|
| 107 |
-
|
| 108 |
-
### Step 3: Run Ingestion
|
| 109 |
-
```powershell
|
| 110 |
-
python core/ingest.py
|
| 111 |
-
```
|
| 112 |
-
*The script will see that `posts.md` has been modified (its file hash changed). It will automatically delete all the old chunks for `posts.md`, divide your newly updated file into fresh chunks, and embed them. You will not pay twice for other unchanged files.*
|
| 113 |
-
|
| 114 |
-
---
|
| 115 |
-
|
| 116 |
-
## Scenario D: Updating Your Identity
|
| 117 |
-
|
| 118 |
-
If you learn a massive new skill (e.g., you master a new framework like Next.js) or change your freelance goals:
|
| 119 |
-
|
| 120 |
-
1. Open `d:\ArunCore\data\static\public_profile.md`.
|
| 121 |
-
2. Add the new framework to the `Core Focus & Technology Stack` lists.
|
| 122 |
-
3. Save the file.
|
| 123 |
-
4. Run `python core/ingest.py`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|