File size: 3,611 Bytes
e34edc7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# CURL Test Commands for Ingestion Pipeline

## Backend Configuration
- **URL**: `https://binkhoale1812-studdybuddy-ingestion1.hf.space/`
- **User ID**: `44e65346-8eaa-4f95-b17a-f6219953e7a8`
- **Project ID**: `496e2fad-ec7e-4562-b06a-ea2491f2460`
- **Test Files**: `Lecture5_ML.pdf`, `Lecture6_ANN_DL.pdf`

## 1. Health Check

```bash
curl -X GET "https://binkhoale1812-studdybuddy-ingestion1.hf.space/health" \
  -H "Content-Type: application/json"
```

## 2. Upload Files

```bash
curl -X POST "https://binkhoale1812-studdybuddy-ingestion1.hf.space/upload" \
  -F "user_id=44e65346-8eaa-4f95-b17a-f6219953e7a8" \
  -F "project_id=496e2fad-ec7e-4562-b06a-ea2491f2460" \
  -F "files=@../exefiles/Lecture5_ML.pdf" \
  -F "files=@../exefiles/Lecture6_ANN_DL.pdf"
```

## 3. Check Upload Status

Replace `{JOB_ID}` with the job_id from the upload response:

```bash
curl -X GET "https://binkhoale1812-studdybuddy-ingestion1.hf.space/upload/status?job_id={JOB_ID}" \
  -H "Content-Type: application/json"
```

## 4. List Uploaded Files

```bash
curl -X GET "https://binkhoale1812-studdybuddy-ingestion1.hf.space/files?user_id=44e65346-8eaa-4f95-b17a-f6219953e7a8&project_id=496e2fad-ec7e-4562-b06a-ea2491f2460" \
  -H "Content-Type: application/json"
```

## 5. Get File Chunks (Lecture5_ML.pdf)

```bash
curl -X GET "https://binkhoale1812-studdybuddy-ingestion1.hf.space/files/chunks?user_id=44e65346-8eaa-4f95-b17a-f6219953e7a8&project_id=496e2fad-ec7e-4562-b06a-ea2491f2460&filename=Lecture5_ML.pdf&limit=5" \
  -H "Content-Type: application/json"
```

## 6. Get File Chunks (Lecture6_ANN_DL.pdf)

```bash
curl -X GET "https://binkhoale1812-studdybuddy-ingestion1.hf.space/files/chunks?user_id=44e65346-8eaa-4f95-b17a-f6219953e7a8&project_id=496e2fad-ec7e-4562-b06a-ea2491f2460&filename=Lecture6_ANN_DL.pdf&limit=5" \
  -H "Content-Type: application/json"
```

## Expected Responses

### Health Check Response
```json
{
  "ok": true,
  "mongodb_connected": true,
  "service": "ingestion_pipeline"
}
```

### Upload Response
```json
{
  "job_id": "uuid-string",
  "status": "processing",
  "total_files": 2
}
```

### Status Response
```json
{
  "job_id": "uuid-string",
  "status": "completed",
  "total": 2,
  "completed": 2,
  "progress": 100.0,
  "last_error": null,
  "created_at": 1234567890.123
}
```

### Files List Response
```json
{
  "files": [
    {
      "filename": "Lecture5_ML.pdf",
      "summary": "Document summary..."
    },
    {
      "filename": "Lecture6_ANN_DL.pdf", 
      "summary": "Document summary..."
    }
  ],
  "filenames": ["Lecture5_ML.pdf", "Lecture6_ANN_DL.pdf"]
}
```

### Chunks Response
```json
{
  "chunks": [
    {
      "user_id": "44e65346-8eaa-4f95-b17a-f6219953e7a8",
      "project_id": "496e2fad-ec7e-4562-b06a-ea2491f2460",
      "filename": "Lecture5_ML.pdf",
      "topic_name": "Machine Learning Introduction",
      "summary": "Chunk summary...",
      "content": "Chunk content...",
      "embedding": [0.1, 0.2, ...],
      "page_span": [1, 3],
      "card_id": "lecture5_ml-c0001"
    }
  ]
}
```

## Testing Steps

1. **Run Health Check**: Verify the service is running
2. **Upload Files**: Upload both PDF files
3. **Monitor Progress**: Check job status until completion
4. **Verify Files**: List uploaded files
5. **Inspect Chunks**: Get document chunks to verify processing

## Troubleshooting

- **Connection Issues**: Check if the backend URL is accessible
- **File Not Found**: Ensure PDF files exist in `../exefiles/` directory
- **Upload Fails**: Check file size limits and format support
- **Processing Stuck**: Monitor job status and check logs