davanstrien HF Staff commited on
Commit
4d1a611
·
1 Parent(s): c40ab4a

Update extraction prompt in app.py for improved clarity and detail on metadata fields

Browse files
Files changed (1) hide show
  1. app.py +28 -9
app.py CHANGED
@@ -17,15 +17,34 @@ model = AutoModelForImageTextToText.from_pretrained(
17
  processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct")
18
  print("Model loaded successfully!")
19
 
20
- EXTRACTION_PROMPT = """Extract all metadata from this library catalog card and return it as valid JSON with the following fields:
21
- - title: The main title or name on the card
22
- - author: Author, creator, or associated person/organization
23
- - date: Any dates mentioned (publication, creation, or coverage dates)
24
- - call_number: Library classification or call number
25
- - physical_description: Details about the physical item (size, extent, format)
26
- - notes: Any additional notes or information
27
-
28
- Return NLY the JSON object, nothing else. If a field is not present on the card, use null for that field."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
 
31
  @spaces.GPU
 
17
  processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct")
18
  print("Model loaded successfully!")
19
 
20
+ EXTRACTION_PROMPT = """Extract metadata from this library catalog card as JSON.
21
+
22
+ Library catalog cards contain bibliographic information about materials and filing/access information. Extract whatever fields are present:
23
+
24
+ CORE BIBLIOGRAPHIC FIELDS:
25
+ - title: Full title of the work
26
+ - author: Main author/creator (person or organization)
27
+ - editor: Editor if different from author
28
+ - contributor: Other contributors (translators, illustrators, etc.)
29
+ - publication_date: Date(s) of publication
30
+ - publisher: Publisher name
31
+ - publication_place: Place of publication
32
+ - physical_description: Physical details (volumes, pages, size, illustrations)
33
+ - series: Series information if part of a series
34
+ - edition: Edition statement
35
+ - contents: Description of contents, volumes, or parts
36
+
37
+ CATALOGING/ACCESS FIELDS:
38
+ - call_number: Library classification number
39
+ - subject_headings: Subject terms (often numbered list)
40
+ - added_entries: Additional access points for co-authors, editors, etc. (often with Roman numerals)
41
+ - notes: Any additional notes
42
+
43
+ CARD-SPECIFIC:
44
+ - filing_heading: The heading under which this card is filed (often at top, may be in all caps)
45
+ - card_sequence: If this is a continuation card (e.g., "Card 2", "Card 3")
46
+
47
+ Return ONLY valid JSON. Use null for fields not present on the card. Use arrays [] for repeating fields like subject_headings and added_entries."""
48
 
49
 
50
  @spaces.GPU