DigiPres / README.md
semanticnoodles's picture
Update README.md
e6f872e verified

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: FormatAnalyser
emoji: 🗂️
colorFrom: green
colorTo: indigo
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: false
license: mit

🗂️ FileAnalyser

A web‑based GUI for digital format identification Created by Giulia Osti in collaboration with GitHub Copilot (Raptor mini model).

Analyse files using three digital preservation tools:

Tool What it does
Siegfried Fast format identification against PRONOM, LOC and MIME-info signatures
DROID The National Archives' official format identification tool
JHOVE Deep format validation and well-formedness checking

Features

  • Upload one or more files simultaneously
  • Select which tools to run (independently)
  • Results displayed as a formatted table
  • Raw terminal output visible per tool
  • Download full results as CSV

Deploy on Hugging Face Spaces

  1. Fork or clone this repo to your GitHub account
  2. Go to huggingface.co/new-space
  3. Choose Gradio as the SDK
  4. Link your GitHub repo under "Import from GitHub"
  5. Spaces will auto-deploy — tools install on first run (~3–5 min cold start)

Note: Tools are installed once per Space instance. Subsequent runs within the same session are instant.


Run Locally

git clone https://github.com/YOUR_USERNAME/format-analyser
cd format-analyser
pip install -r requirements.txt
python app.py

You will also need sf (Siegfried), DROID, and JHOVE installed on your system,
or let the app install them automatically on Linux/macOS.


File Structure

.
├── app.py            # Main Gradio application (contains installers + runners)
├── requirements.txt  # Python dependencies
├── packages.txt      # System-level packages (for HF Spaces)
└── README.md         # This file (you are reading it)

Notes & Tips

  • JHOVE installer URL may change with new releases — check openpreservation.org if installation fails.
  • DROID signature file is automatically downloaded from The National Archives. The hard‑coded version in app.py might need bumping occasionally; the app will also fall back to any existing .xml in your home directory.
  • Java version – beginning with DROID 6.8, Java 21+ is required. The application detects the JVM version and chooses a compatible binary (6.7.0 for Java ≤ 19). In Spaces we install openjdk-21 so you get the latest release by default.
  • On Hugging Face Spaces the tooling phase runs just once when the container starts; warm‑up runs are very fast thereafter.