Add comprehensive AppImage metadata extraction design document

Covers all metadata sources from Type 1/2 ELF headers through desktop
entries to AppStream XML, with schema, parser, and UI display plans.
This commit is contained in:
lashman
2026-02-27 18:08:24 +02:00
parent 6526f92a6f
commit fc3ee9ba8f

View File

@@ -0,0 +1,183 @@
# AppImage Comprehensive Metadata Extraction and Display
## Goal
Extract ALL available metadata from AppImage files - from the oldest Type 1 format to the newest Type 2 with AppStream XML - and display it comprehensively in the overview tab of the detail view.
## Background: All AppImage Metadata Sources
### 1. ELF Binary Header (Type 1 and Type 2)
- Magic bytes at offset 8: `AI\x01` (Type 1) or `AI\x02` (Type 2)
- Architecture from `e_machine` at offset 18
### 2. Type 1: ISO 9660 Volume Descriptor
- Update info at fixed offset 33651 (512 bytes)
### 3. Type 2: ELF Sections
- `.upd_info` (1024 bytes) - update transport (zsync, GitHub releases, etc.)
- `.sha256_sig` (1024 bytes) - GPG digital signature
- `.sig_key` (8192 bytes) - public key for signature verification
### 4. Desktop Entry File (.desktop)
Standard freedesktop fields:
- `Name`, `GenericName`, `Comment`, `Icon`, `Exec`, `Categories`
- `Keywords`, `MimeType`, `StartupWMClass`, `Terminal`
- `Actions` with `[Desktop Action <name>]` sections
- AppImage-specific: `X-AppImage-Version`, `X-AppImage-Name`, `X-AppImage-Arch`
### 5. AppStream / AppData XML (richest source)
Located at `usr/share/metainfo/*.xml` or `usr/share/appdata/*.xml`:
- `<id>` - reverse-DNS identifier
- `<name>` - localized app name
- `<summary>` - one-line description
- `<description>` - full rich-text description
- `<developer>` - developer/organization
- `<project_license>` - SPDX license
- `<project_group>` - umbrella project (GNOME, KDE, etc.)
- `<url type="...">` - homepage, bugtracker, donation, help, vcs-browser, contribute
- `<keywords>` - search terms
- `<categories>` - menu categories
- `<screenshots>` - screenshot URLs with captions
- `<releases>` - version history with dates and changelogs
- `<content_rating type="oars-1.1">` - age rating
- `<provides>` - MIME types, binaries, D-Bus interfaces
- `<branding>` - theme colors
### 6. Icons (already handled)
- `.DirIcon`, root-level PNG/SVG, `usr/share/icons/` hierarchy
### 7. Digital Signatures (Type 2)
- GPG signature in `.sha256_sig` ELF section
- Verifiable with embedded public key
## Approach
Parse everything at analysis time and store in the database (Approach A). This matches the existing architecture where `run_background_analysis()` populates the DB and the UI reads from `AppImageRecord`.
## Database Schema (Migration v9)
16 new columns on `appimages`:
| Column | Type | Default | Description |
|--------|------|---------|-------------|
| `appstream_id` | TEXT | NULL | Reverse-DNS ID (e.g. `org.kde.krita`) |
| `appstream_description` | TEXT | NULL | Rich description (paragraphs joined with newlines) |
| `generic_name` | TEXT | NULL | Generic descriptor ("Web Browser") |
| `license` | TEXT | NULL | SPDX license expression |
| `homepage_url` | TEXT | NULL | Project website |
| `bugtracker_url` | TEXT | NULL | Bug reporting URL |
| `donation_url` | TEXT | NULL | Donation page |
| `help_url` | TEXT | NULL | Documentation URL |
| `vcs_url` | TEXT | NULL | Source code URL |
| `keywords` | TEXT | NULL | Comma-separated keywords |
| `mime_types` | TEXT | NULL | Semicolon-separated MIME types |
| `content_rating` | TEXT | NULL | Summarized OARS rating |
| `project_group` | TEXT | NULL | Umbrella project |
| `release_history` | TEXT | NULL | JSON array of recent releases |
| `desktop_actions` | TEXT | NULL | JSON array of desktop actions |
| `has_signature` | INTEGER | 0 | Whether AppImage has GPG signature |
## New Module: AppStream XML Parser
**File:** `src/core/appstream.rs`
Uses `quick-xml` crate (pure Rust, lightweight).
Key types:
```
AppStreamMetadata
id: Option<String>
name: Option<String>
summary: Option<String>
description: Option<String>
developer: Option<String>
project_license: Option<String>
project_group: Option<String>
urls: HashMap<String, String>
keywords: Vec<String>
categories: Vec<String>
content_rating_summary: Option<String>
releases: Vec<ReleaseInfo>
mime_types: Vec<String>
ReleaseInfo
version: String
date: Option<String>
description: Option<String>
```
Parser function: `parse_appstream_file(path: &Path) -> Option<AppStreamMetadata>`
- Walks XML events, extracts all fields
- Strips HTML from `<description>` (joins `<p>` with newlines, `<li>` with bullets)
- Caps releases at 10 most recent
- Summarizes OARS content rating into a single label
## Extended Desktop Entry Parsing
Update `DesktopEntryFields` in `inspector.rs`:
- Add `generic_name`, `keywords`, `mime_types`, `terminal`, `actions`, `x_appimage_name`
- Parse `[Desktop Action <name>]` sections for action names and exec commands
## Inspector + Analysis Pipeline
1. `AppImageMetadata` struct gains all new fields
2. `inspect_appimage()` looks for AppStream XML after extraction, parses it, merges into metadata (AppStream takes priority for overlapping fields)
3. `run_background_analysis()` stores new fields via `db.update_appstream_metadata()`
4. Signature detection: read ELF `.sha256_sig` section, check if non-empty
## Overview Tab UI Layout
Groups in order (each only shown when data exists):
### About (new)
- App ID, Generic name, Developer, License, Project group
### Description (new)
- Full multi-paragraph AppStream description
### Links (new)
- Homepage, Bug tracker, Source code, Documentation, Donate
- Each row clickable via `gtk::UriLauncher`
### Updates (existing, unchanged)
### Release History (new)
- Recent releases with version, date, description
- Uses `adw::ExpanderRow` for entries with descriptions
### Usage (existing, unchanged)
### Capabilities (new)
- Keywords, MIME types, Content rating, Desktop actions
### File Information (existing, extended)
- Add "Signature: Signed / Not signed" row
## Dependencies
Add to `Cargo.toml`:
```toml
quick-xml = "0.37"
```
## Files Modified
| File | Changes |
|------|---------|
| `Cargo.toml` | Add `quick-xml` |
| `src/core/mod.rs` | Add `pub mod appstream;` |
| `src/core/appstream.rs` | **New** - AppStream XML parser |
| `src/core/database.rs` | Migration v9, new columns, `update_appstream_metadata()` |
| `src/core/inspector.rs` | Extended desktop parsing, AppStream integration, signature detection |
| `src/core/analysis.rs` | Store new metadata fields |
| `src/ui/detail_view.rs` | Redesigned overview tab with all new groups |
## Verification
1. `cargo build` compiles without errors
2. AppImages with AppStream XML show full metadata (developer, license, URLs, releases)
3. AppImages without AppStream XML still show desktop entry fields (graceful degradation)
4. URL links open in browser
5. Release history is scrollable/expandable
6. Empty groups are hidden
7. Re-scanning an app picks up newly available metadata