VisionTagger

A desktop app that uses local AI to automatically tag and rename images based on their content.

Note: This project is not yet available on GitHub.

Overview

VisionTagger is a desktop application that uses Ollama's vision models to analyze images and generate descriptive tags and filenames. Point it at a folder of images and it will examine each one, suggest tags, and offer to rename files based on their content—all running locally without cloud dependencies.

Key Features

Vision AI Analysis: Uses Ollama vision models to understand image content.
Automatic Tagging: Generates descriptive tags for each image.
Smart Renaming: Suggests filenames based on image content.
Batch Processing: Handle entire folders of images at once.
Local Processing: All AI runs locally via Ollama—no data leaves your machine.
Flet UI: Cross-platform desktop interface.

Technical Architecture

VisionTagger connects to a local Ollama instance running vision-capable models. It processes images through the vision API, extracts descriptions, and applies naming conventions based on the results.

Core components:

Ollama Client: Connects to local Ollama for vision model access.
Image Processor: Handles image loading and format conversion.
Tag Generator: Parses vision model output into structured tags.
Rename Engine: Applies naming rules based on generated tags.

Technology Stack

UI Framework: Flet for cross-platform desktop
AI Backend: Ollama with vision models (LLaVA, etc.)
Language: Python
Image Handling: Pillow for format support

Current Status

Prototype with core tagging working. Note: renaming is currently destructive without undo—use with caution on copies. Planning to add rename preview and undo support.

VisionTagger

Overview

Key Features

Technical Architecture

Technology Stack

Current Status

Have questions about VisionTagger?

Related Projects

Gloss

VisionForge

Agent Forge