PromptKart
A prompt engineering and evaluation suite with tracks, heats, and leaderboards for systematic LLM testing.
Note: This project is not yet available on GitHub.
Overview
PromptKart is a structured environment for prompt engineering that brings rigor to LLM experimentation. Instead of ad-hoc testing, it organizes evaluations into tracks and heats, running prompts across multiple providers and tracking performance over time.
Key Features
- Tracks & Heats: Organize experiments into themed tracks with multiple evaluation rounds.
- Multi-Provider Testing: Run the same prompts against OpenAI, Anthropic, Google, and local models simultaneously.
- Leaderboard System: Track which prompts and models perform best across different tasks.
- Chips System: Modular prompt components that can be mixed and matched.
- Karts: Configurable prompt templates with variable substitution.
- Docker Support: Full containerized deployment with docker-compose.
Technical Architecture
PromptKart uses a FastAPI backend for prompt execution and evaluation, with a React/Vite frontend for the interactive dashboard. SQLite stores evaluation history and leaderboard data.
Core components:
- Evaluation Engine: Executes prompts across providers with consistent metrics.
- Track Manager: Organizes experiments and manages heat progression.
- Metrics Collector: Gathers response quality, latency, and cost data.
- Leaderboard Service: Aggregates results and ranks performance.
Technology Stack
- Backend: Python, FastAPI, SQLAlchemy
- Frontend: React, TypeScript, Vite
- Database: SQLite for persistence
- Containerization: Docker and docker-compose
- LLM Integration: Multi-provider support via unified interface
Current Status
Active development with core evaluation and leaderboard features complete. The chips/karts system enables modular prompt construction. Currently expanding provider support and improving the analytics dashboard.
Have questions about PromptKart?
Try asking the AI assistant! Here are some ideas:
Related Projects
Medicine
A medication management system that helps users track doses, manage refills, and never miss critical medications.
ClawGuard
A security analysis platform for AI agent skills that scans for malware, prompt injection, and supply chain risks using multi-layer analysis.
SocialBacklog
A productivity app that transforms browser history into an actionable backlog of things to revisit.