Skip to the content.

Diff Edit Evaluation Database Schema

This document provides an overview of the SQLite database schema used for the diff edit evaluation suite. The database is designed to capture every aspect of the evaluation runs in a structured way, allowing for detailed, multi-dimensional analysis and ensuring full reproducibility of our findings.

Data Model Overview

The database is composed of several interconnected tables that work together to provide a comprehensive picture of each evaluation. The core of the model revolves around runs, cases, and results.

runs

A run represents a single, top-level execution of the evaluation script (e.g., one invocation of npm run diff-eval). It serves as the main container for a complete benchmark session.

cases

A case represents a single test scenario that is presented to a model. It corresponds to one of the JSON files in the cases/ directory and links that static definition to a specific benchmark run.

results

This is the most granular and important table in the database. A result represents the outcome of a single attempt by a specific model on a specific case.


Supporting Tables

The following tables store versioned, deduplicated content to ensure data integrity and efficiency.

system_prompts

processing_functions

files

The Bigger Picture

This relational schema provides a powerful foundation for sophisticated analysis. It moves beyond simple pass/fail metrics and allows us to explore the nuanced interactions between models, prompts, and the code they operate on. With this database, we can answer critical questions like:

Ultimately, this data model enables us to move from simply measuring performance to truly understanding it, providing the insights needed to build more capable and reliable AI engineering systems.


Viewing the Full Schema

To see the most up-to-date and detailed schema for the database, you can use the sqlite3 command-line tool. From the evals/diff-edits directory, run the following command:

sqlite3 evals.db .schema

This will print the complete CREATE TABLE statements for all tables in the database, providing a definitive reference for the database structure.