Open source · Runtime Intelligence Platformv0.1 available on PyPI

Runtime Intelligence
for AI Agents

Critiqor evaluates observable runtime behaviour rather than relying on agent self-reporting. Capture runtime evidence. Generate explainable diagnoses. Improve agent reliability.

Install Critiqor Read Documentation

terminal

$ pip install critiqor

Evidence-backedRuntime-observedDeveloper-firstApache 2.0 licensed

live observation pipeline

run_004

Developer
CLI
step 01
AI Agent
OpenClaw
step 02
Runtime Events
observed
step 03
Evidence Collection
tool calls · outputs
step 04
Diagnosis Engine
explainable
step 05
Interactive Dashboard
verdict · timeline
step 06

healthyverdictReady For Runtime

trust 100

Why Critiqor

Traditional evals judge the answer.
Critiqor watches the work.

Most evaluation frameworks score the final response. Critiqor instead observes the agent during execution — recording tool calls, tool outputs, runtime events, reasoning flow, execution efficiency and evidence utilisation.

Every diagnosis is backed by observable execution evidence — not a model's opinion about itself.

evidence types

100s

observed events / run

self-report

100%

explainability

Traditional Evaluation

legacy

Answer-only scoring

Final response only
Self-reported reasoning
Limited explainability
No runtime visibility

Critiqor

runtime

Evidence-backed diagnosis

Runtime evidence
Observable execution
Explainable diagnosis
Root cause analysis
Historical intelligence

Platform

Everything you need to trust your agents.

Six capabilities working together — from the moment your agent boots to the final diagnosis.

Runtime Observation

Observe agents during execution — not after. Critiqor attaches before launch and follows every event end-to-end.

Evidence Collection

Capture runtime events, tool calls, tool outputs, provider requests and rich execution metadata into structured artifacts.

Diagnosis Engine

Convert raw runtime evidence into explainable reports — verdicts, confidence, trust score and reasoning summaries.

Root Cause Analysis

Identify failures, ignored outputs, loops and retrieval gaps. Each issue links back to the underlying evidence.

Interactive Dashboard

Executive summaries, runtime timelines, causal graphs and historical evaluations — local-first, no data leaves your machine.

Coming Soon

Benchmarking & Leaderboards

Compare agent reliability with private, anonymous and public visibility. Opt-in benchmarks for teams and communities.

Architecture

How Critiqor works.

A nine-stage pipeline turning raw agent execution into evidence, diagnosis and recommendations — fully local by default.

User

developer

OpenClaw

AI agent

Critiqor Plugin

openclaw integration

Runtime Events

observed signal

Session File

session.json

Diagnosis Engine

causal analysis

Diagnosis File

diagnosis.json

Dashboard

local-first

Recommendations

actionable

Local-first

Runs entirely on your machine

Artifact-based

session.json + diagnosis.json

Explainable

Every claim references evidence

Workflow

Developer Workflow.

Four commands from install to insight. Local. Reproducible. Friction-free.

step 01

Install Critiqor

One pip install. Zero infrastructure. No accounts, keys or cloud services required.

terminal

$ pip install critiqor

step 02

Launch under observation

Critiqor launches OpenClaw and immediately begins observing runtime activity.

terminal

$ critiqor monitor openclaw

step 03

Use OpenClaw normally

Work as you always do. Critiqor observes silently in the background — zero changes to your agent code.

terminal

› openclaw run …  # business as usual

step 04

Finalize the session

Critiqor finalizes the observation session, generates the diagnosis and automatically opens the local dashboard.

terminal

$ critiqor finalize

Dashboard

Read the evidence. Trust the verdict.

A local-first interface designed for engineers — fast, dense, explainable.

localhost:5173 / dashboard

Dashboard

local diagnosis

Live reliability intelligence for OpenClaw agents.

Explore Dashboard

healthy · run_004 · openclaw_agent

Runtime evidence captured

No OpenClaw failure mode was detected from runtime evidence.

100trust

confidence

95%

Critiqor certainty

Executive Summary

trust100/100

confidence95%

verdictReady For Runtime

Runtime Timeline

events7

duration70.4s

tools0

Diagnosis Artifact

diagnosis.jsonrun_004

session.jsoncaptured

events7

Dashboard

active runs4

diagnoses0 critical

trust impact-0 pts

Recent runsView all →

openclaw_agent runtime run
run_004 · openclaw
passed
openclaw_agent runtime run
run_003 · openclaw
passed
openclaw_agent runtime run
run_002 · openclaw
passed
openclaw_agent runtime run
run_001 · openclaw
passed

Primary diagnosesInvestigate →

No failure causes detected yet. Diagnoses appear after runtime evidence is finalized.

Get started in 30 seconds

Install Critiqor. Observe everything.

One pip install, three commands. No accounts. No cloud. Just runtime evidence.

install

$ pip install critiqor

observe

$ critiqor monitor openclaw

diagnose

$ critiqor finalize

GitHub Documentation View on PyPI

Roadmap

Built in the open, with you in the loop.

Transparent roadmap, public issues, and fast iteration on real agent failures.

Shipped

Runtime Observation
Interactive Dashboard
OpenClaw Integration
Diagnosis Engine
Root Cause Analysis

In Progress

Improved benchmarking — compare agents across runs and environments.
Dashboard enhancements — denser evidence views for faster debugging.

Planned

Additional agent frameworks — extend beyond OpenClaw while keeping local-first.
Community leaderboards — opt-in reliability benchmarks for teams and OSS.
Enterprise dashboard — multi-tenant observability for AI ops teams.

Track progress on GitHub Issues , propose a feature on GitHub , or follow the full Roadmap.

Documentation

Read the docs.

Every surface of Critiqor documented for engineers — concise, complete, runnable.

Stay in the loop.

Docs, source, plugins and the conversation around runtime intelligence.

Questions, answered.

Everything developers ask before adopting Critiqor.

Runtime Intelligence
for AI Agents

Traditional evals judge the answer.
Critiqor watches the work.

Answer-only scoring

Evidence-backed diagnosis

Everything you need to trust your agents.

Runtime Observation

Evidence Collection

Diagnosis Engine

Root Cause Analysis

Interactive Dashboard

Benchmarking & Leaderboards

How Critiqor works.

Developer Workflow.

Install Critiqor

Launch under observation

Use OpenClaw normally

Finalize the session

Read the evidence. Trust the verdict.

Dashboard

Runtime evidence captured

Install Critiqor. Observe everything.

Built in the open, with you in the loop.

Read the docs.

Getting Started

Installation

CLI Commands

Architecture

Evidence Collection

Dashboard

Benchmarks (Coming Soon)Soon

API Reference

FAQ

Stay in the loop.

Docs

ClawHub

GitHub

X

YouTube

PyPI

Questions, answered.

Runtime Intelligencefor AI Agents

Traditional evals judge the answer.Critiqor watches the work.

Answer-only scoring

Evidence-backed diagnosis

Everything you need to trust your agents.

Runtime Observation

Evidence Collection

Diagnosis Engine

Root Cause Analysis

Interactive Dashboard

Benchmarking & Leaderboards

How Critiqor works.

Developer Workflow.

Install Critiqor

Launch under observation

Use OpenClaw normally

Finalize the session

Read the evidence. Trust the verdict.

Dashboard

Runtime evidence captured

Install Critiqor. Observe everything.

Built in the open, with you in the loop.

Read the docs.

Getting Started

Installation

CLI Commands

Architecture

Evidence Collection

Dashboard

Benchmarks (Coming Soon)Soon

API Reference

FAQ

Stay in the loop.

Docs

ClawHub

GitHub

X

YouTube

PyPI

Questions, answered.

What is Critiqor?

How does Critiqor observe AI agents?

Does Critiqor access my prompts?

Which agent frameworks are supported?

How does the diagnosis work?

What evidence does Critiqor collect?

Is Critiqor open source?

How do I contribute?

Will Critiqor support Claude Code and Hermes?

How do leaderboards work?

Runtime Intelligence
for AI Agents

Traditional evals judge the answer.
Critiqor watches the work.