Pablo Zavala · AI Safety Evaluation · Research Engineering

Projects & Research

Public research, benchmarks, and applied systems across AI safety and evals, economics and policy, and systems building.

Project Index

Authority Calibration

Proof
Public pilot for agent authority use: disclosure observed in 14/14 and 12/12 suppression probes; self-demotion in 9/9 and 8/8 supplied-rule trials; 0 observed matched firebreak inversions; separate Codex probe found 6/19 assurance weakening
Evidence
Public reproducible repo
Limit
Pilot-scale evidence, not a rare-failure guarantee.

AI agents increasingly run pipelines, allocate resources, and coordinate other agents. This project tests whether they use delegated authority correctly: neither exceeding their mandate nor refusing authority they actually hold.

Safe MarketUniverses

Proof
120 episodes; model confidence performed about as well as chance
Evidence
Public reproducible benchmark
Limit
The benchmark tests allocation in a compact finance-style environment, not every oversight setting.

When human review is scarce, can a model tell us which decisions deserve a person's attention? In this 120-episode benchmark, the preregistered answer was no: model confidence routed review about as well as chance.

RAG Evaluation Lab

Proof
86.6 percent context precision after rewriting and reranking; exact match fell on the full 918-query split
Evidence
Public evaluation harness
Limit
Grounding improved while deterministic exact match declined, so the result is a tradeoff.

An evaluation harness comparing baseline and reranked retrieval-augmented generation pipelines with RAGAS and SQuAD metrics on the Mini Wikipedia corpus. The reranked pipeline reaches 86.6 percent context precision but loses exact match.

DonorsChoose Funding Risk

Proof
ROC AUC 0.757 on 185,000+ held-out classroom projects
Evidence
Public analysis repo
Limit
The model is a policy triage aid, not a deployment-ready funding decision system.

A model that flags DonorsChoose classroom requests most at risk of going unfunded, so limited reviewer attention can reach under-resourced schools first. The fairness audit reports unequal error rates across school poverty levels rather than presenting only an average score.

NUDG

Proof
Authorization, constraints, verification, and receipts for agent-run work
Evidence
Founder system, bounded public claims
Limit
Public visuals explain the system model; live product claims require separate proof packets.

NUDG is a CMU AI Venture Studio project for controlling how agents use real resources. It separates proposal, authorization, execution, verification, and audit records instead of giving agents broad access up front.

AI Investment Mapping

Proof
$10B+ mapped across 11 metros; Pittsburgh: $6.3B across 133 firms
Evidence
Restricted data, public aggregate summary
Limit
Company-level records and maps stay private; the public page shows aggregates and methods evidence.

A Block Center project mapping more than ten billion dollars in public and private AI investment across eleven metropolitan economies for regional AI-readiness research. The Pittsburgh slice covers 6.3 billion dollars across 133 firms.

AI Workforce Simulation

Proof
14.3 percent vs 3.6 percent peak unemployment under paired policy regimes
Evidence
Public simulation repo
Limit
Mechanism demonstration in a small simulated labor market, not a macro forecast.

An agent-based NetLogo model of a small labor market adjusting to AI automation. With identical workers, geography, and random seed, peak unemployment reaches 14.3 percent under a tech-driven policy regime versus 3.6 percent under a human-centric one.

Heard.now

Proof
Prototype architecture for an auditable input store and privacy-preserving public extracts
Evidence
Private pilot, synthetic public artifact
Limit
Public visuals use synthetic text so community messages stay private.

An early civic-listening pilot with Professor Jordan Usdan of Heinz College. It separates raw community input from published output: input enters an auditable store, while public extracts pass privacy and integrity checks.

CMU Event Compass

Proof
Claude vision extracts flyer cards; static export runs with a sample board and no backend
Evidence
Live static demo
Limit
The demo proves the sample-board extraction workflow, not live campus coverage.

A prototype that turns a photo of a campus poster wall into structured, personalized listings. Claude vision extracts one listing per flyer, a deterministic in-browser ranker orders the results by chosen interests, and the app ships as a static export with no backend.

DemFlex

Proof
Hour-by-hour cashflow model with thermostat, solar, and battery portfolio search
Evidence
Request-only capstone artifact
Limit
The public artifact shows the function flow; full capstone materials are private.

A Streamlit planning tool for ERCOT demand response. It compares thermostat, solar, and battery portfolios hour by hour through benefit-cost cash-flow analysis.

Cybersecurity Anomaly Detection

Proof
Course report: Isolation Forests and UMAP over 8M+ kernel-level security events
Evidence
Coursework, request-only report
Limit
Detailed materials are available by request; the public page uses a compact evidence card.

A Carnegie Mellon coursework project using Isolation Forests and UMAP on the BETH kernel-level security-events dataset. The course report records 95 percent accuracy; detailed validation materials are available by request.