Pablo Zavala · AI Safety Evaluation · Research Engineering

Projects & Research

Public research, benchmarks, and applied systems across AI safety and evals, economics and policy, and systems building.

Project Index

Authority Calibration

Proof: Public pilot for agent authority use: disclosure observed in 14/14 and 12/12 suppression probes; self-demotion in 9/9 and 8/8 supplied-rule trials; 0 observed matched firebreak inversions; separate Codex probe found 6/19 assurance weakening
Evidence: Public reproducible repo
Limit: Pilot-scale evidence, not a rare-failure guarantee.

AI agents increasingly run pipelines, allocate resources, and coordinate other agents. This project tests whether they use delegated authority correctly: neither exceeding their mandate nor refusing authority they actually hold.

Safe MarketUniverses

Proof: 120 episodes; model confidence performed about as well as chance
Evidence: Public reproducible benchmark
Limit: The benchmark tests allocation in a compact finance-style environment, not every oversight setting.

When human review is scarce, can a model tell us which decisions deserve a person's attention? In this 120-episode benchmark, the preregistered answer was no: model confidence routed review about as well as chance.

RAG Evaluation Lab

Proof: 86.6 percent context precision after rewriting and reranking; exact match fell on the full 918-query split
Evidence: Public evaluation harness
Limit: Grounding improved while deterministic exact match declined, so the result is a tradeoff.

An evaluation harness comparing baseline and reranked retrieval-augmented generation pipelines with RAGAS and SQuAD metrics on the Mini Wikipedia corpus. The reranked pipeline reaches 86.6 percent context precision but loses exact match.

DonorsChoose Funding Risk

Proof: ROC AUC 0.757 on 185,000+ held-out classroom projects
Evidence: Public analysis repo
Limit: The model is a policy triage aid, not a deployment-ready funding decision system.

A model that flags DonorsChoose classroom requests most at risk of going unfunded, so limited reviewer attention can reach under-resourced schools first. The fairness audit reports unequal error rates across school poverty levels rather than presenting only an average score.

NUDG

Proof: Authorization, constraints, verification, and receipts for agent-run work
Evidence: Founder system, bounded public claims
Limit: Public visuals explain the system model; live product claims require separate proof packets.

NUDG is a CMU AI Venture Studio project for controlling how agents use real resources. It separates proposal, authorization, execution, verification, and audit records instead of giving agents broad access up front.

AI Investment Mapping

Proof: $10B+ mapped across 11 metros; Pittsburgh: $6.3B across 133 firms
Evidence: Restricted data, public aggregate summary
Limit: Company-level records and maps stay private; the public page shows aggregates and methods evidence.

A Block Center project mapping more than ten billion dollars in public and private AI investment across eleven metropolitan economies for regional AI-readiness research. The Pittsburgh slice covers 6.3 billion dollars across 133 firms.

AI Workforce Simulation

Proof: 14.3 percent vs 3.6 percent peak unemployment under paired policy regimes
Evidence: Public simulation repo
Limit: Mechanism demonstration in a small simulated labor market, not a macro forecast.

An agent-based NetLogo model of a small labor market adjusting to AI automation. With identical workers, geography, and random seed, peak unemployment reaches 14.3 percent under a tech-driven policy regime versus 3.6 percent under a human-centric one.

Heard.now

Proof: Prototype architecture for an auditable input store and privacy-preserving public extracts
Evidence: Private pilot, synthetic public artifact
Limit: Public visuals use synthetic text so community messages stay private.

An early civic-listening pilot with Professor Jordan Usdan of Heinz College. It separates raw community input from published output: input enters an auditable store, while public extracts pass privacy and integrity checks.

CMU Event Compass

Proof: Claude vision extracts flyer cards; static export runs with a sample board and no backend
Evidence: Live static demo
Limit: The demo proves the sample-board extraction workflow, not live campus coverage.

A prototype that turns a photo of a campus poster wall into structured, personalized listings. Claude vision extracts one listing per flyer, a deterministic in-browser ranker orders the results by chosen interests, and the app ships as a static export with no backend.

DemFlex

Proof: Hour-by-hour cashflow model with thermostat, solar, and battery portfolio search
Evidence: Request-only capstone artifact
Limit: The public artifact shows the function flow; full capstone materials are private.

A Streamlit planning tool for ERCOT demand response. It compares thermostat, solar, and battery portfolios hour by hour through benefit-cost cash-flow analysis.

Cybersecurity Anomaly Detection

Proof: Course report: Isolation Forests and UMAP over 8M+ kernel-level security events
Evidence: Coursework, request-only report
Limit: Detailed materials are available by request; the public page uses a compact evidence card.

A Carnegie Mellon coursework project using Isolation Forests and UMAP on the BETH kernel-level security-events dataset. The course report records 95 percent accuracy; detailed validation materials are available by request.