All skills

Model Evaluation

/model-evaluationNEW
Model Engineering & Validation

What it does

Compute and report task-correct held-out metrics for a trained model — Dice + HD95/NSD, AUROC/AUPRC with bootstrap CIs, FROC/mAP, calibration, and subgroup slices — emitting a per-case table for analyze-stats. Numbers come only from executed code.

Highlights

  • Task-correct metrics with bootstrap CIs
  • Calibration + subgroup performance
  • Metrics Reloaded / CLAIM 2024 gate

Install this skill

git clone https://github.com/aperivue/medsci-skills.git
cp -r medsci-skills/skills/model-evaluation ~/.claude/skills/
Full documentationView source on GitHub

Related skills