Dataset Versioning
/version-datasetNEWWhat it does
Dataset version control for reproducibility. Builds a deterministic content-hash manifest (file SHA-256 + schema + per-column value hashes), verifies later copies to detect drift, and diffs two manifests — proof an analysis ran on the intended data.
Highlights
- ✓Content-hash manifest (SHA-256 + schema)
- ✓Drift detection: schema / rows / values
- ✓Reproducibility-lock datasets & demos
Install this skill
git clone https://github.com/aperivue/medsci-skills.git
cp -r medsci-skills/skills/version-dataset ~/.claude/skills/Related skills
Study Design/design-study
Identifies analysis unit, cohort logic, data leakage risks, and validation strategy.
Sample Size Calculator/calc-sample-sizeInteractive sample size calculator with decision-tree guided test selection. Covers 11 designs including Cox regression EPV.
Data Cleaning/clean-dataStandardize, validate, and transform raw research datasets. Handles missing data, outlier detection, and variable recoding.
De-identification/deidentifyDe-identify clinical research data before LLM-assisted analysis. Standalone Python CLI with 10 country locale packs. No LLM involved.