Chapter 5 Statistical Analysis Plan (SAP) Development
Chapter Objectives
The Statistical Analysis Plan (SAP) is the definitive document that governs how clinical trial data are analyzed and interpreted.
This chapter provides a comprehensive and practical framework for developing a robust, regulator-ready SAP.
After completing this chapter, the reader should be able to:
- Define primary analysis methods with sufficient operational detail
- Specify models, covariates, and stratification handling clearly
- Plan secondary, exploratory, sensitivity, and subgroup analyses
- Define coherent multiplicity and missing data strategies
- Support interim analyses and Data Monitoring Committee (DMC) activities, if applicable
- Ensure all analyses are reproducible and implementable without post–database lock decisions
5.1 Role of the SAP in Clinical Trials
5.1.1 Why the SAP Is a Critical Document
From a statistical perspective, the SAP serves as:
- The binding interpretation of the protocol
- The operational blueprint for statistical programming
- The primary reference for regulatory review and inspection
Any analysis not prospectively specified in the SAP is vulnerable to being considered post hoc, regardless of scientific plausibility.
5.2 Definition of the Primary Analysis Method
5.2.1 Purpose of the Primary Analysis
The primary analysis directly addresses the primary estimand and supports the main study conclusion.
It must be defined in sufficient detail so that:
- Independent statisticians would implement the same analysis
- Results are reproducible
- No analytical discretion remains after database lock
5.2.2 Model Type Specification
The SAP must explicitly specify the statistical model, including:
- Model family (e.g., linear model, generalized linear model, Cox model)
- Link function, if applicable
- Distributional assumptions
Typical examples include:
- ANCOVA for continuous endpoints
- Logistic regression for binary endpoints
- Cox proportional hazards models for time-to-event endpoints
Any key model assumptions should be stated and, where appropriate, assessed.
5.2.3 Covariate Specification
The SAP should clearly define:
- Which covariates are included in the model
- Whether covariates are pre-specified or data-driven
- How covariates are coded (continuous or categorical)
Covariates typically include baseline measures of the endpoint or other strong prognostic factors identified during study design.
5.2.4 Handling of Stratification Factors
If stratified randomization was used, the SAP should specify:
- Whether stratification factors are included as covariates
- Whether stratified tests or stratified models are applied
- How sparse or empty strata are handled
Consistency between randomization and analysis strategies is essential.
5.3 Secondary and Exploratory Analyses
5.4 Multiplicity Control Strategy
5.5 Missing Data Handling Strategies
5.5.1 Importance of Pre-Specifying Missing Data Methods
Assumptions about missing data directly affect interpretation of treatment effects.
The SAP must prospectively specify missing data handling methods for each key analysis.
5.5.2 Commonly Used Methods
The SAP should clearly state when and how the following methods are applied:
- MMRM (Mixed Model for Repeated Measures)
- Multiple Imputation (MI)
- Last Observation Carried Forward (LOCF)
- Non-Responder Imputation (NRI)
Each method’s assumptions and limitations should be acknowledged.
5.6 Definition of Repeat Assessments and Visit Window Rules
5.7 Sensitivity Analysis Plan
5.8 Subgroup Analysis Definition
5.9 Interim Analysis and DMC Support (If Applicable)
5.11 Chapter Summary
The SAP transforms study objectives and protocol concepts into executable statistical analyses.
A high-quality SAP eliminates analytical ambiguity, ensures reproducibility, and provides a defensible basis for interpretation.
Careful, detailed, and prospective SAP development is one of the most critical responsibilities of the statistician in clinical research.