Recommended Workflow

  1. Choose the data source. Upload a workbook or use the default actual workbook already in the project folder.
  2. Click Load Population so the app can read the data, detect fields, and populate country and age-group filters.
  3. Set the monitoring period, filters, energy basis, sampling methods, stratification variables, confidence level, precision target, sample fractions, and iteration count.
  4. Click Run Simulation to estimate how well each tested sample size reproduces the full-population census total.
  5. Use Decision Summary, charts, and exports to support the recommended monitoring sample size.

Key Inputs

Monitoring period: defines the period for observed energy estimation. For the default prorated method, energy is counted only where the device was delivered before the period ends and the data are observed before the refresh cutoff.

Energy basis: defaults to prorated monitoring-period energy, calculated from average kWh per day multiplied by overlap days. The app does not extrapolate energy beyond the dataset refresh date.

Age group / vintage: is calculated from delivery date to the Age group as of date. The app limits this date to the dataset refresh date so age is not projected beyond the available data.

Stratify by: allows country, region, age group, or combinations of these variables. Stratification is most useful when groups have materially different energy-use patterns.

Precision target: is the acceptable relative error threshold, such as +/-5% of the true census total.

Iterations: control how many repeated samples are drawn. More iterations produce more stable simulation results but take longer.

How to Read the Results

Bias: shows whether sampled estimates are systematically high or low. Values close to zero indicate unbiased estimation.

Relative RMSE: summarizes typical percentage estimation error across simulations. Lower values mean more precise estimates.

Coverage: shows how often confidence intervals contain the true census total. At 90% confidence, coverage should be close to 90%.

Success rate: shows how often sampled estimates fall within the selected precision target.

Average relative precision: is the average confidence-interval half-width as a percentage of the estimate.

Cost savings: compares the sampled monitoring cost with 100% monitoring under the selected cost assumptions.

Sparse stratification warning: appears when the selected stratification creates strata with too few sampled units. In that case, consider larger sample fractions or fewer stratification variables before relying on stratified precision.

Decision Rule

The recommended sample size is the smallest tested design that meets the precision target, achieves the required success rate, and has empirical coverage close to the selected confidence level.



              











            

Energy Basis

Default prorated monitoring-period energy, capped at data refresh:

$$observed\_end_i = \min(refresh\_date_i, monitoring\_period\_end)$$ $$overlap\_start_i = \max(delivery\_date_i, monitoring\_period\_start)$$ $$overlap\_days_i = \max(0, observed\_end_i - overlap\_start_i + 1)$$ $$y_i = avg\_kwh\_per\_day_i \times overlap\_days_i$$

Core Estimators

Simple random sampling total estimator:

$$\hat{T} = N\bar{y}_s$$

Finite-population standard error for SRS:

$$SE(\hat{T}) = N\sqrt{\left(1 - \frac{n}{N}\right)\frac{s_s^2}{n}}$$

Stratified total estimator:

$$\hat{T}_{strat} = \sum_h N_h\bar{y}_h$$

Stratified variance estimator:

$$\widehat{Var}(\hat{T}_{strat}) = \sum_h N_h^2\left(1 - \frac{n_h}{N_h}\right)\frac{s_h^2}{n_h}$$

Neyman allocation:

$$n_h = n\frac{N_hS_h}{\sum_h N_hS_h}$$

Monte Carlo Metrics

Relative bias:

$$100\times\frac{E(\hat{T}) - T}{T}$$

Relative RMSE:

$$100\times\frac{\sqrt{E[(\hat{T} - T)^2]}}{T}$$

Coverage probability:

$$100\times P(L \leq T \leq U)$$

Success rate within target precision p:

$$100\times P\left(\frac{|\hat{T} - T|}{T} \leq p\right)$$

Conservative lower confidence bound:

$$LCB = \hat{T} - t_{1-\alpha/2,df}\times SE(\hat{T})$$