# Taking Salt with a Grain of Statistics

While investigating the differences in kosher salt brands, it was observed that two boxes of the same brand had surprisingly different densities.  But were those differences statistically significant?

Question:

Is the difference in salt densities between two boxes statistically significant?  Or more precisely: can we reject the null hypothesis that the salt densities in each box have the same population mean?

Equipment & Materials:

• kosher salt (two boxes of the same brand)
• high precision (0.1g) kitchen scale
• one tablespoon measuring spoon

Procedure:

1. Measure 1 level tablespoon of salt by pouring salt into the spoon and shaking it level.
2. Record the mass of the salt.
3. Repeat steps 1 & 2 eight* times for each type of salt.
4. Compute confidence intervals.

*Note: eight samples were used due to the amount of salt remaining in the old box.

Data:

The table below shows the mass of each tablespoon of salt as collected, and normalized to zero mean and unit standard deviation (a.k.a. z-score or standard score) for each box, which in this case is called a t-statistic since we’re working with samples.

 Old Box New Box Sample Mass (g) Mass Normalized Mass (g) Mass Normalized 1 18.0 -1.731 17.4 -0.045 2 19.2 0.224 17.3 -0.402 3 19.1 0.061 18.0 2.098 4 19.6 0.875 17.0 -1.473 5 18.5 -0.916 17.3 -0.402 6 20.0 1.527 17.4 -0.045 7 19.0 -0.102 17.5 0.313 8 19.1 0.061 17.4 -0.045 Average 19.063 0.000 17.413 0.000 Standard Deviation 0.614 1.000 0.280 1.000

Analysis:

Test for normality:

Before proceeding with computing confidence intervals for each population, it is necessary to establish that the error in the mass measurements is normally distributed.   This is accomplished using a normal quantile-quantile plot (a.k.a. normal q-q plot).

In a quantile-quantile plot, one axis is the normalized values, the other axis is the rank based z-score.  Here the z-score was computed using a spreadsheet and the normsinv function.

 Old Box New Box Normalized Value Rank Based z‑score Normalized Value Rank Based z‑score -1.731 -1.534 -1.473 -1.534 -0.916 -0.887 -0.402 -0.887 -0.102 -0.489 -0.402 -0.887 0.061 -0.157 -0.045 -0.157 0.061 -0.157 -0.045 -0.157 0.224 0.489 -0.045 -0.157 0.875 0.887 0.313 0.887 1.527 1.534 2.098 1.534

In a normal quantile-quantile plot, data that is normally distributed will form a straight line.

The above quantile-quantile plots are not quite as linear as expected. This is most likely due to the small number of samples as well as duplicate values resulting from the lack of precision in the scale.

Confidence Intervals:

To determine if there is a statistically significant difference in the densities of the salt between the two boxes Student’s t-test is used.

The t-test computes confidence intervals for each box.  For the old box the 95% confidence interval is 18.651g to 19.474g, similarly, the 95% confidence interval for the new box is 17.225g to 17.600g.  The graph below shows the average mass per tablespoon for each box with error bars showing the respective confidence intervals. The average mass per tablespoon in each box with error bars indicating the 95% confidence interval.

Since the confidence intervals do not overlap, we can reject the null hypothesis with 95% confidence.  Therefore we can say that the difference between the boxes is statistically significant.

Conclusion:

We have previously demonstrated that between brands of kosher salt there can be a huge difference in the density of the salt, and that one should be leery of any recipe that calls for a specific volume of kosher salt without giving a mass.  Here we further demonstrated that even within a brand, it is possible for the difference between two boxes to be statistically significant.  Therefore it is always a good idea to measure salt by mass, even when a recipe calls for a specific brand.