# How To Calculate P-Value in 3 Steps (With an Example)

Most statistical tests begin by identifying a null hypothesis. The null hypothesis for the pattern analysis tools (Analyzing Patterns toolset and Mapping Clusters toolset) is Complete Spatial Randomness (CSR), either of the features themselves or of the values associated with those features. The z-scores and p-values returned by the pattern analysis tools tell you whether you can reject that null hypothesis or not. Often, you will run one of the pattern analysis tools, hoping that the z-score and p-value will indicate that you can reject the null hypothesis, because it would indicate that rather than a random pattern, your features (or the values associated with your features) exhibit statistically significant clustering or dispersion. Whenever you see spatial structure such as clustering in the landscape (or in your spatial data), you are seeing evidence of some underlying spatial processes at work, and as a geographer or GIS analyst, this is often what you are most interested in.

The p-value is a probability. For the pattern analysis tools, it is the probability that the observed spatial pattern was created by some random process. When the p-value is very small, it means it is very unlikely (small probability) that the observed spatial pattern is the result of random processes, so you can reject the null hypothesis. You might ask: How small is small enough? Good question. See the table and discussion below.

Very high or very low (negative) z-scores, associated with very small p-values, are found in the tails of the normal distribution. When you run a feature pattern analysis tool and it yields small p-values and either a very high or a very low z-score, this indicates it is unlikely that the observed spatial pattern reflects the theoretical random pattern represented by your null hypothesis (CSR).

To reject the null hypothesis, you must make a subjective judgment regarding the degree of risk you are willing to accept for being wrong (for falsely rejecting the null hypothesis). Consequently, before you run the spatial statistic, you select a confidence level. Typical confidence levels are 90, 95, or 99 percent. A confidence level of 99 percent would be the most conservative in this case, indicating that you are unwilling to reject the null hypothesis unless the probability that the pattern was created by random chance is really small (less than a 1 percent probability).

Consider an example. The critical z-score values when using a 95 percent confidence level are -1.96 and +1.96 standard deviations. The uncorrected p-value associated with a 95 percent confidence level is 0.05. If your z-score is between -1.96 and +1.96, your uncorrected p-value will be larger than 0.05, and you cannot reject your null hypothesis because the pattern exhibited could very likely be the result of random spatial processes. If the z-score falls outside that range (for example, -2.5 or +5.4 standard deviations), the observed spatial pattern is probably too unusual to be the result of random chance, and the p-value will be small to reflect this. In this case, it is possible to reject the null hypothesis and proceed with figuring out what might be causing the statistically significant spatial structure in your data.

A key idea here is that the values in the middle of the normal distribution (z-scores like 0.19 or -1.2, for example), represent the expected outcome. When the absolute value of the z-score is large and the probabilities are small (in the tails of the normal distribution), however, you are seeing something unusual and generally very interesting. For the Hot Spot Analysis tool, for example, unusual means either a statistically significant hot spot or a statistically significant cold spot.

The local spatial pattern analysis tools including Hot Spot Analysis and Cluster and Outlier Analysis Anselin Local Morans I provide an optional Boolean parameter Apply False Discovery Rate (FDR) Correction. When this parameter is checked, the False Discovery Rate (FDR) procedure will potentially reduce the critical p-value thresholds shown in the table above in order to account for multiple testing and spatial dependency. The reduction, if any, is a function of the number of input features and the neighborhood structure employed.

Local spatial pattern analysis tools work by considering each feature within the context of neighboring features and determining if the local pattern (a target feature and its neighbors) is statistically different from the global pattern (all features in the dataset). The z-score and p-value results associated with each feature determines if the difference is statistically significant or not. This analytical approach creates issues with both multiple testing and dependency.

Multiple Testing—With a confidence level of 95 percent, probability theory tells us that there are 5 out of 100 chances that a spatial pattern could appear structured (clustered or dispersed, for example) and could be associated with a statistically significant p-value, when in fact the underlying spatial processes promoting the pattern are truly random. We would falsely reject the CSR null hypothesis in these cases because of the statistically significant p-values. Five chances out of 100 seems quite conservative until you consider that local spatial statistics perform a test for every feature in the dataset. If there are 10,000 features, for example, we might expect as many as 500 false results.

Spatial Dependency—Features near to each other tend to be similar; more often than not spatial data exhibits this type of dependency. Nonetheless, many statistical tests require features to be independent. For local pattern analysis tools this is because spatial dependency can artificially inflate statistical significance. Spatial dependency is exacerbated with local pattern analysis tools because each feature is evaluated within the context of its neighbors, and features that are near each other will likely share many of the same neighbors. This overlap accentuates spatial dependency.

There are at least three approaches for dealing with both the multiple test and spatial dependency issues. The first approach is to ignore the problem on the basis that the individual test performed for each feature in the dataset should be considered in isolation. With this approach, however, it is very likely that some statistically significant results will be incorrect (appear to be statistically significant when in fact the underlying spatial processes are random). The second approach is to apply a classical multiple testing procedure such as the Bonferroni or Sidak corrections. These methods are typically too conservative, however. While they will greatly reduce the number of false positives they will also miss finding statistically significant results when they do exist. A third approach is to apply the FDR correction which estimates the number of false positives for a given confidence level and adjusts the critical p-value accordingly. For this method statistically significant p-values are ranked from smallest (strongest) to largest (weakest), and based on the false positive estimate, the weakest are removed from this list. The remaining features with statistically significant p-values are identified by the Gi_Bin or COType fields in the output feature class. While not perfect, empirical tests show this method performs much better than assuming that each local test is performed in isolation, or applying the traditional, overly conservative, multiple test methods. The additional resources section provides more information about the FDR correction.

## Uses for p-value

Statisticians, data analysts and businesses all use p-value to determine how far outside a data set a particular data point exists. This can be helpful for determining whether the data point is an effective metric for increasing production and profits for businesses, whether data is significant for data analysts and whether a data point is reasonable for other statistical measures. There are two types of p-value you can use:

## What is a p-value?

P-value is a statistical metric that represents the probability of an extreme result occurring. This result is at least as extreme as an observed result in a statistical hypothesis test by random chance, assuming the null hypothesis is correct. Hypothesis testing in statistics is a way to determine the significance of a particular data point or set. Below are definitions for different terms you can use to understand what p-value is:

P-value is a measurement that assumes the null hypothesis is correct, meaning that if the value is small, then you can reject the null hypothesis in favor of the alternative hypothesis. A large p-value typically means that the data point or set you measured aligns with the null hypothesis, making it the more likely outcome. P-value is a measurement that you can use in published research to allow readers to interpret the data themselves.

## How to calculate p-value

Below are steps you can use to help calculate the p-value for a data sample:

### 1. State the null and alternative hypotheses

The first step to calculating the p-value of a sample is to look at your data and create a null and alternative hypothesis. For example, you could state that a hypothesized mean “μ” is equal to 10 and because of this, the alternative hypothesis is that the hypothesized mean “μ” is not equal to 10. You can write these hypotheses as:

H0: μ = 10

H1: μ ≠ 10

In these hypotheses:

### 2. Use a t-test and its formula

Once you have determined what both of your hypotheses are, you can calculate the value of your test statistic “t” based on your data set. The formula to calculate this statistic is:

t = (x̄ – μ) / (s / √n)

In this formula:

Standard deviation in mathematics is a measure of the variation in a set of data. It can also help you understand how close to the mean a data point a sample is in comparison to other data points.

### 3. Use a t-distribution table to find the associated p-value

Once youve calculated the value of the test statistic “t,” you can find the associated p-value by referring to a t-distribution table, which you can find on the internet. There are three major significance values on a t-distribution table that p-value uses: 0.01, 0.05 and 0.1. These values measure how close a hypothesis is to a data set. To use the t-distribution table, you can choose which of the significance values you want your data to fall within. You can do this by taking your sample size “n,” and subtracting 1 from it. For example:

n = 10

10 − 1 = 9

Then you can use the significance value you chose to find the corresponding value in the table. If you have a single-tailed distribution, this number is the p-value of your data. If you have a two-tailed distribution, which is more common, then you can multiply this number by two to get your p-value.

## Example of calculating p-value

Below is an example of calculating p-value based on a known set of data:

Owen wants to know if the mean amount of rainfall for the month of August is nine inches. He finds data for the month of August last year and determines that the sample mean is eight inches, with a standard deviation of two inches. He decides to conduct a two-tailed t-test to find the p-value with a 0.01 level to determine if nine is the true mean of the data. He forms the following hypotheses:

After he creates his hypotheses, he calculates the absolute value, or “|t|,” of the test like this:

Using this t-value, he uses a t-distribution table to locate values based on his values of 0.01 and 2.78388. He uses a sample size of 31 since there are 31 days in August. He subtracts 1 from his sample size like this:

31 − 1 = 30

Then he reviews his “t” value of 2.78388, which falls between the levels 0.005 and 0.001 on a t-distribution table. He averages 0.005 and 0.001 to get a value of 0.003. With a two-tailed test, he can multiply this value by 2 to get 0.006, which is the p-value for this test. Since the p-value is less than the 0.01 level of significance, he rejects the null hypothesis he made and accepts his alternative hypothesis that the mean amount of rainfall for the month of August is not nine inches.