# Stop and Frisk: Hypothesis Testing

## Introduction

From New York Civil Liberties Union

Every time a police officer stops a person in NYC, the officer is supposed to fill out a form recording the details of the stop. The forms were filled out by hand and manually entered into an NYPD database until 2017, when the forms became electronic. The annual database includes nearly all of the data recorded by the police officer after a stop such as the age of the person stopped, if a person was frisked, if there was a weapon or firearm recovered, if physical force was used, and the exact location of the stop within the precinct.

In this post, I will be performing an exploratory data analysis of the stop-and-frisk dataset provided by the NYPD on the New York Civil Liberties Union website. The data contains 11008 rows and 83 variables. Then, I will use hypothesis testing with the `infer` package to assess if there is a statistical difference in police action when approaching a black New Yorker vs a non-black New Yorker.

Code is available here

## Exploratory Data Analysis • This seems like a clear example of the officers just rounding to the nearest 5 as its unlikely that the actual stop durations are always multiples of 5.  ## Understanding Police Stops with Maps  • In this map, I’m using a couple tricks I learned from David Robinson’s screencasts. First, in order to better show the higher values of `stop_duration_minutes`, I arrange in increasing order of `stop_duration_minutes`. This allows ggplot to plot the lower values (lighter colors) first and then plot the higher values (redder colors) next, allowing me to emphasize regions with long stop duration in redder colors. Second, the histogram shows that the stop duration is skewed to the right- according to David, “you can go a little bit below the median, but way above”. As a result, I transform the scale using the `trans` argument and choose the midpoint to be `log10(median(value))`. This way, the color scale gives more meaning to the data.   • We can see that suspects are predominantly white or black, with apparently more black suspects than white suspects.

## Racial discrimination in frisking/searching suspects

I will perform hypothesis testing to find whether police engage in an action, specified by each of these columns, more on black suspects than non-black suspects in a statistically significant manner. Specfically, I will use the `infer` package, following the steps layed out in Modern Dive Chapter 9 • This facetted plot shows that the police frisk, handcuff, use restraint and verbal instruction on black suspects more than non-black suspects

### Using `infer` for hypothesis testing

• Let’s first create the processed version of `sqf` by taking out `(null)` inside `suspect_race_description` and lumping `suspect_race_description` into two groups of race: `BLACK` and `NON-BLACK`.
``````sqf_testing <- sqf %>%
filter(suspect_race_description != "(null)") %>%
mutate(suspect_race_description = if_else(suspect_race_description == "BLACK HISPANIC", "BLACK",
suspect_race_description)) %>%
mutate(suspect_race_description = if_else(suspect_race_description != "BLACK", "NON-BLACK",
suspect_race_description))``````

• Then, we use `specify` to formulate the response and explanatory variables. Let’s use `frisked_flag` as the response variable and obviously, `suspect_race_description` as the explanatory variable. The argument `success = "Y"` reveals that we are interested in the proportion of “Y” in the `frisked_flag` column.
``````sqf_testing %>%
specify(formula = frisked_flag ~ suspect_race_description, success = "Y") ``````

• Now, we set meta data required for hyphotesis testing where we set `null = "independence"` for two sample (`BLACK` and `NON-BLACK`) hypothesis testing.
``````sqf_testing %>%
specify(formula = frisked_flag ~ suspect_race_description, success = "Y") %>%
hypothesize(null = "independence") ``````

• Here, we `generate` replicates of “shuffled” datasets assuming the null hypothesis is true.
``````sqf_testing %>%
specify(formula = frisked_flag ~ suspect_race_description, success = "Y") %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute")``````

• Now, we `calculate` the appropriate summary statistic for each of our 1000 shuffles, called the test statistic. According to Modern Dive, “…since the unknown population parameter of interest is the difference in population proportions, the test statistic of interest here is the difference in sample proportions” We have 1000 values of `stat` and we assign this dataframe to `null_distribution`
``````null_distribution <- sqf_testing %>%
specify(formula = frisked_flag ~ suspect_race_description, success = "Y") %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in props", order = c("BLACK", "NON-BLACK"))``````

• Here, we calculate the observed difference in proportion between blacks and non-blacks by using the same code as above except we remove `hypothesize` and `generate`
``````obs_diff_prop <- sqf_testing %>%
specify(formula = frisked_flag ~ suspect_race_description, success = "Y") %>%
calculate(stat = "diff in props", order = c("BLACK", "NON-BLACK"))``````

• Now, we visualize the `null_distribution` (values of the difference in proportions assuming that there is no racial discrimination) with a histogram and then “add what happened in real-life” with a red line and shades. Here, the shaded region is the p-value:

A p-value is the probability of obtaining a test statistic just as or more extreme than the observed test statistic assuming the null hypothesis is true.

``````visualize(null_distribution, bins = 10) +
shade_p_value(obs_stat = obs_diff_prop, direction = "right")`````` • In the above graph, we clearly see that the p-value is extremely low. We calculate the exact p-value (“fraction that the null distribution is shaded”) with `get_p_value`
``````null_distribution %>%
get_p_value(obs_stat = obs_diff_prop, direction = "right")``````
``````## # A tibble: 1 x 1
##   p_value
##     <dbl>
## 1       0``````

• Below, we create a wrapper function that does all the steps we just covered so that we can iterate this hypothesis testing on all our desired columns.
``````# Wrapper Function
run_hypothesis_test <- function(response_var) {

f <- as.formula(
paste(response_var,
"suspect_race_description",
sep = " ~ "))

null_distribution <- sqf_testing  %>%
specify(formula = f, success = "Y") %>%
# "independence" for hypotheses involving two samples
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in props", order = c("BLACK", "NON-BLACK"))

obs_diff_prop <- sqf_testing %>%
specify(formula = f, success = "Y") %>%
calculate(stat = "diff in props", order = c("BLACK", "NON-BLACK"))

# Calculate P-value
null_distribution %>%
get_p_value(obs_stat = obs_diff_prop, direction = "right") # Because P(BLACK) > P(NON-BLACK)
}

# Find desired variables
testing_names <- sqf %>%
select(contains("physical"), contains("frisked"), contains("searched")) %>%
names()

results <- testing_names %>%
map_df(~(data.frame(p_value = run_hypothesis_test(.x))),
.id = "response_variable")`````` All response variables have p-values that are bigger than 0.05 except for `physical_force_restraint_used_flag` and `frisked_flag`. This signifies that we reject the null hypothesis for these variables and conclude that there is evidence to suggest that police frisk and use physical force on blacks suspects more than non-black suspects. ##### Howard Baek
###### Biostatistics Master’s student

My email is howardba@uw.edu