# Modeling Strategy

## Training Data

Our training set consisted of phone bank data collected by People’s Action between June 24 and October 18, 2020. We gathered the data from ~14,000 conversations that occurred during this period of time from 8 states across the U.S. Since callers asked potential voters who they were planning to vote for prior to further conversations, we used this initial rating as our ground truth. The ratings are binned into 5 categories: favors Trump, leaning Trump, undecided, leaning Biden, and favors Biden. Those who are undecided or leaning are the conflicted voters that we would like to target for deep canvassing.

## Modeling Strategies

We utilized a variety of modeling strategies to provide inputs into DARTS. Having some diversity in our ensemble allows us to cast a wider net for finding voters that may be candidates for empathy-driven conversations. We devised four modeling strategies that could be used for identifying our target voters. Each approach takes a different view on the classification problem and uses different feature sets. However, due to circumstances in the timing, we were only able to use the first three strategies in production.

### Strategy 1

The first strategy utilized a binary classifier to predict the probability of one of two outcomes: whether someone would vote for Trump or Biden. This model strategy uses features that each party may feel strongly about, while undecided voters may have a moderate view, such as scores on immigration or healthcare. Using the model’s output probabilities, we targeted those in the middle of the probability range, the range likely to represent people who the model is uncertain about. Within this range, we hypothesized that we could find a higher proportion of undecided voters (in gray) compared to those who were decided.

### Strategy 2

In the second strategy, we trained two different binary classification models, one that predicted Biden supporters, and another that predicted Trump supporters. From these two models, we calculated the difference in probabilities for each voter. We targeted those with a low difference in the probabilities of supporting either candidate, as we believed we could find a higher proportion of undecided voters.

### Strategy 3

We also trained a model that classifies across three categories: Biden supporters, Trump supporters, and undecided. To combat the fact that undecided voters make up a small proportion of the population, the number of voters in the two decided classes were downsampled to match the number of undecided voters.

### Strategy 4

We also tried training a binary classification model on identifying undecided and leaning voters against strong decided voters. Since voters that are leaning could still be candidates for deep canvassing, we included them in the positive class along with undecided voters. This model strategy uses features that describe voters who may not be as politically involved, such as voter turnout on previous elections. For this model, we would target voters who have high probabilities of being conflicted.