Inferring Voting Demographics of the 2018 Texas Senate Elections

The 2018 Texas Senate race between Cruz and O'Rourke was, to me, interesting due to the large contrast between the two candidates and what they represented. For a state as populous as Texas, such an election also likely serves as a barometer for the general alignment of beliefs within the United States of America. According to election exit polls, the majority of white women voted for Cruz. Given the symbolic meaning and what the votes1 represent (e.g. stance on human rights, equal treatment etc.), many progressive groups called white women to task (in particular, the seeming contradiction of their alignment with a platform with a high concentration of misogyny).

Personally, I found such statements to be painting with too broad a brush and became interested in what happens when age information is also incorporated. The problem with polls however, is that certain combinations of interest are often left out. In the CNN polls, gender + race data were given, as well as age, separately, but not the combination of all. Is it possible then, to guess at a likely breakdown given the provided information?


1While I am strongly against what the republican platform represents, I do not think automatically attacking those who support that platform will do any good. The split is nearly half among the majority and shaming really only works when the gradient is already geared towards a consensus. I know that many I agree with on most things will not agree that a less antagonistic approach is the only choice when the full reality of things is considered. But given the size of the current divide and regardless the ideal, vilifying will only proceed, as has been, in fortifying differences and widening margins. I am more despondent, confused and a bit frightened than I am angry. Another important aspect is given the high geographic clustering of beliefs and the rarity of exceptions, one would risk tackling the issues at a wrong level while replicating similarly problematic attitudes on a different axis. Unfortunately, I've not anywhere spelled my beliefs out in detail but, I have shared aspects of it on twitter.

Age Distribution

The below is from the CNN exit polls, on age:

O'Rourke Voters Ages

Given that someone voted for O'Rourke, what age group do they fall under? Because of the differing rates at which age groups vote, the voter distribution across age ranges given our condition is nearly even, with most of his voters in the 45-64 age range.

Race and Gender

The conditional percentages are given on the CNN site but what is the full breakdown of who voted for whom? Although most white women did not vote for O'Rourke, most of the people who voted for O'Rourke were white women.

Incorporating Age Information

What remains now, is to combine all these attributes. As the voter poll data on race x gender x age is not given, how might we infer a likely breakdown? One way would be to place a prior on all valid combinations and then use bayesian probability to constrain possibilities to match what we know. There are 56 total combinations; this means a 56 dimensional vector on our dirichlet prior. Problem now is, with such a large search space, things are out of reach of our discrete probability tools. However, we have some tricks. First, we can try an alignment hack by constraining the age and race votes to agree. This yields sensible results compared to not constraining. It pushes down the (18-29, Cruz) and (65+, O'Rourke) votes as well as redistributes the percentages in an intuitively realistic direction. Focusing on just white women, it's clear that below 30, the majority of voters shift away from Cruz.

In [9]:
Model(cont {
          let! raceXgender = raceDistr()
          let! vote = voteGivenRace raceXgender
          let! age = ageDistr()
          let! votea = voteGivenAge age
          do! constrain(vote = votea)
          return (raceXgender,age,vote)
      }).Reify()
|> List.sortByDescending fst
|> normalize
|> List.map(keepRight(round 2))
|> List.filter(fst >> (<) 0.)
|> List.map toProbVal
|> Util.Table
Out[9]:
ItemProbability
(white men, 45-64, Cruz)0.08
(white women, 45-64, Cruz)0.07
(white men, 65+, Cruz)0.06
(white women, 65+, Cruz)0.06
(white men, 30-44, Cruz)0.04
(white women, 30-44, Cruz)0.04
(white women, 45-64, O'Rourke)0.03
(white women, 30-44, O'Rourke)0.03
(white women, 18-29, O'Rourke)0.03
(white women, 65+, O'Rourke)0.03
(latino women, 45-64, O'Rourke)0.02
(white men, 45-64, O'Rourke)0.02
(latino men, 45-64, O'Rourke)0.02
(latino women, 30-44, O'Rourke)0.02
(latino women, 18-29, O'Rourke)0.02
(latino women, 65+, O'Rourke)0.02
(black women, 45-64, O'Rourke)0.02
(white men, 18-29, Cruz)0.02
(white men, 30-44, O'Rourke)0.02
(latino men, 45-64, Cruz)0.02
(latino men, 30-44, O'Rourke)0.02
(latino women, 45-64, Cruz)0.02
(white men, 18-29, O'Rourke)0.02
(white men, 65+, O'Rourke)0.02
(latino men, 18-29, O'Rourke)0.02
(latino men, 65+, O'Rourke)0.02
(white women, 18-29, Cruz)0.02
(black women, 30-44, O'Rourke)0.02
(black women, 18-29, O'Rourke)0.02
(black women, 65+, O'Rourke)0.02
(latino men, 65+, Cruz)0.01
(black men, 45-64, O'Rourke)0.01
(latino women, 65+, Cruz)0.01
(others, 45-64, Cruz)0.01
(black men, 30-44, O'Rourke)0.01
(black men, 18-29, O'Rourke)0.01
(black men, 65+, O'Rourke)0.01
(others, 65+, Cruz)0.01
(latino men, 30-44, Cruz)0.01
(latino women, 30-44, Cruz)0.01
(others, 45-64, O'Rourke)0.01
(others, 30-44, Cruz)0.01
(others, 30-44, O'Rourke)0.01
(others, 18-29, O'Rourke)0.01
(others, 65+, O'Rourke)0.01

Inferring the Distribution

Furthermore, we can now use this as a starting point for a prior in a more bayesian approach. In order to match the data we project our model to the subspaces where we have data (age,vote) and (race,sex,vote) and make sure that our model agrees with the data in that subspace. The original constraint already leads to good agreement and in fact, the bayesian approach is a little off in comparison. We do get a possible distribution over vote outcomes in return.

In [11]:
let priorWeightsPair =
    Model(cont {
              let! raceXgender = raceDistr()
              let! vote = voteGivenRace raceXgender
              let! age = ageDistr()
              let! votea = voteGivenAge age
              do! constrain(vote = votea)
              return (raceXgender,age,vote)
          }).Reify()
    |> List.filter(snd
                   >> valueExtract
                   >> third
                   >> Strings.strcontains "No Answer"
                   >> not)
    |> normalize
    |> List.map(keepLeft valueExtract)

let priorWeights = List.map fst priorWeightsPair
let labels = List.map snd priorWeightsPair

let raceXgenderXageModel p =
    cont {
        let distr = List.zip p labels
        let! (rg,a,v) = categorical distr
        return (rg,a,v)
    }

let raceXgenderModelMap = mkProbabilityMap raceXgenderModel |> Map.toArray
let agepmap = mkProbabilityMap agemodel |> Map.toArray
let ageps = Array.map snd agepmap
let rgps = Array.map snd raceXgenderModelMap
In [13]:
//HIDEOUTPUT
let priorWeights' = List.map (( * ) 2400. >> round 0) priorWeights
let prior = dist {let! p = dirichlet priorWeights'
                  return (Array.map (round 3) p)} 

let post =
    computePosterior (fun p _ -> 
        let ragm =
            exact_reify(raceXgenderXageModel(Array.toList p)) 
            |> mkProbabilityMap
        
        let agepsE =
            [|for ((a,v),_) in agepmap -> 
                  probabilityOf (fun (_,a',v') -> v = v' && a = a') ragm|]
        
        let rgpsE =
            [|for ((r,v),_) in raceXgenderModelMap -> 
                  probabilityOf (fun (r',_,v') -> v = v' && r = r') ragm|]
        
        let score1 = Array.manhattanDist agepsE ageps
        let score2 = Array.manhattanDist rgps rgpsE
        prob(logisticRange 1. 0.001 (score1 + score2))) prior [true]
 
let smcsamples = Sampling.computeSamplesSMC 300 60 30 post
let csamples = Grouping.compactMapSamples id smcsamples

The display output of the distributions below is not ideal. If this were not more than idle curiosity, I might have spent more time on it, but it'll suit our purposes. The numbers on the Y axis represent how likely a number on the X axis is and numbers on the X axis represent % of vote contributed. So you can read it as e.g. 2%-3% of voters were white men aged 18-29 who voted for Cruz. This is a bit of a mindful but the essence of the charts is, the more overlap there is, the more divided this group was. The less overlap, the more consensus in that group. So the numbers for Black women (of any age) who voted for Cruz is highly concentrated around 0 showing a high amount of consensus. On the other hand, according to the model, consensus around latino women 45+ is less strong.

Summary of the Patterns

In general, the younger the person, the more likely it was that they voted for O'Rourke (at least according to the exit polls). And according to our model, white women younger than 30 tended to vote for O'Rourke while those older tended to vote for Cruz. And excepting black people, the older the person, the more likely it was that they voted for Cruz. This includes those who identify as Latino or "Other" (Asian?). Interestingly, black women were most consistent across age groups in voting for O'Rourke (but there simply were not that many when absolute count is considered). White men aged 18-29, Latino men 45+ and "Others" 30-44 were most divided in their votes.

Conclusion

In this analysis, I have attempted to approximate age X gender X race vote combinations by constraining the inferred model's projections to agree with the known data. From this, we could estimate that the majority of white women younger than 30 likely voted for O'Rourke, as well as that older Latino and "Others" at surprisingly high rates, of the non-whites, voted for Cruz. It seems that the real admonishment that should be given is for young people to go out and vote more!

Reality of Things

However, I view the true meaning of these votes as more symbolic and an estimate of prevailing beliefs. As far as real change goes, it is doubtful to me that O'Rourke could have accomplished anything against entrenched (corporate) interests and existing government bureaucracies. More realistic than looking at a single senator or two, it will be interesting to see if the democratic house majority amounts to any meaningful change (indeed, Alexandria Ocasio-Cortez seems even more impressive from an idealistic perspective and there are many other firsts from under-represented groups but it remains to be seen if anything really, will come of this).

Appendix

I also optimized over possible settings for our categorical using simulated annealing. It eventually found an outcome that's a teeny bit better than the result arrived at by constraints but not visibly so.

In [ ]:
let annealer =
    SimulatedAnnealing.SimulatedAnnealer
        (0.005,score,priorWeights.Length,9000.,0.98,transform = (snd >> max 0.))
        
let top = annealer.OptimizeVector 290000 (List.toArray priorWeights)

The below compares the differences between the race x gender and age projections to actual data of our default (constrained) model.

([|-0.0003004815146; 0.001014095264; -0.0001031064021; 0.0007; 0.001339909003;
   -0.001307978358; 0.0012; 0.001515033888; -0.001302086563; 0.001747176178;
   -0.0009544706934; 0.0005620287005; -0.005726824161; 0.001539469919;
   -0.005125861131; 0.0029; 0.002303095871|],
 [|0.0008782324009; -0.0003806571889; 0.002046054365; -0.0046; -0.0003930553544;
   0.003732487704; -0.0034; -0.0004671092618; 0.002964034353; -0.0003799870178|])

The below compares the result of projecting the average of our posterior distribution on possible parameters for the categorical and comparing to actual outcomes. The result in both cases is a good match.

([|-0.0002229622609; 0.001378803687; -0.0001568708901; 0.0007; 0.001076856129;
   -0.001447335805; 0.0012; 0.001802678956; -0.001284198136; 0.001881710776;
   -0.00099486382; 0.000444004571; -0.005903942773; 0.001232734282;
   -0.005013426601; 0.0029; 0.002406811884|],
 [|0.000669899402; -0.0008090632978; 0.001637968289; -0.0046; -0.0003325680625;
   0.003960462789; -0.0034; 3.897983915e-05; 0.003555269805; -0.0007209487641|])
Out[35]:
(0.01924161765, 0.01972516025, 0.02964161765, 0.03004720057, 0.04888323529,
 0.04977236082)

2018-11-18