NYC Democratic Primary: Ranked Choice Voting

Today I created a Sankey diagram based on the last three rounds of the New York City Democratic Mayor Primary. This year’s election was unique because in addition to provoking heated debate, it was the first time NYC used Ranked Choice Voting (RCV).

The basic premise of RCV is that you get to rank up to 5 candidates in order. In each round, the candidate with the least amount of votes is eliminated. If your first choice is eliminated, then your vote will be reallocated to your next preferred candidate among the remaining candidates. This process continues until one candidate receives 5at least 50% of the remaining ballots. Note that if all your candidates have been eliminated, then your ballot is “exhausted” and is not considered in the remaining rounds. You can click the link in the previous paragraph for a more detailed explanation.

My goal was to recreate a chart from the New York Times. I used the networkD3 package and was able to reproduce the diagram to some disagree. However, whereas the original diagram shows the percentages at the end of each round, my chart uses vote count.

The Diagram

Below is my final diagram. We can see rounds 6, 7, and 8 of the election. If you hover over the nodes (rectangles), you can see the final tally after each round. If you hover over the links between the nodes, you can observe how votes were reallocated as candidates were eliminated. This is one feature that I added that was not in the original NY Times diagram

Andrew Yang was eliminated in round 7 and Maya Wiley was eliminated in round 8. In round 8, Eric Adams defeated Kathryn Garicia by 8,426 votes. Note that at the bottom in gray there are inactive ballots. These are ballots in which all the candidates listed were eliminated.

NYC 2021 Mayoral Democratic Primary

Ranked Choice Results: Rounds 6 to 8

Note that the reason I (and probably the NY Times as well) only included the last three rounds is that more than one candidate was eliminated in the sixth round. Because more than one candidate was eliminated, it’s currently impossible based on the data available to determine how each individual candidate’s ballots were re-allocated.

Take Aways from Diagram

Sometimes I feel as though Sankey Diagrams get a bad rap. It’s true that they can quickly become junk chart if there are two many nodes. However, I think in this case, the Sankey Diagram really helps to illustrate shifts in the race as candidates were eliminated in the final rounds.

Garcia did not gain that much on Adams through her alliance with Yang

Toward the end of the campaign, Garcia and Yang formed and alliance. Yang, despite being an early front runner, was eliminated in the sixth round. However, Garcia didn’t gain that much ground on Adams after Yang was eliminated. Garcia won 43,072 of Yang’s votes versus 37,430 that went to Adams. At the end of round 7, Garcia was still behind Adams by 88,203 votes.

Wiley Voters prefered Garcia to Adams

After Wiley was eliminated in the final round, most of her votes were reallocated to Garcia (n = 129,446, 51.1%). This result is unsurprising given that Garcia’s public safety/police reform position was between Wiley and Adams.

Many Wiley Voters did not rank either Garcia or Adams

Many Wiley did not rank Garcia or Adams; consequently, their ballots became inactive (n=73,979). While it may be true that Wiley voters had no preference between Garcia and Adams, it also may be true that voters are not yet accustomed to the RCV system. Voters may not know the advantage of ranking more than one candidate in case their favorite candidate gets eliminated.

Technical Note: Comparing both Diagrams

The main difference between our diagrams is that I used vote counts instead of percentages. The percentages are nice because the key to winning the RCV election is to pass the 50% threshold. On the other, using vote counts makes it easier to see how votes are reallocated after each round.

We could define each flow as the percentage of votes from each eliminated candidate, but I am not sure how to code this. The problem is that we still want the nodes to be defined in terms of the vote share at the end of each round. In general, using different denominators for difference aspects of a table or chart is a challenge for both the coder and the audience.