Somebody said me that it'd be really nice to see a posting with simple simulations for the runoff this weekend. Answering such a call, this is the best I could come up with. The following is a highly simplified simulation that does not account for time trends nor for house effects. But it's still theoretically sound as it takes a vector of polls, instead of single polls individually.

# collect some of the latest polls
dilma <- c(53,54, 52,46.7,52, 50.5, 43.6)

# Set the estimated percent for Dilma
# based on the average of several national polls
propDilma = mean(dilma)

# Set the standard deviation
# this measures the variability between the different polls.
sdDilma = sd(dilma)

# Function to simulate a single election
simElection <- function(prop,sd){
return(rnorm(1,mean=prop,sd=sd))
}

# Simulate the percent Dilma in 1000 elections
simPropDilma = replicate(1000, simElection(propDilma,sdDilma))

# Calculate the percent of times Dilma wins
perDilmaWin = mean(simPropDilma > 50)
perDilmaWin
[1] 0.517

hist(simPropDilma, freq=FALSE, xlab="Popular Vote",
main="Distribution of Dilma's Popular Vote",
col="red", xlim=c(30,70),  ylim=c(0, .15) )

curve(dnorm(x, mean=mean(simPropDilma), sd=sd(simPropDilma)), add=TRUE, col="blue", lwd=2)


Probability sampling with quotas is nice. It provides a favorable cost-benefits relation and has been found quicker than full probability designs (Thompson 2012), but it is definitely not the best way to understand what is going on within the electorate.

As a simple example, every body knows that the population of voters who vote null in the country is around 9% and growing slowly. However, pollsters never get this category right; they typically field a mean value of 5/6 percents. This category of voters does not matter much as it is totally disregarded of the official results, but it does make a strong case to think about polling bias once this quantity is highly predictable. Looking at the following graphs may help to understand how the sampling with quotas may be affecting polling data results, as it seems difficult to field people who rather prefer to wast their ballots. The dot is the expected value, the pluses “+” are polling measures, and the shaded area are the 95% credible region. The first figure differ from the second in the extent that the second brings forth simulated plausible values a pollsters is expected to find.

Figure 1: prediction with actual likelihood for wasting ballots voters

￼ Figure 2: prediction with simulated plausible likelihood for wasting ballots voters

A typical Brazilian polling firm applies various modifications of the theoretically convenient simple random sample, like quota and cluster designs, where one or more stages of the fieldwork permit some surveyor discretion. These sampling strategies provide a favorable cost-benefits relation as it has been found quicker and cheaper than full probability designs, however, this comes with theoretically weakness because of possible bias at the last stage of respondent selection in addition to constraints in deriving a proper margin of error.

Sampling with quotas is theoretically sound; pollsters try to mimic the universe of voters as if each quota was homogeny enough to produce valid estimates, however, this assumption will rarely hold. As a result, when a parameter has different correlates, unrelated to those of the quota, considerable error can spill over, affecting the overall estimates (Groves 2013). In addition, people who are difficult to locate or to interview will be more seriously underrepresented than in completely random designs. Therefore, requiring much more knowledge about the joint distributions of variables in the population than in a simple random sample to correct those issues.

Lefting 10 days to the runoff election, pollsters are saying the Social Democrat candidate, Aécio Neves (PSDB), is leading for a tiny margin, though his advantage is within the typically 2% sampling error. That is, pollsters are careful to call anything.
If this context holds over the next week, it will be the first time in modern Brazilian democracy that a runner-up candidate in the first round get more votes than the winner, the Workers' Party's incumbent Dilma Rousseff.

The following charts combine the latest polls using the vote intention declared for of the eventual runoff between these two candidates collected over the first round election as priors, the dots at the end is where I believe the candidates will fall. The computation uses the total votes, therefore it includes the Wasting votes as the third category and a residual category of Swing voters. Because I'm using total votes, the winner may have less than 50% of the votes.

This year's election rendered an even more fragmented federal legislative. The way political scientists measure this is by applying an algorithm to calculate the Effective Number of Parties, which is a measure that helps to go beyond the simple number of parties represented in the Parliament. A widely accepted algorithm was proposed by M. Laakso and R. Taagepera (1979): $N = \frac{1}{\sum_{i=1}^{n} p_{i}^{2}}$, where N denotes the effective number of parties and p_i denotes the ith party's fraction of the seats. Few years ago, Grigorii Golosov proposed a new method for computing the effective number of parties in which both larger and smaller parties are not attributed unrealistic scores as those resulted by using the Laakso—Taagepera index. I checked it out, by comparing changes in the Brazilian lower chamber from 2002 to 2014 elections using the Golosov formula: $N = \sum_{i=1}^{n}\frac{p_{i}}{p_{i}+p_{i}^{2}-p_{i}^{2}}$. The results indicate that the legislative power will be more fragmented next year, jumping from 10.5 to 14.5 in the scale.

This is what true vote distributions look like.

Maybe the biggest loser in this election are the polling firms; they once again did a very poor job in capturing the mood of the electorate. Moving from quotas to random sampling design is just the right thing to do.