Argentine general election, 2015

The 2015 Argentine's presidential election to be held next October 25th is approaching and the dispute begun to appear more clearly since the major parties announced their potential candidates last June.

This Sunday, the political parties are holding their primaries for the upcoming presidential election. As in US, in Argentine the primaries are important for parties to solve internal disputes, so the winning candidate can run more comfortable with a united party.

As usual, I set up a forecasting model to track down the vote intentions in my neighbouring country--the land of tango. I'm still consolidating opinion polls data while trying to get some clues about past pollster's performance, so I can account for the likely house effect.

The following graph was adjusted using simple loess techniques. As I already have a reasonable population of polls, I could adjust a Dirichlet regression, which produces a more robust picture of the dispute (the second graph below), though at this stage the model is an oversimplification as some pollsters are more reliable than others. So, I hope next time to post a more sound forecast.

enter image description here

enter image description here

From the figures above, we can see that some polls have quite weird sinusoidal artifacts. Considering those are wrong compared to the others, they can influence the trend line estimates if on a particular day the deviating poll is the only measurement we have.

House Effects

I want to improve the estimates over the next weeks by using better priors for the house effect of each polling firm. For example, in the picture below, pollster "OPSM" polled favourably for Mauricio Macri (blue line/dots) while polled worst for the Kristina Kirschner's candidate, Daniel Scioli. On the other hand, pollster "Hugo Haime & Asc." underestimated the principal opposition candidate (M. Macri) while overestimated Daniel Scioli plus the PJ's dissident, Sergio Massa.

enter image description here

Let's think about the implications of this for a moment. Some institutes published polls in which the one or the other candidate over a period of several months is predicted on average two percents below/above the median of all the polls published in that period; this is really hard to believe given that the polling organizations are all claiming to interview a representative group of the population. Even if we acknowledge that the polls are producing some noisy measurements, there would not be this kind of hex. Unless there is a systematic error in the polls that occurs over and over.

Ternary plots in politics

This week, I read a post by Nicholas Hamilton about ternary plots that made me think, how this geometric diagram has many different application in science fields. Couple weeks ago, I was reading a book by Donald Saari, who uses ternary charts massively to project election outcomes in different electoral settings.

Perhaps, most common, ternary diagrams are used for projecting 3-parties elections; the same idea can be generalised for more parties, though it gets more complex. When we study elections with this geometric figure, each point of the triangle intuitively represents 3 coordinates (say A, B, C, because I'm creative today) that correspond to the percentage of votes each political organisation obtained. From this, we may speculate the composition of the legislative body. Here I use colors to make it fancier.

The whole point of the book is not about election outcome visualisation, but that the election outcome may be depend on the formula used to aggregate votes and the number of seats available. For instance, if we have a constituency of M=5 seats, the possible outcomes are:

ternary

If party "A" has 60% of the popular preference, party "B" 20%, and "C" other 20%, then a system using d'Hondt to distribute seats proportionally among the parties will give: A(3), B(1), C(1).

Rather than thinking on edges, we can draw regions to make more clear how electoral formulas cause small, but different shapes, which in the long run may affect the number of parties and coordination among party supporters, to mention few adverse reactions.

ternary2

ternary3

I loved this %>% crosstable

This is a public tank you for @heatherturner's contribution. Now the SciencesPo's crosstable can work in a chain (%>%) fashion; useful for using along with other packages that have integrated the magrittr operator.

     > candidatos %>%
     + filter(desc_cargo == 'DEPUTADO ESTADUAL'| 
desc_cargo =='DEPUTADO DISTRITAL' | desc_cargo =='DEPUTADO FEDERAL' | 
desc_cargo =='VEREADOR' | desc_cargo =='SENADOR') %>% 
tab(desc_cargo,desc_sexo)

====================================================
                           desc_sexo                
                   -------------------------        
desc_cargo             NA   FEMININO MASCULINO  Total 
----------------------------------------------------
DEPUTADO DISTRITAL      1     826      2457     3284
                    0.03%     25%       75%     100%
DEPUTADO ESTADUAL     122   12595     48325    61042
                    0.20%     21%       79%     100%
DEPUTADO FEDERAL       40    5006     20176    25222
                    0.16%     20%       80%     100%
SENADOR                 4     161      1002     1167
                    0.34%     14%       86%     100%
VEREADOR             9682  376576   1162973  1549231
                    0.62%     24%       75%     100%
----------------------------------------------------
Total                9849  395164   1234933  1639946
                    0.60%     24%       75%     100%
====================================================

Chi-Square Test for Independence

Number of cases in table: 1639946 
Number of factors: 2 
Test for independence of all factors:
    Chisq = 1077.4, df = 8, p-value = 2.956e-227
                    X^2 df P(> X^2)
Likelihood Ratio 1216.0  8        0
Pearson          1077.4  8        0

Phi-Coefficient   : 0.026 
Contingency Coeff.: 0.026 
Cramer's V        : 0.018 

# Reproducible example:

library(SciencesPo)

 gender = rep(c("female","male"),c(1835,2691))
    admitted = rep(c("yes","no","yes","no"),c(557,1278,1198,1493))
    dept = rep(c("A","B","C","D","E","F","A","B","C","D","E","F"),
               c(89,17,202,131,94,24,19,8,391,244,299,317))
    dept2 = rep(c("A","B","C","D","E","F","A","B","C","D","E","F"),
               c(512,353,120,138,53,22,313,207,205,279,138,351))
    department = c(dept,dept2)
    ucb = data.frame(gender,admitted,department)


> ucb %>% tab(admitted, gender, department)
================================================================
                                 department                       
                  -----------------------------------------       
admitted gender   A      B      C      D      E      F    Total 
----------------------------------------------------------------
no       female     19      8    391    244    299    317   1278
                  1.5%  0.63%    31%    19%  23.4%    25%   100%
         male      313    207    205    279    138    351   1493
                 21.0% 13.86%    14%    19%   9.2%    24%   100%
         -------------------------------------------------------
         Total     332    215    596    523    437    668   2771
                 12.0%  7.76%    22%    19%  15.8%    24%   100%
----------------------------------------------------------------
yes      female     89     17    202    131     94     24    557
                   16%   3.1%    36%    24%  16.9%   4.3%   100%
         male      512    353    120    138     53     22   1198
                   43%  29.5%    10%    12%   4.4%   1.8%   100%
         -------------------------------------------------------
         Total     601    370    322    269    147     46   1755
                   34%  21.1%    18%    15%   8.4%   2.6%   100%
----------------------------------------------------------------
Total    female    108     25    593    375    393    341   1835
                  5.9%   1.4%    32%    20%  21.4%    19%   100%
         male      825    560    325    417    191    373   2691
                 30.7%  20.8%    12%    15%   7.1%    14%   100%
         -------------------------------------------------------
         Total     933    585    918    792    584    714   4526
                 20.6%  12.9%    20%    17%  12.9%    16%   100%
================================================================

The Greek thing II

Just few hours before Greeks head to the polls to decide on the bailout agreement, and ultimately, whether the country will stay in the euro, there is no overwhelming advantage of either side. Actually, the margin became blurred over the last three days, with the "Yes" position rehearsing a last-minute recovery. Despite this last-minute trend, the aggregate preference for the "NO" is not too far behind. To frame this in terms of probabilities, that is, the \theta_{YES} exceeds \theta_{NO}, I adapted a short function written a while ago to simulate from a Dirichlet distribution, and then to compute posterior probabilities shown in the chart below. It's really nothing, but the YES outperformed the NO in 57%.

Probs

The polls were aggregated and the "Don't Know" respondents were distributed accordingly to proportion of the Yes/No reported by the polls.

UPDATE:
With polls yesterday showing both sides in a dead heat, today's overwhelmingly majority of voters saying NO is a big surprise, isn't? Plenty of theories will appear to explain why Greeks have chosen to reject the terms of the deal as proposed by EU officials, meanwhile, it's time for the parties to set up the plan B.

The Greek thing

Greeks have been quite volatile on their opinion whether they should accept or not a proposal by the country's creditors for more austerity to keep aid flowing. The polls conducted over this week look like crazy, though that "belly" was likely provoked by the anxiety on what comes next after Greece not paying IMF back.

loess

The data were collected on the internet, most of them assembled by http://metapolls.net