IMG_1375 Last week, I attended a "Flower Fest" where I had the opportunity to admire several of the most beautiful and awarded flowers, orchids, and decoration plants. Surprisingly, though, I never had thought of flowers like fractals the way I did this time.

Fractals attract lots of interest, especially from mathematicians who actually spend some time trying to learn about their structures. But what makes something a fractal? A fractal is defined as an object that displays self-similarity on all scales. But the object doesn't need to have exactly the same structure at all scales only the same sort of structure must be visible or recognizable any how.

IMG_1368 The structure or the equation that defines a fractal is most of the time very simple. For instance, the formula for the famous Mandelbrot is z_{n+1}=z_n^2+c.

We start by plugging in a constant value for $c$, $z$ can start at zero. After one iteration, the equation gives us a new value for $z$; then we plug this back into the equation at old $z$ and iterate it again, it can proceed infinitely.

As a very simple example, let's start this with c = 1.

z_{1} = z_{0}^2 + c= 0 + 1 = 1

z_{2} = z_{1}^2 + c = 1 + 1 = 2

z_{3} = z_{2}^2 + c = 4 + 1 = 5

Graphing these results against n would create an upward parabolic curve because the numbers increase exponentially (to infinity). But if we start with c = -1 for instance, $z$ will behave completely different. That is, it will oscillate between two fixed points as:

z_{1} = z_{0}^2 + c= 0 + (-1) = -1

z_{2} = z_{1}^2 + c = 1 + (-1) = 0

z_{3} = z_{2}^2 + c = 0 + (- 1) = -1

z_{4} = z_{3}^2 + c = 1 + (- 1) = 0

And this movement back and forth will continue forever as we can imagine. I figured out, that the Mandelbrot set is made up of all the values for $z$ that stay finite, thus a solution such as the first example for $c = 1$ is not valid and will be thrown out because $z$ in those cases goes to infinity and Mandelbrot lives in a finite world. The following is an example of such set.

Mandelbrot The script for this set is here.

Every campaign cycle I usually do similar things, go to a repository, download a bounce of data, merge and store them to an existing RData file for posterior analysis. I've already wrote about this topic some time ago, but this time I think my script became simpler.

Set the Directory

Let's assume you're not in the same directory of your files, so you'll need to set R to where the population of files resides.

setwd("~/Downloads/consulta_cand_2014")

Getting a List of files

Next, it’s just a matter of getting to know your files. For this, the list.files() function is very handy, and you can see the file names right-way in your screen. Here I'm looking form those "txt" files, so I want my list of files exclude everything else, like pdf, jpg etc.

files <- list.files(pattern= '\\.txt$')

Sometimes you may find empty objects that may prevent the script to run successfully against them. Thus, you may want to inspect the files beforehand.

info = file.info(files)
empty = rownames(info[info$size == 0, ])

Moreover, in case you have the same files in more than one format, you may want to filter them like in the following:

CSVs <-list.files(pattern='csv')
TXTs <- list.files(pattern='txt')
mylist <- CSVs[!CSVs %in% TXTs]

Stacking files into a dataframe

The last step is to iterate "rbind" through the list of files in the working directory putting all them together.
Notice that in the script below I've included some extra conditions to avoid problems reading the files I have. Also, this assumes all the files have the same number of columns, otherwise "rbind" won't work. In this case you may need to replace "rbind" by "smartbind" from gtools package.

cand_br <- do.call("rbind",lapply(files,
FUN=function(files){read.table(files,
header=FALSE, sep=";",stringsAsFactors=FALSE, 
fileEncoding="cp1252", fill=TRUE,blank.lines.skip=TRUE)
}))

Uruguayan voters are about to give to the Frente Amplio a third mandate this November 30th. The following graph shows how the outcome would look like if the election were held this week. The undecided voters were distributed accordingly to each party as by the election day. The picture plots the probability density function (pdf) of the vote support for the FA and the PN as published by the major polling houses. The script can de found here.

As the picture suggests, the posterior densities are quite apart from each other and their confidence regions narrow, meaning that we have less uncertainty about the results under that area.

Rplot

Within 2 weeks, electors in Uruguay will vote for the runoff election between FA and PN. According to the polling data being published, it's very likely Uruguayans will give FA a third mandate. I run the following forecast model which suggest that the difference between the two parties are huge; even greater than the number of undecided voters.

Tabaré Vázquez - Frente Amplio

FA

Luis Lacalle Pou - Partido Nacional

PN

The latest polls just released tonight are suggesting a numerical tie between Dilma Rousseff (PT) and Aecio Neves (PSDB) considering the limit of the margin of error. Actually, these polls fired up a possible game-changing for the opposition over the government as some of the polls did capture any impact stimulated by the televised debate on Friday night.

There is still a lot of uncertainty around; roughly 5% of the electorate were reported to be undecided still. Nonetheless, by the time I run my model today, it turned Aecio ahead of Dilma for a little margin (< 1%). These numbers also account for Wasting votes, so it will typically diverge from the official results.

DILMA

PT

AECIO

PSDB

WASTING VOTES

WASTE

House Effects

Although the machinery behind the model I'm running allows for drawing several elections from data, it's too risky to call one side or the other given the pollster's credibility, which was certainly aggravated by the poor performance 3 weeks ago. Meanwhile, I've been trying to learn the pollsters' random walk in the Brazilian campaigns, but given the small range of observations this will take a while to produce robust measures.

The following chart shows the house effects considering those polls released over the runoff campaign. Ideally, a poster would have its effect equally distributed between positive and negative bands. Like a drunkard's walk a pollster could stagger left and right near each party or candidate. Not surprisingly, however, the picture shows two blocks of bias. While the first 4 pollsters typically fielded more positive numbers of the Government, the last 3 did so for the opposition. In addition, the house effects found for Datafolha, Veritá, and Sensus are statistically different than zero.

houseeffects

Taking a less systematic approach on the house effects, adjusting only for the sample size of the polls, regardless the methodology employed (probability with quotas or simply quotas), Dilma appears ahead with an interval of [2.5% to 5.7%] as represented in the following distribution. This happen because polling firms with a house house effects toward the government happen to sample much more people than otherwise, though the methodology they use to sample a large quantity of voters is poorer. This election is so mercurial that wrong decision on the precision parameter can sway the outcome from one side to the other.

Dilma