Majority calculations for real or hypothetical elections
majority varname [weight] [if exp] [in range] , positive(valuelist) negative(valuelist) [ by(byvarlist) percent format(format) list_options generate(newvarlist) ]
majority varlist [weight] [if exp] [in range] , positive(varlist) negative(varlist) [ by(byvarlist) percent format(format) list_options generate(newvarlist) ]
Description
majority reports on voting in real or hypothetical elections. The votes cast are summarized as being positive or negative and the difference between these is reported. This difference is here called the majority, such terminology being a mild generalisation of usage in British English.
There are two syntaxes. The first is for a long data structure in which votes are recorded in a single variable, which may be numeric or string. The second is for a wide data structure in which votes are recorded in two or more variables, which must be numeric. Note that with both syntaxes the percents calculated with the percent option will depend on the totals as determined by all the variables supplied, except for any if and in restrictions and missing values.
fweights and aweights may be specified.
Remarks In 2000 the Electoral College of the United States voted as follows: George W. Bush 271; Al Gore 266; absentions 1. Hence George W. Bush was elected President of the United States by a majority of 5, or as a percent 5 / 538 = 0.93%. The majority for Bush is here # votes for - # votes against. More generally we can calculate a majority given a decision on which votes are counted as positive and which as negative. Polls are often summarized in terms of the difference between those who approve and those who disapprove, usually of some politician or policy. In a poll reported in the Independent [London] on 28 April 2003, 14% of (presumably British) adults were in favour of genetically modified food, and 56% against. From this it would seem that a referendum on the issue would yield a majority of 42% against. In a survey of grocery shoppers in Oxford, one question asked for agreement or disagreement with the statement "I find getting to grocery shops very tiring" on a 5 point scale running from agree to disagree. Two possibilities here are to calculate (1) (# agree) - (# disagree) and (2) (# agree + # tend to agree) - (# disagree + # tend to disagree). Here as elsewhere it usually seems best to ignore the neutral category in the middle of the scale, # in between. In electoral terms, we might guess, in the absence of any other information, that those who are undecided either might not vote at all or might be split equally into positive and negative: whatever we guess, the majority is unchanged. However, how to assign categories is a substantive decision, rather than a statistical decision.
More generally, the majority has various attractive properties. It is simple, and familiar to many people, largely because of its widespread use in reporting elections. It is a direct summary of the data which does not go beyond the categories given, as compared with say assigning some scores to the categories and averaging those. It is sensitive to variations, without being unstable: note that by comparison a ratio such as # positive / # negative is not only unstable but also indeterminate for zero denominators. As mentioned, the majority is unchanged if neutral or don't know categories are split equally, unlike any ratio. Perhaps the main disadvantage of the majority is that it does not have an obvious link to any specific family of models.
See also Wilkinson, L. 1999. The grammar of graphics. New York: Springer-Verlag and Zeisel, H. 1985. Say it with figures. 6th ed. New York: Harper & Row.
We deal here only with the simplest kind of voting. For more information on voting systems, see for example http://www.barnsdle.demon.co.uk/vote/vote.html.
Options
positive() and negative() are required options. With the first syntax, the argument of each is a valuelist, i.e. a numeric list or a list of string values. With the second syntax, it is a varlist of numeric variables.
In the first syntax, suppose we have data in a numeric variable opinion on a 5 point opinion scale which we wish to summarize in terms of the number who say 1 or 2 minus the number who say 4 or 5, ignoring those who say 3. The syntax is, minimally, majority opinion, pos(1 2) neg(4 5). Or suppose that we have data in a string variable candidate which we wish to summarize in terms of votes for "Bush" minus votes for "Gore". The syntax is, minimally, majority candidate, pos("Bush") neg("Gore").
In the second syntax, suppose we have data in four numeric variables, Bush, Gore, Nader and others, which we wish to summarize in terms of the number who voted for Bush minus the number who voted for Gore. The syntax is minimally majority Bush Gore Nader others, pos(Bush) neg(Gore). If we wish to summarize in terms of the number who voted for Bush minus the number who voted for all other candidates, the syntax is minimally majority Bush Gore Nader others, pos(Bush) neg(Gore Nader others).
by() specifies that reports are to be subdivided according to the distinct combinations of byvarlist. This is not a required option but in practice perhaps the most useful handle provided by majority.
percent stipulates that positive and negative votes and the majority be reported as percents of the total. Thus majority Bush Gore Nader others, pos(Bush) neg(Gore Nader others) percent will report Bush minus all others as a percent of the total for all candidates, while majority Bush Gore, pos(Bush) neg(Gore) percent will report Bush minus Gore as a percent of their total.
format() supplies a format to be used in displaying the results of positive, negative and majority. The default is %3.2f if percent is specified and %8.0g otherwise.
list_options refers to options of list, which is used to display results.
generate() generates up to three new variables, as follows. With one new variable name, the new variable contains the majority calculated. With two new variable names, the second contains the total(s) of positive votes. With three new variable names, the third contains the total(s) of negative votes. If percent is also specified, new variables will be in percent form. Note that each new variable will contain constants, unless by() is also specified, in which case each new variable will contain the same constant for each block defined by by().
Examples
. use http://www.stata-press.com/data/r8/voter.dta . majority candidat [w=frac], pos(2) neg(3) . majority candidat [w=frac], pos(2) neg(3) by(inc) . majority candidat [aw=pfrac], pos(2) neg(3) by(inc) format(%3.2f)
Author
Nicholas J. Cox, University of Durham n.j.cox@durham.ac.uk
Also see
On-line: help for slideplot (if installed)