```-------------------------------------------------------------------------------
help for majority
-------------------------------------------------------------------------------

Majority calculations for real or hypothetical elections

majority varname [weight] [if exp] [in range] , positive(valuelist)
negative(valuelist) [ by(byvarlist) percent format(format)
list_options generate(newvarlist) ]

majority varlist [weight] [if exp] [in range] , positive(varlist)
negative(varlist) [ by(byvarlist) percent format(format)
list_options generate(newvarlist) ]

Description

majority reports on voting in real or hypothetical elections.  The votes
cast are summarized as being positive or negative and the difference
between these is reported.  This difference is here called the majority,
such terminology being a mild generalisation of usage in British English.

There are two syntaxes. The first is for a long data structure in which
votes are recorded in a single variable, which may be numeric or string.
The second is for a wide data structure in which votes are recorded in
two or more variables, which must be numeric. Note that with both
syntaxes the percents calculated with the percent option will depend on
the totals as determined by all the variables supplied, except for any if
and in restrictions and missing values.

fweights and aweights may be specified.

Remarks

In 2000 the Electoral College of the United States voted as follows:
George W. Bush 271; Al Gore 266; absentions 1. Hence George W. Bush was
elected President of the United States by a majority of 5, or as a
percent 5 / 538 = 0.93%. The majority for Bush is here # votes for - #
votes against.  More generally we can calculate a majority given a
decision on which votes are counted as positive and which as negative.
Polls are often summarized in terms of the difference between those who
approve and those who disapprove, usually of some politician or policy.
In a poll reported in the Independent [London] on 28 April 2003, 14% of
(presumably British) adults were in favour of genetically modified food,
and 56% against. From this it would seem that a referendum on the issue
would yield a majority of 42% against.  In a survey of grocery shoppers
in Oxford, one question asked for agreement or disagreement with the
statement "I find getting to grocery shops very tiring" on a 5 point
scale running from agree to disagree.  Two possibilities here are to
calculate (1) (# agree) - (# disagree) and (2) (# agree + # tend to
agree) - (# disagree + # tend to disagree). Here as elsewhere it usually
seems best to ignore the neutral category in the middle of the scale, #
in between. In electoral terms, we might guess, in the absence of any
other information, that those who are undecided either might not vote at
all or might be split equally into positive and negative:  whatever we
guess, the majority is unchanged.  However, how to assign categories is a
substantive decision, rather than a statistical decision.

More generally, the majority has various attractive properties. It is
simple, and familiar to many people, largely because of its widespread
use in reporting elections. It is a direct summary of the data which does
not go beyond the categories given, as compared with say assigning some
scores to the categories and averaging those. It is sensitive to
variations, without being unstable: note that by comparison a ratio such
as # positive / # negative is not only unstable but also indeterminate
for zero denominators.  As mentioned, the majority is unchanged if
neutral or don't know categories are split equally, unlike any ratio.
Perhaps the main disadvantage of the majority is that it does not have an
obvious link to any specific family of models.

Springer-Verlag and Zeisel, H. 1985.  Say it with figures. 6th ed. New
York: Harper & Row.

We deal here only with the simplest kind of voting.  For more information
on voting systems, see for example
http://www.barnsdle.demon.co.uk/vote/vote.html.

Options

positive() and negative() are required options.  With the first syntax,
the argument of each is a valuelist, i.e. a numeric list or a list of
string values. With the second syntax, it is a varlist of numeric
variables.

In the first syntax, suppose we have data in a numeric variable
opinion on a 5 point opinion scale which we wish to summarize in
terms of the number who say 1 or 2 minus the number who say 4 or 5,
ignoring those who say 3. The syntax is, minimally, majority opinion,
pos(1 2) neg(4 5).  Or suppose that we have data in a string variable
candidate which we wish to summarize in terms of votes for "Bush"
minus votes for "Gore". The syntax is, minimally, majority candidate,
pos("Bush") neg("Gore").

In the second syntax, suppose we have data in four numeric variables,
Bush, Gore, Nader and others, which we wish to summarize in terms of
the number who voted for Bush minus the number who voted for Gore.
The syntax is minimally majority Bush Gore Nader others, pos(Bush)
neg(Gore).  If we wish to summarize in terms of the number who voted
for Bush minus the number who voted for all other candidates, the
syntax is minimally majority Bush Gore Nader others, pos(Bush)

by() specifies that reports are to be subdivided according to the
distinct combinations of byvarlist.  This is not a required option
but in practice perhaps the most useful handle provided by majority.

percent stipulates that positive and negative votes and the majority be
reported as percents of the total. Thus majority Bush Gore Nader
others, pos(Bush) neg(Gore Nader others) percent will report Bush
minus all others as a percent of the total for all candidates, while
majority Bush Gore, pos(Bush) neg(Gore) percent will report Bush
minus Gore as a percent of their total.

format() supplies a format to be used in displaying the results of
positive, negative and majority. The default is %3.2f if percent is
specified and %8.0g otherwise.

list_options refers to options of list, which is used to display results.

generate() generates up to three new variables, as follows. With one new
variable name, the new variable contains the majority calculated.
With two new variable names, the second contains the total(s) of
positive votes. With three new variable names, the third contains the
total(s) of negative votes.  If percent is also specified, new
variables will be in percent form. Note that each new variable will
contain constants, unless by() is also specified, in which case each
new variable will contain the same constant for each block defined by
by().

Examples

. use http://www.stata-press.com/data/r8/voter.dta
. majority candidat [w=frac], pos(2) neg(3)
. majority candidat [w=frac], pos(2) neg(3) by(inc)
. majority candidat [aw=pfrac], pos(2) neg(3) by(inc) format(%3.2f)

Author

Nicholas J. Cox, University of Durham
n.j.cox@durham.ac.uk

Also see

On-line:  help for slideplot (if installed)
```