{smcl} {* 30april2003}{...} {hline} help for {hi:majority} {hline} {title:Majority calculations for real or hypothetical elections} {p 8 17 2} {cmd:majority} {it:varname} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] {cmd:,} {cmdab:pos:itive(}{it:valuelist}{cmd:)} {cmdab:neg:ative(}{it:valuelist}{cmd:)} [ {cmd:by(}{it:byvarlist}{cmd:)} {cmdab:perc:ent} {cmd:format(}{it:format}{cmd:)} {it:list_options} {cmdab:g:enerate(}{it:newvarlist}{cmd:)} ] {p 8 17 2} {cmd:majority} {it:varlist} [{it:weight}] [{cmd:if} {it:exp}] [{cmd:in} {it:range}] {cmd:,} {cmdab:pos:itive(}{it:varlist}{cmd:)} {cmdab:neg:ative(}{it:varlist}{cmd:)} [ {cmd:by(}{it:byvarlist}{cmd:)} {cmdab:perc:ent} {cmd:format(}{it:format}{cmd:)} {it:list_options} {cmdab:g:enerate(}{it:newvarlist}{cmd:)} ] {title:Description} {p 4 4 2} {cmd:majority} reports on voting in real or hypothetical elections. The {it:votes} cast are summarized as being {it:positive} or {it:negative} and the difference between these is reported. This difference is here called the {it:majority}, such terminology being a mild generalisation of usage in British English. {p 4 4 2}There are two syntaxes. The first is for a long data structure in which votes are recorded in a single variable, which may be numeric or string. The second is for a wide data structure in which votes are recorded in two or more variables, which must be numeric. Note that with both syntaxes the percents calculated with the {cmd:percent} option will depend on the totals as determined by all the variables supplied, except for any {cmd:if} and {cmd:in} restrictions and missing values. {p 4 4 2}{cmd:fweight}s and {cmd:aweight}s may be specified. {title:Remarks} {p 4 4 2}In 2000 the Electoral College of the United States voted as follows: George W. Bush 271; Al Gore 266; absentions 1. Hence George W. Bush was elected President of the United States by a majority of 5, or as a percent 5 / 538 = 0.93%. The majority for Bush is here # votes for - # votes against. More generally we can calculate a {it:majority} given a decision on which votes are counted as {it:positive} and which as {it:negative}. Polls are often summarized in terms of the difference between those who approve and those who disapprove, usually of some politician or policy. In a poll reported in the {it:Independent} [London] on 28 April 2003, 14% of (presumably British) adults were in favour of genetically modified food, and 56% against. From this it would seem that a referendum on the issue would yield a majority of 42% against. In a survey of grocery shoppers in Oxford, one question asked for agreement or disagreement with the statement "I find getting to grocery shops very tiring" on a 5 point scale running from {it:agree} to {it:disagree}. Two possibilities here are to calculate (1) (# {it:agree}) - (# {it:disagree}) and (2) (# {it:agree} + # {it:tend to agree}) - (# {it:disagree} + # {it:tend to disagree}). Here as elsewhere it usually seems best to ignore the neutral category in the middle of the scale, # {it:in between}. In electoral terms, we might guess, in the absence of any other information, that those who are undecided either might not vote at all or might be split equally into {it:positive} and {it:negative}: whatever we guess, the majority is unchanged. However, how to assign categories is a substantive decision, rather than a statistical decision. {p 4 4 2}More generally, the majority has various attractive properties. It is simple, and familiar to many people, largely because of its widespread use in reporting elections. It is a direct summary of the data which does not go beyond the categories given, as compared with say assigning some scores to the categories and averaging those. It is sensitive to variations, without being unstable: note that by comparison a ratio such as # {it:positive} / # {it:negative} is not only unstable but also indeterminate for zero denominators. As mentioned, the majority is unchanged if neutral or don't know categories are split equally, unlike any ratio. Perhaps the main disadvantage of the majority is that it does not have an obvious link to any specific family of models. {p 4 4 2}See also Wilkinson, L. 1999. {it:The grammar of graphics.} New York: Springer-Verlag and Zeisel, H. 1985. {it:Say it with figures.} 6th ed. New York: Harper & Row. {p 4 4 2}We deal here only with the simplest kind of voting. For more information on voting systems, see for example {browse "http://www.barnsdle.demon.co.uk/vote/vote.html":http://www.barnsdle.demon.co.uk/vote/vote.html}. {title:Options} {p 4 8 2}{cmd:positive()} and {cmd:negative()} are required options. With the first syntax, the argument of each is a {it:valuelist}, i.e. a numeric list or a list of string values. With the second syntax, it is a {it:varlist} of numeric variables. {p 8 8 2}In the first syntax, suppose we have data in a numeric variable {cmd:opinion} on a 5 point opinion scale which we wish to summarize in terms of the number who say 1 or 2 minus the number who say 4 or 5, ignoring those who say 3. The syntax is, minimally, {cmd:majority opinion, pos(1 2) neg(4 5)}. Or suppose that we have data in a string variable {cmd:candidate} which we wish to summarize in terms of votes for "Bush" minus votes for "Gore". The syntax is, minimally, {cmd:majority candidate, pos("Bush") neg("Gore")}. {p 8 8 2}In the second syntax, suppose we have data in four numeric variables, {cmd:Bush}, {cmd:Gore}, {cmd:Nader} and {cmd:others}, which we wish to summarize in terms of the number who voted for Bush minus the number who voted for Gore. The syntax is minimally {cmd:majority Bush Gore Nader others, pos(Bush) neg(Gore)}. If we wish to summarize in terms of the number who voted for Bush minus the number who voted for all other candidates, the syntax is minimally {cmd:majority Bush Gore Nader others, pos(Bush) neg(Gore Nader others)}. {p 4 8 2}{cmd:by()} specifies that reports are to be subdivided according to the distinct combinations of {it:byvarlist}. This is not a required option but in practice perhaps the most useful handle provided by {cmd:majority}. {p 4 8 2}{cmd:percent} stipulates that positive and negative votes and the majority be reported as percents of the total. Thus {cmd:majority Bush Gore Nader others, pos(Bush) neg(Gore Nader others) percent} will report Bush minus all others as a percent of the total for all candidates, while {cmd:majority Bush Gore, pos(Bush) neg(Gore) percent} will report Bush minus Gore as a percent of their total. {p 4 8 2}{cmd:format()} supplies a format to be used in displaying the results of {it:positive}, {it:negative} and {it:majority}. The default is %3.2f if {cmd:percent} is specified and %8.0g otherwise. {p 4 8 2}{it:list_options} refers to options of {help list}, which is used to display results. {p 4 8 2}{cmdab:g:enerate()} generates up to three new variables, as follows. With one new variable name, the new variable contains the {it:majority} calculated. With two new variable names, the second contains the total(s) of {it:positive} votes. With three new variable names, the third contains the total(s) of {it:negative} votes. If {cmd:percent} is also specified, new variables will be in percent form. Note that each new variable will contain constants, unless {cmd:by()} is also specified, in which case each new variable will contain the same constant for each block defined by {cmd:by()}. {title:Examples} {p 4 8 2}{inp:. use http://www.stata-press.com/data/r8/voter.dta}{p_end} {p 4 8 2}{inp:. majority candidat [w=frac], pos(2) neg(3)}{p_end} {p 4 8 2}{inp:. majority candidat [w=frac], pos(2) neg(3) by(inc)}{p_end} {p 4 8 2}{inp:. majority candidat [aw=pfrac], pos(2) neg(3) by(inc) format(%3.2f)} {title:Author} Nicholas J. Cox, University of Durham n.j.cox@durham.ac.uk {title:Also see} {p 0 19}On-line: help for {help slideplot} (if installed){p_end}