------------------------------------------------------------------------------- help forcountmatch-------------------------------------------------------------------------------

Count matching values for one variable in another

countmatchvar1var2[if] [in] [,generate(newvar)by(byvarlist)missinglist_options]

Description

countmatchcounts observations for which each distinct value ofvar1is matched by (is equal to)var2, whether for the same observation or for some different observation(s).var1andvar2should be both numeric or both string.

Options

generate()specifies the name of a new variable to hold information on match counts. Ifgenerate()is not specified, data and counts will belisted.

by()specifies that matching is to be carried out only within distinct groups defined bybyvarlist. Observations with equal values must belong to the same group to count as matching.

missingindicates that missing values ofvar1should be included in the comparison. By default, they are excluded.

list_optionsare options of list, which may be used to tune the output of any listing.

Remarks---------------------------------------------------------------------------

ExamplesFor concreteness, consider data on friendships. Two variables are

nameandbestfriendname. Thencountmatch name bestfriendnamecounts how many people name each person innameas their best friend inbestfriendname. This will include all those who name themselves as their own best friends.Alternatively, two variables are

nameandfriendnameand each observation specifies a person and one of their friends, so that the data occur in blocks, one block for each person. Thencountmatch name friendnamecounts how many people name each person innameas their friend infriendname. This will, again, include all those who name themselves as their own friends. The count will necessarily be the same for each observation on a particular person. Downstream of this you may wish to list each person and the corresponding count just once, and egen's tag() function offers a way to do this.Doing this with

by()adds a restriction: count only within distinct groups ofbyvarlist. You might be counting only friends of the same race or gender, for example. Getting all friends and all friends in the same group will allow you to determine all friends outside the same group by subtraction.---------------------------------------------------------------------------

Do-it-yourselfAlthough

countmatchautomates a solution, the following notes on how to do this for yourself may be interesting or useful.We focus on a simple version of the problem. For different values of

var1, how many values ofvar2are the same?We will need to loop over the distinct values of

var1. Each time round the loop there will be a count, and then the result will be put into a variable in the right place(s). To do that we need to have a variable to put it in.

. gen long count = 0initialises a counter variable. The

longis cautious, just in case the counts get really big. Another variable type may well be fine for your problem. Initialising to missing (not 0) is another good way.For toy examples, we can use levelsof confidently. (In an updated Stata 8, use levels instead.) Frequently,

var1andvar2are both string, so let us focus on that situation.

. levelsofvar1, local(levels)puts the distinct values into a local macro.

. quietly foreach l of local levels {. count if `"`l'"' ==var2. replace count = r(N) ifvar1== `"`l'"'. }gives a first solution. Compound double quotes

`" "'are used just in case there are double quotes lurking in the strings. That may be unlikely, but it does no harm.Now this code pivots on both variables being string. Also, in a industrial-strength solution, you would not want to rely on all the distinct values fitting into a macro, so

levelsofmay be set on one side. One thing we can always do is map the distinct values to successive integers:

. egen group = group(var1). su group, meanonly. local ngroup = r(max)

egen, group()maps the distinct values ofvar1to the integers 1,...,#groups; and we can retrieve #groups by a summarize and then peeking at the saved results. Initialise as before:

. gen long count = 0Another variable will come in useful, holding the observation numbers. Then once again the counting is done in a loop.

. gen long obs = _n

. qui forval i = 1/`ngroup' {. su obs if group == `i', meanonly. local first = r(min). count ifvar1[`first'] ==var2. replace count = r(N) if group == `i'}The loop uses a look-up technique. When we are focusing on

group == 1, for example, how we know what value ofvar1we are dealing with? (By construction,var1is constant for each distinct value ofgroup- we set up a one-to-one mapping - but what is that constant?) Notice that it is not general enough to go

. suvar1if group == `i'and look at the saved results, because in general

var1could be a string. We have to be one step more devious. We just need to find the observation number for any observation in a particular group, and then we can get at the corresponding value ofvar1. That is where theobsvariable comes in useful. There are two saved results after a summarize that will work here, the minimum or the maximum, and you can choose. (The mean will not work in general: consider, for example, a group with just two representatives, in observation 8 and observation 10: the mean at 9 does not identify a representative.)---------------------------------------------------------------------------

Existence of match deducible from count of matchesWhether or not a match exists is determined by

inrange(count,1,.).---------------------------------------------------------------------------

Multiple variablesGiven

var1and somevarlistover which we wish to count matches, loop overvarlist. This will fail if variables are not either all numeric or all string. One way of checking first is to use ds.

. qui foreach v of varvarlist{. countmatchvar1`v', gen(`v'_m). }---------------------------------------------------------------------------

Matches in the same observationGiven

var1and somevarlistover which we wish to count matches in the same observation, initialise a count variable and then loop overvarlist. This will fail if variables are not either all numeric or all string. One way of checking first is to use ds.

. gen count = 0. qui foreach v of varvarlist{. replace count = count + (`v' ==var1). }

Examples

. countmatch name bestfriend. countmatch name bestfriend, gen(nfriends)

AuthorNicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk

AcknowledgmentsThis is a rewriting of

fndmtch2. The original problem was suggested by Brian Uzzi. A bug was reported by Socrates Mokkas, which prompted this rewriting. Marcello Pagano pointed out some unclear wording in this help.

See alsoOnline: help for duplicates; fndmtch (if installed)