{smcl} {* *! version 1.0 05apr2011} {cmd:help dyads} {hline} {title:Title} {phang}{bf:dyads} {hline 2} Transform observations into dyads {title:Syntax} {phang}{bf:dyads} {it:idvar} [{cmd:,} {it:DYadvars(varlist)}] {title:Description} {pstd} {cmd:dyads} takes a set of {it:N} observations and returns a set of {it:(N(N-1))/2} dyads. Observations are identified by {it:idvar}, which must be specified. The user can also specify via {it:DYadvars()} any variables whose values should be added in for the second half of the created dyads; the default is to copy only the value of {it:idvar} for the other half of the dyad. {title:Remarks} {pstd}{cmd:dyads} is designed to take a file that has {it:N} observations, indexed by some {it:idvar}, and create a file that has {it:(N(N-1))/2} dyads representing all the possible pairs of observations in the original dataset. The command is meant to be used as a preface to calculating dyad-based statistics, as one might do for many types of network analyses. In addition to generating pairs of observation identifierss, {cmd:dyads} can include a user-specified collection of variables. {pstd}You could instead create dyads using Stata's matrix-programming capabilities. The advantages of {cmd:dyads} are that it doesn't require learning the matrix syntax for a discrete data-management job and it isn't limited by Stata's matrix-size boundaries. Thus it runs quickly on large datasets. It is worth noting that {cmd:dyads} produces {it:N*N} observations before finishing with {it:(N(N-1))/2} observations. This may be a memory issue with large datasets. {title:Examples} {pstd}The simplest example is to take a list of IDs and generate all pairs of IDs. Imagine the following dataset: {center:{cmd:N id}} {center:{hline 6}} {center:1 1} {center:2 2} {center:3 3} {center:4 4} {center:5 5} {pstd}Typing {phang}{cmd:. dyads id} {pstd}will produce the following dataset: {center:{cmd:N id id_d}} {center:{hline 11}} {center: 1 1 2} {center: 2 1 3} {center: 3 1 4} {center: 4 1 5} {center: 5 2 3} {center: 6 2 4} {center: 7 2 5} {center: 8 3 4} {center: 9 3 5} {center:10 4 5} {pstd}...that is, all the pairwise combinations of {it:id}. {pstd}A more useful example might be calculating the distance between pairs of objects. Imagine a dataset with firms and their {it:(x,y)} locations: {center:{cmd:firm x y}} {center:{hline 13}} {center:firma 12 13} {center:firmb 4 6} {center:firmc 10 2} {center:firmd 5 17} {phang}{cmd:. dyads firm, dy(x y)} {center:{cmd:firm x y firm_d x_d y_d}} {center:{hline 32}} {center:firma 12 13 firmb 4 6} {center:firma 12 13 firmc 10 2} {center:firma 12 13 firmd 5 17} {center:firmb 4 6 firmc 10 2} {center:firmb 4 6 firmd 5 17} {center:firmc 10 2 firmd 5 17} {pstd}You could now calculate distance for all the dyads using {phang}{cmd:. generate dist = sqrt((x-x_d)^2+(y-y_d)^2)} {pstd}A final example shows the utility of choosing {it:idvar} carefully. Imagine a dataset with basketball players, the times they entered the game (expressed in minutes since the start) and the times they exited: {center:{cmd:id timein timeout}} {center:{hline 19}} {center: 1 5 27} {center: 2 15 41} {center: 3 8 33} {center: 4 2 36} {center: 5 39 51} {center: 6 33 36} {pstd}Assume that you want to know the amount of time that each player-dyad spent on the court together. In order to generate that particular statistic, you might want to ensure that the player who came onto the court earlier is always the {it:first} member of the dyad. Creating a unique {it:idvar} on which to run {cmd:dyads} can help with this. {phang}{cmd:. sort timein id} {phang}{cmd:. generate sortid = _n} {phang}{cmd:. dyads sortid, dy(id timein timeout)} {center:{cmd:id timein timeout sortid sortid_d id_d timein_d timeout_d}} {center:{hline 64}} {center: 4 2 36 1 2 1 5 27} {center: 4 2 36 1 3 3 8 33} {center: 4 2 36 1 4 2 15 41} {center:...} {pstd}Notice that, in order to keep the original observation's identifier when generating the dyads, in this case the variable {it:id} was included inside {id:DYadvars()}. Calculation of overlaps could now proceed under the assumption that the first member of the dyad always had an earlier or equal starting time than the second member. {title:Author} {phang}John-Paul Ferguson {phang}ferguson_john-paul@gsb.stanford.edu