help dyads -------------------------------------------------------------------------------

Title

dyads -- Transform observations into dyads

Syntax

dyads idvar [, DYadvars(varlist)]

Description dyads takes a set of N observations and returns a set of (N(N-1))/2 dyads. Observations are identified by idvar, which must be specified. The user can also specify via DYadvars() any variables whose values should be added in for the second half of the created dyads; the default is to copy only the value of idvar for the other half of the dyad.

Remarks

dyads is designed to take a file that has N observations, indexed by some idvar, and create a file that has (N(N-1))/2 dyads representing all the possible pairs of observations in the original dataset. The command is meant to be used as a preface to calculating dyad-based statistics, as one might do for many types of network analyses. In addition to generating pairs of observation identifierss, dyads can include a user-specified collection of variables.

You could instead create dyads using Stata's matrix-programming capabilities. The advantages of dyads are that it doesn't require learning the matrix syntax for a discrete data-management job and it isn't limited by Stata's matrix-size boundaries. Thus it runs quickly on large datasets. It is worth noting that dyads produces N*N observations before finishing with (N(N-1))/2 observations. This may be a memory issue with large datasets.

Examples

The simplest example is to take a list of IDs and generate all pairs of IDs. Imagine the following dataset:

N id ------ 1 1 2 2 3 3 4 4 5 5

Typing

. dyads id

will produce the following dataset:

N id id_d ----------- 1 1 2 2 1 3 3 1 4 4 1 5 5 2 3 6 2 4 7 2 5 8 3 4 9 3 5 10 4 5

...that is, all the pairwise combinations of id.

A more useful example might be calculating the distance between pairs of objects. Imagine a dataset with firms and their (x,y) locations:

firm x y ------------- firma 12 13 firmb 4 6 firmc 10 2 firmd 5 17

. dyads firm, dy(x y)

firm x y firm_d x_d y_d -------------------------------- firma 12 13 firmb 4 6 firma 12 13 firmc 10 2 firma 12 13 firmd 5 17 firmb 4 6 firmc 10 2 firmb 4 6 firmd 5 17 firmc 10 2 firmd 5 17

You could now calculate distance for all the dyads using

. generate dist = sqrt((x-x_d)^2+(y-y_d)^2)

A final example shows the utility of choosing idvar carefully. Imagine a dataset with basketball players, the times they entered the game (expressed in minutes since the start) and the times they exited:

id timein timeout ------------------- 1 5 27 2 15 41 3 8 33 4 2 36 5 39 51 6 33 36

Assume that you want to know the amount of time that each player-dyad spent on the court together. In order to generate that particular statistic, you might want to ensure that the player who came onto the court earlier is always the first member of the dyad. Creating a unique idvar on which to run dyads can help with this.

. sort timein id

. generate sortid = _n

. dyads sortid, dy(id timein timeout)

id timein timeout sortid sortid_d id_d timein_d timeout_d ---------------------------------------------------------------- 4 2 36 1 2 1 5 27 4 2 36 1 3 3 8 33 4 2 36 1 4 2 15 41 ...

Notice that, in order to keep the original observation's identifier when generating the dyads, in this case the variable id was included inside {id:DYadvars()}. Calculation of overlaps could now proceed under the assumption that the first member of the dyad always had an earlier or equal starting time than the second member.

Author

John-Paul Ferguson