{smcl}
{* 23aug2021}{...}
{cmd:help mata mm_crosswalk()}
{hline}
{title:Title}
{p 4 10 2}
{bf:mm_crosswalk() -- Translate values between classifications (bulk recoding)}
{title:Syntax}
{p 8 23 2}
{it:transmorphic vector}
{cmd:mm_crosswalk(}{it:x}{cmd:,} {it:from}{cmd:,} {it:to} [{cmd:,} {it:d}{cmd:,}
{it:n}]{cmd:)}
{p 8 23 2}
{it:transmorphic vector}
{cmd:mm_crosswalk_hash(}{it:x}{cmd:,} {it:from}{cmd:,} {it:to} [{cmd:,} {it:d}]{cmd:)}
{p 4 8 2}
where
{p 12 16 2}
{it:x}: {it:transmorphic vector} containing values to be translated
{p 9 16 2}
{it:from}: {it:transmorphic vector} containing the origin values of the
crosswalk dictionary; {it:from} must be of the same type as {it:x}; the values in
{it:from} are assumed to be unique (this is not checked)
{p 11 16 2}
{it:to}: {it:transmorphic vector} containing the destination values of the
crosswalk dictionary; {it:to} must have the same length as {it:from}
{p 12 16 2}
{it:d}: {it:transmorphic scalar} specifying a default destination value for
elements of {it:x} that do not have a match in {it:from}; alternatively, {it:d}
may be a {it:transmorphic vector} providing individual default values (must
have same length as {it:x}); in any case, {it:d} must be of the
same type as {it:to}; the default for {it:d} is {cmd:missingof(}{it:to}{cmd:)}
{p 12 16 2}
{it:n}: {it:real scalar} specifying the maximum length of the
index-based crosswalk vector; this is only relevant if {it:x} and {it:from}
are integer and non-missing such that fast index-based translation is
possible; if the index-based crosswalk vector would be longer than
{it:n}, {cmd:mm_crosswalk()} automatically switches to the (slower but more
memory-efficient) hash-based translation algorithm; the default for
{it:n} is {cmd:1e6}
{p 16 16 2}
specify {it:n}<1 to enforce
the hash-based algorithm; specify {it:n}=. to enforce the index-based
algorithm, provided {it:x} and {it:from} are integer and non-missing; specify
{it:n}={cmd:.z} to enforce the index-based algorithm and skip any checks for
noninteger or missing values in {it:x} and {it:from}; use {it:n}={cmd:.z}
to save computer time if you know that {it:x} and {it:from} are
integer and nonmissing (the function may break or return invalid results
if these assumptions are not met)
{p 16 16 2}
in any case, usage of the index-based algorithm is only considered if {it:x}
and {it:from} have storage type {cmd:real}
{title:Description}
{pstd}
{cmd:mm_crosswalk()} translates {it:x} based on the dictionary provided
by {it:from} and {it:to}. That is, for each element in {it:x}, {cmd:mm_crosswalk()}
looks for a match in {it:from} and then returns the element from {it:to} that
has the same index as the matched element in {it:from}. Think of
{cmd:mm_crosswalk()} as a way to bulk-recode {it:x} where the element-by-element
pairs of {it:from} and {it:to} provide the recoding rules. Value
{it:d} is returned for elements in {it:x} that have no match in {it:from}.
{pstd}
If feasible, {cmd:mm_crosswalk()} uses a fast translation technique based on
indexing. This requires all elements in {it:x} and {it:from} to be integer and
nonmissing (also see the description of argument {it:n} above). In all other
cases a hash-based algorithm is employed (implemented in terms of
{helpb mf_asarray:asarray()}). The hash-based algorithm is slower than the
index-based algorithm, but it works with any type of input.
{pstd}
Function {cmd:mm_crosswalk_hash()} directly calls the hash-based algorithm.
{title:Examples}
{pstd}
Input and output may be of different type:
. {stata "mata:"}
: {stata x = (1,2,3,4,5)'}
: {stata from = (2,3)}
: {stata to = ("two","three")}
: {stata mm_crosswalk(x, from, to, "--")}
: {stata end}
. {stata "mata:"}
: {stata x = ("one","two","three","four","five")}
: {stata from = ("two","three")}
: {stata to = (2,3)}
: {stata mm_crosswalk(x, from, to, .a)}
: {stata end}
{pstd}
Partial recoding:
. {stata "mata:"}
: {stata x = (1,2,3,4,5)}
: {stata mm_crosswalk(x, (2,3), (3,2), x)}
: {stata end}
{title:Conformability}
{cmd:mm_crosswalk(}{it:x}{cmd:,} {it:from}{cmd:,} {it:to}{cmd:,} {it:d}{cmd:,} {it:n}{cmd:)}
{it:x}: {it:n x} 1 or 1 {it:x n}
{it:from}: {it:l x} 1 or 1 {it:x l}
{it:to}: {it:l x} 1 or 1 {it:x l}
{it:d}: 1 {it:x} 1 or {it:n x} 1 or 1 {it:x n}
{it:n}: 1 {it:x} 1
{it:result}: {it:n x} 1 or 1 {it:x n} (same orientation as {it:x})
{cmd:mm_crosswalk_hash(}{it:x}{cmd:,} {it:from}{cmd:,} {it:to}{cmd:,} {it:d}{cmd:)}
{it:x}: {it:n x} 1 or 1 {it:x n}
{it:from}: {it:l x} 1 or 1 {it:x l}
{it:to}: {it:l x} 1 or 1 {it:x l}
{it:d}: 1 {it:x} 1 or {it:n x} 1 or 1 {it:x n}
{it:result}: {it:n x} 1 or 1 {it:x n} (same orientation as {it:x})
{pstd}
Orientation of vectors does not matter for conformability, only length is relevant.
{title:Diagnostics}
{pstd}
The values in {it:from} are assumed to be unique such that the dictionary
defined by {it:from} and {it:to} is non-ambiguous (although not necessarily
bijective). Returned results will be arbitrary if this assumption is
not met.
{pstd}
The functions return void if {it:x} is void.
{pstd}
The functions return defaults as specified by {it:d} if {it:from} and
{it:to} are void.
{pstd}
Missing values are treated like any other values.
{title:Source code}
{pstd}
{help moremata11_source##mm_crosswalk:mm_crosswalk.mata}
{title:Author}
{pstd} Ben Jann, University of Bern, ben.jann@unibe.ch
{title:Also see}
{psee}
Online: help for
{helpb moremata}, {helpb mf_editvalue:editvalue()}, {helpb mf_asarray:asarray()}
{p_end}