{smcl} {* 23aug2021}{...} {cmd:help mata mm_crosswalk()} {hline} {title:Title} {p 4 10 2} {bf:mm_crosswalk() -- Translate values between classifications (bulk recoding)} {title:Syntax} {p 8 23 2} {it:transmorphic vector} {cmd:mm_crosswalk(}{it:x}{cmd:,} {it:from}{cmd:,} {it:to} [{cmd:,} {it:d}{cmd:,} {it:n}]{cmd:)} {p 8 23 2} {it:transmorphic vector} {cmd:mm_crosswalk_hash(}{it:x}{cmd:,} {it:from}{cmd:,} {it:to} [{cmd:,} {it:d}]{cmd:)} {p 4 8 2} where {p 12 16 2} {it:x}: {it:transmorphic vector} containing values to be translated {p 9 16 2} {it:from}: {it:transmorphic vector} containing the origin values of the crosswalk dictionary; {it:from} must be of the same type as {it:x}; the values in {it:from} are assumed to be unique (this is not checked) {p 11 16 2} {it:to}: {it:transmorphic vector} containing the destination values of the crosswalk dictionary; {it:to} must have the same length as {it:from} {p 12 16 2} {it:d}: {it:transmorphic scalar} specifying a default destination value for elements of {it:x} that do not have a match in {it:from}; alternatively, {it:d} may be a {it:transmorphic vector} providing individual default values (must have same length as {it:x}); in any case, {it:d} must be of the same type as {it:to}; the default for {it:d} is {cmd:missingof(}{it:to}{cmd:)} {p 12 16 2} {it:n}: {it:real scalar} specifying the maximum length of the index-based crosswalk vector; this is only relevant if {it:x} and {it:from} are integer and non-missing such that fast index-based translation is possible; if the index-based crosswalk vector would be longer than {it:n}, {cmd:mm_crosswalk()} automatically switches to the (slower but more memory-efficient) hash-based translation algorithm; the default for {it:n} is {cmd:1e6} {p 16 16 2} specify {it:n}<1 to enforce the hash-based algorithm; specify {it:n}=. to enforce the index-based algorithm, provided {it:x} and {it:from} are integer and non-missing; specify {it:n}={cmd:.z} to enforce the index-based algorithm and skip any checks for noninteger or missing values in {it:x} and {it:from}; use {it:n}={cmd:.z} to save computer time if you know that {it:x} and {it:from} are integer and nonmissing (the function may break or return invalid results if these assumptions are not met) {p 16 16 2} in any case, usage of the index-based algorithm is only considered if {it:x} and {it:from} have storage type {cmd:real} {title:Description} {pstd} {cmd:mm_crosswalk()} translates {it:x} based on the dictionary provided by {it:from} and {it:to}. That is, for each element in {it:x}, {cmd:mm_crosswalk()} looks for a match in {it:from} and then returns the element from {it:to} that has the same index as the matched element in {it:from}. Think of {cmd:mm_crosswalk()} as a way to bulk-recode {it:x} where the element-by-element pairs of {it:from} and {it:to} provide the recoding rules. Value {it:d} is returned for elements in {it:x} that have no match in {it:from}. {pstd} If feasible, {cmd:mm_crosswalk()} uses a fast translation technique based on indexing. This requires all elements in {it:x} and {it:from} to be integer and nonmissing (also see the description of argument {it:n} above). In all other cases a hash-based algorithm is employed (implemented in terms of {helpb mf_asarray:asarray()}). The hash-based algorithm is slower than the index-based algorithm, but it works with any type of input. {pstd} Function {cmd:mm_crosswalk_hash()} directly calls the hash-based algorithm. {title:Examples} {pstd} Input and output may be of different type: . {stata "mata:"} : {stata x = (1,2,3,4,5)'} : {stata from = (2,3)} : {stata to = ("two","three")} : {stata mm_crosswalk(x, from, to, "--")} : {stata end} . {stata "mata:"} : {stata x = ("one","two","three","four","five")} : {stata from = ("two","three")} : {stata to = (2,3)} : {stata mm_crosswalk(x, from, to, .a)} : {stata end} {pstd} Partial recoding: . {stata "mata:"} : {stata x = (1,2,3,4,5)} : {stata mm_crosswalk(x, (2,3), (3,2), x)} : {stata end} {title:Conformability} {cmd:mm_crosswalk(}{it:x}{cmd:,} {it:from}{cmd:,} {it:to}{cmd:,} {it:d}{cmd:,} {it:n}{cmd:)} {it:x}: {it:n x} 1 or 1 {it:x n} {it:from}: {it:l x} 1 or 1 {it:x l} {it:to}: {it:l x} 1 or 1 {it:x l} {it:d}: 1 {it:x} 1 or {it:n x} 1 or 1 {it:x n} {it:n}: 1 {it:x} 1 {it:result}: {it:n x} 1 or 1 {it:x n} (same orientation as {it:x}) {cmd:mm_crosswalk_hash(}{it:x}{cmd:,} {it:from}{cmd:,} {it:to}{cmd:,} {it:d}{cmd:)} {it:x}: {it:n x} 1 or 1 {it:x n} {it:from}: {it:l x} 1 or 1 {it:x l} {it:to}: {it:l x} 1 or 1 {it:x l} {it:d}: 1 {it:x} 1 or {it:n x} 1 or 1 {it:x n} {it:result}: {it:n x} 1 or 1 {it:x n} (same orientation as {it:x}) {pstd} Orientation of vectors does not matter for conformability, only length is relevant. {title:Diagnostics} {pstd} The values in {it:from} are assumed to be unique such that the dictionary defined by {it:from} and {it:to} is non-ambiguous (although not necessarily bijective). Returned results will be arbitrary if this assumption is not met. {pstd} The functions return void if {it:x} is void. {pstd} The functions return defaults as specified by {it:d} if {it:from} and {it:to} are void. {pstd} Missing values are treated like any other values. {title:Source code} {pstd} {help moremata11_source##mm_crosswalk:mm_crosswalk.mata} {title:Author} {pstd} Ben Jann, University of Bern, ben.jann@unibe.ch {title:Also see} {psee} Online: help for {helpb moremata}, {helpb mf_editvalue:editvalue()}, {helpb mf_asarray:asarray()} {p_end}