{smcl}
{* *! version 1.0.13  10oct2019}{...}
{cmd:help dta2md}
{hline}

{title:Title}

{phang}
{bf:dta2md} {hline 2} Convert Stata system file to metadata{p_end}


{title:Table of contents}

    {help dta2md##syn:Syntax}
    {help dta2md##des:Description}
    {help dta2md##kno:Known issues}
    {help dta2md##exa:Examples}
    {help dta2md##ack:Acknowledgements}
    {help dta2md##aut:Author}

{marker syn}{...}
{title:Syntax}

{p 8 15 2}
{cmd:dta2md} {cmdab:in:put(}{it:filename}{cmd:)} {cmdab:freqvar:list(}{varlist}{cmd:)} {cmdab:out:put(}{it:filename}{cmd:)} [{cmdab:gr:oup(}{var}{cmd:)} {cmdab:re:place} {cmdab:miss:ingdef(}{it:exp}{cmd:)} {cmdab:smiss:ingdef(}{it:exp}{cmd:)} {cmdab:relat:ion(}{it:re}{cmd:)}]

{marker args}{...}

{synoptset 28}{...}
{synopthdr:Arguments}
{synoptline}
{synopt:{cmdab:in:put}}specifies the Stata system file which is converted.{p_end}

{synopt:{cmdab:freqvar:list}}list of variables for which information on value level is computed.{p_end}

{synopt:{cmdab:out:put}}specifies where the metadata file is saved.{p_end}

{synopt:{cmdab:gr:oup}}specifies group variable across which meta data are stratified.{p_end}

{synopt:{cmdab:re:place}}specifies if output file replaces existing file.{p_end}

{synopt:{cmdab:miss:ingdeff}}specifies definition which non-system-missings in numerical variables should be considered as invalid values.{p_end}

{synopt:{cmdab:smiss:ingdeff}}specifies definition which non-system-missings in alpha-numeric variables should be considered as invalid values.{p_end}

{synopt:{cmdab:relat:ion}}specifies a regular expression which is used to indicate 
related variables of main variables. E.g. the suffix _flag indicates flag variables of the respective variables without the suffix. This relation would be specified with {cmd:relation("(_flag)$")}.{p_end}

{marker des}{...}
{title:Description}

{p 4 4 2}
{cmd:dta2md} converts a Stata system file to aggregated metadata. For all variables, 
descriptive statistics are computed. Additionally, for a specified subset of variables, frequencies on the value level can be generated.
The ado allows to generate all metadata for the whole sample and also for all levels of a categorical variable.
Exemplatory use cases might be international comparative data. 

{marker kno}{...}
{title:Known issues}

{p 4 4 2}
Continuous variables: In the metadata file, a observation for each level of an variable in the list {cmd:freqvarlist} is created. 
This leads to a huge metadata file if continuous variables are accidentally included in the list {cmd:freqvarlist}. Therefore, do not include continuous variables.

{p 4 4 2}
Encoding problems: The ado does not guarantee proper translation between encoding. If you run into problems, use {manhelp unicode_translate D:unicode translate} to convert your system file, before you use the ado.

{p 4 4 2}
Missing values: The ado uses system missings to compute the number of valid cases and respective percentages of all cases. The options {cmd:missingdeff} and {cmd:smissingdeff} can be used to
specify missing values. This missing option functionality is designed for identical system missing structure for all variables, e.g. all negative values should be considered as system missings.
It is possible to specify missing definition that differentiate along variables, but the ado is finally tested for such applications. We advise to use consistent missing definitions.

{marker ex}{...}
{title:Examples}

{p 4 4 2}
This is simple application of the ado to convert the auto data to metadata. Here, we only compute frequencies for the variables {cmd:rep78}. We use {cmd:foreign} as group variable, i.e. we get all statistics
also for domestic and foreign cars separately. In the missing option, we define that all negative values in numeric variable should be considered as system missing.

{inp:. dta2md, input(`c(sysdir stata)'auto.dta) output("auto_md.dta") freqvarlist(rep78) group(foreign) missingdef(`"(X < 0) | missing(X)"') smissingdef(missing(X)) replace}
{txt}Variables processed (12)
{txt}----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
{txt}............

{inp:. use auto_md.dta}

{inp:. list in 1/20, sepby(varName)}

{txt}     +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{txt}     | group   computed   varName        variableLabel   total_n   total_~g    min     max       Mean   Standa~n   value   valueL~l    n    percent   validP~t   isValid   first |
{txt}     |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
{txt}  1. |   all          0      make       Make and Model        74          0      .       .          .          .                       .          .          .         .       1 |
{txt}  2. |   DOM          0      make       Make and Model        52          0      .       .          .          .                       .          .          .         .       1 |
{txt}  3. |   FOR          0      make       Make and Model        22          0      .       .          .          .                       .          .          .         .       1 |
{txt}     |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
{txt}  4. |   all          0     price                Price        74          0   3291   15906   6165.257   2949.496                       .          .          .         .       1 |
{txt}  5. |   DOM          0     price                Price        52          0   3291   15906   6072.423   3097.104                       .          .          .         .       1 |
{txt}  6. |   FOR          0     price                Price        22          0   3748   12990   6384.682   2621.915                       .          .          .         .       1 |
{txt}     |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
{txt}  7. |   all          0       mpg        Mileage (mpg)        74          0     12      41    21.2973   5.785503                       .          .          .         .       1 |
{txt}  8. |   DOM          0       mpg        Mileage (mpg)        52          0     12      34   19.82692   4.743297                       .          .          .         .       1 |
{txt}  9. |   FOR          0       mpg        Mileage (mpg)        22          0     14      41   24.77273   6.611187                       .          .          .         .       1 |
{txt}     |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
{txt} 10. |   all          1     rep78   Repair Record 1978        74          5      1       5   3.405797   .9899323       1               2   2.702703   2.898551         1       1 |
{txt} 11. |   all          1     rep78   Repair Record 1978        74          5      1       5   3.405797   .9899323       2               8   10.81081    11.5942         1       0 |
{txt} 12. |   all          1     rep78   Repair Record 1978        74          5      1       5   3.405797   .9899323       3              30   40.54054   43.47826         1       0 |
{txt} 13. |   all          1     rep78   Repair Record 1978        74          5      1       5   3.405797   .9899323       4              18   24.32432   26.08696         1       0 |
{txt} 14. |   all          1     rep78   Repair Record 1978        74          5      1       5   3.405797   .9899323       5              11   14.86487   15.94203         1       0 |
{txt} 15. |   DOM          1     rep78   Repair Record 1978        52          4      1       5   3.020833    .837666       1               2   3.846154   4.166667         1       1 |
{txt} 16. |   DOM          1     rep78   Repair Record 1978        52          4      1       5   3.020833    .837666       2               8   15.38461   16.66667         1       0 |
{txt} 17. |   DOM          1     rep78   Repair Record 1978        52          4      1       5   3.020833    .837666       3              27   51.92308      56.25         1       0 |
{txt} 18. |   DOM          1     rep78   Repair Record 1978        52          4      1       5   3.020833    .837666       4               9   17.30769      18.75         1       0 |
{txt} 19. |   DOM          1     rep78   Repair Record 1978        52          4      1       5   3.020833    .837666       5               2   3.846154   4.166667         1       0 |
{txt} 20. |   FOR          1     rep78   Repair Record 1978        22          1      3       5   4.285714   .7171372       3               3   13.63636   14.28571         1       1 |
{txt}     +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

{p 4 4 2}{txt}
In the second example, we also use the option {cmd:relation} to define dependencies among variables, in order to distinguish attached variables from main variables.
We again use the auto.dta, but let's assume the {cmd:length} is the flag variable related to {cmd:weight}. To indicate this, the variable name is {cmd:weight_flag}.
In such a case, we would add the the option {cmd:relation("(_flag)$")}. For illustration we change the auto file accordingly.

{inp:. sysuse auto.dta, clear}
{txt}(1978 Automobile Data)

{inp:. rename length weight_flag}

{inp:. save auto.dta, replace}
{txt}file auto.dta saved

{inp:. dta2md, input(`c(sysdir stata)'auto.dta) output("auto_md.dta") freqvarlist(rep78) group(foreign) missingdef(`"(X < 0) | missing(X)"') smissingdef(missing(X)) relation("(_flag)$") replace}
{txt}Variables processed (12)
{txt}----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
{txt}............

{inp:. list in 29/34, sepby(varName)}

{txt}     +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{txt}     | group   computed       varName   mother   variableLabel   total_n   total_~g    min    max       Mean   Standa~n   value   valueL~l   n   percent   validP~t   isValid   first |
{txt}     |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
{txt} 29. |   all          0        weight            Weight (lbs.)        74          0   1760   4840   3019.459   777.1935                      .         .          .         .       1 |
{txt} 30. |   DOM          0        weight            Weight (lbs.)        52          0   1800   4840   3317.115   695.3638                      .         .          .         .       1 |
{txt} 31. |   FOR          0        weight            Weight (lbs.)        22          0   1760   3420   2315.909   433.0034                      .         .          .         .       1 |
{txt}     |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
{txt} 32. |   all          0   weight_flag   weight    Length (in.)        74          0    142    233   187.9324   22.26634                      .         .          .         .       1 |
{txt} 33. |   DOM          0   weight_flag   weight    Length (in.)        52          0    147    233   196.1346   20.04605                      .         .          .         .       1 |
{txt} 34. |   FOR          0   weight_flag   weight    Length (in.)        22          0    142    193   168.5455   13.68255                      .         .          .         .       1 |
{txt}     +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

{p 4 4 2}{txt}
The variable {cmd:mother} indicates that {cmd:weight} is the relational superordinate partner to {cmd:weight_flag}.
Typical use cases would me additional variable that indicate the origin of missing values and flag variables that mark imputed values.

{marker ack}{...}
{title:Acknowledgments}

{p 4 4 2}
Florian Thirolf, Anne Balz

{marker aut}{...}
{title:Author}

{pstd}Klaus Pforr, GESIS, klaus.pforr@gesis.org