{smcl} {hline} help for {cmd:cprdenttype} and {cmd:cprdentcode}{right:(Roger Newson)} {hline} {title:Converting {browse "http://www.cprd.com":CPRD} entity string data of any {cmd:enttype} to numeric} {p 8 21 2} {cmd:cprdenttype} {it:#} {cmd:using} {it:filename} [ , {opt lloo:kuplist(name)} {opt ldes:clist(name)} ]{p_end} {p 8 21 2} {cmd:cprdentcode} {ifin} , {opt g:enerate(stub)} {opt loo:kuplist(string_list)} [ {opt des:clist(string_list)} {opt do:file(filename)} float ]{p_end} {title:Description} {pstd} The {cmd:cprdenttype} package is intended for use with datasets produced by the {helpb cprdutil} package to convert CPRD entity string data variables to numeric variables. {cmd:cprdenttype} inputs an integer number, assumed to be a CPRD {cmd:enttype} (or entity type) value, and a filename, assumed to be a Stata dataset produced by the {helpb cprd_entity} module of {helpb cprdutil}, with one observation for each {cmd:enttype} value known to CPRD. It outputs descriptive information about the entity type with the specified {cmd:enttype} value, and, optionally, two local macros, storing lists of CPRD lookups and descriptive labels for decoding the string data variables used by that {cmd:enttype}. {cmd:cprdentcode} is used in test and additional clinical datasets, produced by the {helpb cprd_test} and {helpb cprd_additional} modules of {helpb cprdutil}, respectively. It inputs the lists specified by the local macros output by {cmd:cprdenttype}, and uses them to input the string entity data variables in the datasets, and to output a list of numeric variables with the correct formats, variable labels and/or value labels, assuming that the test or additional clinical observations have the {cmd:enttype} value originally specified by the user to {cmd:cprdenttype}. {cmd:cprdentcode} requires the {help ssc:SSC} package {helpb chardef} in order to work. {title:Options for {cmd:cprdenttype}} {phang} {opt llookuplist(name)} specifies the name of a local macro, to be set by {cmd:cprdenttype} to contain a list of lookups for the {cmd:enttype} specified. The lookup list for an entity type specifies the way in which the string variables {cmd:data}{it:i} should be converted, for values of {it:i} from 1 to the number of string data fields used by that {cmd:enttype} value. The value {cmd:"dd/mm/yyyy"} specifies that the field will be input as a date in {cmd:"DMY"} format. A 3-character name equal to the name of a CPRD {help cprd_xyzlookup:{it:XYZ} lookup}, of the form {cmd:"{it:XYZ}"}, specifies that the field will be input as a number, and then assigned the {help label:value label} specified by the {it:XYZ} lookup (if a correct {cmd:dofile()} option is specified to {cmd:cprdentcode}). Other values, including the empty string {cmd:""}, specify that the field will be input simply as a number, with a default format and no value label. {phang} {opt ldesclist(name)} specifies the name of a local macro, to be set by {cmd:cprdenttype} to contain a list of descriptive labels for the specified {cmd:enttype}. The descriptive label list for an entity type specifies the recommended {help label:variable labels} for the new numeric variables generated by coding from the string variables {cmd:data}{it:i}, for values of {it:i} from 1 to the number of string data fields used by that {cmd:enttype} value. {title:Options for {cmd:cprdentcode}} {phang} {opt generate(stub)} specifies a prefix (or stub) to be used to name the generated numeric variables. These generated variables will be named as {it:stub}1 to {it:stubN}, where {it:N} is the number of string elements supplied in the {cmd:lookuplist()} option (see below). If the {cmd:lookuplist()} option is supplied as the value of the macro variable specified by the {mmd:llookuplist()} option of {cmd:cprdenttype}, then the number of list elements will be equal to the number of string data fielda used by the {cmd:enttype} supplied to {cmd:cprdenttype} by the user. {phang} {opt lookuplist(string_list)} specifies a list of quoted string values, one for each of the data fields used by the entity type to be used, usually supplied as the value of the local macro generated by tle {cmd:llookuplist()} option of {cmd:cprdenttype}. The list will be used to specify the coding of the string variables to numeric variables. {phang} {opt desclist(string_list)} specifies a list of quoted string values, one for each of the data fields used by the entity type to be used, usually supplied as the value of the local macro generated by tle {cmd:ldesclist()} option of {cmd:cprdenttype}. The list will be used to specify {help label:variable labels} for the newly generated numeric variables. {phang} {opt dofile(filename)} specifies the name of an existing {help do:Stata do-file}, usually created using the {cmd:dofile()} option of the {helpb cprd_xyzlookup} module of the {helpb cprdutil} package. This do-file is executed by {cmd:cprdentcode}, and usually creates a long list of {help label:value labels}, some of which are then assigned to the numeric variables generated by {cmd:cprdentcode}. {phang} {opt float} specifies that non-date variables may not have a higher precision than {cmd:float}. {title:Remarks} {pstd} The {browse "http://www.cprd.com":Clinical Practice Research Datalink (CPRD)} is a data warehouse of information from primary-care practices in the British National Health Service (NHS). The {helpb cprd_additional} and {helpb cprd_test} modules of the {helpb cprdutil} package create datasets containing a numeric variable {cmd:enttype}, which specifies a CPRD entity type, and also containing string variables with names beginning with {cmd:data}, such as {cmd:data1} and {cmd:data2}. These string variables contain information on CPRD entities, of a type indicated by {cmd:enttype}. These entity types determine how the {cmd:data}{it:i} variables should be read, in order to evaluate new numeric variables containing information on those entities. For instance, entity type 1 is a blood pressure measurement, and the {cmd:data}{it:i} variables in an observation whose {cmd:enttype} value is 1 contain information on the blood pressure measurement, such as the systolic and diastolic values, the time of day when the measurement was made, and the posture of the patient when the measurement was made. The use of multiple entity types enables the same string variables to be used to store different numeric data in observations with different entity types. {pstd} A list of all the {cmd:enttype} values is usually stored in a Stata disk dataset, created by the {helpb cprd_entity} module of the {helpb cprdutil} package. This dataset has 1 observation per entity type, and data on descriptive attributes of that entity type, and also on the number of string data fields used by that entity type, how these string fields should be converted to numeric variables, and sensible variable labels for these numeric variables. This information is extracted by {cmd:cprdenttype} for the entity type value specified by the user, and stored in saved results or local macros. The basic descriptive information is output to the Stata log to inform the user. The user can then input data to be converted, and use {cmd:cprdentcode} to do the converting, using the local macros set by {cmd:cprdenttype}. {pstd} Note that it is the user's responsibility to ensure that {cmd:cprdentcode} is used sensibly, on observations whose {cmd:enttype} value is the one specified by the user to {cmd:cprdenttype}. However, it is easy (and saves space) for the user to input into the memory only the observations with the correct {cmd:enttype} value, before any conversion is done. {title:Examples} {pstd} The following examples assume that the current folder has a sister folder {cmd:../cprddata} containing a CPRD retrieval of text files, and also a daughter folder {cmd:./dta} where Stata datasets can be created. The set-up section creates first a do-file {cmd:xyzlookuplabs.do} in the current folder, to be run to define the CPRD value labels, and then 3 Stata datasets {cmd:entity}, {cmd:clinical}, {cmd:additional} and {cmd:test}, containing entity data, clinical-event data, additional clinical-event data records, and test data, respectively. {pstd} Set-up: {p 8 12 2}{cmd:. cprd_xyzlookup, txtdir("../cprddata/Lookups/TXTFILES") do(xyzlookuplabs.do, replace)}{p_end} {p 8 12 2}{cmd:. cprd_entity using ../cprddata/Lookups/entity.txt, clear}{p_end} {p 8 12 2}{cmd:. save ./dta/entity.dta, replace}{p_end} {p 8 12 2}{cmd:. cprd_clinical using ../cprddata/Data/clinical.txt, clear dofile(xyzlookuplabs.do)}{p_end} {p 8 12 2}{cmd:. save ./dta/clinical.dta, replace}{p_end} {p 8 12 2}{cmd:. cprd_additional using ../cprddata/Data/additional.txt, clear dofile(xyzlookuplabs.do)}{p_end} {p 8 12 2}{cmd:. save ./dta/additional.dta, replace, replace}{p_end} {p 8 12 2}{cmd:. cprd_test using ../cprddata/Data/test.txt, clear dofile(xyzlookuplabs.do)}{p_end} {p 8 12 2}{cmd:. save ./dta/test.dta, replace}{p_end} {pstd} The following example starts by using {cmd:cprdenttype} to create 2 local macros {cmd:LL} and {cmd:LD}, containing the lookups list and the descriptive label list, respectively, for the {cmd:enttype} value 4 ({cmd:"Smoking"}). It then inputs additional clinical data records with {cmd:enttype} equal to 4, and uses {cmd:cprdentcode} to generate a list of numeric variables, numbered {cmd:ad_1} to {cmd:ad_6}, with variable labels, value labels and formats defined by the do-file generated earlier, and by the lists in the 2 local macros generated earlier. The user may tabulate, summarize and graph these variables to find out about the distributions of smoking-status data. {p 8 12 2}{cmd:. clear}{p_end} {p 8 12 2}{cmd:. cprdenttype 4 using ./dta/entity, llookuplist(LL) ldesclist(LD)}{p_end} {p 8 12 2}{cmd:. use ./dta/additional if enttype==4, clear}{p_end} {p 8 12 2}{cmd:. describe, full}{p_end} {p 8 12 2}{cmd:. cprdentcode, generate(ad_) lookuplist(`LL') desclist(`LD') dofile(xyzlookuplabs.do)}{p_end} {pstd} The following example also starts by using {cmd:cprdenttype} to create 2 local macros {cmd:LL} and {cmd:LD}, this time containing the lookups list and the descriptive label list, respectively, for the {cmd:enttype} value 163 ({cmd:"Serum cholesterol"}). It then inputs test data records with {cmd:enttype} equal to 163, and uses {cmd:cprdentcode} to generate a list of numeric variables, numbered {cmd:td_1} to {cmd:td_7}, with variable labels, value labels and formats defined by the do-file generated earlier, and by the lists in the 2 local macros generated earlier. This time, we save space by using the {cmd:float} option to prevent double-precision variables. The user may tabulate, summarize and graph these variables to find out about the distributions of serum cholesterol data. {p 8 12 2}{cmd:. clear}{p_end} {p 8 12 2}{cmd:. cprdenttype 163 using ./dta/entity, llookuplist(LL) ldesclist(LD)}{p_end} {p 8 12 2}{cmd:. use ./dta/test if enttype==163, clear}{p_end} {p 8 12 2}{cmd:. describe, full}{p_end} {p 8 12 2}{cmd:. cprdentcode, generate(td_) lookuplist(`LL') desclist(`LD') dofile(xyzlookuplabs.do) float}{p_end} {pstd} Note that {cmd:cprdutil} modules require the {help ssc:SSC} packages {helpb keyby}, {helpb addinby}, {helpb chardef}, {helpb lablist}, and {helpb intext} to be installed in order to work. It is also a good idea to call {cmd:cprdenttype} before inputting any large datasets, because otherwise the large datasets will be temporarily {helpb preserve:preserve}d when the smaller entity dataset is input, creating a lot of work. {title:Saved results} {pstd} {cmd:cprdenttype} saves the following results in {cmd:r()}: {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Scalars}{p_end} {synopt:{cmd:r(enttype)}}the user-specified entity type{p_end} {synopt:{cmd:r(data_fields)}}the number of data fields used by the entity type{p_end} {p2col 5 20 24 2: Macros}{p_end} {synopt:{cmd:r(description)}}description of entity type{p_end} {synopt:{cmd:r(filetype)}}type of data file ("Test" or "Clinical") using the entity type{p_end} {synopt:{cmd:r(category)}}broad category for entity type{p_end} {synopt:{cmd:r(lookuplist)}}lookup list for data fields{p_end} {synopt:{cmd:r(desclist)}}descriptive label lists for data fields{p_end} {p2colreset}{...} {pstd} {cmd:cprdentcode} saves the following results in {cmd:r()}: {synoptset 20 tabbed}{...} {p2col 5 20 24 2: Macros}{p_end} {synopt:{cmd:r(newvars)}}list of generated numeric output variables{p_end} {p2colreset}{...} {pstd} {cmd:cprdentcode} also saves, for each generated variable, the {help char:variable characteristic} {cmd:varname[lookup]}, containing the CPRD lookup rule used for the generated variable. This may be a 3-character {help label:value label name}, or a CPRD date format specification such as {cmd:"dd/mm/yyyy"}, or a specification such as {cmd:"medical dictionary"}, indicating that the generated variable refers to a medical code in the CPRD medical dictionary. {title:Author} {pstd} Roger Newson, Imperial College London, UK.{break} Email: {browse "mailto:r.newson@imperial.ac.uk":r.newson@imperial.ac.uk} {title:Also see} {psee} {space 2}Help: {browse "http://www.cprd.com":Clinical Practice Research Datalink (CPRD)}{break} {helpb cprdutil}, {helpb keyby}, {helpb addinby}, {helpb chardef}, {helpb lablist}, {helpb intext} if installed {p_end}