{smcl} {* 11May2005} {hline} help for {hi:convert_top_lines} {hline} {title:Convert the first one or two observations to variable names and lables.} {p 8 17 2} {cmd:convert_top_lines} [, {cmd:line2labels list drop}] {title:Description} {p 4 4 2} {cmd:convert_top_lines} will take values in the first observation, and use them as variable names. Optionally, it will take values in the second observation, and use them as variable labels. This works only when all datatypes are string, and the existing names are v1, v2, etc. {title:Options} {p 4 8 2} {cmd:line2labels} specifies that values in the second observation are to be taken as variable labels. {p 4 8 2} {cmd:drop} specifies that the first, and possibly the second observations are to be dropped {c -} after the names and labels are extracted from them. With this option, the first observation is always dropped; when combined with {cmd:line2labels}, the second observation is also dropped. Typically you would want to specify this option, since, if you need to use this program, then these particular observations do not contain "regular" data, and don't belong with the others. {p 4 8 2} {cmd:list} specifies that the first three observations will be listed (after the renaming, but prior to the optional {cmd:drop} operation), so you can see the information that has been converted to names and labels. Typically, in the first observation, you will see values equaling the names, and the third observation would typically contain regular data values. {title:Remarks} {p 4 4 2} This is intended to aid in clearing up some problems that may occur with {cmd:insheet}. {p 4 4 2} Often, comma-separated-value (csv) files have the variable names in the first line. If the data follow, starting on the second line, then {cmd:insheet} knows what to do; it uses the values in the first line as the variable names, and collects the regular data beginning with the second. {p 4 4 2} But sometimes, csv raw data files come with descriptive information in the second line, in addition to variable names in the first. This descriptive information is often suitable as variable labels. But {cmd:insheet} is not able to handle that situation, and will... {p 8 10 2} a, use default names, v1, v2, v3, etc. {p 8 10 2} b, use long string datatypes, such as str68. {p 4 4 2} {cmd:insheet}, as it stands at the time of this writing, is not set up to recognize this situation, and it invokes its "take everything as string" mode. Thus, it selects datatypes that can accomodate the values in all the lines, including the second, where those long descriptions dwell. Often, in this situation, the datatype is tailored to that one longest value, and all other "actual" data values are much shorter, and possibly numeric. {p 4 4 2} This program is meant to partly remedy that situation. It will first check that all the variables are named v1, v2, etc. Then it renames them to the values contained in the first line, with these names converted to lower case. With the {cmd:line2labels} option, the values in the second line become variable labels. {p 4 4 2} Typically, you would want the {cmd:line2labels} optioon, because, if you don't have descriptive information in the second line, then {cmd:insheet} probably would have succeeded at taking variable names from the first line (if they exist therein), and you wouldn't be needing this program. But this feature was made optional for the sake of generality and to give you more control. {p 4 4 2} Some truncation may occur when the values in the second line are read into the variables during insheet, and possibly when these values are converted to variable labels. (The latter would occur with Stata SE only.) Whenever this possibility is detected (when the length of the value is >=80), a note is added to the variable, indicating the possibility of truncation. {title:Examples} {p 6 6 2}{cmd:. convert_top_lines }{p_end} {p 6 6 2}{cmd:. convert_top_lines, line2labels list drop}{p_end} {title:Additional Remarks} {p 4 4 2} In the situation where you would want to use this, all datatypes are initially string, which may not be appropriate after the operation is completed. But it is beyond the scope of {cmd:convert_top_lines} to try to remedy that situation. Thus, you may want to follow this with some changes to datatypes, such as with {cmd:compress} and {cmd:destring}. {col 12}{hline} {p 12 12 12} {hi:Technical note:} While the presence of string types may be ultimately undesirable for most variables, it makes the operations within {cmd:convert_top_lines} possible, as variable names and labels are string values. {p_end} {col 12}{hline} {p 4 4 2} This always converts the names to lower case, which would be a problem if some names are distinguished only by case. If users find they need to be able to control that, they should contact the author. {p 4 4 2} If you are using {cmd:insheet} on a large csv file with descriptive information in the second line, then you may need a large amount of memory just for the {cmd:insheet} operation. Once {cmd:convert_top_lines} is done, followed by appropriate changes to datatypes, the dataset will be much smaller, and you can return to using less memory. See {help memory}. {title:Author} {p 4 4 2} David Kantor, Institute for Policy Studies, Johns Hopkins University. Email {browse "mailto:dkantor@jhu.edu":dkantor@jhu.edu} if you observe any problems. {title:Also see} {p 4 4 2} {help insheet}, {help datatypes}, {help destring}, {help compress}