{smcl}
{* 31 Jan 2019}{...}
{hline}
help for {hi:sctostreamsum}
{hline}

{title:Title}

{phang2}{cmdab:sctostreamsum} {hline 2} This command calculates statistics from sensor stream files outputted during data collection using {browse "https://www.surveycto.com/":SurveyCTO} and summarizes them at the submission level in a .dta file so that they can be merged with the main data.{p_end}

{title:Syntax}

{phang2} {cmdab:sctostreamsum} ,
		{cmdab:media:folder(}{it:folder_path}{cmd:)} {cmdab:output:folder(}{it:folder_path}{cmd:)}
	[
		{cmdab:sen:sors(}{it:string}{cmd:)} {cmdab:replace} {cmdab:quiet} {cmdab:still} {cmdab:moving}
		{cmdab:llbet:ween(}{it:string}{cmd:)} {cmdab:slbet:ween(}{it:string}{cmd:)}
		{cmdab:spbet:ween(}{it:string}{cmd:)} {cmdab:mvbet:ween(}{it:string}{cmd:)}
	]{p_end}

{marker opts}{...}
{synoptset 24}{...}
{synopthdr:options}
{synoptline}
{pstd}{it:Required options:}{p_end}
{synopt :{cmdab:media:folder(}{it:folder_path}{cmd:)}} Folder where the .csv stream files used as an input to this command are saved.{p_end}
{synopt :{cmdab:output:folder(}{it:folder_path}{cmd:)}} Folder where the .dta file generated by this command will be saved.{p_end}

{pstd}{it:Output options:}{p_end}
{synopt :{cmdab:sen:sors(}{it:string}{cmd:)}} Indicates the sensor streams (LL, SL, SP or MV) for which basic statistics should be calculated.{p_end}
{synopt :{cmdab:replace}} Replace the .dta file in the output folder even if it already exists. Default is that only .csv files with key-IDs not already in the .dta file are processed and appended to the already existing file.{p_end}

{pstd}{it:Standardized Statistics options:}{p_end}
{synopt :{cmdab:quiet}} Add a statistic indicating percentage of time periods when the sound level around device was quiet. Requires sound level sensor files.{p_end}
{synopt :{cmdab:still}} Add a statistic indicating percentage of time periods when the device was completely still. Requires movement sensor files.{p_end}
{synopt :{cmdab:moving}} Add a statistic indicating percentage of time periods when the device was being moved around. Requires movement sensor files.{p_end}

{pstd}{it:Customizable Statistics options:}{p_end}
{synopt :{cmdab:llbet:ween(}{it:{help sctostreamsum##customstats:range_string}}{cmd:)}} Manually specify statistics for the light level stream files.{p_end}
{synopt :{cmdab:slbet:ween(}{it:{help sctostreamsum##customstats:range_string}}{cmd:)}} Manually specify statistics for the sound level stream files.{p_end}
{synopt :{cmdab:spbet:ween(}{it:{help sctostreamsum##customstats:range_string}}{cmd:)}} Manually specify statistics for the sound pitch stream files.{p_end}
{synopt :{cmdab:mvbet:ween(}{it:{help sctostreamsum##customstats:range_string}}{cmd:)}} Manually specify statistics for the movement stream files.{p_end}

{synoptline}

{marker desc}
{title:Description}

{pstd}{cmd:sctostreamsum} is a command that calculates statistics from the sensor stream .csv files outputted by the {it:sensor_stream} field in {browse "https://www.surveycto.com/":SurveyCTO's} data collection tool. This command reads .csv files with sensor data (light level, sound level, sound pitch and movement) recorded during the interview by the device used in the data collection. The unit of observation (what each row represents) in the .csv files this command uses as input is the time period the sensor data was reported on (the default is one second), but this command outputs a .dta file where the statistics have been collapsed down to one row for each submission so that the .dta file can be merged to the main data.{p_end}

{pstd}Before reading a .csv file in the media folder, the command first checks if the .dta file already exists, and if a file exists then it checks if that file already has an observation with the same key (ID generated for each submission on SurveyCTO's server). If an observation with the same key already exist, then the default behavior is to skip that .csv file. If the option {cmd:replace} is used, then no .csv files are skipped and the existing .dta file is overwritten with a new .dta file that is generated from all .csv files in the media folder.{p_end}

{pstd}This command cannot make updates to observations already in the .dta file, so the only way to change the statistics for an observation already in the .dta file is to use the {cmd:replace} option and overwrite the .dta file with a new file were all observations have been re-calculated. Note that any manual edits to the .dta file done to observations in that file after they were outputted will be lost when the replace option is used.{p_end}

{pstd}The command calculates two types of statistics. {it:Basic statistics} (the mean of the sensor, the standard deviation of the sensor, etc., see full list below) that are calculated the same way for all sensors, and {it:sensor specific statistics} that are all reported as a percentage of time periods which had a mean in the sensor within a specific range. If a sensor specific statistic has been specified for a sensor, then the basic statistics are also calculated. If no sensor specific statistics has been specified but basic statistics are still wanted, then the sensor should be listed in the {cmd:sensor()} option. If a sensor is not listed in {cmd:sensor()} and has not sensor specific stats, then all the .csv files for that sensor will be skipped.{p_end}

{pstd}{ul:{it:Basic statistics}}{p_end}
{pstd}The {it:basic statistics} are calculated the same way for all sensor streams and the statistics are described in the table below. They are calculated for all sensors listed in {cmd:sensor()} or all sensors for which a {it:sensor specific statistics} was specified. In the .dta file the variables woth these statistics will have the same name as in the table below but with the sensor stream abbreviation (LL, SL, SP and MV) as a prefix. For example, LL_mean, SL_mean etc.{p_end}

{p2colset 8 21 26 4}{...}
{p2col:Name}Description{p_end}
{p2line}
{p2col:mean}The mean of all time period means. Each row in the .csv file is based on more than one raw sensor recording. The value in the variable {it:mean} in the .csv file is the mean of those raw recordings. The basic statistics {it:mean} calculated by this command is the mean of all those means.{p_end}
{p2col:period}The period length (in seconds) used for each row in the .csv file. The default if not explicitly specified in the questionnaire form definition is period=1.{p_end}
{p2col:period_obs}Number of time periods (i.e. rows) in the .csv file. If the period is 1, then this is the duration of the interview in seconds.{p_end}
{p2col:raw_obs}Number of raw recordings of the stream. Unless the time period for the sensor stream is set to 0 in the questionnaire form definition, each time period is made up of many raw sensor recordings. This is the total number of raw sensor recordings.{p_end}
{p2col:min}The minimum of all time period means. Note that this is not the lowest raw recording, it is the lowest time period mean.{p_end}
{p2col:max}The maximum of all time period means. Note that this is not the highest raw recording, it is the highest time period mean.{p_end}
{p2col:sd}The standard deviation of all time period means.{p_end}
{p2col:median}The median of all time period means.{p_end}
{p2line}


{pstd}{ul:{it:Sensor specific statistics}}{p_end}
{pstd}{it:Sensor specific statistics} consist of {it:Standardized Statistics} where the range is predefined and {it:Customizable Statistics} where the range is specified by the user. Each of these statistics are calculated as booleans, i.e. either true or false. For example, was it quiet or not, was the sensor within a certain range or not. This boolean, represented as 1 or 0, where 1 is true, is calculated for each time period for that sensor. This command then calculates the percentage of time periods for which this variable was 1 and reports that value in the .dta file. The variable for each statistic in the .dta file therefore represents the percentage of the interview that was quiet, where the sensor was within a certain range, etc., for each submission.{p_end}

{marker optslong}
{title:Options}

{pstd}{it:{ul:{hi:Required options:}}}{p_end}
{phang}{cmdab:media:folder(}{it:string}{cmd:)} indicates where the .csv files exported from the SurveyCTO server are saved. This is called the media folder because that is the name of the folder where SurveyCTO Sync saves these files. Other files not relevant to this command may also be stored in this folder as this command can tell which files are sensor stream files based on the file name.{p_end}

{phang}{cmdab:output:folder(}{it:string}{cmd:)} indicates where the .dta file will be saved. This folder may not be the same folder as the folder in {cmdab:media:folder()}. If the .dta file already exists there, then only files in the media folder with a key not already in the .dta file will be processed and appended to the already existing file (unless the {cmd:replace} option is used).{p_end}

{pstd}{it:{ul:{hi:Output options:}}}{p_end}
{phang}{cmdab:sen:sors(}{it:string}{cmd:)} lists all the sensor streams for which to calculate basic statistics. If a sensor specific statistic is already specified then there is no need to also specify it here as basic statistics for that sensor will also be calculated, but doing so will not cause an error. For each sensor listed here there must be at least one sensor stream .csv file in the media folder. Valid values in this option are LL for light level, SL for sound level, SP for sound pitch and MV for movement.{p_end}

{phang}{cmdab:replace} makes the command overwrite any file generated by this command already in the {cmdab:output:folder()}. If this option is used, then all .csv files in the {cmdab:media:folder()} will be used, whether they are in the file that already exists or not. Since this will overwrite the file already in the output folder, all manual edits after the file was originally generated will be lost. Using this option is the only way to update existing observations in a .dta file already generated using this command.{p_end}

{pstd}{it:{ul:{hi:Standardized statistics options:}}}{p_end}
{phang}{cmdab:quiet} adds a variable called {it:quiet} to the .dta file. This variable will be the percentage of time periods (expressed in decimals) for which the sound level around the device used during the interview was less than 25dB. An error will be generated if this option is used and no sound level sensor stream files exist in the {cmdab:media:folder()} folder.{p_end}

{phang}{cmdab:still} adds a variable called {it:still} to the .dta file. This variable will be the percentage of time periods (expressed in decimals) for which the movement of the device used during the interview is less than 0.25 m/s^2. An error will be generated if this option is used and no movement sensor stream files exist in the {cmdab:media:folder()} folder.{p_end}

{phang}{cmdab:moving} adds a variable called {it:moving} to the .dta file. This variable will be the percentage of time periods (expressed in decimals) for which the movement of the device used during the interview is greater than 2 m/s^2. An error will be generated if this option is used and no movement sensor stream files exist in the {cmdab:media:folder()} folder.{p_end}

{marker customstats}
{pstd}{it:{ul:{hi:Customizable Statistics options:}}}{p_end}

{pstd}All of the following options take a {inp:{it:range_string}} as the value. The {it:range_string} is used to indicate the name of the new variable this command should create and the range that this command will use to calculate the percentage (in decimal points) of time periods that the sensor was within that range. Each new variable in the {it:range_string} must be specified as: {inp:{it:varname}({it:min max})}, where {it:varname} is the name of the new variable to be created, and {it:min} and {it:max} are the lower and upper boundaries for the range. Round brackets indicate that the boundary is exclusive, and straight brackets indicate it is inclusive. One of min and max can be replaced with a question mark to have a greater-than or less-than expression instead of a range. Multiple new variables can be specified in the same {it:range_string}. See examples below. {p_end}

{phang}{cmdab:llbet:ween(}{it:range_string}{cmd:)} allows the user to manually specify statistics for the light level stream. See documentation on {it:range_string} above and examples below. An error will be generated if this option is used and no light level sensor stream files exist in the {cmdab:media:folder()} folder.{p_end}

{phang}{cmdab:slbet:ween(}{it:range_string}{cmd:)} allows the user to manually specify statistics for the sound level stream. See documentation on {it:range_string} above and examples below. An error will be generated if this option is used and no sound level sensor stream files exist in the {cmdab:media:folder()} folder.{p_end}

{phang}{cmdab:spbet:ween(}{it:range_string}{cmd:)} allows the user to manually specify statistics for the sound pitch stream. See documentation on {it:range_string} above and examples below. An error will be generated if this option is used and no sound pitch sensor stream files exist in the {cmdab:media:folder()} folder.{p_end}

{phang}{cmdab:mvbet:ween(}{it:range_string}{cmd:)} allows the user to manually specify statistics for the movement stream. See documentation on {it:range_string} above and examples below. An error will be generated if this option is used and no movement sensor stream files exist in the {cmdab:media:folder()} folder.{p_end}

{pstd}{ul:range_string examples:}{p_end}

{phang}{inp:llbetween(}{it:indoors_lit(100 750)}{inp:)} will create a variable from the light level stream named {it:indoors_lit} where the values are the percentage (in decimal points) of time periods where the mean light level was between 100 lux (exclusive) and 750 lux (exclusive).{p_end}

{phang}{inp:slbetween(}{it:quiet(? 25)}{inp:)} will create a variable from the sound level stream named {it:quiet} where the values are the percentage (in decimal points) of time periods where the mean sound level was below 25 dB (exclusive). This is identical to the variable created when using the option {inp:quiet}.{p_end}

{phang}{inp:mvbetween(}{it:mv1[.2 .25) mv2[1 ?]}{inp:)} will create two variables from the movement stream named {it:mv1} and {it:mv2}. {it:mv1} is the percentage (in decimal points) of time periods where the mean movement was between .2 m/s^2 (inclusive) and .25 m/s^2 (exclusive). {it:mv2} is the percentage (in decimal points) of time periods where the mean movement was greater than 1 m/s^2 (inclusive).{p_end}

{marker examples}
{title:Examples}

{pstd}All examples will use the following globals as folder paths: {p_end}

{pstd}{inp:global project "}C:\Users\username\Documents\ProjectA{inp:"}{p_end}
{pstd}{inp:global media "}$project\raw_data\media{inp:"}{p_end}
{pstd}{inp:global output "}$project\outputs{inp:"}{p_end}

{pstd}{hi:Example 1.}{p_end}

{pstd}{inp:sctostreamsum, mediafolder(}{it:"$media"}{inp:) outputfolder(}{it:"$output"}{inp:) quiet still}{p_end}

{pstd}This is a very simple way to run the command. The command will read sensor stream .csv files in the media folder with the prefix SL (because {inp:quiet} was used) and the prefix MV (because {inp:still} was used) and create a .dta file with the key in variable {it:key}, the basic statistics for the sound level sensor in addition to the {it:quiet} variable, and the basic statistics for the movement sensor in addition to the {it:still} variable and save it in the output folder. If the command has already been run and the .dta already exists in the folder, then only submissions with a key not already in the .dta file will be processed and appended to the already existing file. Any sensor stream files with the prefix SP or LL in the media folder will be ignored as no statistic applicable to either of those streams was specified.{p_end}

{pstd}{hi:Example 2.}{p_end}

{pstd}{inp:sctostreamsum, mediafolder(}{it:"$media"}{inp:) outputfolder(}{it:"$output"}{inp:) quiet slbetween(}{it:loud(60 ?)}{inp:) replace}{p_end}

{pstd}In this example, the command will only read SL sensor stream .csv files from the media folder as sound level is the only sensor required by the statistics that are specified. Basic statistics will be calculated for the sound level stream, in addition to the {it:quiet} and {it:loud} variables. All SL files in the media folder will be included and the command will start over with a new .dta file, replacing the .dta file in the output folder if it already exists.{p_end}

{title:Author}

{phang}This command was developed by {browse "https://www.surveycto.com/about/contact/":SurveyCTO}.{p_end}

{phang}See this command's {browse "https://github.com/surveycto/scto":repository} for more information where you can also submit feedback and feature requests.{p_end}