{smcl}
{* 13 July 2022}{...}
{hline}
help for {hi:onetext}
{hline}


{title:Title}
{p 4 4 2}
{bf:onetext} —— Help you do some simple Chinese text analysis.{p_end}


{title:Syntax}
{p 4 4 2}
{cmdab:onetext} {varlist}, [{cmdab:k:eyword:}{cmd:(}string{cmd:)}]
{cmdab:m:ethod:}{cmd:[}count/exist/cosine/jaccard{cmd:]}
{cmdab:g:enerate:}{cmd:(}real{cmd:)}
{p_end}


{title:Description}
{p 4 4 2}
{cmd:onetext} By entering your variable/variables, the {it:onetext} command
helps you to do some simple Chinese text analysis. It can simply count the 
occurrence frequency of a specified Chinese character in Chinese text through 
{it:method(count)}, or observe whether it appears through {it:method(exist)}.
When you have a vector of text, you can use {it:method(cosine)} and {it:method(jaccard)} 
to calculate cosine similarity and jaccard similarity respectively.
{p_end}


{title:Requirements}
{p 4 4 2}
{cmd:varlist(}{it:varname}{cmd:)} specifies the variables. When you want to 
observe whether Chinese words appear or count word frequencies, you are required 
to specify the variable as text, and only one variable can be specified. When 
you want to calculate cosine similarity or jaccard similarity, you need both 
variables to be numerical types that can be calculated.
{p_end}{break}
{p 4 4 2}
{cmd:keyword(}{it:string}{cmd:)} specify the Chinese characters you want to 
look for, such as "大数据". Noting that this item is required when you need 
to count words.
{p_end}{break}
{p 4 4 2}
{cmd:method(}{it:count/exist/cosine/jaccard}{cmd:)} specifies the way you want 
to use. Arguments other than the given characters are not allowed.
{p_end}{break}
{p 4 4 2}
{cmd:generate(}{it:varname}{cmd:)} specifies a variable to save the result.
{p_end}


{title:Examples1 - Find Chinese words.}
{p 4 4 2}{inp:.} 
{stata `"clear"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"set obs 4"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"gen text = "大数据" in 1"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace text = "大数据大数据" in 2"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace text = "数据小数据" in 3"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace text = "小数据" in 4"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"onetext text, k("大数据") m(count) g(count_text) "'}
{p_end}

{title:Examples1 - Existence of Chinese words.}
{p 4 4 2}{inp:.} 
{stata `"clear"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"set obs 4"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"gen text = "大数据" in 1"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace text = "大数据大数据" in 2"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace text = "数据小数据" in 3"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace text = "小数据" in 4"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"onetext text, k("大数据") m(exist) g(isExist) "'}
{p_end}

{title:Examples1 - Similarity calculation.}
{p 4 4 2}{inp:.} 
{stata `"clear"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"set obs 3"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"gen var1 = 1 in 1"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace var1 = 2 in 2"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace var1 = 3 in 3"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"gen var2 = 4 in 1"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace var2 = 2 in 2"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"replace var2 = 5 in 3"'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"onetext var1 var2, m(cosine) g(cs) "'}
{p_end}
{p 4 4 2}{inp:.} 
{stata `"onetext var1 var2, m(jaccard) g(js) "'}
{p_end}

{title:Author}
{p 4 4 2}
{cmd:Shutter Zor(左祥太)}{break}
School of Accountancy, Wuhan Textile University.{break}
E-mail: {browse "mailto:Shutter_Z@outlook.com":Shutter_Z@outlook.com}. {break}