{smcl} {* 13 July 2022}{...} {hline} help for {hi:onetext} {hline} {title:Title} {p 4 4 2} {bf:onetext} —— Help you do some simple Chinese text analysis.{p_end} {title:Syntax} {p 4 4 2} {cmdab:onetext} {varlist}, [{cmdab:k:eyword:}{cmd:(}string{cmd:)}] {cmdab:m:ethod:}{cmd:[}count/exist/cosine/jaccard{cmd:]} {cmdab:g:enerate:}{cmd:(}real{cmd:)} {p_end} {title:Description} {p 4 4 2} {cmd:onetext} By entering your variable/variables, the {it:onetext} command helps you to do some simple Chinese text analysis. It can simply count the occurrence frequency of a specified Chinese character in Chinese text through {it:method(count)}, or observe whether it appears through {it:method(exist)}. When you have a vector of text, you can use {it:method(cosine)} and {it:method(jaccard)} to calculate cosine similarity and jaccard similarity respectively. {p_end} {title:Requirements} {p 4 4 2} {cmd:varlist(}{it:varname}{cmd:)} specifies the variables. When you want to observe whether Chinese words appear or count word frequencies, you are required to specify the variable as text, and only one variable can be specified. When you want to calculate cosine similarity or jaccard similarity, you need both variables to be numerical types that can be calculated. {p_end}{break} {p 4 4 2} {cmd:keyword(}{it:string}{cmd:)} specify the Chinese characters you want to look for, such as "大数æ®". Noting that this item is required when you need to count words. {p_end}{break} {p 4 4 2} {cmd:method(}{it:count/exist/cosine/jaccard}{cmd:)} specifies the way you want to use. Arguments other than the given characters are not allowed. {p_end}{break} {p 4 4 2} {cmd:generate(}{it:varname}{cmd:)} specifies a variable to save the result. {p_end} {title:Examples1 - Find Chinese words.} {p 4 4 2}{inp:.} {stata `"clear"'} {p_end} {p 4 4 2}{inp:.} {stata `"set obs 4"'} {p_end} {p 4 4 2}{inp:.} {stata `"gen text = "大数æ®" in 1"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace text = "大数æ®å¤§æ•°æ®" in 2"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace text = "æ•°æ®å°æ•°æ®" in 3"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace text = "å°æ•°æ®" in 4"'} {p_end} {p 4 4 2}{inp:.} {stata `"onetext text, k("大数æ®") m(count) g(count_text) "'} {p_end} {title:Examples1 - Existence of Chinese words.} {p 4 4 2}{inp:.} {stata `"clear"'} {p_end} {p 4 4 2}{inp:.} {stata `"set obs 4"'} {p_end} {p 4 4 2}{inp:.} {stata `"gen text = "大数æ®" in 1"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace text = "大数æ®å¤§æ•°æ®" in 2"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace text = "æ•°æ®å°æ•°æ®" in 3"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace text = "å°æ•°æ®" in 4"'} {p_end} {p 4 4 2}{inp:.} {stata `"onetext text, k("大数æ®") m(exist) g(isExist) "'} {p_end} {title:Examples1 - Similarity calculation.} {p 4 4 2}{inp:.} {stata `"clear"'} {p_end} {p 4 4 2}{inp:.} {stata `"set obs 3"'} {p_end} {p 4 4 2}{inp:.} {stata `"gen var1 = 1 in 1"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace var1 = 2 in 2"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace var1 = 3 in 3"'} {p_end} {p 4 4 2}{inp:.} {stata `"gen var2 = 4 in 1"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace var2 = 2 in 2"'} {p_end} {p 4 4 2}{inp:.} {stata `"replace var2 = 5 in 3"'} {p_end} {p 4 4 2}{inp:.} {stata `"onetext var1 var2, m(cosine) g(cs) "'} {p_end} {p 4 4 2}{inp:.} {stata `"onetext var1 var2, m(jaccard) g(js) "'} {p_end} {title:Author} {p 4 4 2} {cmd:Shutter Zor(左祥太)}{break} School of Accountancy, Wuhan Textile University.{break} E-mail: {browse "mailto:Shutter_Z@outlook.com":Shutter_Z@outlook.com}. {break}