*! version 2.1 2024/7/12
* Papers form arXiv.org can be sent to CLIP now.
* version 1.10.1 2024/5/18
* update American Journal of Political Science (open access)
* https://onlinelibrary.wiley.com/doi/epdf/10.1111/ajps.12808
* version 1.10 2024/5/17
* update PDF link for SJ papers
* version 1.9 2024/1/15
* fix bugs for 'latex' option
* version 1.8 2024/1/11
* more robust to mirror error or bug
* version 1.7 2024/1/7
* 兼容来自 datacite 的 doi,如 arXiv
* version 1.6 2024/1/2
* 更新 Markdown 链接
* version 1.5 2023/12/31
* add option 'nogoogle', deal with WP (SSRN, arXiv, NBER)
* version 1.4 2023/12/18
* add option -pdf-, -bib-, -all-
* version 1.2 2023/10/4, add option -clipoff-, -nodiocheck-
*! version 1.1 2023/9/26
*! Yujun Lian, arlionn@163.com
* input: getref {DOI}
* output: Citation information.
* Author, Year, Title, Journal, Vol(Issue): Pages. Link, PDF
/*
:: 修改 :: 2023/10/13 8:42
[?] 同时使用 bibtex 格式,抽取作者信息等,提供三种基本的引文格式
[?] 输出引用格式,Author (Year, PDF),
其中 Year 为论文主页链接,PDF 为 PDF 链接
[?] 增加 pdf 选项,调用 get_pdf.ado 子程序
[?] 增加 cite 选项,以便可以输出 Author(Year, PDF) 格式 2023/10/20 17:29
cite Author(Year)
citepdf Author(Year, PDF)
[Year](link), [PDF](link)
* 2023/12/17 11:56 修订了 SCI-HUB 中 PDF 文档的 {DOI} 包含特殊字符的情形
* 2023/12/19 0:44 ------------------- getref.ado 需要优化的问题 [OK-done]
1. Title 中的特殊字符
a. 去掉 (<[/\w]+>);
b. (&) 修改为 (&)
e.g., Amosh, H., & Khatib, S. F. A. (2023).
COVID‐19 impact, financial and ESG performance:
Evidence from G20 countries.
Business Strategy & Development, 6(3), 310–321. Portico.
2. 修改期刊名称
The Stata Journal: Promoting Communications on Statistics and Stata
修改为:
The Stata Journal
* 2023/12/19 21:22 --- 标题中包含数学符号 RegEX: '<\/?.+?>'
getref 10.1016/j.spl.2016.02.012
Romano, J. P., & Wolf, M. (2016). Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Statistics & Probability Letters, 113, 38–40.
* 2023/12/25 8:22 -- 若 {DOI} 中包含 ssrn, NBER, 等 working paper 网站的关键词,则:
1. 不显示 PDF 链接
2. 提示:SSRN 网站的 PDF 文档需要手动下载
* 2023/12/26 0:00 -- 给不同的错误制定不同的错误码
basic meta fail == "doi"
PDF fail == "PDF"
bibtex, ris fail == "bib"
随后 getref_OpenCitation 调用时分类显示错误信息;显示部分也酌情处理
*-帮助文件中需要说明的信息
getref "Gallen, Trevor, Broken Instruments (June 16th, 2020). Available at SSRN: https://ssrn.com/abstract=3671850 or http://dx.doi.org/10.2139/ssrn.3671850"
如果输入的表达式中包含 ',' 等特殊字符,需要将输入的字符串用 "" 包裹起来
* 2023/12/21 9:55 {DOI} 中包含 ssrn, 等 working paper 网站标示时,[-PDF-]() 不再显示,或者直接显示链接
SSRN: 10.2139/ssrn.3671850
NBER: 10.3386/w31184
10.3386/w29723
PDF: https://www.nber.org/system/files/working_papers/w31184/w31184.pdf
https://www.nber.org/system/files/working_papers/w29723/w29723.pdf
arXiv: 10.48550/arXiv.{ID}
Link: https://doi.org/10.48550/arXiv.2312.05400
https://doi.org/10.48550/arXiv.{ID}
PDF: https://arxiv.org/pdf/2312.05400.pdf
https://arxiv.org/pdf/{ID}.pdf
* 2023/12/26 8:52 主要期刊 DOI
JASA 10.2307 10.2307/2283916
AER
* DOI 基础知识:https://www.medra.org/en/DOI.htm
https://www.doi.org/the-identifier/resources/factsheets/key-facts-on-digital-object-identifier-system
https://www.scribbr.com/citing-sources/what-is-a-doi/
* Google Scholar URL
https://scholar.google.com/scholar?q={Title_encode}
------------
转码后的论文标题
mata: st_local("Title_encode", urlencode(`"`Title'"'))
local title "Accommodating Time-Varying Heterogeneity in Risk Estimation under the Cox Model"
local title "Economies of scale, technical change and persistent and time-varying cost"
local title "Dynamic firm performance and estimator choice'"
mata: st_local("title_encode", urlencode(`"`title'"'))
local google_head "https://scholar.google.com/scholar?q="
local google_url "`google_head'`title_encode'"
local google_br `"{browse `"`google_url'"':Google}"'
dis "`google_url'"
dis `"`google_br'"'
* OK - 2023/12/29 17:31 部分论文的作者姓名全为大写,需要替换成正常写法
10.1111/1475-679X.12496
GOLDSTEIN, I., YANG, S., & ZUO, L. (2023). The Real Effects of ...
Solution: help ustrtitle() , string function
* 2023/12/30 10:06
* TBD: Open Access Journals, Free to browse PDF documents
View our list of Wiley and Hindawi fully open access journals.
https://authorservices.wiley.com/open-research/open-access/browse-journals.html
e.g., Quantitative Economics, Theoretical Economics
*------------------------------------ test ---------------------------
:: test ::
global DOI ""10.1257/aer.109.4.1197"" // double """"
global DOI "10.1016/j.eneco.2022.106017" // many authors, special characters
global DOI "10.1162/rest.90.3.592" // RES
global DOI "10.1177/1536867X231175332" // SJ 23-2
global DOI "10.3969/j.issn.1000-6249.2007.01.003" // Chinese article
global DOI "10.2139/ssrn.3765862" // SSRN
global DOI "10.48550/arXiv.1301.3781" // arXiv
global DOI "10.1111/j.1467-629X.2010.00375.x"
global DOI "10.1016/j.jeconom.2020.06.003"
global DOI "10.1007/978-3-030-21432-6" // Open Access books
global DOI "10.1007/978-3-030-86186-5"
cls
// set trace on
getiref "$DOI", d
*/
*-get individual reference for given {DOI}
* renamed from 'getref.ado', 2023/12/20 23:45
cap program drop getiref
program define getiref, rclass
version 14
syntax anything(everything) ///
[, PAth(string) ///
Md /// // [Link], [PDF], [Google], [Appendix]
md1 /// // clean format. Only [Link], [PDF]
md2 /// // similar as 'md' , with blanks in links
md3 /// // similar as 'md2', with blanks in links
Latex /// // Same as {cmd:md}, but with links in TeX format.
Wechat /// // Plain text format: Author, Year, title, URL
Text /// // Equivalent to {bf:wechat}
Cite /// // 'Author (Year)' with Markdown links
cite1 /// // 'Author, Year' with Markdown links, in text
c1 /// // short version of CIte1
cite2 /// // 'Author (Year)' plain text
c2 /// // short version of CIte2
DISfn /// // display filename for saving by hand
All /// // both -pdf- and -bib- options are used
Pdf /// // auto save PDF, defult name: [Author_Year_Title]
BLank(string) /// // Title format: See == Note 1 ==
NOTItle /// // Hansen-2023
ALTname(string) /// // Alternative name: user specified PDF file name, seldom use
Bib /// // download and list .ris and bibtex files used for reference managers
FASTscihub /// // re-search the fast url of SCI-Hub. Seldom use. Time use: 1-2 seconds
CLEAN /// // clean pattern: Display only [link] and [PDF]. Default: [link] (rep), [PDF], [Appendix]
CLIPoff /// // do not send message to clipboard
Notip /// // do not display 'Tips: Text is on clipboard.'
NOGoogle /// // don't display google links
Doicheck /// // get {DOI} using regular expression from text given by user.
/// // e.g, from 'xxx https://doi.org/{DOI}' to '{DOI}'
]
/*
==Note 1==: blank(string), where string can be 'bar' or 'keep'
default: Hansen_2023_The_Crisis
bar: Hansen-2023-The-Crisis
keep: Hansen 2023 The Crisis
*/
preserve //>>>>>>>>>>>>>>>>>>>>>>>>>>>>> preserve begin
clear
*-record working directory
local pwd: pwd
*-to be done 2024/1/11 9:03
* auto logfile: record the {DOI}s in a text file
* Save in folder:
* '../_log_DOIs/_log_doi_Date.txt'
* Usage:
* users can 'infile' this file and loop {DOI}s to download PDF and .ris
*---------------
*-options check
*---------------
if "`all'" != ""{
local pdf "pdf"
local bib "bib"
}
* Note: getiref DOI, all
* is same as
* getiref DOI, pdf bib
if "`altname'" != ""{
local pdf_save = "1" // downlaod PDF document
local filename "`altname'"
if strpos(`"`altname'"', "/") | strpos(`"`altname'"', "\"){
dis as error "Invalid filename. You should specify directory in {cmd:path()} option."
exit
}
if "`notitle'" != ""{
dis as text "Note: option {cmd:notitle} can only take effect with {cmd:pdf} option"
}
}
if "`blank'" != ""{
if wordcount("`blank'")>1{
dis as error "Only one arguments allowed in option {cmd:blank(string)}"
exit
}
if "`blank'"!="bar" & "`blank'"!="keep"{
dis as error "Only {cmd:blank(bar)} or {cmd:blank(bar)} is allowed"
exit
}
}
if ("`pdf'" != ""){
local pdf_save = "1" // downlaod PDF document 'from SCI-HUB'
}
*-check option conflicts
local dis_opt "`md'`md2'`cite'`cite1'`c1'`cite2'`c2'`latex'`wechat'`text'"
if wordcount("`dis_opt'")>1{
dis as error "Options conflict: only one of {cmd:md} / {cmd:latex} / {cmd:cite / {cmd:c2} / {cmd:wechat} / {cmd:text} options is allowed"
exit
}
*-delete "
local anything = subinstr(`"`anything'"', `"""', "", .)
*---------------
*- path, SCI-Hub --> .dta of reference
*---------------
*-Path
if "`path'" == ""{
local path "_temp_getref_"
}
qui get_checkpath "`path'"
local path "`r(path)'"
/*
// to be done 2024/1/11 9:09
Sub Folders
[PDF_getref_]
[ris_getref_]
[log_getref_]
readme_getref_.txt
say soming guides for usage of these files
*/
*-host of SCI-Hub
if ("${sci__hub_}" == "") | ("`fastscihub'" != ""){
cap get_scihub // get the fast url of SCI-Hub
if _rc==0{
global sci__hub_ "`r(best)'"
}
else{
global sci__hub_ "http://sci-hub.ren"
}
}
*-get DOI
if "`doicheck'" != ""{ // get {DOI} and check validity
// cap noi get_doi `"`anything'"', nodisplay // old
cap noi get_doi `anything', nodisplay
if `r(valid)' == 0{
exit
}
else{
local DOI "`r(doi)'"
}
}
else{
local DOI "`anything'"
}
*---------------
*- download meta data and get meta information
*---------------
// local DOI "10.1111/j.1467-629X.2010.00375.x"
// local DOI 10.1111/j.1467-629X.2010.00375.x
if "`latex'" == "" local tex_opt ""
else local tex_opt ", `latex'"
get_doidata `DOI' `tex_opt' // .... download meta data , new 2024/1/8 13:27
// suit for both 'crossref' and 'datacite'
local DOI = "`r(DOI)'"
*-text of reference
local ref_body = r(ref_body)
local ref_full = r(ref_full)
qui set obs 1
tempvar v_body
qui gen strL `v_body' = `"`ref_body'"'
*-filename of PDF document
*-Get file name of PDF article:
* Author-Year-Title
get_au_yr_ti "`ref_body'", doi("`DOI'")
*-filename of PDF document
if "`notitle'" == ""{
local fn_au_year `"`r(au_yr_ti)'"'
}
else{
local fn_au_year `"`r(au_yr)'"'
}
local ar_title = "`r(title)'" // Title of the article
*-transfer to valid filename (delete invalid characters: '* \ / : * ? " < > | ')
if "`blank'" == ""{
get_filename "`fn_au_year'"
}
else{
get_filename "`fn_au_year'", blank(`blank')
}
if "`filename'" == ""{ // Gomez_2023_The_Effect_of_Mandatory_Disclosure……
local filename `"`r(fn)'"'
}
*------------------
*- article page (link) and PDF url
*------------------
*-Common setting: General Journal articles
* article page
local link "https://doi.org/`DOI'"
local link_br `"{browse "`link'":Link}"'
* PDF url given by SCI-HUB
local pdf_web "${sci__hub_}/`DOI'" // http://sci-hub.ren/ or return by 'get_scihub.ado'
local pdf_br `"{browse "`pdf_web'":PDF}"'
*-deal with ASCII characters in {DOI}
get_doi_scihub_special `DOI' // deal with special characters
local DOI_scihub "`r(doi_scihub)'"
local scihub "https://sci.bban.top/pdf"
local pdf_web_full "`scihub'/`DOI_scihub'.pdf" // full: full screen
local pdf_br_full `"{browse "`pdf_web_full'":PDF}"'
*-default:
* with PDF doucument
local pdf_Yes = 1
* Source of PDF document
local pdf_source = 0 // "SCI-HUB"
* without replication
local rep_Yes = 0
* without appendix
local app_Yes = 0
**** Note:
* [xxx_web] means [xxx_url]
* [link] means the url of article page
* [xxx_br] means the text for displaying as browse pattern in Results Window
*------------------
*- working papers
*------------------
*-SSRN
* 10.2139/ssrn.3671850 , 无法直接获取 PDF
local key "10.2139/ssrn"
if strpos(`"`DOI'"', "`key'"){
// local pdf_web "none"
local pdf_br ""
local pdf_Yes = 0
}
*-EconPapers
* DOI: 10.32468/Espe.5704 , 无法直接获取 PDF
local key "10.32468/Espe"
if strpos(`"`DOI'"', "`key'"){
// local pdf_web "none"
local pdf_br ""
local pdf_Yes = 0
}
*-arXiv with PDF
* DOI: 10.48550/arXiv.2312.05400 --> 10.48550/arXiv.{ID}
local key "10.48550/arXiv"
if strpos(`"`DOI'"', "`key'"){
local ar_ID = subinstr("`DOI'", "`key'.", "", 1) // get: 2312.05400 ({article ID})
local pdf_web "https://arxiv.org/pdf/`ar_ID'.pdf"
local pdf_source = 1 // non SCI-HUB
local rep_web "https://arxiv.org/e-print/`ar_ID'"
local rep_br `"{browse "`rep_web'":Sources}"' // 参考文献 .bibtex, 原始 .tex 文档等
local rep_Yes = 1 // replication Data & Codes (may has)
}
*-NBER with PDF
* DOI: 10.3386/w31184 --> 10.3386/{ar_ID}
* PDF: https://www.nber.org/system/files/working_papers/{ID}/{ID}.pdf
* - e.g. https://www.nber.org/system/files/working_papers/w31184/w31184.pdf
local key "10.3386/"
if strpos(`"`DOI'"', "`key'"){
local ar_ID = subinstr("`DOI'", "`key'", "", 1) // get: w31184 ({article ID})
local pdf_root "https://www.nber.org/system/files/working_papers"
local pdf_web "`pdf_root'/`ar_ID'/`ar_ID'.pdf"
local pdf_source = 1 // non SCI-HUB
}
*---------------------
*- Open Access Journal
*---------------------
*-QE
* DOI: 10.3982/QE1288
* PDF: https://onlinelibrary.wiley.com/doi/epdf/10.3982/QE1288
local key "10.3982/QE"
if strpos(`"`DOI'"', "`key'"){
local link "https://onlinelibrary.wiley.com/doi/`DOI'"
local pdf_root "https://onlinelibrary.wiley.com/doi/epdf"
local pdf_web "`pdf_root'/`DOI'"
local pdf_source = 1 // non SCI-HUB
}
*-Stata Journal
* DOI: 10.1177/1536867
* PDF: https://journals.sagepub.com/doi/pdf/10.1177/1536867X20909689
* https://journals.sagepub.com/doi/pdf/10.1177/1536867X1801800409
local key "10.1177/1536867"
if strpos(`"`DOI'"', "`key'"){
local link "https://journals.sagepub.com/doi/`DOI'"
local pdf_root "https://journals.sagepub.com/doi/pdf"
local pdf_web "`pdf_root'/`DOI'"
local pdf_source = 1 // non SCI-HUB
}
/* 2022年以前的都可以通过 SCI-HUB 获取,只有最近两年的需要特别处理
*-American Journal of Political Science (open access)
* DOI: 10.1111/ajps
* PDF: https://onlinelibrary.wiley.com/doi/pdf/10.1111/ajps.12808
local key "10.1111/ajps"
if strpos(`"`DOI'"', "`key'"){
local link "https://onlinelibrary.wiley.com/doi/`DOI'"
local pdf_root "https://onlinelibrary.wiley.com/doi/pdf/"
local pdf_web "`pdf_root'/`DOI'"
local pdf_source = 1 // non SCI-HUB
} */
*-TBD: add more Journals with Open Access
/* Open Access Journal list
view browse "https://openaccesspub.org/about"
* wiley.com: Browse Fully Open Access Journals
https://authorservices.wiley.com/open-research/open-access/browse-journals.html
* sagepub.com
https://journals.sagepub.com/doi/pdf/10.1177/1536867X231212425
* https://wires.onlinelibrary.wiley.com/
https://wires.onlinelibrary.wiley.com/doi/full/10.1002/wrna.1824
* https://www.tandfonline.com/doi/full/10.1080/10705511.2022.2131555
*/
*-----------------
*-Online Appendix and/or Replication (Codes & Data)
*-----------------
* general case
local app_url = "" // 'app' means 'Appendix'
local app_br = ""
local app_Yes = 0
local rep_Yes = 0 // 'rep' means 'Replication'
* AEA journals
// Most journals in 'American Economic Association'
// provide data & codes for replication, and online appendix.
// - DOI start with '10.1257' is a journal under AEA.
// - The URL of online appendix for AEA journals:
// https://www.aeaweb.org/doi/{DOI}.appx
*---------------
// e.g. [*American Economic Review*](https://www.aeaweb.org/journals/aer)
// - DOI: 10.1257/aer.20210710
// - Appendix: `https://www.aeaweb.org/doi/{DOI}.appx`
// - e.g.
local key "10.1257/" // AEA Journals, e.g. AER
if strpos("`DOI'", "`key'"){
local app_url "https://www.aeaweb.org/doi/`DOI'.appx"
local app_Yes = 1 // with appendix
local rep_Yes = 1 // replication Data & Codes (may has)
}
*-arXiv
local key "10.48550/arXiv"
if strpos(`"`DOI'"', "`key'"){
local rep_Yes = 1
}
*-JPE, JLE
// **JPE** Journal of Political Economy: [Supplemental Material](https://www.journals.uchicago.edu/toc/jpe/current)
// - DOI: 10.1086/xxx
// - PDF: https://www.journals.uchicago.edu/doi/epdf/{DOI}
// - Supp: https://www.journals.uchicago.edu/doi/suppl/10.1086/725171
local key "10.1086/" // JPE, JLE
if strpos("`DOI'", "`key'"){
local app_url "https://www.journals.uchicago.edu/doi/suppl/`DOI'"
local app_Yes = 1 // with appendix
local rep_Yes = 1 // may has replication Data & Codes
}
*---------------------------
* Journals with replications
#delimit ;
local jlist
"
10.48550/arXiv
10.1257/
10.1086/
10.3982/ECTA
10.1016/j.eneco
10.1111/jofi
10.1016/j.jfineco
10.1093/rfs
10.15456/jbnst
10.18637/jss
10.1371/journal.pone
10.1093/qje
10.1002/jae
10.1016/j.red
10.1093/ej
10.1093/restud
10.1016/j.euroecorev
10.3982/QE
" ;
#d cr
foreach jj of local jlist{
if strpos("`DOI'", "`jj'"){
local rep_Yes = 1 // may has replication Data & Codes
}
}
* Important!!!!!!
*-------------------
*- export reference: Markdown, LaTeX or plain text
*-------------------
* text to be displayed as links
* md2: add a blank to url in Markdown text
if "`md2'" != "" local a_blank " "
else local a_blank ""
* Link: article page
// local link "https://doi.org/`DOI'"
// local link_br `"{browse "`link'":Link}"'
local link_md `" [Link](`a_blank'`link'`a_blank')"'
local link_md_dis `" [`link_br'](`a_blank'`link'`a_blank')"'
local link_tex `" \href{`link'}{Link}"'
local link_tex_dis `" \href{`link'}{`link_br'}"'
local link_plain `" Link: `link'"'
* PDF
if `pdf_Yes' == 1{
local pdf_br `"{browse "`pdf_web'":PDF}"'
local pdf_md `", [PDF](`a_blank'`pdf_web'`a_blank')"'
local pdf_md_dis `", [`pdf_br'](`a_blank'`pdf_web'`a_blank')"'
local pdf_md_full `", [PDF](`a_blank'`pdf_web_full'`a_blank')"' // only for SCI-HUB
local pdf_tex `", \href{`pdf_web'}{PDF}"'
local pdf_tex_dis `", \href{`pdf_web'}{`pdf_br'}"'
local pdf_plain `", PDF: `pdf_web'"'
}
else{
local pdf_br ""
local pdf_md ""
local pdf_md_dis ""
local pdf_md_full ""
local pdf_tex ""
local pdf_tex_dis ""
local pdf_plain ""
}
* replication
if `rep_Yes' == 1{
local _rep " (rep)"
}
else{
local _rep ""
}
* Appendix
if `app_Yes' == 1{
local app_br `"{browse "`app_url'":Appendix}"'
local app_md `", [Appendix](`a_blank'`app_url'`a_blank')"'
local app_md_dis `", [`app_br'](`a_blank'`app_url'`a_blank')"'
local app_tex `", \href{`app_url'}{Appendix}"'
local app_tex_dis `", \href{`app_url'}{`app_br'}"'
local app_plain `", Appendix: `app_url'"'
}
else{
local app_br ""
local app_md ""
local app_md_dis ""
local app_tex ""
local app_tex_dis ""
local app_plain ""
}
*-Google scholar link
if "`nogoogle'" == ""{
local google_head "https://scholar.google.com/scholar?q="
local google_url `"`google_head'`ar_title'"'
// Google url need not encode.
// The drawback of encoding is that if the length of "`at_title_encode'"
// exceeds 254 digits, which is the max limit of 'dis {browse ....}',
// an error message will be reported
// the URL in 'browse' cmd is limited in 253 digits
local google_url_trim = substr(`"`google_url'"', 1, 253)
// Encoding version: used for displaying as plain text
mata: st_local("google_url_encode", urlencode(`"`google_url'"'))
local google_url_encode_trim = substr(`"`google_url_encode'"', 1, 253)
local google_br `"{browse `"`google_url_trim'"':Google}"'
local google_md `", [Google](`a_blank'<`google_url'>`a_blank')"'
local google_md_dis `", [`google_br'](`a_blank'<`google_url'>`a_blank')"'
local google_tex `", \href{`google_url'}{Google}"'
local google_tex_dis `", \href{`google_url'}{`google_br'}"'
local google_plain `", Google: `google_url_encode_trim'"'
}
else{
local google_br ""
local google_md ""
local google_md_dis ""
local google_tex ""
local google_tex_dis ""
local google_plain ""
}
*---------------------------
*-Display in Results Window
*---------------------------
*-clean option
if "`clean'" != ""{
local _rep ""
local google_br ""
local app_br ""
local keylist "md tex plain"
foreach key of local keylist{
local app_`key' ""
local app_`key'_dis ""
local google_`key' ""
local google_`key'_dis ""
}
}
*-Default format:
* == Author, Year, Title, Journal, Vol(Issue): pages. 'link', 'PDF-url'.
noi dis " " // add a blank line
noi dis as text `"`ref_body'"'
noi dis as text _col(5) `"`link_br'`_rep'"' ///
_skip(4) `"`pdf_br'"' ///
_skip(4) `"`google_br'"' ///
_skip(4) `"`app_br'"' ///
_n
local refout = `"`ref_body'"'
* ??????????????????????????????????????????????
* 这两行好像没什么用了 2024/1/1 0:15 ???????
* ??????????????????????????????????????????????
local ref_link_pdf `"`ref_body' `link_br', `pdf_br'"'
local ref_link_pdf_full `"`ref_body' `link_br', `pdf_br_full'"'
*-Options for display in Results Window
*-Markdown
* Link: article page
if "`md'`md2'" != ""{ // [text](URL)
local refout `"`ref_body'`link_md'`_rep'`pdf_md'`app_md'`google_md'."'
local refdis `"`ref_body'`link_md_dis'`_rep'`pdf_md_dis'`app_md_dis'`google_md_dis'."'
dis as text `"`refdis'"'
}
*-LaTeX
if "`latex'" != ""{ // \href{text}{URL}
local refout `"`ref_body'`link_tex'`_rep'`pdf_tex'`app_tex'`google_tex'."'
local refdis `"`ref_body'`link_tex_dis'`_rep'`pdf_tex_dis'`app_tex_dis'`google_tex_dis'."'
dis as text `"`refdis'"'
}
*-Plain tet
if ("`wechat'" != "") | ("`text'" != ""){
local refout `"`ref_body'`link_plain'`_rep'`pdf_plain'`app_plain'`google_plain'"'
dis as text `"`refout'"'
*dis as text _col(5) `"`link_br'"' _col(15) `"`pdf_br'"'
}
*---------------
*- export: cite Author (Year)
*---------------
// set trace on
if "`cite'" != ""{
get_cite `v_body', doi("`DOI'") link `latex' // author (Year), with link
}
if "`c1'" != "" | "`cite1'" != ""{ // intext, with link
get_cite `v_body', doi("`DOI'") link intext `latex' // (author, Year)
}
if "`c2'" != "" | "`cite2'" != ""{ // plain text, no link
get_cite `v_body', doi("`DOI'")
}
if "`cite'`c1'`cite1'`c2'`cite2'" != ""{
local refout "`r(cite)'"
noi dis as text `"`refout'"'
noi dis as text _col(5) `"`link_br'"' _skip(6) `"`pdf_br'"'
return add
}
*------------------ ------------
*-download and display links of PDF document
*------------------ ------------
if "`pdf'" != "" & ("`dis_opt'") == ""{
local refout "`ref_body' `link'"
}
if ("`pdf_save'"=="1"){ // want to save and have PDF
if `pdf_Yes'==0{
dis as error `"{cmd:Warning}: Failed to downland PDF document for {browse "`link'":`DOI'}"'
dis as text `"{cmd:Maybe}, you can save it by hand at the {browse "`link'":{ul:article page}} using filename:"'
dis as text _skip(2) `"{cmd:`filename'}"'
}
else{
if `pdf_source' == 0{ // from SCI-HUB
cap noi get_pdf_scihub "`DOI'", saving("`filename'") path("`path'")
* Stata Journal: Some paper is open-access
if `r(pdf_got)'==0 & strpos("`DOI'", "10.1177/1536867"){
local pdf_source = 1
}
}
else{
get_pdf_nonSCIHUB "`DOI'", saving("`filename'") path("`path'")
}
// return local pdfurl `"`r(pdfurl)'"'
}
}
*---------------------
*- .bibtex, .ris files
*---------------------
if "`bib'" != ""{
get_bib `DOI', path("`path'") `notip'
local got_bib = `r(got_bib)'
local bibtex "`r(bibtex)'"
local ris "`r(ris)'"
}
*-send to CLIP
if "`clipoff'" == ""{
dis " "
get_clipout "`refout'", `clipoff' `notip'
}
*-----------------
*-display filename for saving by hand
*-----------------
if "`disfn'" != ""{
dis " "
dis `"`filename'"'
*-send to CLIP
if "`clipoff'" == ""{
get_clipout "`filename'", `clipoff' notip
}
}
*--------------
*-return values
*--------------
get_au_yr_ti "`ref_body'", doi("`DOI'")
return local au_yr "`r(au_yr)'"
return local au_yr_ti "`r(au_yr_ti)'"
return local au_yr_doi "`r(au_yr_doi)'"
return local link_br = `"`link_br'"'
return local pdf_br = `"`pdf_br'"'
return local pdf_br_full = `"`pdf_br_full'"'
return local ref_link_pdf = `"`ref_link_pdf'"'
return local ref_link_pdf_full = `"`ref_link_pdf_full'"'
return local refdis "`refdis'"
return local AD "-------- Below: advanced values --------"
return scalar with_app = `app_Yes'
return scalar with_rep = `rep_Yes'
if "`bib'"!=""{
return scalar got_bib = `got_bib'
return local bibtex "`bibtex'"
return local ris "`ris'"
}
if "`pdf'"!="" return scalar got_pdf = `pdf_Yes'
return local pdf_web `"`pdf_web'"'
return local ref "`refout'"
return local refbody `"`ref_body'"'
return local link "`link'"
return local filename "`filename'"
return local title "`r(title)'"
return local year "`r(year)'"
return local author1 "`r(author)'"
return local doi "`DOI'"
restore //>>>>>>>>>>>>>>>>>>>>>>>>>>>>> preserve over
end
*>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
* over
*>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
*------------------ subprogram ------------- get_doi.ado
cap program drop get_doi
program define get_doi, rclass
version 14
* input: reference text including {DOI} information
* output: {DOI} saved in r(doi)
syntax anything [, Display aregex]
if "`aregex'" == ""{
local regex `"10\.\d{4,9}[^\s]+[^",,。()()<>`'\s]"'
}
else{
local regex "10\.[\d]{4,}/[^\s]+[\d]" // regular expression of {DOI}
}
local m = ustrregexm(`"`anything'"', `"`regex'"')
if `m'==0{
dis as error "Can not find valid {DOI}, please check."
return scalar valid = `m'
exit
}
else{
local doi = ustrregexs(0)
if "`display'" != ""{
noi dis "`doi'"
}
*-Return values
return local doi = "`doi'"
return scalar valid = `m' // 1 = get valid DOI, 0 = otherwise
}
end
*------------------ subprogram ------------- get_doiserver.ado
* version 1.1 08jan2024
* Yujun Lian, arlionn@163.com
*
*== Goal:
* return the 'server' of {DOI}. e.g, Crossref, Datacite
* input: {DOI}
* output: server name
*== Usage:
* doiserver "10.5281/zenodo.1308060"
* ret list
* 平均耗时:0.15s
*--------------------------- get_doiserver.ado -------------- 0 ------------
cap program drop get_doiserver
program define get_doiserver, rclass
syntax anything(name=doi) [, Display]
// local doi 10.48550/arXiv.2312.05400
gettoken doi_head: doi , parse(/) // '10.48550/arXiv.2312.05400'
// to '10.48550'
local url "https://doi.org/`doi_head'"
cap mata: mata drop urlText
mata: urlText = cat("`url'")
*-to be done: get the name of SERVER
mata: is_server_name = ustrregexm(urlText, `".*10.SERV/([\w]+)"')
mata: sub_sample = select(urlText, is_server_name:==1)
mata: server = ustrregexra(sub_sample, `".*10.SERV/([\w]+)"', `"$1"')
mata: st_local("server", server)
local is_crossref = ("`server'" == "CROSSREF")
local is_datacite = ("`server'" == "DATACITE")
if "`display'" != ""{
dis as text "SERVER: `server'"
}
return local url "`url'"
return local server "`server'"
return scalar is_crossref = `is_crossref'
return scalar is_datacite = `is_datacite'
end
*--------------------------- get_doi_server.ado -------------- 1 ------------
/*
* === test
global doi "10.5281/zenodo.1308060" // datacite
global doi "10.48550/arXiv.2312.05400" // datacite
global doi "10.13140/rg.2.2.18135.01449" // datacite
global doi "10.1016/j.eneco.2023.107287" // crossref
global doi "10.1126/science.169.3946.635" // crossref
global doi "10.1016/j.jhealeco.2015.10.004" // crossref
get_doiserver $doi, dis
ret list
. get_doiserver $doi, dis
SERVER: DATACITE
. ret list
scalars:
r(is_datacite) = 1
r(is_crossref) = 0
macros:
r(server) : "DATACITE"
r(url) : "https://doi.org/10.48550"
*/
*------------------ subprogram ------------- get_doidata.ado
* version 1.1 2024/1/7 23:17
* suit for both 'crossref' and 'datacite'
* version 1.0 2023/12/23 18:06
cap program drop get_doidata
program define get_doidata, rclass
* input: {DOI}
* output: local `ref_body' and `ref_full' with meta data for given {DOI}
syntax anything(name=DOI) [, Display Latex]
preserve
// ========= datacite =====
* API:
local head "https://data.crosscite.org"
local mine "text/x-bibliography"
local style "apa" // abb of style
local styleexp "?style=`style'" // expression of style
local url_datacite "`head'/`mine'/`DOI'`styleexp'"
// ========= crossref =====
*-API and MIME from
* url: https://github.com/CrossRef/rest-api-doc#resource-components
local API "http://api.crossref.org/works"
local trans "transform/text/x-bibliography"
local url_crossref "`API'/`DOI'/`trans'"
// ========= copy webdata to .txt
* S1: copy data using Crossref's API
* S2: if failed, copy data using Datacite's API
tempfile doi_ref
cap qui copy `"`url_crossref'"' "`doi_ref'.txt", replace // crossref
if _rc==0{
local server "CROSSREF"
}
else{ // can not get meta data: first time ('crossref')
*-check/get the DOI
cap noi get_doi `DOI', nodisplay
if `r(valid)' == 0{
dis as error `"Invalid DOI. See {browse "https://www.doi.org/the-identifier/what-is-a-doi/":DOI-1}, {browse "https://academicguides.waldenu.edu/library/doi":DOI-2} or {browse "https://www.doi.org/the-identifier/resources/handbook":DOI-Handbook} for details."'
exit 198
}
else{
local DOI "`r(doi)'" // valid DOI
local url_crossref "`API'/`DOI'/`trans'"
cap qui copy `"`url_crossref'"' "`doi_ref'.txt", replace // crossref
if _rc==0{
local server "CROSSREF"
}
else{
qui get_doiserver `DOI' // get agency server of {DOI}
local server "`r(server)'"
if `r(is_crossref)' == 1{
get_error_doi_data `DOI' // show error message
}
else if `r(is_datacite)' == 1{
cap qui copy `"`url_datacite'"' "`doi_ref'.txt", replace // datacite
if _rc{ // can not get meta data: second time ('datacite')
get_error_doi_data `DOI'
}
}
else{
get_error_doi_data `DOI', only simple
}
}
}
}
*-save as .dta
tempvar v_ref
qui infix strL `v_ref' 1-1000 using "`doi_ref'.txt", clear
*-deal with special characters
* e.g. for crossref
* "growth and energy, R&D Exp, Business & Economics"
* ----- ------ ----- -----
* e.g. for datacite
* "Generalized difference-in-differences"
* --- ----
* "Xxxx" to "Xxxx"
qui replace `v_ref' = ustrregexra(`v_ref', `"(.*)"', `"$1"')
local regex "(<\/?.+?>)"
qui replace `v_ref' = ustrregexra(`v_ref', `"`regex'"', "", 1)
local regex "&"
qui replace `v_ref' = subinstr(`v_ref', `"`regex'"', "&", .)
if "`latex'"!=""{ // update: 2024/1/13 18:14
local regex "&"
qui replace `v_ref' = subinstr(`v_ref', "`regex'", "\&", .)
}
*-Author names: from 'ALL Upper' to 'Proper'
* e.g. GOLDSTEIN, I., YANG, S., & ZUO, L. (2023)
* to Goldstein, I., Yang, S., & Zuo, L. (2023).
* All authors
tempvar au
qui split `v_ref', parse(`". ("') gen(`au')
qui replace `au'1 = ustrtitle(`au'1) // From: HÉMET, C., & MALGOUYRES, C. (2017)
// To: Hémet, C., & Malgouyres, C. (2017)
qui replace `v_ref' = `au'1 + ". (" + `au'2
qui cap drop `au'*
*-full meta data (citation)
local ref0 = `v_ref'[1] // the meta data
*-delete trailing blanks
local ref0 = strrtrim(`"`ref0'"')
*-delete special characters (e.g., '\t', '\n')
local ref0 = subinstr(`"`ref0'"', char(9), "", .) // 去掉制表符
local ref0 = subinstr(`"`ref0'"', char(10), "", .) // 去掉换行符
local ref0 = subinstr(`"`ref0'"', char(13), "", .) // 去掉回车符
local ref0 = subinstr(`"`ref0'"', char(12), "", .) // 去掉换页符
local ref0 = subinstr(`"`ref0'"', char(11), "", .) // 去掉垂直制表符
local ref0 = subinstr(`"`ref0'"', char(0), "", .) // 去掉空字符
local ref0 = subinstr(`"`ref0'"', char(8), "", .) // 去掉退格符
*-shorter Journal name
* From: The Stata Journal:Promoting Communications on Statistics and Stata"
* To: The Stata Journal"
* local regex ": Promoting Communications on Statistics and Stata"
local regex ":[\s]?Promoting.*on .*and Stata"
local ref0 = ustrregexra(`"`ref0'"', `"`regex'"', "", 1)
*-ref_body, ref_full
local regex " http[s]?://.+"
local ref_body = ustrregexra(`"`ref0'"', `"`regex'"', "", 1)
*-display
if "`display'" != ""{
dis `"`ref0'"'
}
*-return value
return local ref_full `"`ref0'"'
return local ref_body `"`ref_body'"'
return local url `"`url'"'
return local DOI `"`DOI'"'
return local server "`server'"
restore
end
/* === test ===
global DOI "10.1111/j.1467-629X.2010.00375.x"
global DOI "10.1016/j.jeconom.2020.06.003"
global DOI "10.1111/1475-679X.12496" // GOLDSTEIN, I., YANG, S., & ZUO, L. (2023).
global doi "10.48550/arXiv.2312.05400" // datacite
global doi "10.14454/fxws-0523" // datacite
// set trace on
get_doidata $doi
ret list
get_doidata $doi, dis
ret list
dis "|`r(ref_body)'|"
dis "|`r(ref_full)'|"
*/
*------------------ subprogram ------------- get_error_doi_data .ado
cap program drop get_error_doi_data
program define get_error_doi_data, rclass
syntax anything(name=DOI) [, Simple Only]
dis as error `"Failed to get data for DOI: {cmd:`DOI'}. Visit {browse "https://doi.org/`DOI'":{ul:article page}}"'
if "`only'" !=""{
dis as error `"Only DOI with agency {browse "https://project-thor.readme.io/docs/who-are-datacite-and-crossref":{ul:crossref}} or {browse "https://project-thor.readme.io/docs/who-is-crossref":{ul:datacite}} is supported. See {browse "https://www.doi.org/the-community/existing-registration-agencies/":DOI Servers} for details"'
}
if "`simple'" == ""{
dis as error `"Check the validity of DOI at {browse "https://doi.org/`DOI'":doi.org}, or try it later."'
}
exit 601
end
*------------------ subprogram ------------- get_au_yr_ti.ado
* version 1.2 change the input from 'varname' to 'string' (ref_full)
* 2023/12/23 16:41
cap program drop get_au_yr_ti
program define get_au_yr_ti, rclass
* input: {DOI}
* output: filename --> Author-Year-Title
syntax anything [, DOI(string) doifn ]
preserve
*-delete "
local anything = subinstr(`"`anything'"', `"""', "", .)
*-begin
local ref_body = `"`anything'"' // reference
* First Author
// tempvar au
// qui split `varlist', parse(,) gen(`au')
local regex `"(^.+?),"'
if ustrregexm(`"`ref_body'"', "`regex'"){
local author = ustrregexs(1)
}
else{
local author = ""
}
* Year
local regex = `"(?<= \()(\d\d\d\d)(?=\))"'
if ustrregexm(`"`ref_body'"', "`regex'"){
local year = ustrregexs(0)
}
else{
local year = ""
}
* Title
local regex `"(?<=\).\s)(.+)(?=[\.\?]\s)"'
if ustrregexm(`"`ref_body'"', "`regex'"){
local title = ustrregexs(1)
}
else{
local title = ""
}
* Title: delete special characters
* doifn: transfer {DOI} to valid 'filename'
// "10.1111/j.1467-629X.2010.00375.x"
// to
// "10.1111_j.1467-629X.2010.00375.x" // "`doifn'" == ""
// "10_1111_j_1467-629X_2010_00375_x" // "`doifn'" != ""
if "`doi'" != ""{
if "`doifn'" != ""{
local doi = ustrregexra(`"`doi'"', "[^0-9a-zA-Z-]", "_")
}
else{
local doi = ustrregexra(`"`doi'"', "/", "_")
}
local au_yr_doi = `"`author'-`year'-`doi'"'
}
else{
local au_yr_doi = `"`author'-`year'"'
}
* return values
return local au_yr_doi = `"`au_yr_doi'"'
return local au_yr_ti = `"`author' `year' `title'"'
return local au_yr = "`author' `year'"
return local title = "`title'"
return local year = "`year'"
return local author = "`author'"
restore
end
/*
// new version: 2023/12/23 17:33
global doi "10.1111/j.1467-629X.2010.00375.x"
global doi 10.1111/j.1467-629X.2010.00375.x
get_doidata $doi
ret list
global ref_body "`r(ref_body)'"
get_au_yr_ti "$ref_body", doi($doi)
ret list
get_au_yr_ti "$ref_body", doi($doi) doifn
ret list
*/
*------------------ subprogram -------------get_filename.ado
* version 1.0 29dec2023
* this is a simplified version of 'getfn.ado'
// Goal: delete/replace invalid characters in 'filename' (fn)
// The default invalid characters in filenames
* \ / : * ? " < > |
// Source: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file
cap program drop get_filename
program define get_filename, rclass
syntax anything(name=filename) [, Delete(string) Blank(string)]
*-user specified characters to be deleted
if "`delete'" != ""{
local delete = subinstr("`delete'", " ", "",.)
}
*-replace invalid characters with 'blanks'
local filename = ustrregexra(`"`filename'"', `"[",,\\/:\*\?<>|`delete']"', " ")
local filename = ustrregexra(`"`filename'"', `" "', " ")
local filename = ustrregexra(`"`filename'"', `" "', " ")
*-delete leading and trailing blanks
local filename = strtrim(`"`filename'"')
*-option -blank(string)-: replace blank with '-'; default: '_'
* option blank()
* "_" default, if user do not specify option -blank-, i.e., "`blank'"==""
* " " blank(keep)
* "-" blank(bar)
if "`blank'" == ""{
local reblank "_"
}
if "`blank'" == "bar"{
local reblank "-"
}
if "`blank'" == "keep"{
local reblank " "
}
*-replace blank
local fn = ustrregexra("`filename'", "\s", "`reblank'")
*-return value
return local fn "`fn'" // return value, nb: No Blank
end
*------------------ subprogram -------------get_cite.ado
/*
test:
global DOI "10.1093/rfs/hhs072"
global DOI "10.1257/aer.109.4.1197"
global DOI "10.1016/j.jeconom.2020.10.012" // 2 authors
get_cite body, doi("$DOI")
ret list
get_cite body, doi("$DOI") link
ret list
*/
cap program drop get_cite
program define get_cite, rclass
* input: {DOI}
* output: Author (Year)
syntax varname , DOI(string) [Link Intext Latex]
preserve
local ref_body = `varlist'[1] // reference
local DOI "`doi'"
* Year
local regex = `"(?<= \()(\d\d\d\d)(?=\))"'
if ustrregexm(`"`ref_body'"', "`regex'"){
local year = ustrregexs(0)
local year_link = "[`year']()]"
}
else{
local year = ""
}
* First Author
tempvar au
qui split `varlist', parse(,) gen(`au')
local au_1 = `au'1[1]
cap drop `au'*
* All authors
tempvar au aulist
qui split `varlist', parse(`". ("') gen(`au')
qui gen `aulist' = `au'1
local authors = `au'1[1]
cap drop `au'*
*-Number of authors
local n_authors = length("`authors'") - length(subinstr("`authors'", ".,", "", .))
local n_authors = `n_authors'/2 + 1
*-get Author 2
tempvar au au_u
if `n_authors' == 2{
replace `aulist' = subinstr(`aulist', "\", "", .)
qui split `aulist', parse(`" & "') gen(`au')
qui split `au'2, parse(,) gen(`au_u')
local au_2 = `au_u'1[1]
}
*-Link
if "`link'" != ""{
local art_link "https://doi.org/`DOI'"
local pdf_web "${sci__hub_}/`DOI'" // http://sci-hub.ren/ or return by get_scihub.ado
if "`latex'"==""{ // link with Markdown format
local au_1 "[`au_1'](`art_link')"
local year "[`year'](`pdf_web')"
}
else{
local au_1 "\href{`art_link'}{`au_1'}"
local year "\href{`pdf_web'}{`year'}"
}
}
*-cite format
if `n_authors' == 1{ // Hansen (2023)
if "`intext'" == "" local cite "`au_1' (`year')"
else local cite "(`au_1', `year')"
}
else if `n_authors' == 2{ // Hansen and Levin (2023)
if "`intext'" == "" local cite "`au_1' and `au_2' (`year')"
else local cite "(`au_1' and `au_2', `year')"
}
else{ // #(authors)>=3 Hansen et al. (2023)
if "`intext'" == "" local cite "`au_1' et al. (`year')"
else local cite "(`au_1' et al., `year')"
}
if `n_authors' == 2{
return local au2 "`au_2'"
}
else{
return local au2 " "
}
*--------------
* return values
return scalar n_authors = `n_authors'
return local au1 = "`au_1'"
return local year = "`year'"
return local authors = "`authors'"
return local cite = "`cite'"
restore
end
*------------------ subprogram ------------- get_doi_scihub_special.ado
* version 1.0, 17Dec2023
* Yujun Lian, arlionn@163.com
*-Goal: transfer special characters in {DOI} to ASCII
* get {url} of PDF documents using SCI-HUB
*- input: {DOI}
*-output:
* {DOI_scihub}
* "`scihub'/`DOI_scihub'`suffix'" (url of PDF)
cap program drop get_doi_scihub_special
program define get_doi_scihub_special, rclass
syntax anything(name=DOI) [, View]
local DOI = subinstr(`"`DOI'"', `"""', "", .) // 去掉多余的引号
// local DOI "10.1177/1536867X1101100308"
// local DOI "10.1111/j.1467-629X.2010.00375.x"
*-deal with ASCII characters in {DOI}
* '10.1016/0304-4076(74)90034-7' -->
* '10.1016/0304-4076%252874%252990034-7'
*
* '10.1177/1536867x19830921' -->
* '10.1177%2F1536867x19830921'
* ---
// URL --> percent-encoded ASCII format
// mata: st_local("DOI_ascii", urlencode(`"`DOI'"'))
mata: st_local("DOI_test", urlencode("`DOI'"))
*-number of '%': special characters
local n_per = length("`DOI_test'") - length(subinstr("`DOI_test'", "%", "", .))
*-lowering
if `n_per' > 1{
local DOI_lower = lower("`DOI'")
}
else{
local DOI_lower = "`DOI'"
}
*-URL --> percent-encoded ASCII format
if `n_per' == 1 & strpos("`DOI_test'", "%2F")>0{
local DOI_scihub = "`DOI'"
}
else{
mata: st_local("DOI_ascii", urlencode("`DOI_lower'"))
local DOI_ascii = subinstr("`DOI_ascii'", "%2F", "/" , 1) // '%2F' --> '/'
local DOI_scihub = subinstr("`DOI_ascii'", "%" , "%25", .) // '%28' --> '%2528'
}
*-PDF link of SCI-HUB
local scihub "https://sci.bban.top/pdf"
local pdf_scihub "`scihub'/`DOI_scihub'.pdf"
local pdf_scihub_br = `"{browse "`pdf_scihub'" : PDF}"'
*-view
if "`view'" != ""{
dis `"`pdf_scihub_br'"'
}
*-return values
return local scihub = "`scihub'"
return local pdf_scihub_br = `"`pdf_scihub_br'"'
return local pdf_scihub = "`pdf_scihub'"
return local doi_ascii = "`DOI_ascii'" // Stata 格式
return local doi_scihub = "`DOI_scihub'"
end
/*
*--- Test ----
set trace on
local DOI "10.1002/1521-3951(200101)223:1<293::AID-PSSB293>3.0.CO;2-N"
get_doi_scihub_special "`DOI'"
ret list
get_doi_scihub_special "`DOI'", view
ret list
cls
set trace on
local DOI "10.1177/1536867X1101100308"
get_doi_scihub_special "`DOI'", view
ret list
*/
*------------------ subprogram -------------get_pdf_scihub.ado
* version 1.2 2023/12/24 11:10
* chg: sometimes, SCI-HUB may change the Upper letter in {DOI} into lower case,
* so that we can download the PDF document properly
* download PDF article using sci-hub
* input: {DOI}
* output: PDF document of the article given {DOI}
* output: ../pwd/filename.pdf
* Example:
* get_pdf `DOI', saving(`filename')
*-- basic idea: --
* copy "https://sci.bban.top/pdf/{DOI}.pdf" abc.pdf
cap program drop get_pdf_scihub
program define get_pdf_scihub, rclass
syntax anything [ , Saving(string) Path(string) ]
// local DOI = subinstr(`"`DOI'"', `"""', "", .)
if strpos(`"`anything'"', `"""')>0{
local DOI `anything'
}
else{
local DOI "`anything'"
}
*-path
if "`path'" == ""{
local path: pwd
}
*-filename
if "`saving'" != ""{
local fn "`saving'"
}
else{ // use DOI as file
local fn = "_" + ustrregexra("`DOI'", "[^0-9a-zA-Z]", "_")
}
*-download
* copy "https://sci.bban.top/pdf/{DOI}.pdf" abc.pdf
* ------------------------ -----
* scihub suffix
* e.g. https://sci.bban.top/pdf/10.1111/j.1467-629x.2010.00375.x.pdf
get_doi_scihub_special `DOI' // deal with special characters
local DOI_scihub "`r(doi_scihub)'"
// dis in red "`DOI_scihub'" // ++++++++++++++++++++++++++++++++++++
local scihub "https://sci.bban.top/pdf"
// local suffix ".pdf"
local pdf_url "`scihub'/`DOI_scihub'.pdf"
local link "https://doi.org/`DOI'"
local pdf_web "${sci__hub_}/`DOI'"
// dis "`pdf_url'"
*-把 // 修改为 /
local pdf_url = subinstr("`pdf_url'", `"//10"', "/10",.)
*-download PDF file
cap qui copy `"`pdf_url'"' `"`path'/`fn'.pdf"', replace // download PDF document
if _rc{ // >>> try 1: lower case of the {DOI}
local DOI_scihub = lower("`DOI'") // https://sci.bban.top/pdf/10.1111/j.1467-629X.2010.00375.x.pdf
// change to ('X' --> 'x')
// https://sci.bban.top/pdf/10.1111/j.1467-629x.2010.00375.x.pdf
local pdf_url "`scihub'/`DOI_scihub'.pdf"
cap qui copy `"`pdf_url'"' `"`path'/`fn'.pdf"', replace // download PDF document
if _rc==0{
local pdf_ok = 1
}
else{ // >>> try 2: upper case of the {DOI}
local DOI_scihub = upper("`DOI'") // https://sci.bban.top/pdf/10.3982/ecta13117.pdf
// change to
// https://sci.bban.top/pdf/10.3982/ECTA13117.pdf
local pdf_url "`scihub'/`DOI_scihub'.pdf"
cap qui copy `"`pdf_url'"' `"`path'/`fn'.pdf"', replace // download PDF
if _rc==0{
local pdf_ok = 1
}
else{
local pdf_ok = 0
}
}
}
else{
local pdf_ok = 1
}
*-display error message, hints and show results
if `pdf_ok' == 0{
dis as error "Failed to Download/Save PDF document. Possible reasons:"
dis as error " 1. The PDF document with same filename has been opened, and is read-only."
dis as error `" 2. The paper is too new, or is a working paper. You can visit {browse "`link'":{ul:aricle page}} to download by hand using filename:"'
dis as text _skip(2) `"{cmd:`fn'}."'
dis as error `" 3. There are some wrong with the URL. You can try {browse "`pdf_web'":PDF_online}."'
dis as text "If necessary, report bugs to ."
// exit
}
else{ // 这部分内容做成一个子程序 get_pdf_dis.ado
*local path `"`c(pwd)'"'
local path = subinstr(`"`path'"', "\", "/", .)
if "`c(os)'" == "Windows" {
noi dis _col(9) "{cmd:PDF:}" ///
_skip(2) `"{browse `"`path'"': dir}"' ///
_skip(3) `"{browse `"`pdf_url'"': view_online}"' ///
_skip(4) `"{stata `" winexec cmd /c start "" "`path'/`fn'.pdf" "' : Open}"'
}
if "`c(os)'" == "MacOSX" {
noi dis _col(4) "{cmd:PDF:}" ///
_skip(2) `"{browse `"`path'"': dir}"' ///
_skip(3) `"{browse `"`pdf_url'"': view_online}"' ///
_skip(4) `"{stata `" !open "`path'/`fn'.pdf" "' : Open}"'
}
}
*-return value
return local pdfurl = `"`pdf_url'"'
return local pdfweb = `"`pdf_web'"'
return scalar pdf_got = `pdf_ok' // Fail to download PDF document?
end
/*
*- test
global DOI "10.1111/j.1467-629X.2010.00375.x" // need lower
global DOI "10.3982/ecta13117" // need upper
global DOI "10.1016/j.jbankfin.2019.07.014"
cls
set trace on
get_pdf_scihub $DOI
get_pdf_scihub $DOI, saving(Hansen-2021)
get_pdf_scihub $DOI, saving(Hansen-2021) hline
*/
/*
to be done: sepecial case
* SCI-HUB 中部分 DOI 的大小写更改具有随机性
* REST 的其他文章不存在这个问题
DOI: 10.1162/rest_a_00775
https://sci.bban.top/pdf/10.1162/REST_a_00775.pdf
*/
*------------------ subprogram -------------get_pdf_nonSCIHUB.ado
* version 1.1 2024/1/10 10:03
* download PDF documents for articles in
* NBER, arXiv, and Open-access Journals
* input: {DOI}
* output: PDF document of the article given {DOI}
* output: ../pwd/filename.pdf
* Example:
* get_pdf `DOI', saving(`filename')
*-- basic idea: --
* copy "https:xxxx/pdf/{DOI}.pdf" path/abc.pdf
cap program drop get_pdf_nonSCIHUB
program define get_pdf_nonSCIHUB, rclass
syntax anything(name=DOI) [ , Saving(string) Path(string) ]
local DOI = subinstr(`"`DOI'"', `"""', "", .) // delete '"'
* article page
local link "https://doi.org/`DOI'"
local link_br `"{browse "`link'":Link}"'
*-path
if "`path'" == ""{
local path: pwd
}
*-filename
if "`saving'" != ""{
local fn "`saving'"
}
else{ // use DOI as file
local fn = "_" + ustrregexra("`DOI'", "[^0-9a-zA-Z]", "_")
}
*-download
*-arXiv
* DOI: 10.48550/arXiv.2312.05400 --> 10.48550/arXiv.{ID}
local key "10.48550/arXiv"
if strpos(`"`DOI'"', "`key'"){
local ar_ID = subinstr("`DOI'", "`key'.", "", 1) // get: 2312.05400 ({article ID})
local pdf_url "https://arxiv.org/pdf/`ar_ID'.pdf"
}
*-NBER with PDF
* DOI: 10.3386/w31184 --> 10.3386/{ar_ID}
* PDF: https://www.nber.org/system/files/working_papers/{ID}/{ID}.pdf
* - e.g. https://www.nber.org/system/files/working_papers/w31184/w31184.pdf
local key "10.3386/"
if strpos(`"`DOI'"', "`key'"){
local ar_ID = subinstr("`DOI'", "`key'", "", 1) // get: w31184 ({article ID})
local pdf_root "https://www.nber.org/system/files/working_papers"
local pdf_url "`pdf_root'/`ar_ID'/`ar_ID'.pdf"
}
*---------------------
*- Open Access Journal
*---------------------
*-QE
* DOI: 10.3982/QE1288
* PDF: https://onlinelibrary.wiley.com/doi/epdf/10.3982/QE1288
local key "10.3982/QE"
if strpos(`"`DOI'"', "`key'"){
local link "https://onlinelibrary.wiley.com/doi/`DOI'"
local pdf_root "https://onlinelibrary.wiley.com/doi/epdf"
local pdf_url "`pdf_root'/`DOI'"
}
*-Stata Journal
* DOI: 10.1177/1536867X
* PDF: https://journals.sagepub.com/doi/epdf/10.1177/1536867X1801800306
local key "10.1177/1536867"
if strpos(`"`DOI'"', "`key'"){
local pdf_root "https://journals.sagepub.com/doi/epdf"
local pdf_url "`pdf_root'/`DOI'"
}
*-Save the PDF document
*-把 // 修改为 /
local pdf_url = subinstr("`pdf_url'", `"//10"', "/10",.)
*-download PDF file
cap qui copy `"`pdf_url'"' `"`path'/`fn'.pdf"', replace // download PDF document
if _rc{
local pdf_ok = 0
}
else{
local pdf_ok = 1
}
*-display error message, hints and show results
if `pdf_ok' == 0{
dis as error "Failed to Download/Save PDF document. Possible reasons:"
dis as error " 1. The PDF document with same filename has been opened, and is read-only. You can close this file and rename it"
dis as error `" 2. There are some wrong or changes with the URL. You can visit {browse "`link'":aricle page} and download by hand."'
dis as error "If necessary, report bugs to ."
}
else{ // TBD: 这部分内容做成一个子程序 get_pdf_dis.ado
local path = subinstr(`"`path'"', "\", "/", .)
if "`c(os)'" == "Windows" {
noi dis _col(9) "{cmd:PDF:}" ///
_skip(2) `"{browse `"`path'"': dir}"' ///
_skip(3) `"{browse `"`pdf_url'"': view_online}"' ///
_skip(4) `"{stata `" winexec cmd /c start "" "`path'/`fn'.pdf" "' : Open}"'
}
if "`c(os)'" == "MacOSX" {
noi dis _col(4) "{cmd:PDF:}" ///
_skip(2) `"{browse `"`path'"': dir}"' ///
_skip(3) `"{browse `"`pdf_url'"': view_online}"' ///
_skip(4) `"{stata `" !open "`path'/`fn'.pdf" "' : Open}"'
}
}
*-return value
return local pdfurl = `"`pdf_url'"'
return scalar pdf_got = `pdf_ok' // Fail to download PDF document?
end
/*
test
*-arXiv
global DOI "10.48550/arXiv.1301.3781"
*-NBER
global DOI 10.3386/w31184
global DOI 10.3386/w3110
*-QE Open Access Journal
global DOI 10.3982/QE1288
get_pdf_nonSCIHUB $DOI
*/
*------------------ subprogram ------------- get_bib.ado
* version 1.0 13Dec2023
* Yujun Lian
cap program drop get_bib
program define get_bib, rclass
version 14
*:Goal: download and list .ris and .bibtex files for given {DOI}
*-- input: {DOI} e.g., 10.1016/j.jbankfin.2019.07.014
*-- output: doi.ris, doi.bibtex
*::ideas: "`API'/`DOI'/`trans'"
* e.g, "http://api.crossref.org/works/{DOI}/transform/application/x-bibtex"
syntax anything(name=DOI) [, Path(string) Notip ]
*-generate filename according {DOI} (replace '.' with '_')
local fn = "_" + ustrregexra("`DOI'", "[^0-9a-zA-Z]", "_")
* input: "10.1257/aer.109.4.1197"
*output: "_10_1257_aer_109_4_1197"
*-common url
local API "http://api.crossref.org/works"
*-.bibtex (.tex file or Zotero)
local trans "transform/application/x-bibtex"
local url_bib "`API'/`DOI'/`trans'"
*-.ris (endnote, refworks, ProCite, Reference Manager)
local trans "transform/application/x-research-info-systems"
local url_ris "`API'/`DOI'/`trans'"
* help document
* .RIS file can be used to import meta data to software like: ProCite, Reference Manager, EndNote
* .bibtex file can be used to import meta data to .tex file or Zotero software
cap copy `"`url_bib'"' "`path'/`fn'.bibtex", replace
if _rc==0{
local got_bib = 1
qui copy `"`url_ris'"' "`path'/`fn'.ris", replace
}
else{
local got_bib = 0
dis as error "Warning: Can not download '.bibtex' and '.ris' files. This may occur for newly published papers or working papers. Please check your {DOI}."
dis as error "You can report bugs to ."
// exit
}
if `got_bib' == 1 {
// local path : pwd // Current working directory
local path = subinstr(`"`path'"', "\", "/", .)
if "`pdf'" == ""{
local dis_dir " . "
}
else{
local dis_dir "dir"
}
if "`c(os)'" == "Windows" {
noi dis _col(4) "{cmd:Citation:}" ///
_skip(2) `"{browse `"`path'"': `dis_dir'}"' ///
_skip(3) `"{stata `" winexec cmd /c start "" "`path'/`fn'.bibtex" "' : Bibtex}"' ///
_skip(4) `"{stata `" winexec cmd /c start "" "`path'/`fn'.ris" "' : RIS}"'
}
if "`c(os)'" == "MacOSX" {
noi dis _col(4) "{cmd:Citation:}" ///
_skip(2) `"{browse `"`path'"': `dis_dir'}"' ///
_skip(3) `"{stata `" !open "`path'/`fn'.bibtex" "' : Bibtex}"' ///
_skip(4) `"{stata `" !open "`path'/`fn'.ris" "' : RIS}"'
}
if "`notip'" == ""{
noi dis _col(4) as text "Notes: {cmd:RIS} - EndNote, ProCite, Mendeley"
noi dis _col(11) as text "{cmd:Bibtex} - LaTeX, Zotero, Mendeley"
}
}
*-return value
return scalar got_bib = `got_bib'
if `got_bib' == 1 {
return local bibtex = `"`url_bib'"'
return local ris = `"`url_ris'"'
}
else{
return local bibtex = ""
return local ris = ""
}
end
/* test and Examples
local DOI "10.1016/j.jbankfin.2019.07.014"
get_bib `DOI'
ret list
==Citation Export== RIS: EndNote, ProCite, Mendeley
dir Bibtex: LaTeX, Zotero, Mendeley
Windows: Bibtex RIS MacOS: Bibtex RIS
macros:
r(url_ris) : "http://api.crossref.org/works/10.1016/j.jbankfin.2019.07.014/transform/application/x-research-info-systems"
r(url_bib) : "http://api.crossref.org/works/10.1016/j.jbankfin.2019.07.014/transform/application/x-bibtex"
r(doi) : "10.1016/j.jbankfin.2019.07.014"
*/
*------------------ subprogram ------------- get_checkpath.ado
cap program drop get_checkpath
program define get_checkpath, rclass
version 14
syntax anything(name=path)
local pwd : pwd
local path = subinstr(`"`path'"', `"""', "", .) // 去掉双引号
if strpos(`"`path'"', "/") | strpos(`"`path'"', "\"){ // full path
cap cd `"`path'"'
if _rc{
dis as text `"'`path'' does not exist."'
local path = subinstr(`"`path'"', "/", "\", .)
dis "`path'"
!md "`path'"
cap noi cd `"`path'"'
if _rc==0{
dis as text `"new path '`path'' is created"'
}
else{
dis as error "invalid path(), please check it"
exit
}
}
}
else{
local pwd : pwd
cap cd "`path'"
if _rc{
mkdir "`path'"
cd "`path'"
}
local path `"`pwd'/`path'"'
}
local path = subinstr(`"`path'"', "\", "/", .)
return local path "`path'"
cd "`pwd'"
end
/*
=== test
get_checkpath aaa
ret list
get_checkpath "D:/___temp/delete_later"
ret list
*/
*------------------ subprogram ------------- get_clipout.ado --v2--
* version 1.1 2023/2/23 16:15
* version 1.2 2023/11/14 23:07
* echo text to clipboard. Support: Windows, MacOSX
* Tips
* 1. The 'notice' appears no more than 3 times
* 2. Once -NOTIP- specified, the 'notice' will not appear before you restart Stata
* 3. You can execute "global clipout__times_ = 10" to hind 'notice'
* notice := "Text is on clipboard. Press '`shortcut'' to paste"
* =refs:
* https://www.alphr.com/echo-without-newline/
* https://linuxhandbook.com/echo-without-newline/
cap program drop get_clipout
program define get_clipout
syntax anything [, Clipoff NOTIP]
if "`clipoff'" ~= ""{
exit
}
if "`c(os)'" == "Windows" {
local shellcmd `"shell echo | set /p=`anything'| clip"'
local shortcut "Ctrl+V"
}
if "`c(os)'" == "MacOSX" {
local shellcmd `"shell echo -n `anything'| pbcopy"'
local shortcut "Command+V"
}
`shellcmd' // auto copy to clipboard
*-Warning for non ASCII characters
*local au "M. Dąbrowski, Papież, M."
// mata: st_local("Yes_ascii", strofreal(isascii("`anything'")))
// mata: st_local("Yes_ascii", strofreal(isascii(`anything')))
// if `Yes_ascii'!=1{
// dis as text _n "Warning: Non ASCII characters found and may not display properly."
// dis as text "You'd better manually copy the results"
// }
*-notip
if "`notip'" == ""{
dis as text "{cmd:Tips}: Text is on clipboard. Press '{cmd:`shortcut'}' to paste, ^-^"
}
end
*-----------------------------------------
* version 1.1 2023/9/26 23:10
* Yujun Lian, arlionn@163.com
/*
# Description Goal:
Package 'get_scihub' displays and checks the valid URLs of SCI-Hub, a
special website to search or browse academic papers.
For some reasons, the URL of SCI-Hub always change.
Some commonly use URLs are listed in
"https://lovescihub.wordpress.com/"
Their URLs share the format as: http(s)://sci-hub.xx.
For example, http://sci-hub.ren/.
'get_scihub' can also be used to get the PDF link of an article even though
the URL of SCI-Hub is changing. The dynamic PDF link will be:
http://{best URL returned by get_scihub}/{DOI}
# Methods:
1. copy webpage of "https://lovescihub.wordpress.com/"
2. get the URLs using regular expression
3. if -check- option is specified, check the validity of URLs, and keep the
valid ones.
4. if -list- option is specified, list the URLs and Speed (in seconds) in
Stata's Results Window. The URLs listed are clickable.
5. The return values include: the fast URL, r(best); all URLs listed in
"https://lovescihub.wordpress.com/"
# Options:
List: list URLs which is clickable and speed (seconds)
Check: check the validity of URLs listed in https://lovescihub.wordpress.com/,
and keep only the valid ones.
# Usage and Examples
(1) display a list of valid URLs of SCI-Hub
. get_scihub, list
(2) check validity and list valid ones
. get_scihub, list check
(3) programming use
. qui get_scihub
. local scihub "`r(best)'"
. local DOI "10.1257/aer.109.4.1197"
. view browse "`scihub'/`DOI'" // open the PDF document
* URL format: ren, wf, pk, ee, click
sci-hub.ee | sci-hub.ren | sci-hub.ru
sci-hub.se | sci-hub.st | sci-hub.tf | sci-hub.wf
*/
cap program drop get_scihub
program define get_scihub, rclass
version 14
syntax [, Check List]
// detail: save the full list of sci-hub host as return macros
// list: list the full list of sci-hub host as results
preserve
qui {
*-download webpage and save as .dta
local lovescihub "https://lovescihub.wordpress.com/"
tempfile html_text
cap copy "`lovescihub'" "`html_text'.txt", replace
if _rc{
dis as error "Fail to connect to `lovescihub'"
dis as error "Try http://sci-hub.ren/ or http://sci-hub.ee/"
exit
}
tempvar v
qui infix strL `v' 1-1000 using "`html_text'.txt", clear
*-extract
keep if strpos(`v', "://sci-hub")
local regex "https?://sci-hub\.[a-z]{2,}"
gen url = ustrregexs(0) if ustrregexm(`v', "`regex'") // URL of sci-hub
duplicates drop url, force
*-check the validity of URLs, and keep valid URLs
if "`check'" != ""{
local N = _N
tempfile testfn
tempvar IsValid
gen `IsValid' = 1
gen IsValid = 1
forvalues i = 1/`N'{
local urli = url[`i']
cap copy "`urli'" "`testfn'.txt", replace // Check if the urli is valid
if _rc{ // not valid
replace `IsValid' = 0 in `i'
}
}
keep if `IsValid' == 1
local N = _N
if `N' == 0{
local solution "https://lovescihub.wordpress.com/solutions/"
noi: dis as error "Can not find valid URL. " _c
noi: dis as text `"See:{browse "`solution'": Possible Solutions}"'
exit
}
}
*============ return and display =======
/* Options:
Default: return r(best), all URLs listed in https://lovescihub.wordpress.com/
List: list URLs which is clickable and speed (seconds)
Check: check the validity of URLs listed in https://lovescihub.wordpress.com/,
and keep only the valid ones.
*/
*-return full list of URLs
local N = _N
return scalar N = `N'
forvalues i = `N'(-1)1{
return local s`i' = url[`i']
}
*-the best (fast)
local best = url[1]
return local best = "`best'" // the fast/best URL
*-display the full list of URLs
if "`list'" != ""{
// connect speed (seconds)
local regex "[\d]\.[\d]{1,}(?=s)"
gen seconds = ustrregexs(0) if ustrregexm(`v', "`regex'")
local love "https://lovescihub.wordpress.com/"
noi: dis _col(10) "URL" _col(34) "Seconds " _c
noi: dis `"{browse "`love'": [Source]}"'
forvalues i = 1/`N'{
local url = url[`i']
local sec = seconds[`i']
noi: dis _col(3) "`i'." _col(7) `"{browse "`url'": `url'}"' _col(34) "`sec'"
}
}
} // quietly over
restore
end
/*
*===== test ======
get_scihub
ret list
get_scihub, check
ret list
clear
set trace on
get_scihub, list
get_scihub, l
ret list
*/