*!Second Generation P-Values Calculations
*!Based on the R-code for sgpvalue.R from the sgpv-package from https://github.com/weltybiostat/sgpv
*!Version 1.07a 13.07.2022: Fixed the example "Simulated two-group dichotomous data for different parameters" in the help file. No actual code changes.
*!Version 1.07	15.05.2022: Handling of larger inputs and nomata-option should work now as intended. Previously, not all possible combinations of large inputs and nomata-option have been supported correctly. 
*Version 1.06  13.02.2022: Fixed a bug when variables as input with only one null-hypothesis is used, but only a few observations exist. ///
							Fixed a bug which prevented one of the examples from the help file to run.
*Version 1.05  01.11.2020: Fixed a bug in an input check which made it impossible to use missing values as input for one-sided intervals. ///
							Fixed a bug which set delta incorrectly when calculating the deltagap for one-sided intervals. 
*Version 1.04  05.07.2020: Added/improved support matrices as inputs for options "esthi" and "estlo". Noshow-option now works expected. 
*Version 1.03a 23.06.2020: Removed unnecessary input checks
*Version 1.03 24.05.2020 : Added further input checks	
*Version 1.02 06.04.2020 : Added another check to prevent using more than one null interval with variables or large matrices as input estlo and esthi, added two more input error checks -> some non-sensical input is probably still possible. 
*Version 1.01 28.03.2020 : Fixed the nodeltagap-optin -> now it works in all scenarios, previously it was missing in the Mata version and ignored in the variable version of the computing algorithm.	
*Version 1.00 : Initial SSC release, no changes compared to the last Github version.
*Version 0.98a: Fixed an incorrect comparison -> now the correct version of the SGPV algorithm should be chosen if c(matsize) is smaller than the input matrix; added more examples from the original R-code to the help-file.
*				Fixed a bug with the nodeltagap-option.
*				Added more examples from the R-code (as a do-file).
*Version 0.98 : Implement initial handling of infinite values (one sided intervals) -> not 100% correct yet -> treatment of missing values in variables is questionable and might need further thought and changes
*Version 0.95a: Fixed some issues in the documentation.
*Version 0.95 : Added support for using variables as inputs for options esthi() and estlo(); Added Mata function for SGPV calculations in case c(matsize) is smaller than the input vectors; 
*				Added alternative approach to use variables for the calculations instead if variables are the input -> Mata is relatively slow compared to using only variables for calculations.
*Version 0.90  : Initial release to Github 
*Handling of infinite values depends on whether variables or "vectors" are used as input. But it should not matter for calculations.  
*To-do: 	At some point rewrite the code to use only Mata for a more compact code -> currently three different versions of the same algorithm are used.
*			Add an option to format the output -> exists already for the sgpv-command, could added if needed	
*			Unify options nulllo and nullhi into one option named "null" and options estlo and esthi into "est" or "ALTernative" to make it easier for users to enter null-hypothesis and alternative hypothesis or estimated intervals
*			Allow matrices as input for the (null-)intervals?
*			Rename variables created by sgpv_var from "pdelta" and "dg" to "_pdelta" and "_dg" to decrease the risk having user created variables with the same names.



capture program drop sgpvalue
program define sgpvalue, rclass
version 12.0 
syntax, estlo(string) esthi(string)  nulllo(string) nullhi(string) [NOWARNings INFcorrection(real 1e-5) nodeltagap nomata noshow replace] 

*Parse the input : 
*Check that the inputs are variables -> For the moment only allowed if both esthi and estlo are variables		
	*Check if input is matrix or variable and convert matrix into local macro
	foreach name in esthi estlo{
		capture confirm numeric variable ``name''
		if !_rc{
			local variable_`name' 1
		}
		else{
			capture confirm matrix ``name''
			if !_rc{
				local matrix_`name' 1
			}
		}
	}
	
	local variablefound 0 
	local matrixfound 0
	foreach type in variable matrix{
		if ("``type'_esthi'"=="1" & "``type'_estlo'"=="1"){
			local `type'found 1

		} 
		else if "``type'_esthi'"!="``type'_estlo'"{
			disp as error "Either `esthi' or `estlo' is not a `type'. Both options need to be a `type' if you want to use `type' as inputs."
			exit 198
		}
	}
	*Set macro determine which approach to use later 
	if "`variablefound'"=="1"{
		local estint =_N
	}
	else if "`matrixfound'"=="1"{
		local estint = rowsof(`esthi')
	}
	else{
		local estint : word count `esthi'
	}	
	
	*Convert matrices if they are smaller than c(matsize)
	if `matrixfound'==1{
		if rowsof(`esthi') <=c(matsize) | rowsof(`estlo')<=c(matsize){
			foreach mat in esthi estlo{
				convertMatrix ``mat'' `mat'
				local `mat' `r(`mat')'
			}
		}
	}

	
**Potential Errors
* Not all non-sensical inputs covered yet
	if `:word count `nullhi'' != `: word count `nulllo''{
		disp as error `"Options 'nullhi' and 'nulllo' do not contain the same number of arguments."'
		exit 198
	}

	if `:word count `esthi'' != `: word count `estlo''{
		disp as error `"Options 'esthi' and 'estlo' do not contain the same number of arguments."'
		exit 198
	}

	if wordcount("`nulllo'") != wordcount("`estlo'") & wordcount("`nulllo'")>1{
		disp as error `"Options 'nulllo' and 'nullhi' must only have one argument or exactly as many arguments as options 'esthi' and 'estlo'."'
		exit 198
	}

*Expand null-interval to match the number of estimated intervals -> Currently leads to an error if variables used as input and number of observations < c(matsize)
	if `:word count `nulllo''==1 & `variablefound'==0  & `=wordcount("`esthi'")'<=c(matsize){
		local nulllo = "`nulllo' " * `: word count `estlo''  
		local nullhi = "`nullhi' " * `: word count `esthi'' 
	}
	else if `:word count `nulllo''==1 & `variablefound'==1 & _N < c(matsize){ // Added new condition to account for situations if the number  number of observations < c(matsize)
		local nulllo = "`nulllo' " * _N  
		local nullhi = "`nullhi' " * _N
	}	



*Check if estint is larger than the current matsize
if `estint'>c(matsize){ //Assuming here that this condition is only true if variables used as inputs -> The maximum length of the esthi() and estlo() should not be as large as c(matsize).
	*An alternative based on variables if inputs are variables.
	local nodeltagap `deltagap'
	*if "`mata'"=="nomata" & "`variablefound'"=="1"{
	if "`mata'"=="nomata"{
		local nulllo = real(trim("`nulllo'"))
		local nullhi = real(trim("`nullhi'"))
		if wordcount("`nulllo'") >1 | wordcount("`nullhi'") > 1{
			stop "Only one null interval can be used when using variables as input. Provide only one value in options nulllo() and nullhi()."
		}
		
		sgpv_var ,esthi(`esthi') estlo(`estlo') nulllo(`nulllo') nullhi(`nullhi') `replace' `nodeltagap' infcorrection(`infcorrection')
	}
	else if "`mata'"==""{
		if wordcount("`nulllo'") >1 | wordcount("`nullhi'") > 1{
			stop "Only one null interval can be used when using variables or matrices larger than c(matsize) as input. Provide only one value in options nulllo() and nullhi()."
		}
		if "`variablefound'"=="1" local type "variable"
		if "`matrixfound'"=="1" local type "matrix"
		if "`variablefound'"=="0" & "`matrixfound'"=="0"{
			disp as error "Local macros used in the options esthi and estlo contain more than `=c(matsize)' entries."
			disp as error "Such large macros are not yet supported."
			disp as error "Please convert the macros to matrices or variables."
			exit 198
		} 
		*mata: sgpv("`estlo' `esthi'", "results", `nulllo', `nullhi', `infcorrection' , "`nodeltagap'") // Only one null interval allowed.
		mata: sgpv("`estlo' `esthi'", "results", `nulllo', `nullhi', "`type'" , `infcorrection' , "`nodeltagap'")
		*The same return as before but this time for the Mata function -> not the best solution yet.

		if `=colsof(results)'==2 mat colnames results = "SGPV" "Delta-Gap"
		if `=colsof(results)'==1 mat colnames results = "SGPV"
		
		if "`show'"!="noshow" matlist results ,names(columns) title(Second Generation P-Values)
		return matrix results = results
		*exit
	}
}
else{	// Run if rows less than matsize -> the "original" approach
	tempname results 
	mat `results' = J(`estint',2,0)	
	mat colnames `results' = "SGPV" "Delta-Gap"
	***Iterate over all intervalls to implement a parallel max and min function as in the R-code
	forvalues i=1/`estint'{
		*Parse interval -> Not the best names yet
		local null_lo  `: word `i' of `nulllo''
			isValid `null_lo' nulllo
			isInfinite `null_lo'
			if (`s(infinite)' == `=c(maxdouble)'){
			 local null_lo = `=c(mindouble)'
			} 
		local null_hi  `: word `i' of `nullhi''
			isValid `null_hi' nullhi
			isInfinite `null_hi'
			local null_hi = `s(infinite)'
		*Only required if no variables as input -> Is that check needed? If I found variables earlier than I should be already in a different algorithm?
		if `variablefound'==0{
		local est_lo  `: word `i' of `estlo''	
			isValid `est_lo' estlo
			isInfinite `est_lo'
			if (`s(infinite)' == `=c(maxdouble)'){
			local est_lo = `=c(mindouble)'
			}
			
		local est_hi  `: word `i' of `esthi''
			isValid `est_hi' esthi
			isInfinite `est_hi'
			local est_hi =`s(infinite)'
		}
		else{ // For which scenario did I include variables in the macro algorithm? In case someone does not want to create variables when using variables as input?
			local est_lo = `estlo'[`i']
			local est_hi = `esthi'[`i']
		}
		*Compute Interval Lengths 
		local est_len = `est_hi' - `est_lo'
		if "`est_len'"==".z_" local est_len = c(maxdouble) // the difference between two infinite is the largest missing value of Stata .z_
		
		local null_len = `null_hi' -`null_lo'
		if "`null_len'" ==".z_" local null_len = c(maxdouble)
		*Warnings -> Make warning messages more descriptive
			if "`warnings'"!="nowarnings"{
				if (`est_len'<0  ) & (`null_len'<0){
					disp "The `i'. interval length is negative. Upper and lower bound of the interval might be switched." 
				}
				if reldif(`est_len',`=c(maxdouble)')<1e-5 | reldif(`null_len',`=c(maxdouble)')<1e-5{ // Needs further corrections for everything close to but not exactly c(maxdouble)
					disp "The `i'. interval has infinite length."
				}				
				if (`est_len'==0 | `null_len'==0 ) {
					disp "The `i'. interval has zero length. Consider using an interval hypothesis instead of a point hypothesis."
				}
		}
		
		*SGPV computation--------------------------------------------------
		*Esthi and estlo are vectors		
		local overlap = min(`est_hi',`null_hi') - max(`est_lo',`null_lo')
		local overlap = max(`overlap',0)
		local bottom =	min(2*`null_len',`est_len')
		local pdelta = `overlap'/`bottom'
		
		***** Zero-length & Infinite-length intervals **** 
		** Overwrite missing values due to bottom = Inf
		if `overlap'==0 {
			local pdelta 0
		}
		
		*Overlap finite & non-zero but bottom = Inf
		if `overlap'!=0 & `overlap'!=c(maxdouble) & `bottom'>=(c(maxdouble)/2){
		local pdelta `infcorrection'
		}
	
		** Interval estimate is a point (overlap=zero) but can be in null or equal null pt
		if `est_len'==0 & `null_len'>=0 & `est_lo'>=`null_lo' & `est_hi'<=`null_hi' {
			local pdelta 1
		}
	
		** One-sided intervals with overlap; overlap == Inf & bottom==Inf -> Not correct yet -> works only if overlap and bottom are exact c(maxdouble) -> need a way for slightly smaller values -> used half of c(maxdouble) as the limit
		if `overlap'>=(c(maxdouble)/2) & `bottom'>=(c(maxdouble)/2) &(`est_lo'>=`null_lo' | `est_hi'<=`null_hi'){
			local pdelta 1
		 }
		 
		if `overlap'>=(c(maxdouble)/2) & `bottom'>=(c(maxdouble)/2) &  (`est_lo'<`null_lo' | `est_hi'>`null_hi'){
			local pdelta = 1-`infcorrection'
		 }
		** Interval estimate is entire real line and null interval is NOT entire real line
		if `est_lo'<=-c(maxdouble)/2 & `est_hi'>=c(maxdouble)/2{
			local pdelta 0.5
			}

		** Null interval is entire real line
		if `null_lo'>=-c(maxdouble)/2 & `null_hi'==c(maxdouble)/2{
			local pdelta .
		}
	
		** Null interval is a point (overlap=zero) but is in interval estimate
		if `est_len'>0 & `null_len'==0 & `est_lo'<=`null_lo' & `est_hi'>=`null_hi' {
			local pdelta 0.5
		}
		
		** Return Missing value for nonsense intervals
		if `est_lo'>`est_hi' | `null_lo'>`null_hi'{
			local pdelta .
			if "`warnings'"!="nowarnings" disp "The `i'th interval limits are likely reversed."
		}
				
		** Calculate delta-gap
		if "`deltagap'"==""{
			local gap = max(`est_lo', `null_lo') - min(`null_hi', `est_hi')
			local delta = `null_len'/2
			* Report unscaled delta-gap if null has infinite length, use reldif to check if the distance between infinite and null_len is sufficiently small
			if reldif(`null_len',c(maxdouble)) <1e-5{
				local delta 1
			}
			
			if `pdelta'==0 {
				local dg = `gap'/`delta' 
			}
			else{
				local dg .
			}
		}
		*Write results
		mat `results'[`i',1] = `pdelta'
		if "`deltagap'"=="" mat `results'[`i',2] = `dg'
		
	}
		*Process nodelta-option
	if "`deltagap'"=="nodeltagap"{
	mat `results'=`results'[1...,1]
	} 
	if "`show'"!="noshow" matlist `results' ,names(columns) title(Second Generation P-Values)
	return matrix results = `results'
}	 
end 


*Additional commands ----------------------------------------------------------------------------------------------
program define convertMatrix, rclass
*Convert a matrix into a local macro
args matname macroname 
*Check if matrix contains only one row or one column
	if rowsof(`matname')>1 & colsof(`matname')>1{
		disp as error "Matrix `matname' can be only either a column or a row vector."
		exit 198
		}

	if rowsof(`matname')>1{
		local cnt = rowsof(`matname')
		local colvec 1
	} 

	if colsof(`matname')>1{
		local cnt = colsof(`matname')
		local rowvec 1
	} 

	forvalues i=1/`cnt'{
			if "`colvec'"=="1"{
				local matmacro `matmacro' `=el(`matname',`i',1)'
			}
			else if "`rowvec'"=="1"{
				local matmacro `matmacro' `=el(`matname',1,`i')'
			}
	}	

return local `macroname' `matmacro'
end

*Convert a local macro into a matrix -> needed if I want to remove the approach based on macros to calculate sgpvs and only use Mata-approach
*Not used yet 
program define convertMacro, sclass
	args macroname matname
	tempname macromat
	forvalues i=1/`=wordcount("`macroname'")' {
	mat `macromat' = (nullmat(`macromat') \ `=`=word("`macroname'",`i')'')
	}
	
	sreturn matrix `matname' = `macromat'
end

*Check if the input is valid -> Missing value or number
program define isValid 
args valid optname
if `valid'!=.{
	if real("`=`valid''")==.{
		disp as error "`valid' in option {cmd:`optname'}  is not a number nor . (missing value) nor empty."
		exit 198
		}
}		
end

*Check if input is infinite
program define isInfinite, sclass
args infinite
	if `infinite'==.{
		sreturn local infinite = c(maxdouble) // Using c(maxdouble) as a proxy for a true infinite value
	}
	else{
		sreturn local infinite = `infinite'
	}
end

*Simulate the behaviour of the R-function with the same name 
program define stop
 args text 
 disp as error `"`text'"'
 exit 198
end

*Use new variables to store and calculate the SGPVs -> Do I properly deal with one-sided intervals yet?
program define sgpv_var, rclass
 syntax ,esthi(name) estlo(name) nullhi(string) nulllo(string) [replace nodeltagap infcorrection(real 1e-5)]
 
 * Allow handling of matrices
 * Not sure yet how to remove the new variables at the end of the program, since using tempvar does not work well in connection with svmat.
 * For now will not document this creation of new variables in the help file until a sufficient solution has been found.
 capture confirm matrix `esthi' `estlo'
 if !_rc{
	capture confirm variable `esthi' `estlo'
	if !_rc{
		disp as error "Variables with the same name as the matrices `esthi', `estlo' in options esthi and estlo already exist."
		disp as error "Please rename either the variables or matrices, so that there is no naming conflict."
		exit 198
	}
	local matrixfound 1
	svmat `esthi', name(`esthi')
	rename `esthi'1 `esthi'
	svmat `estlo', name(`estlo')
	rename `estlo'1 `estlo'
 } 
 

 if "`replace'"=="replace"{
	capture drop pdelta
	capture drop dg
 }
 else{
	capture confirm variable pdelta dg
		if !_rc{
			disp as error "Specifiy the replace option: variables pdelta and dg already exist."
			exit 198
		}
 }
 
 tempvar estlen nulllen overlap bottom null_hi null_lo gap delta gapmax gapmin overlapmin overlapmax
 local type : type `esthi' // Assuming that both input variables have the same type -> will add an additional check if this assumption turns out to be wrong in too many cases and if the results change to a degree that it matters.
 quietly{ //Set variable type to the same precision as input variables -> cannot deal yet with missing values correctly 
		gen `type'	`null_hi' 	 = real("`nullhi'")
		gen `type' `null_lo' 	 = real("`nulllo'")			
 		gen `type' `estlen' 	 = `esthi' - `estlo'
		gen `type' `nulllen' 	 = `null_hi' -`null_lo'
		egen `type' `overlapmin' = rowmin(`esthi' `null_hi') 
		egen `type' `overlapmax' = rowmax(`estlo' `null_lo')
		gen `type' `overlap' 	 = `overlapmin' - `overlapmax'
		replace `overlap' 		 = max(`overlap',0)
		gen `type' `bottom' 	 =	min(2*`nulllen',`estlen')
		gen `type' pdelta		 = `overlap'/`bottom'
		
		replace pdelta = 0 if `overlap'==0 
		replace pdelta = `infcorrection' if `overlap'!=0 & `overlap'!=. & `bottom'==.	
		replace pdelta = 1 if (`estlen'==0 & `nulllen'>=0 & `estlo'>=`null_lo' & `esthi'<=`null_hi')
		replace pdelta = 1 if `overlap'==. & `bottom'==. & (`estlo'>=`null_lo' | `esthi'<=`null_hi')
		replace pdelta = 1-`infcorrection' if `overlap'==. & `bottom'==. &  (`estlo'<`null_lo' | `esthi'>`null_hi')
		replace pdelta = 0.5 if `estlo'==. & `esthi'==.
		replace pdelta = . if `null_lo'==. & `null_hi'==.
		replace pdelta = 0.5 if (`estlen'>0 & `nulllen'==0 & `estlo'<=`null_lo' & `esthi'>=`null_hi')
		replace pdelta =. if (`estlo'>`esthi' | `null_lo'>`null_hi')
		/* Calculate delta-gap*/
		egen `type'	`gapmax' = rowmax(`estlo' `null_lo') 
		egen `type' `gapmin' = rowmin(`null_hi' `esthi')
		gen `type'	`gap' = `gapmax' - `gapmin'
		 
		 if "`deltagap'"==""{
			 gen `type' `delta' = `nulllen'/2
			 replace `delta' = 1 if `nulllen'==0
			 gen `type' dg = .
			 replace dg = `gap'/`delta'  if (pdelta==0 & pdelta!=.)
			 label variable dg "Delta-Gap"
		}
		*Label variables
		label variable pdelta "SGPV"
		
		}
	if "`matrixfound'"=="1"{	// Remove variables created when using matrices as input
	capture drop `esthi'
	capture drop `estlo'
	}
 exit 
end

***Mata function(s)
mata:
version 12.0
//void function sgpv(string varlist, string scalar sgpvmat, real scalar nulllo, real scalar nullhi, real scalar infcorrection ,| string scalar nodeltagap){ 
 void function sgpv(string list, string scalar sgpvmat, real scalar nulllo, real scalar nullhi, string scalar type, real scalar infcorrection ,| string scalar nodeltagap){ 
/*How to handle multiple null-hypotheses*/
/*Allow only one null interval for now*/
/*Calculate the SGPVs and Delta-Gaps if the desired matrix size is too large for Stata
Might have to change the way missing values are handled -> For now they are treated as meaning infinite.
*/
V = tokens(list)
if (type=="matrix"){
	Data=(st_matrix(V[1]), st_matrix(V[2]))
}
if (type=="variable"){
	V = st_varindex(V)
    Data = J(1,1,0)
    st_view(Data,.,V)
}

	Sgpv = J(rows(Data),2,.)	
	null_len = nullhi - nulllo
	n = rows(Data) 
	for(i=1; i<=n;i++) {
		 est_lo = Data[i,1]
		 est_hi = Data[i,2]
		 est_len = est_hi - est_lo
		 overlap = min_s(est_hi,nullhi) - max_s(est_lo,nulllo) 
		 overlap = max_s(overlap,0) 
		 bottom = min_s(2*null_len, est_len) 
		 pdelta = overlap/bottom
		 if (overlap==0) {
			pdelta = 0
			}
			
		if (overlap!=0 & overlap!=. & bottom==.){
		 pdelta = infcorrection
		}	
		
		if (est_len==0 & null_len>=0 & est_lo>=nulllo & est_hi<=nullhi) {
			 pdelta = 1
		}
		if (overlap==. & bottom==. & (est_lo>=nulllo | est_hi<=nullhi)){
			 pdelta = 1
		 }
		 if (overlap==. & bottom==. &  (est_lo<nulllo | est_hi>nullhi)){
			 pdelta = 1-infcorrection
		 }
		if (est_lo==. & est_hi==.){
			 pdelta = 0.5
			}

		if (nulllo==. & nullhi==.){
			 pdelta = .
		}
		 
		if (est_len>0 & null_len==0 & est_lo<=nulllo & est_hi>=nullhi) {
			 pdelta = 0.5
		}
		if (est_lo>est_hi | nulllo>nullhi){
			pdelta = .			
		}
		/*Delta-Gap*/
		if (nodeltagap==""){
			 gap = max_s(est_lo,nulllo) - min_s(nullhi,est_hi) 
			 delta = null_len/2
			if (null_len ==0){
				 delta = 1
			}
			if (pdelta==0 & pdelta!=.){
				 dg = gap/delta 
			}
			else{
				 dg = .
			}
			Sgpv[i,2] = dg
		}		
		 Sgpv[i,1] = pdelta		 
	}	
	st_matrix(sgpvmat,Sgpv)
}

//Two convenience functions to make reading the code easier: Function names chosen to not conflict with internal Mata functions
real scalar function min_s(real scalar x, real scalar y){
	return(x > y ? y :x)
}

real scalar function max_s(real scalar x, real scalar y){
	return(x > y ? x : y)
}
end