Conversion of Roman numerals to decimal numbers
fromroman romanvar [if exp] [in range] , generate(numvar) [re(regex)]
Description
fromroman creates a numeric variable numvar from a string variable romanvar following these rules:
1. Any spaces are ignored.
2. Lower case letters are treated as if upper case.
3. Numerals must match the Stata regular expression "^M*(CM|DCCC|DCC|DC|D|CD|CCC|CC|C)?(XC|LXXX|LXX|LX|L|XL|XXX|XX|X)?(IX > |VIII|VII|VI|V|IV|III|II|I)?$". Note that this forbids e.g. CCCC, XXXX or IIII. But see documentation of the re() option below.
4. Single occurrences of CM, CD, XC, XL, IX, IV are treated as 900, 400, 90, 40, 9, 4 respectively.
5. M, D, C, L, X, V, I are treated as 1000, 500, 100, 50, 10, 5, 1 respectively as many times as they occur.
6. The results of 4 and 5 are added.
7. Input of any other expression or characters is trapped as an error and results in missing. Examples would be minus signs and decimal points.
Note that there is no explicit upper limit for the integer values created. In practice, the limit is implied by the limits on string variables, so that using these rules any number greater than 244000 (and some numbers less than that) could not be stored as Roman numerals in a Stata string variable. (The smallest problematic number is 233888, which would convert to a Roman numeral consisting of 233 Ms followed by DCCCLXXXVIII, i.e. a numeral 245 characters long.) See help on data types.
Options
generate() specifies the name of the new numeric variable to be created and is not optional.
re() specifies a regular expression other than the default for checking input.
Examples
. fromroman roman, gen(numeq)
Acknowledgments
Peter A. [Tony] Lachenbruch suggested this problem on Statalist. Sergiy Radyakin's comments on that list provoked more error checking.
Author
Nicholas J. Cox, Durham University, U.K. n.j.cox@durham.ac.uk
Also see
Online: help for toroman