Stata vs SPSS

The following comments on Stata vs SPSS were made on Statalist, digest issues of 2-4 November 2000.

After studying Stata for about half a year my department asked me to tell them some more about STATA. One of the things my colleages are interested in is what they can do with STATA that they can't do with SPSS. Since I am not very familiar with SPSS I hope to find an answere on the list. Of course I know allready about the great possibilities of programming but I hope to find some answers about not to exotic statistical methods.

Marion de Leeuw
Dept. of Methodology and Statistics
Maastricht University

I have both Stata and SPSS on my computer. In my opinion, SPSS has only two slight advantages and many, many disadvantages. The two advantages are that it is slightly more user friendly in making complex tables and graphs. But thanks to people like Nick Cox, that difference is decreasing daily. Second, SPSS has a nice routine in their logistic regression model for testing interactions. That is a trivial advantage, however. I have heard that the ANOVA commands in SPSS is also user friendly. I don't use them, however.

The only reason that I keep SPSS on my machine is that I am not pressed for disk space. I rarely use it, whereas I use Stata almost every day. Ever try to run a probit in SPSS? Nearly impossible and the documentation stinks. On the other hand, it is a breeze in Stata.

Todd Wagner
Stanford University

I don't know if it is a big difference or not, since I don't use SPSS all that much, but Stata has the best support system I have ever seen in any software product. Not only the Stata Staff, but many Stata users respond to the most basic, and complex, questions presented. This is a fantastic advantage to anyone who uses the product.

Donald Spady
Department of Pediatrics
University of Alberta

The bottom line is that SPSS doesn't do much, although it is (perhaps too) easy to use. For example, it's useful multivariate analysis procedures are pretty much limited to OLS, probit, and logit, with a few less useful additional procedures avialable. SPSS does not have the multiple pooled cross sectional time series routines that Stata has. There are no count procedures (Poisson, negative binomial and the zero routines), and other maximum likelihood estimators such as Tobit, multinomial logit, ordinal logit or probit, and complementary log-log models are not readily avialable.

Additional problems with SPSS include no Huber-White correction for heteroskedascity, and none of Stata's extensive tests that are available after estimation. The anova routines in SPSS are not nearly as comprehensive as those in Stata. The last time I looked at SPSS there weren't any provisions for Cox regression and the other extensive duration analysis procedures that Stata offers. In short, anyone who limits themselves to SPSS would be quite handicaped.

Dave Jacobs
Ohio State University

One of the things you can do with Stata that you can't do with SPSS is estimate models for complex surveys. Most SPSS procedures will allow weights, but although these will produce correct estimates, the standard errors will be too small (aweights or iweights versus pweights). SPSS cannot take clustering into account at all. This is an important issue, most surveys use a weight variable to take stratification and/or sampling bias (random or due to non-response) into account, but standard programs can lead to incorrect inferences on statistical significance.

There are a lot of user-written programs out there and -webseek- makes it much easier to find solutions to non-standard problems. These problems need not be exotic, one problem that fired up a lot of discussion among a group of us was the comparison of coefficients of nested logistic models. With a downloadable ado file, standardized coefficients and marginal effects can be calculated easily. The only way to do that in SPSS is with a macro that estimates a logistic model using matrix facilities (if you happen to have such a macro, it wouldn't be easy to write one). Alternative fit measures like BIC, AIC, pseudo R^2 measures can be easily added to Stata, in SPSS you'd have to write a visual basic script (assuming that would work).

Stata also has excellent programs for event history analysis or panel data analysis, but perhaps these are "exotic" methods according to you or your colleagues. Well, SPSS is good enough for most purposes, most of the time. What annoys me about SPSS is that it's pace of development is so slow. Only a handful of statistical procedures have been added in the last five years: GLM, NOMREG, PLUM, one or two others. Just glance through a few STBs for comparison. SPSS has concentrated on graphical output since 1995, to the annoyance of many users. Their implementation is an interface nightmare, you have to navigate two scrollbars just to view your *text* output! To hide elements of their pivot tables you choose "hide" from a right-click menu, except in some cases where you choose "ungroup". Add the bugs in the last release and the expensive price/lease, and you've got plenty of arguments in favour of Stata.

John Hendrickx
University of Nijmegen

SPSS has been around for a very long time; it started off on mainframes, made it to DOS, OS/2 and finally to Windows. Because of its mainframe origins, SPSS started life as a 'data filter'. The data records were processed through a procedure or set of procedures and the results generated in an output stream. In this way, the data was read from disk file for each set of procedures carried out, but not retained in memory. The result was that very large quantities of data could be handled, on computers with limited memory. With RAM costing about $1 per megabyte, this method only serves to slow SPSS down.

Many statistical procedures can be thought of as filters, although this does not apply to techniques such as cluster analysis. The modern PC has developed a moderately complex memory model, with disk caching playing a major role, and in a number of areas SPSS has moved beyond the 'data filter', but much of its operation is still conditioned by this way of working. The interface is very interactive in a computing sense, with mouse, menus, dialog boxes and a help system, but in a statistical sense its operations are generally rather less interactive.

SPSS has its roots in the social sciences and the analysis of questionnaires and surveys is where many of its core strengths lie. Unfortunately, nowadays, one can detect a distinctly Market Research edge to SPSS, but it has also been strengthening its offering in the medical area with the addition of a fairly sophisticated set of survival analysis procedures and the incorporation of routines for exact testing (of course, you have to pay extra for that). SPSS Inc. has frequently incorporated modules developed by other companies or groups into the package and exact testing is a case in point.

SPSS 10 features a new Data Editor which allows you to organize, view, and edit data more efficiently. It now provides the ability for one to enter data and value labels directly onto the grid rather than having to use nested dialogs. Multiple variables can be defined simultaneously.

If what you want to do is available from a menu (or via the command syntax) then you can do it. On the other hand, the macro language only really allows automation of repetitive tasks and allows little scope for the addition of new features (unlike the much superior package, Stata).

For myself, I tend to use a combination of Stata ( for statistics and modeling and SigmaPlot for graphics. I seem to remember a promise some years ago now that Stata was going to come out with a more modern graph editor (hint, hint). Maybe I just missed the announcement. Nevertheless, as has been said, Nick Cox has been extending Stata's graphics capability for years, so who am I to complain. I guess I'm just an aesthete and prefer the beautiful anti-aliased graphics of SigmaPlot to the edgy, utilitarian graphics in Stata. Vive la difference?

The list of procedures that Stata can perform is absolutely stunning. Not only that, but it is fully extensible with its own matrix language, accurate, very well-documented, compact (two 3.5" floppy disks!), and faster than any other stats package on the market.

Your question about which is better is easy: Stata is a far more powerful stats package than SPSS. Lets face it, SPSS is a bloated, over-priced, over-gimmicked, pig of a program. But, as Nick Cox hinted, a comparison of what they can each do would be an enormous task ranging into the thousands of procedures. I would guess (and its just a guess) that there is probably an order of magnitude of difference between the greater number of procedures that Stata can do compared to the number that SPSS can do. It sounds like your department (statistics and methodology) is comprised of power users. Given this, and assuming that they WANT a change, Stata is the way to go. BTW, it seems that some of the larger universities are adopting Stata. The Harvard and UCLA Schools of Public Health are cases in point.

Disclaimer: having said all of this, two of the best statisticians I know still use SPSS version 6.0 in batch mode and they do not appear to face limitations. I guess it all depends on what package you get used to and, thus, how you conceptualize data.

Lee Sieswerda
University of Alberta

SPSS cannot handle longitudinal panel data. There is supposedly an SPSS macro to do GEE but I've not been able to locate it. Use Stata or SAS.

Peter Lewycky

I'm pretty sure there are no dedicated panel routines in SPSS, yet the Stata inventory is extensive.

Dave Jacobs
Ohio State University

Following a sub-thread of the Stata vs SPSS comparison relating to SPSS' facility for analysing panel data, I have an old reference to a GEE macro for SPSS. I have no personal experience with using it and there may well be other resources.

Stoolmiller M and Duncan T. (1997) SPSS GEE macro. Eugene OR: Oregon Research Institute.

Presumably this would be available via an SPSS web site. The plucky and optimistic SPSS user would presumably drill down through a labyrinth of menus, find the -webseek- equivalent in SPSS, make the connection and be informed that a customer service representative would be in touch.... :-)

Dr Philip Ryan
University of Adelaide