When we are considering an incumbent candidate: In other words, the predicted value of votes for a non-incumbent is equal to the \(y\)-intercept \(\alpha\). Thus, when we are talking about a non-incumbent: It could either be a bug (wouldn't be the first for SPSS.) or something to do with the way the file has been set up.What does this mean? Recall that we coded rep_inc such that a non-incumbent is zero and an incumbent candidate is one. I'm not sure why SPSS isnt treating those rows as missing. You can confirm this by looking at the SPSS and R output - the degrees of freedom are different across the 2 programs, which then leads to a (slight) difference in results In other words, SPSS is using more data for the model than it should be using. The consequence of this is that SPSS isn't treating those rows as missing data when you run the analysis, but in R those values are correctly set as missing and omitted. I haven't checked, but I'm guessing a similar thing is happening to a number of rows for the polviews variable. For example, try changing educ for 837 to any other (non-missing) number, and you'll see that SPSS says there are 0 missing rows for educ, when in fact 1214 should still be missing (99) SPSS is not considering that 99 value for 1214 to be missing. The problem is in SPSS, when you actually tell it to count how many rows are missing, it says theres only 1 missing data row: FREQUENCIES VARIABLES=educ In R, you can confirm that those rows do infact contain missing values ( NA): > which(is.na(GSS2012$educ)) If you sort the SPSS file by the educ column, you can see there are 2 data rows with these missing values. In the SPSS file you can see that the values 97, 98, and 99 are defined as missing values: Lets just take the educ variable as an example. The issue is to do with missing values and how they're handled in the SPSS file. This is only a partial answer as I can see what the problem is, although I'm not sure what is causing it. But I'm in "an SPSS environment", and thus it would be good if I'd be able to get the same results for now :) (Again, R might have more accurate results I don't know. Not that I know the latter are necessarily correct, I'd just like to replicate the results.Īrticles like these are probably related: link1 link2 link3, but I haven't been able to use the information therein to replicate the SPSS data. Nothing has so far given me the same values as SPSS. GSS2012$educ, na.action = "na.exclude", singular.ok = F) Lm(formula = GSS2012$tolerance ~ GSS2012$age + GSS2012$polviews + cor(d, e, use = "")įull, minimal working example for lm: > library(haven) I've tried different use="stuff" for cor didn't make difference. d = GSS2012$toleranceįull, minimal working examples to get the results above, are found below. The dataset is GSS2012.zip in this zip-file. However, we're currently doing correlations ( Pearson's Rho), and fitting linear models, and I'm consistently getting different results between R and SPSS. I've been trying to learn r at the same time, and so far I've consistently been getting the same results, for calculations with both tools, As expected. Currently attending an introductory course, which uses spss.