Many statisticians move between SAS and R, and they expect the same numbers from both. For unbalanced or complex designs, however, popular R functions often produce sums of squares, standard errors, or least squares means that differ from those of SAS PROC GLM. The differences are not necessarily errors - they come from different conventions (types of sums of squares, coding of singular designs, denominators, quantile definitions, and so on).
sasLM implements the conventions of SAS so that the
results match SAS PROC GLM, REG, ANOVA, TTEST, FREQ (2x2 tables), and
UNIVARIATE. The package is written in base R with only
mvtnorm as an additional dependency, and its results have
been validated against SAS outputs and the textbooks listed in
?sasLM.
| SAS | sasLM |
|---|---|
| PROC GLM | GLM, aov1, aov2,
aov3, EMS |
| PROC GLM SOLUTION | GLM(..., BETA=TRUE), REG |
| PROC GLM LSMEANS | LSM, GLM(..., EMEAN=TRUE) |
| PROC GLM LSMEANS / PDIFF | PDIFF, Diffogram |
| PROC GLM ESTIMATE/CONTRAST | est, ESTM, CIest,
CONTR |
| PROC GLM RANDOM / TEST | RanTest, EMS, T3test |
| PROC GLM SLICE | SLICE |
| PROC REG | REG, lr, lr0,
regD, Coll |
| PROC ANOVA | aov1 |
| PROC TTEST | TTEST, tmtest, mtest,
vtest |
| PROC UNIVARIATE | UNIV and the descriptive functions |
| PROC FREQ (2x2, stratified) | RD, RR, OR,
RDmn, RRmn, ORmn,
ORcmh |
The SAS code
PROC GLM DATA=np;
CLASS block N P K;
MODEL yield = block N*P*K / SOLUTION;
LSMEANS N*P / CL;
RUN;
corresponds to:
$ANOVA
Response : yield
Df Sum Sq Mean Sq F value Pr(>F)
MODEL 11 691.08 62.825 4.0688 0.01156 *
RESIDUALS 12 185.29 15.441
CORRECTED TOTAL 23 876.37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$Fitness
Root MSE yield Mean Coef Var R-square Adj R-sq
3.929447 54.875 7.160724 0.7885736 0.5947661
$`Type I`
Df Sum Sq Mean Sq F value Pr(>F)
block 5 343.29 68.659 4.4467 0.015939 *
N 1 189.28 189.282 12.2587 0.004372 **
P 1 8.40 8.402 0.5441 0.474904
N:P 1 21.28 21.282 1.3783 0.263165
K 1 95.20 95.202 6.1657 0.028795 *
N:K 1 33.14 33.135 2.1460 0.168648
P:K 1 0.48 0.482 0.0312 0.862752
N:P:K 0
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$`Type II`
Df Sum Sq Mean Sq F value Pr(>F)
block 4 306.293 76.573 4.9592 0.013587 *
N 1 189.282 189.282 12.2587 0.004372 **
P 1 8.402 8.402 0.5441 0.474904
N:P 1 21.282 21.282 1.3783 0.263165
K 1 95.202 95.202 6.1657 0.028795 *
N:K 1 33.135 33.135 2.1460 0.168648
P:K 1 0.482 0.482 0.0312 0.862752
N:P:K 0
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$`Type III`
CAUTION: Singularity Exists !
Df Sum Sq Mean Sq F value Pr(>F)
block 4 306.293 76.573 4.9592 0.013587 *
N 1 189.282 189.282 12.2587 0.004372 **
P 1 8.402 8.402 0.5441 0.474904
N:P 1 21.282 21.282 1.3783 0.263165
K 1 95.202 95.202 6.1657 0.028795 *
N:K 1 33.135 33.135 2.1460 0.168648
P:K 1 0.482 0.482 0.0312 0.862752
N:P:K 0
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
GLM returns the overall ANOVA, fit statistics, and the
Type I, II, and III tables in one call. Note the agreement with SAS for
this unbalanced design - this is the core purpose of the package.
Coefficients as with the SOLUTION option:
Estimate Estimable Std. Error Df t value Pr(>|t|)
(Intercept) 54.600 0 2.7785 12 19.6506 1.713e-10 ***
block1 -2.325 0 2.7785 12 -0.8368 0.41907
block2 1.100 0 2.7785 12 0.3959 0.69913
block3 4.425 0 2.7785 12 1.5926 0.13724
block4 -6.225 0 2.7785 12 -2.2404 0.04477 *
block5 -5.825 0 2.7785 12 -2.0964 0.05791 .
block6 0.000 0 0.0000 12
N0 -1.383 0 2.7785 12 -0.4979 0.62758
N1 0.000 0 0.0000 12
P0 2.783 0 2.7785 12 1.0017 0.33625
P1 0.000 0 0.0000 12
N0:P0 -3.767 0 3.2084 12 -1.1740 0.26317
N0:P1 0.000 0 0.0000 12
N1:P0 0.000 0 0.0000 12
N1:P1 0.000 0 0.0000 12
K0 6.050 0 2.7785 12 2.1774 0.05013 .
K1 0.000 0 0.0000 12
N0:K0 -4.700 0 3.2084 12 -1.4649 0.16865
N0:K1 0.000 0 0.0000 12
N1:K0 0.000 0 0.0000 12
N1:K1 0.000 0 0.0000 12
P0:K0 0.567 0 3.2084 12 0.1766 0.86275
P0:K1 0.000 0 0.0000 12
P1:K0 0.000 0 0.0000 12
P1:K1 0.000 0 0.0000 12
N0:P0:K0 0.000 0 0.0000 12
N0:P0:K1 0.000 0 0.0000 12
N0:P1:K0 0.000 0 0.0000 12
N0:P1:K1 0.000 0 0.0000 12
N1:P0:K0 0.000 0 0.0000 12
N1:P0:K1 0.000 0 0.0000 12
N1:P1:K0 0.000 0 0.0000 12
N1:P1:K1 0.000 0 0.0000 12
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Least squares means with confidence limits:
Group LSmean LowerCL UpperCL SE Df
sunflower A 328.9167 297.2942 360.5392 15.83391 65
casein A 323.5833 291.9608 355.2058 15.83391 65
meatmeal B 276.9091 243.8805 309.9377 16.53798 65
soybean BC 246.4286 217.1518 275.7053 14.65936 65
linseed C 218.7500 187.1275 250.3725 15.83391 65
horsebean D 160.2000 125.5593 194.8407 17.34518 65
Pairwise differences as with LSMEANS / PDIFF, including the Tukey adjustment:
Estimate Lower CL Upper CL Std. Error t value Df Pr(>|t|)
casein - horsebean 163.383 94.420 232.347 23.485 6.9568 65 3.070e-08 ***
casein - linseed 104.833 39.079 170.587 22.392 4.6816 65 0.0002100 ***
casein - meatmeal 46.674 -20.558 113.906 22.896 2.0386 65 0.3324584
casein - soybean 77.155 13.792 140.517 21.578 3.5756 65 0.0083653 **
casein - sunflower -5.333 -71.087 60.421 22.392 -0.2382 65 0.9998902
horsebean - linseed -58.550 -127.514 10.414 23.485 -2.4930 65 0.1413329
horsebean - meatmeal -116.709 -187.083 -46.335 23.966 -4.8698 65 0.0001062 ***
horsebean - soybean -86.229 -152.915 -19.542 22.710 -3.7969 65 0.0042167 **
horsebean - sunflower -168.717 -237.680 -99.753 23.485 -7.1839 65 1.220e-08 ***
linseed - meatmeal -58.159 -125.391 9.073 22.896 -2.5402 65 0.1276965
linseed - soybean -27.679 -91.041 35.684 21.578 -1.2827 65 0.7932853
linseed - sunflower -110.167 -175.921 -44.413 22.392 -4.9198 65 8.843e-05 ***
meatmeal - soybean 30.481 -34.414 95.375 22.100 1.3792 65 0.7391356
meatmeal - sunflower -52.008 -119.240 15.224 22.896 -2.2715 65 0.2206962
soybean - sunflower -82.488 -145.850 -19.126 21.578 -3.8228 65 0.0038845 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Dunnett test uses the first level as the control when
ref is not given, as SAS does:
Estimate Lower CL Upper CL Std. Error t value Df Pr(>|t|)
horsebean - casein -163.383 -223.959 -102.808 23.485 -6.9568 65 4.559e-09 ***
linseed - casein -104.833 -162.590 -47.077 22.392 -4.6816 65 6.376e-05 ***
meatmeal - casein -46.674 -105.728 12.380 22.896 -2.0386 65 0.16711
soybean - casein -77.155 -132.810 -21.500 21.578 -3.5756 65 0.00304 **
sunflower - casein 5.333 -52.423 63.090 22.392 0.2382 65 0.99945
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
EMS produces the expected mean squares table, and
RanTest performs the tests with a random factor as the
RANDOM / TEST statement of SAS PROC GLM:
block N P K N:P N:K P:K N:P:K
block 4 0 0 0 0 0 0 0
N 0 12 0 0 6 6 0 3
P 0 0 12 0 6 0 6 3
K 0 0 0 12 0 6 6 3
N:P 0 0 0 0 6 0 0 3
N:K 0 0 0 0 0 6 0 3
P:K 0 0 0 0 0 0 6 3
N:P:K 0 0 0 0 0 0 0 0
$ANOVA
Df Sum Sq Mean Sq F value Pr(>F)
block 4 306.293 76.573 4.9592 0.013587 *
N 1 189.282 189.282 12.2587 0.004372 **
P 1 8.402 8.402 0.5441 0.474904
K 1 95.202 95.202 6.1657 0.028795 *
N:P 1 21.282 21.282 1.3783 0.263165
N:K 1 33.135 33.135 2.1460 0.168648
P:K 1 0.482 0.482 0.0312 0.862752
N:P:K 0
Error 12 185.287 15.441
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$EMS
block N P K N:P N:K P:K N:P:K eps
block 4 1
N Q Q Q Q 1
P Q Q Q Q 1
K Q Q Q Q 1
N:P Q Q 1
N:K Q Q 1
P:K Q Q 1
N:P:K 1
attr(,"Caution")
[1] "Test assumption: Q's in the corresponding EMS row are 0."
$ANOVA
Response : mpg
Df Sum Sq Mean Sq F value Pr(>F)
MODEL 2 931.00 465.50 69.211 9.109e-12 ***
RESIDUALS 29 195.05 6.73
CORRECTED TOTAL 31 1126.05
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$Fitness
Root MSE mpg Mean Coef Var R-square Adj R-sq PRESS R2pred
2.593412 20.09062 12.90857 0.8267855 0.8148396 246.5063 0.7810871
$Coefficients
Estimate Std. Error Df Lower CL Upper CL t value Pr(>|t|)
(Intercept) 37.227 1.59879 29 33.957 40.497 23.2847 < 2.2e-16 ***
wt -3.878 0.63273 29 -5.172 -2.584 -6.1287 1.12e-06 ***
hp -0.032 0.00903 29 -0.050 -0.013 -3.5187 0.001451 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Heteroscedasticity-consistent standard errors (HC0, HC3) and White’s test:
$ANOVA
Response : mpg
Df Sum Sq Mean Sq F value Pr(>F)
MODEL 1 847.73 847.73 91.375 1.294e-10 ***
RESIDUALS 30 278.32 9.28
CORRECTED TOTAL 31 1126.05
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$Fitness
Root MSE mpg Mean Coef Var R-square Adj R-sq PRESS R2pred
3.045882 20.09062 15.16071 0.7528328 0.7445939 328.0228 0.7086954
$Coefficients
Estimate Std. Error Df Lower CL Upper CL t value Pr(>|t|)
(Intercept) 37.285 1.8776 30 33.450 41.120 19.858 < 2.2e-16 ***
wt -5.344 0.5591 30 -6.486 -4.203 -9.559 1.294e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$HC0
Estimate Std. Error Df Lower CL Upper CL t value Pr(>|t|)
(Intercept) 37.285 2.1253 30 32.945 41.626 17.5436 < 2.2e-16 ***
wt -5.344 0.6337 30 -6.639 -4.050 -8.4337 2.062e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$HC3
Estimate Std. Error Df Lower CL Upper CL t value Pr(>|t|)
(Intercept) 37.285 2.42678 30 32.329 42.241 15.3640 8.882e-16 ***
wt -5.344 0.73811 30 -6.852 -3.837 -7.2408 4.636e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$`White Test`
Chisq Df p
1.0546002 2.0000000 0.5901963
Weighted least squares follows the SAS WEIGHT statement: observations
with nonpositive weights are excluded, and Resid=TRUE
returns fitted values and residuals in the original scale as OUTPUT P=
R= does.
[1] 23.16925 22.25023 24.76409 21.02487 18.25006 20.29296
$`Statistics by group`
Mean SD N SEM LCL UCL
Test 17.14737 3.833966 19 0.8795722 15.29946 18.99528
Control 24.39231 6.166504 13 1.7102804 20.66593 28.11869
$`Two groups t-test for the difference of means`
Choose one row according to the variance equality assumption.
PE SE Df LCL UCL t value Pr(>|t|)
If Equal Var. -7.2449 1.7644 30.000 -10.848 -3.6415 -4.1061 0.000285 ***
If Unequal Var. -7.2449 1.9232 18.332 -11.280 -3.2097 -3.7671 0.001374 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
$`Variance confidence intervals by group`
Variance VarLCL VarUCL
Test 14.69930 8.392571 32.14622
Control 38.02577 19.553319 103.61743
$`F-test for the ratio of variances`
PE LCL UCL Num Df Denom Df F value Pr(>F)
Var. Ratio 0.38656 0.12437 1.0703 18 12 0.3866 0.06691 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With summarized input (mean, SD, n) only:
Welch Two Sample t-test
data: mean of the test (m1) and mean of the control (m0)
t = 1.0879, df = 55.965, p-value = 0.2813
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.504872 1.704872
sample estimates:
Test Control
5.4 4.8
nAll nNA nFinite Mean Variance SD CV
32.00000000 0.00000000 32.00000000 20.09062500 36.32410282 6.02694805 29.99880816
SEM LowerCL UpperCL TrimmedMean Min Q1 Median
1.06542396 17.91767851 22.26357149 19.95333333 10.40000000 15.35000000 19.20000000
Q3 Max Range IQR MAD VarLL VarUL
22.80000000 33.90000000 23.50000000 7.45000000 5.41149000 23.34652855 64.20343072
Skewness SkewnessSE Kurtosis KurtosisSE GeometricMean GeometricCV
0.67237714 0.41445735 -0.02200629 0.80937129 19.25006404 30.44976349
The quartiles and the interquartile range use quantile type 2, the SAS default definition.
Risk difference, relative risk, and odds ratio with score confidence intervals:
p1 p2 RD SE lower upper
1 0.7 0.3 0.4 0.204939 -0.001673089 0.8016731
p1 p2 RR SElog lower upper
1 0.7 0.3 2.333333 0.5255383 0.8329862 6.536056
odd1 odd2 OR SElog lower upper
1 2.333333 0.4285714 5.444444 0.9759001 0.8040183 36.86729
See the vignette Stratified 2x2 Tables for the stratified Miettinen-Nurminen methods and meta-analysis.
data.frame; factors
follow the order of levels, which corresponds to the sorted CLASS levels
of SAS.GLM is fast even for large data sets, since version
1.0.0.?sasLM.