Course Description:
Introduction (through both lecture and supervised work, integrated in a practicum format) to elementary use and overview of SAS Version 8.2 for Windows, including data file organization, data management, data import and export (from/to other formats and operating systems), and basic analysis. Use of SAS on other platforms supported by ITC (Mac, Unix) will be addressed but not explicitly instructed.
This document is the first part of the Introduction to SAS workshop; the second part is also available online.
Prerequisites
Familiarity with DOS (file paths and directory structures) and Microsoft Windows (booting, menus, mouse, scrolling, saving, etc.).
Table of Contents
Workshop Basics
Workshop Structure
The goals of this document (and workshop) are to provide a brief introduction to SAS, explain a few fundamental commands, and practice some of the many features of SAS for Windows.
In this first session, we will focus on basic data management procedures, including how to:
- open and manage a data file,
- define data (resolve missing values, and create variable labels and value labels),
- run basic descriptive statistics (frequencies and descriptives),
- transform data (recoding, and computing new variables), and
- perform basic analytical procedures (cross-tabulations and regression analysis).
In the next session, we will look at additional procedures you may need in order to be productive in a SAS environment, including how to:
- read other data formats (e.g. raw ASCII data, Excel files, SPSS files)
- save the current dataset as a SAS permanent dataset, and
- several advanced procedures including IF-THEN statements, merging, and macros.
Both sessions will concentrate on introductory manipulation of SAS for Windows; they will touch on interpretation of output and invocation of particular procedures. But they are not a complete beginner's guide. The SAS for Windows Tutorial is highly recommended for such issues, and covers topics beyond the scope of these documents.
License Warning
SAS for Windows is a product of SAS Institute Inc. Information Technology and Communication (ITC) has a site license that permits ITC to distribute SAS to faculty, staff, and students for use, in Charlottesville only. In addition, ITC agrees to provide user support: if you have a question regarding the SAS software, call the ITC Research Computing Support Center at 243-8800. Unauthorized copying and use of SAS software violates the copyright and site license and will result in ITC losing its site license. Please use SAS legally.
ITC provides access to a number of other general purpose statistical packages for faculty, staff, and graduate students. For a listing of available software and associated information, please see our Researchers website.
Program Overview
Variation: The use of SAS can vary from simple to complex, depending on the condition of the data and sophistication of the analysis. SAS programs may include a variety of activities: data input, data transformation, elementary or advanced statistical analysis, creation of data sets, or custom output.
Routine: Any data analysis (whether with SAS or another program) has five steps:
- prepare data,
- prepare commands,
- invoke commands,
- examine output, and
- save your work.
In practice, you will typically repeat several of these steps, particularly as you correct errors and re-invoke commands.
Statements: SAS programs consist of SAS statements, largely contained in DATA and PROC steps. DATA steps introduce and prepare data for use in SAS. PROC steps issue procedures, to actually perform analysis and generate desired output. Each PROC step calls a program to process (e.g. list, sort, compute, plot, and/or print) the information stored in a data set using keywords such as PROC FREQ for frequencies, PROC MEANS for descriptive statistics, and PROC REG for regression.
Order: A typical SAS run will encompass both DATA steps and PROC steps. You can use several DATA steps to create multiple data sets, and these in turn can be processed by several PROC steps. These steps may occur in any order, though a DATA step must precede a PROC step using that data.
Errors: By default, SAS does not write error messages and warnings to your terminal as it executes. When SAS completes execution it writes a summary of errors encountered to the LOG file . You should always begin the examination of the results of your SAS program by looking at the LOG file for ERRORS and NOTES.
Etc: SAS has many capabilities that this course does not have time to cover. More than just a statistical analysis tool, it is capable of complex report generation, database management, and graphics. For further details, see the SAS Language Usage and SAS Language Reference manuals (available at the Research Computing Support Center in 244 Wilson).
Syntax Conventions
Keywords begin most statements and are recognized as commands (e.g. DATA or PROC). They are reserved as commands and should not be part of file or variable names. (More on names later.)
Case sensitivity: SAS does not care if commands are typed in UPPERCASE or lowercase. (Unix users should note, however, that the RS/6000 AIX operating system is case-sensitive.)
Spaces: Commas, spaces, and the "equals" sign are NOT used interchangeably in SAS. When items are separated by spaces, the number of spaces is not important, since "extra" spaces are ignored.
Lines: must end with a semi-colon, although statements may be spread over several lines and can begin anywhere on a line. All statements must end by column 80, although you do not have to use all 80 columns. It is also possible (but not recommended for debugging purposes) to have multiple statements on a single line.
Errors: Common errors include omitting a semi-colon at the end of a statement, and omitting a period in a format modifier (which we'll discuss in the second session).
Obtaining the Files Used in this Tutorial
There are several files that have been created for use in this tutorial. The tutorial assumes that the files are saved on the hard drive of your PC in an area named C:\Temp (of course you may choose to save these files elsewhere). If you have problems downloading or using the files using Netscape, using Internet Explorer may resolve the problem.
To save the following files to the C:\Temp area of your hard drive - right click on the link corresponding to each file. Choose the Save As... option from the pull down menu. When the Save As... dialog box opens, use the navigation tools to designate the C:\Temp area as the save in area. Then hit the save button to save the file in the C:\Temp area.
Using the above method, save the following files in your C:\Temp directory.
- bank.dat - The ASCII file containing the raw bank data
- bankdata.sas7bdat- The SAS dataset (for Part 1)
- bankend1.sas7bdat- The SAS dataset (for Part 2)
- course0.sas - A file containing the SAS commands to read bank.dat
- course1.sas - A file containing the sas commands for part 1 of the tutorial
- course2.sas - A file containing the sas commands for part 2 of the tutorial
- bank.xls - an Excel file containing the bank data
- sasdata.dat - a raw data file
Getting Started
Starting SAS for Windows
Click on each of these five selections, in this order:
- the START button on the screen.
- the PROGRAMS listing.
- the STATISTICAL listing.
- the SAS folder listing.
- the SAS 8 listing (or listing for other version, as instructed).
SAS for Windows consists of five sub-windows. By default, the EXPLORER, ENHANCED EDITOR, and LOG windows are initialy open, while the OUTPUT and RESULTS window are hidden under these three.

You can position and resize these windows any way you wish. To bring a partially hidden window to the front, click once anywhere on it. To find a window that not showing at all, select the item on the SAS window bar. You can also right-click anywhere in the LOG or OUTPUT windows, choose VIEW from the pop-up context menu, and select any window from the list at the top.
The Enhanced Editor window provides a number of useful editing features, including color coding and syntax checking of SAS language.
The Results window helps you navigate and manage output from SAS programs that you submit. You can view, save, and print individual items of output. By default, the Results window is positioned behind the Explorer window and it is empty until you submit a SAS program that creates output. Then it moves to the front of your display.
In the Explorer window, you can view and manage your SAS files, and create shortcuts to non-SAS files. Use this window to create new libraries and SAS files, to open any SAS file, and to perform most file management tasks such as moving, copying, and deleting files.
To create a new library, sasclass, with directory C:\Temp, click the Explorer window, then select FILE, NEW in the pull-down menu. You will get to the New Library dialog box, type in "sasclass" next to Name, and "C:\Temp" next to Path, as shown here:

Now click OK, a new library called sasclass will show up in the Explorer window.
Note that SAS uses a single icon on the start bar, whereas SPSS, for example, has a separate icon for each syntax, output, and data window.
In this session, we will enter commands by typing directly into the ENHANCED EDITOR window and run commands directly from this window. (Alternately, you could create such a file in any text editor or word processor. Note, however, that you must save the data or command file as an ASCII text or DOS file, not in the format of the word processor you are using.) You may submit a single command, an entire command file, or merely part of the file; we will try each in this session.
First, let's look at some data.
Managing a Database
In many statistical programs, data is typically created and displayed in a conventional spreadsheet format (a grid of rows and columns, where each row is a case and each column is a variable). SAS does not typically display data in this format, but there is a VIEWTABLE option (effective with version 6.12) that will let you look at data in this way.
Opening a SAS data file using the VIEWTABLE option
Step 1: From the drop-down menus, choose TOOLS, TABLE EDITOR, as shown here:

Step 2: To view a data set, you will need to get to the Open File dialog box. Select FILE and OPEN as shown here:

Step 3:You will next need to select the location of the dataset which you would like to manage. Select the library where the file is. Next, select the bankdata SAS dataset. (Note that only SAS datasets are listed.) In the image above, the file bankdata.sd2 is listed as BANKDATA. Once you've navigated to the correct path and see the file you want listed, mouse click once on that filename to highlight it.

Step 4: Click OPEN, and the VIEWTABLE window should open as follows:

Step 5: There are two modes in SAS for looking at the table. The "browse" mode only allows you to look at the data. So that we can also manipulate the data set in VIEWTABLE, choose EDIT, Edit Mode, as shown here:

If you wanted or needed to, you could edit the actual data in this window -- changing a particular value or values, for instance. You can also investigate and manipulate labels given to particular variables.
Labeling a Variable
To associate a label with a variable (within VIEWTABLE) follow these steps:
Step 1: Click at the top of the third column to highlight that column's variable name.
Step 2: Double-click that variable name (bdate) to see the SAS variable dialog box.

Step 3: Type "Birth date of the respondent" in the white box marked "Label"
Step 4: Click on APPLY then CLOSE.
Then close VIEWTABLE with selecting FILE > CLOSE.
Using VIEWTABLE is fine to look at or edit data, but won't make the data file active (i.e. available for you to perform statistical analysis on it).
Making a Dataset Active
To create a SAS dataset, data may be included directly in the SAS command file (using the cards syntax) but is more often read into SAS from a separate data file. Frequently, this data will be in some non-SAS format, such as a columnar ASCII (text or DOS) file or an Excel file. In the second session, we will discuss how to create and import these other formats. We'll begin today with the bankdata dataset, which is already in SAS format.
Below we demonstrate the use of a series of SAS commands to operate on the bankdata data set.
Step 1: Clear the data from the ENHANCED EDITOR window by clicking in the ENHANCED EDITOR window to make it the active window (or by selecting it from the Windows menu), then select Edit > Clear All from the pop-up menu, as shown here:

Step 2: Type the following statements in the ENHANCED EDITOR window. (Remember that you are only setting up the commands here, which is not the same as invoking those commands.)
PROC CONTENTS DATA = sasclass.bankdata ;
PROC PRINT DATA = sasclass.bankdata ;
RUN ;
What these commands do:
1) The first line runs the procedure "contents" to view the contents of the
workdata SAS data set (which resides in the library area called sasclass).
3) The second line prints the data contained the the bankdata data set and
the last line submits the previous two lines to the SAS system for processing.
Saving Commands
You have just written your first SAS program! Now, save it with the following steps:

Step 1: Choose FILE, SAVE AS from the menu bar
Step 2: Click the down arrow, scroll up, and choose C:
Step 3: Double-click on the Temp directory icon
Step 4: Type practice.sas in the filename box, and
Step 5: Click SAVE or press Enter.
Examining Log and Output
Running a set of commands at a time (rather than issuing commands individually, or running an entire analysis file at once) is referred to as non-interactive use of SAS. Running jobs non-interactively allows you to easily find and correct mistakes. So let's check and see if you made any.
Choose VIEW then OUTPUT from the menu bar (to select the OUTPUT window), and you'll see.... an empty window, because you haven't yet submitted the commands. As mentioned earlier, typing commands into the Program Editor only creates a command file. Invoking SAS to actually "run" commands is a separate procedure:
Step 1: Select the ENHANCED EDITOR window. When the menu bar for the ENHANCED EDITOR window is dark blue rather than gray, this indicates that the window is selected.
Step 2: To submit the commands, under RUN in the menu bar, choose submit. Or you could right-click your mouse and choose SUBMIT ALL


By default, SAS writes output into the OUTPUT window, but writes error messages and warnings to the LOG window. Whether there even is any output depends on whether there are errors in the LOG window. You should always examine your LOG window before the OUTPUT window.
Under VIEW in the menu bar, choose LOG to view the LOG window. Look closely for ERROR messages and at the descriptive NOTES.

In this case, there are no errors listed: Each of the procedures is followed by a NOTE indicating that the procedure was completed within fraction of a second -- or number of seconds, with a larger dataset. (If there were errors, and you needed help debugging your program or interpreting the output, you can save and print the LOG file and bring it to a statistical consultant for assistance.)
Since the LOG window shows no errors, next choose VIEW then OUTPUT from the menu bar. The output window contains the results of the two procedures. First, here is the output from PROC CONTENTS:

Here's part of the PROC PRINT output. (The full output is 19 screens full, showing all 474 cases.)

Note that observation number 6 has no value for GENDER, and observations 9 and 28 have a value of X for JOBCAT. You will almost always have "missing values", whether you start with a collected dataset or collect it yourself, you will need to modify them before using the data.
Data Preparation
Regardless of where your data came from, or who collected it, you will almost always need to make some modifications before you begin your analysis. In particular, you may need to resolve "missing values" as well as provide meaningful labels for each variable and for each value of each variable. Before we do that, we'll extend our simple SAS command file to look deeper at what's there already.
Running Frequencies
Prior to modifying any dataset, you should consider what it looks like already. One way of doing that would be to scan the PROC PRINT output and visually assess the data, as we did a bit of above. That might be fine if we only had those 30 cases, but not for larger datasets. Much easier, even with only 474 cases, is to let SAS do the work for you. The commands below produce several useful tables.
PROC FREQ DATA = sasclass.bankdata ;
TABLES gender ;
PROC MEANS DATA = sasclass.bankdata ;
VAR salary ;
Step 1: Type these commands at the bottom of what already appears in the Editor,
Step 2: highlight these statements,
Step 3: Then, in order to execute them in SAS, choose RUN and SUBMIT, as you did above.
Always remember to check the LOG file before the OUTPUT window. Do that now and you'll see ... nothing new, because here you submitted the command line but not the requisite RUN statement.
Step 4: Add:
RUN ;
to the command file, then highlight and submit this command segment. The LOG should now show:

Note that you will typically use PROC FREQ for categorical variables (nominal or ordinal) and PROC MEANS for continuous (interval or ratio) data -- in this case, gender and salary, respectively.
If you look at the OUTPUT window and scroll up a bit, you should see this PROC FREQ output, which shows that our dataset includes 215 women, 255 men, and 4 respondents of unknown gender:

On the next page, the PROC MEANS output shows that salaries for this sample range from $15,750 to $135,000, with a mean under $35K and a standard deviation above $17K.

From this we can estimate that two-thirds of the individuals sampled earn between $18K and $52K (within one standard deviation of the mean) and less than 5% earn over $68K (two standard deviations above the mean). We can also see that, unlike the data for gender, there is valid data on the salary of all 474 respondents.
Missing Values
System missing values are literally missing -- there is simply no data for some cases. Any variable for which a valid value cannot be computed or read from raw data is assigned the SAS system-missing value. In the dataset we're working with, there are four cases (including observation 6) for which no value is recorded for GENDER.
There is usually no procedure necessary to address these values (such as "setting blanks to zero"), because SAS automatically accepts the absence of values (or a single period surrounded by spaces) as system-missing data. However, as we will discuss in the second session, with list-style input, you must use periods (not blanks) to indicate missing values.
User missing values are non-blank values for numeric variables which you elect to ignore for purposes of analysis, and specifically designate as "missing". They typically indicate non-acceptable responses or otherwise differentiate among cases.
In some programs (e.g. SPSS), perhaps 0 indicates respondents who refused to answer a survey question, 97 indicates those for whom the question was inappropriate or inapplicable, and 98 indicates a response of "don't know". In SAS, these "user-defined missings" may be any of the 26 CAPITAL or lowercase letters of the alphabet (numeric values are not allowed) or the underscore. In our dataset, there are seven cases (including observations 9 and 28) which have a value of X for JOBCAT.
You may sometimes wish to analyze the distribution of missing cases, or to explain missing data with other data. For example, perhaps retirees are less likely to answer questions about sexual habits. But you will usually exclude both missing values, and the cases associated with them, from analysis.
The MISSING statement is used to declare user-missing values for numeric variables as missing within a DATA step. Character variables must be represented by a period in list style input. It is not possible to use user missing values for character variables.
DATA sasclass.missingdata; MISSING X ;
input var1 charvar$;
cards;
4 aaa
5 bbb
X ccc
3 .
. eee
;
RUN ;
These statements tell SAS to use X for var1 for missing values, and thus to ignore those cases with X reported as var1 in any analysis. Here are how you would use missing values in SAS statements:
if var1 = .X then do; if var1 = . then do; if charvar = ' ' then do;
Note that the first if statement only picks up the third observation, whereas the second one picks up the fifth observation, although both are treated as missing in statistical analyses. A frequency table for the variable JOBCAT would show:

Variable Labels
So far, we only have variable names (short names for what the variable is) and numeric values for the variables. The output of your procedures will be easier (for you and others) to understand when you instead use variable labels and value labels . Value labels are more difficult in SAS than in some other programs, such as SPSS, and will be addressed in the second session. But we're ready now to apply variable labels to this data set.
We already implemented one variable label while in VIEWTABLE, when we labeled BDATE as "Birth year of the respondent". A command approach can do the same thing for many variables in one step. The LABEL statement, used in a DATA step, provides labels for variables. Although variable names are limited to eight characters, the label may be up to 40 characters long, including blanks. For example:
LABEL ses = 'Computed Socio-Economic Status' ;
We'll be using five variables in our data set, so should label at least those five.
Step 1: Add these lines to the bottom of your command file in the EDITOR:
PROC DATASETS library = sasclass ;
Modify bankdata ;
LABEL gender = "Respondent's Sex"
educ = 'Number of years of education'
preexp = 'Previous experience (months)'
bdate = 'Birth year of the Respondent'
jobtime = 'Months since hired'
salary = "Respondent's salary" ;
RUN ;
quit;
Note that an apostrophe can be used in a label if the label is surrounded by double-quotations, as was done above for gender and salary; otherwise, single-quotes are fine.
"quit" statement above causes SAS to immediately stop the procedure that is running. Datasets is an interactive procedure, you can submit more optional commands and SAS will process these without running the whole procedure all over. Since we are done with this procedure, we quit it immediately after running our commands.
Step 2: Next, add these lines so that you can see the effect of having specified variable labels:
PROC FREQ DATA = sasclass.bankdata ;
TABLES gender ;
PROC MEANS DATA = sasclass.bankdata ;
VAR educ preexp bdate jobtime salary ;
RUN ;
Step 3: Then choose RUN from the main menu and select SUBMIT option.
If you look at the OUTPUT file, you'll see that the frequency tables now have the variable label as a heading rather than the variable name; and the means tables have a new column for the labels. For example, here is part of the means table:

Data Analysis
We have an active data set with labels, and we've taken care of missing values. Now our data is ready for further analysis. We've already looked at a frequency distribution for salary, but we're interested in explaining that distribution: What accounts for differences in salary among the bank employees? We'll start by looking at differences between men and women, and then use other variables and procedures to further explore the relationship.
Examining Differences
We've already gotten a general picture of the SALARY variable for the entire sample. What we're interested in now is the difference in salaries between men and women. There are actually several ways to compare male and female salaries with SAS. Let me show you a few possible procedures, and then have you do only one of them.
Separate lists In the following command segment, the first PROC step sorts the data by gender, a necessary prerequisite for the second PROC, which lists all of the data for female cases and then all of the data for male cases.
PROC SORT DATA= sasclass.bankdata ;
BY gender ;
PROC PRINT;
BY gender ;
There is no output from the first PROC, and the lists are too long to reproduce -- and probably too long to be of much use. Note that we did not specify the data= option in the print procedure. In the cases where we do not specify the data set explicitly, SAS uses the default data set, which is the most recently created or modified data set.
Cross-tabulation: The next PROC is a bit more useful, by allowing quicker comparisons between men and women at each level of salary.
PROC FREQ ;
TABLES salary * gender ;
However, the table is no smaller (there's actually more information) and no easier to summarize, because there are so many levels of salary.
Summary comparisons: Both of those procedures might be useful (particularly with very small data sets), but we need summary comparisons to make our case. In addition to showing the mean salary for the entire sample, SAS can also provide means for specific groups, allowing us to see how the mean salary differs between men and women.
Step 1: Type the following lines at the bottom of your command file:
PROC SORT DATA= sasclass.bankdata ;
BY gender ;
PROC MEANS DATA = sasclass.bankdata ;
BY gender ;
VAR salary ;
RUN ;
Step 2: To submit these commands, hightlight this section and, this time try clicking on the "running man" icon on the toolbar:

Remember to check the LOG file first, but the output from this procedure should include this:

As you can see, there is indeed a difference in salaries between men and women within the bank: On average, men make almost $15,000 more than women (with means of $41K vs. $26K), although men's salaries are almost three times as a dispersed as women's (with standard deviations of $19 vs. $7K).
Possible explanations for this difference might include differences in education, previous experience, age, length of tenure with the bank, and sexual discrimination. Fortunately, our data set includes information about the first four of those. We'll start by looking at differences in education: Perhaps men are paid more than women because they are better educated (a difference which itself would have to be explained at some point). First, we will need to recode the education variable.
Recoding Variables
Our data on education is currently interval, indicating an actual number of years of education completed by the respondent, from 0 to 16. For now, our comparison will be easier if we recode this variable (i.e. regroup the values, and thus the cases) so that only three categories are present:
- 16 or more years of education (presumably those with college degrees)
- 12 to 15 years of education (presumably those with only a high school diploma), and
- less than 12 years of education (presumably those without a high school diploma).
Recoding conventions: There are three major conventions for recoding, and you should take care to choose one appropriate to your data and research question. The convention we're employing is logical divisions, using thresholds associated with expected differences (12 years for high school and 16 for college). A second convention is to use equal divisions, such as age groupings by decade (teens, 20s, 30s, 40s, etc.) A third convention, and the most robust, considers the shape of the distribution being recoded to find empirical concentrations, independent of expectations and equality.
Recode from a safe copy: Any variable can be recoded simply and quickly, and the data set is altered without any option to leave it unsaved. Consequently, there's a risk that you may recode something and then not be able to ascertain the original differences. (For example, right now we want an ordinal measure of education, with three levels, but later we may want to consider the 16 levels separately.) To eliminate this risk, you should recode a "safe copy" of the original variable -- an exact copy of the variable, with the same values for each case -- rather than the original variable itself. Creating a "safe copy" is easy. A stand-alone command could do it:
educ2 = educ ;
However, that adds a pass through the data without actually doing any recoding. That's not much of an issue with our dataset of only 474 cases, but with a large data set (e.g. 10,000 cases) it would add a lot of time. Instead, you can also create the safe copy at the same time that you recode.
Re-assign values: Recoding is most easily done using IF/THEN statements, as in the following:
DATA sasclass.bankdata ;
SET sasclass.bankdata ;
IF educ GT 0 AND educ LT 12 THEN educ2 = 1 ;
ELSE IF educ GE 12 AND educ LE 15 THEN educ2 = 2 ;
ELSE IF educ GE 16 THEN educ2 = 3 ;
In each line, the new variable EDUC2 (a safe copy of EDUC) is created and set based on the value of the original variable EDUC: The first line identifies those cases with greater than (GT) 0 years of education but less than (LT) 12, and puts them in the first group, those who did not complete high school. The second line identifies those as greater than or equal to (GE) 12 and less than or equal to (LE) 15 and identifies those as the second category, those who complete high school and may have attended some college. The third line identifies those with greater than 16 years of education as belonging in the third group, whom we presume have finished college.
Validating recodes: The first thing you should do after recoding any variable (including creating a new variable and regrouping the categories) is to look at a frequency distribution of it. Add these lines:
PROC FREQ ;
TABLES educ ;
TABLES educ2 ;
RUN ;
then submit these nine lines together, and you should see the following output:

The 53 cases with 8 years of education are in the first category; the 312 cases (190 + 6 + 116) with 12 to 15 years of education are grouped in the second; and the 109 cases ( 59 + 11 + 9 + 27 + 2 + 1) with 16 or more years of education are in the third.
Note that when you re-assign values, you should also re-assign value labels, so that someone coming later (including yourself) will know what categories 1, 2, and 3 of EDUC2 mean. Again, we'll talk about value labels in the second session -- for now, just remember what 1, 2, and 3 stand for.
Utilizing new variable: Now that we've recoded education, we can consider whether educational differences explain differences in salary. One way would be to look at (disaggregated) differences in means, as we did above disaggregating salary by gender. The following will do the same by educ2:
PROC SORT DATA= sasclass.bankdata ;
BY educ2 ;
PROC MEANS ;
VAR salary ;
BY educ2 ;
RUN ;
And the output from this procedure should include the following:

This output suggests that salaries do vary by education level: On average, those with less than 12 years of high school earn $24K, those with 12-15 years earn $28K, and those with 16 or more years earning over $57K. College graduates earn on average twice what those without a high school degree earn, and their mean is more than the maximum earned by employees without a high school diploma.
Perhaps, then, education explains salary differences. But does it explain salary differences between men and women? First, we would need to know whether there are educational differences between men and women. To do that, we'll turn to cross-tabulations (a.k.a. cross-classification tables). But before that, we should save the data changes we've just made.
Saving Your Data
When SAS reads data with a DATA statement, it creates a binary file in a format which only SAS can understand, called a SAS dataset. All SAS statistical procedures operate on what is called the current (or active) data set. A file created by a DATA statement automatically becomes the current dataset, and any new variables added by data transformations or IF statements are added to this SAS dataset as they are created. Other procedures, such as PROC STANDARD, have options that add new variables, such as Z-scores or predicted values to the active system file for analysis by other procedures.
SAS data sets may be either temporary or permanent. The dataset normally disappears at the end of an SAS job. However, you may save SAS the work (and yourself the time) of recreating the dataset next time (and on each successive run on that data) by creating a SAS permanent dataset for later use.
SAS determines whether a dataset is to be temporary or permanent from the name you give it. All SAS filenames are really two-level names, in the form libref.membername. If you want the data set to be temporary (not saved to your directory), then use only the membername: When you use only a membername, SAS provides the default libref WORK and deletes the data set when it is done running. If you need to save the data set, then use both a libref and membername: If a permanent data set is created, it is stored in a SAS Library and may be used in other SAS programs without re-creating it. (Note that on UVA systems, a SAS Library is a logical concept, not a physical entity.)
Notice in the command file above that the name of the SAS dataset, sasclass.bankdata, has two parts. With such a two-part name, the dataset is permanently stored for future use. We did that so we can use the data in the second workshop session. But it is always good to save to a new dataset, just in case.
Generating Crosstabulation
Now that we have recoded education, we can assess educational differences between genders. Again, making this comparison will help us assess the plausibility that differences in educational levels actually explain the difference in salaries between men and women.
Since both the new education variable and the gender variable are categorical variables (ordinal and nominal, respectively), the appropriate procedure to assess difference in educational level across the two genders is to generate a "cross-classification" table, or "cross-tab". We do this with the following commands, similar to what you saw previously:
PROC FREQ ;
TABLES educ2 * gender ;
You can type those at the bottom of your command file, followed by a RUN statement, and then highlight and submit them. You should see this output:

While 32.16% of male employees have at least a college degree, only 11.63% of female employees do -- the male bank employees are almost three times as likely to have a college degree as the females. And we've already seen that salaries vary across values of education. Thus, it is at least plausible that education accounts for salary differences between male and female bank employees.
In order to fully assess whether education accounts for the salary differences, we will utilize a procedure called Linear Regression. This procedure also allows us to examine the influence of other factors that we hypothesized might account for salary differences -- age, previous job experience, and length of tenure at the bank. We'll pick up with that at the beginning of the next session.
Documentation and Help
SAS Online Documentation offers easy access to the most frequently used SAS documentation (previously available only in print), including news about SAS components that are shipped as experimental or beta. SAS for Windows includes pull-down help as well as ASSIST menus and dialogue boxes. You can also use the HELP command from the SAS for Windows command line.
SAS Tutorial
In SAS version 8.2 the online tutorial may be accessed from the Help drop-down menu by selecting the SAS Online Tutor under Books and Training. SAS Institute provides an on-line computer-based training (CBT) tutorial. The SAS/TUTOR module is licensed and available for SAS for Windows and SAS on the RS/6000s. In order to use this program you need to obtain the SAS/TUTOR training notes, which are available for purchase at the University Bookstore's PROFS Publishing. The cost is based on the cost of Profs Publishing photocopying the original notes. (If you have questions about getting a copy of these notes, please e-mail res-consult@virginia.edu) Once you have these notes, you can invoke the SAS/TUTOR module.
In SAS for Windows, or OS/2, version 8 SAS/TUTOR is invoked by starting up SAS, then selecting, online training from the Help menu.
In SAS on Macintosh, double-click on the SAS/Tutor icon in the SAS folder.
On the RS/6000s, the SAS/TUTOR for SAS, version 6.09, is started by typing /sas/sastutor at the Unix prompt, You can choose an item by "tabbing" to it and pressing enter to select it. Please note that the SAS/TUTOR for the RS/6000s is best used in the X-Windows interface. See the ITC document, U-025 for details on using the X-Windows interface to SAS on the RS/6000s.
You may also want to look at the use of SAS/ASSIST for creating a command file by using the pull-down menus and selecting the commands needed in their appropriate order.
SAS Manuals
SAS Institute, Inc. publishes a large library of manuals and statistical procedure guides. Some of these are available in the trade books section of University of Virginia bookstore. All of the manuals listed may be purchased directly from the SAS Institute, Inc., or may be ordered through any bookstore. They are also available, for reference use only, at the ITC Research Computing Support Center, Wilson Hall Room 244. There are manuals in the Research Center that can be checked out for up to 24 hours. Speak with the computing consultant in Room 244 Wilson in order to check out a manual.
Sample Syntax
Another aid to understanding SAS may be obtained by looking at sample programs provided by SAS. The programs come complete with data, and may be examined for ideas on how to set up a procedure, or may be run so that the output of the program may be studied.
The location of the SAS Institute, Inc. example files for PC SAS and SAS for Windows depends on the choices made during installation of these products on your module. In general, they are in the SAS subdirectory along with the module to which they pertain. For example, SAS Institute, Inc. sample files for the STAT module are generally located in: /SAS/STAT/SAMPLE, whereas the sample files for the ETS module would be in: /SAS/ETS/SAMPLE.
On the RS/6000s, sample files from SAS Institute, Inc. are in the directory /sas8/samples, in subdirectories labeled base, stat, graph, af, ets, insight, and or. These files must be copied to your own account before you can run them. Locally written example files may be browsed or copied from the /help/unix/statistics/sas/examples directory.
Web Documentation
The Statistical Computing Support web site includes answers to frequently asked questions, as well as information about licensing SAS and renewing your license, other products that might be useful, and links to dozens of other sites that may be of use to you: http://www.itc.virginia.edu/research/statistical.html
The Research Computing Group supports technologically advanced statistical work. http://www.itc.virginia.edu/researchers/
SAS provides assistance via its own Technical Support website. In addition, in versions 8, the Help files included with the program are in HTML (Web) format. The SAS Technical Site http://www.sas.com/ts SAS 8 online documents (help) http://www.itc.virginia.edu/manuals/sas8/onldoc.htm
Consulting Services
Additional assistance with SAS command file construction and statistical routines is available from the Statistical Computing Consultant located in the ITC Research Computing Support Center in Wilson Hall Room 244 (243-8800). The consultants can be contacted via electronic mail to res-consult@virginia.edu. Please note that consulting hours vary by semester, as well as holidays.
For statistical consulting (as opposed to statistical computing consulting), you may wish to contact the Statistics Division of the Math Department (http://www.stat.virginia.edu/uvastat.html). There are no charges for the advice of the faculty consultant, but there is a fee for graduate student consultants ($45 per hour) as well as for the expertise of statistics faculty other than the dedicated consultant ($95 per hour). To find the current faculty consultant, contact the division's secretary, Ms. Kathi Marshall, Halsey Hall room 103, 924-3222.
