Course Description
Introduction (through both lecture and supervised work, integrated in a practicum format) to elementary use and overview of SAS for Windows, including data file organization, data management, data import and export (from/to other formats and operating systems), and basic analysis. Use of SAS on other platforms supported by ITC (Mac, Unix) will be addressed but not explicitly instructed.
This document is the second part of the Introduction to SAS workshop; the first part is also available online.
Prerequisites
Familiarity with DOS (file paths and directory structures) and Microsoft Windows (booting, menus, mouse, scrolling, saving, etc.).
Table of Contents
- Welcome to Part II
- Obtaining the Files Used in this Tutorial
- Advanced Customizations
- Advanced Data Analysis
- Advanced Data Entry
- Advanced Data Management
- Advanced Appendices
- Documentation and Help
Welcome to Part II
When we left off last time, we had determined that men and women employed at the bank receive different average salaries, that salaries vary by educational level, and that there are educational differences between men and women in the sample. This suggests that education may account for the gender differences in salary, but we hypothesized that several other factors may also play a role in explaining this co-variation.
In order to fully assess whether education accounts for the salary differences, we will utilize a procedure called Linear Regression. This procedure also allows us to examine the simultaneous influence of other factors that we hypothesized might account for salary differences: age, previous job experience, and tenure at the bank.
We'll also read in raw data and then process commands, whereas before we used a SAS permanent dataset which had already been created for you. And we'll look at several options for customizing output, as well as options for importing and managing your data.
But first we'll look into some ways to customize your SAS data sets. We'll start by looking at an existing command file called course1.sas which includes most of the commands you entered in the first session, plus several new ones we'll cover in this session.
Obtaining the Files Used in this Tutorial
There are several files that have been created for use in this tutorial. The tutorial assumes that the files are saved on the hard drive of your PC in an area named C:\Temp (of course you may choose to save these files elsewhere). If you have problems downloading or using the files using Netscape, using Internet Explorer may resolve the problem.
To save the following files to the C:\Temp area of your hard drive -
Step 1. RIGHT click on the link corresponding to the filename below you want to get.
Step 2. Choose the Save As... option from the pull down menu. When the Save As... dialog box opens, use the navigation tools to designate the C:\Temp area as the save in area.
Step 3. Click the save button to save the file in the C:\Temp area.
Using the above method, save the following files in your C:\Temp directory.
Files needed for Part 2: (assumes you have previously downloaded Part 1 files)
- bankend1.sas7bdat- The SAS dataset (for Part 2)
- readascii.sas - A file containing the SAS commands to read bank.dat
- course2.sas - A file containing the sas commands for part 2 of the tutorial
- exportascii.sas - A file containing the SAS commands to write an ASCII text file
- bank.xls - an Excel file containing the bank data
- course1.sas - A file containing the sas commands from Part 1
- sasdata.dat - a raw data file
Advanced Customization
Annotating Syntax
You've probably already realized that SAS commands are very specific, and that small changes can make large differences -- or prevent you from getting any output at all. So that you'll later understand what you did, and so that you can more easily replicate procedures with other data sets and for other projects, it is advisable to comment your SAS command file -- that is, to add annotations which detail what the written syntax does (or, at least, what you had intended it to do).
Comments are useful for documenting what you are doing and are highly recommended. Comments may be put anywhere in a SAS file as long as they are bracketed by a slash and asterisk at the beginning and an asterisk and slash (reverse order) at the end.
/* one way of adding comments */ /* another way to add comments is to have many lines to type or multiple SAS statements you want to comment out. In this case, you only need to have a slash asterisk at the start and an asterisk slash at the end. Don't need them on each line */
Another comment style is to put an asterisk at the beginning of a line, which is useful in commenting out a command. Note that comments that begin with an asterisk are ended with a semi-colon
* another way of adding comments ; * With this style your comments can go across lines, but not across SAS statements because the semi-colon ends this style of comment, you can only comment out one SAS statement ;
Computing a New Variable
While there are already variables in the data set for job experience (PREVEMP) and tenure (JOBTIME), there is not yet a variable for age. But there is a measure of date of birth, which we can use to create a measure of age. The "command" is similar to what we did to create a "safe copy" of EDUC, but extends the logic: Rather than creating a mere copy, it calculates an age for each respondent based on a simple mathematical expression. This can often be done in one SAS step but here we do it in two. First we set y equal to value given using the YEAR function to strip the year of birth from the bdata variable. Next we determine the value of the variable age with the expression age = 1999 - y.
Note these steps will also create a new SAS dataset from the data set bankend1. Bankend1 contains the modificiations that we did in the previous session. In order to use the library name "sasdir" or any libname on a DATA step, you have to let SAS know about it. You can do this using the mouse as we did in Part 1, or you can use the command shown below.
LIBNAME sasdir "C:\temp" ;
DATA sasdir.bankdata ;
SET sasdir.bankend1 ;
y = year (bdate) ;
age = 1999 - y ;
PROC FREQ ;
TABLES age;
RUN ;
Above, line 2 creates a new SAS permanent dataset, and line 3 reads in an existing SAS permanent dataset. Lines 4 and 5 create the new variable AGE, the next two lines request a table for validation, and RUN makes the PROC FREQ execute immediately, rather than "wait" until a second SAS procedure is submitted.
Step 1. Type in the lines of SAS
commands above in the Enhanced Program Editor
Step 2. Submit these commands to SAS to execute by using the
F8 function key. This does the same thing as pressing the SUBMIT button
( the running dude) on the toolbar.
After any adjustment of data, it is always a good idea to look at a frequency distribution to ensure that changes occurred in the manner expected. This output validates that what you tried to do (create a new variable indicating age) worked:

The same logic could be extended further, using more complex expressions.
For example:
| Combining measures of satisfaction: | sat_all = sat_1 + sat_2 ; |
| Computing a measure of socio-economic status: | ses = (educ * income) / age ; |
| Computing overall income: | income = SUM (salary, overtime, bonus, stocks) ; |
Note: The word SUM is a SAS function, that will add all the listed variables together. If any value for a particular variable for a given case is missing, it will omit that variable for that person, but give a total sum. You should be aware calculating new variables from existing ones what effect missing values will have on your new data values.
Value Labels
We previously created a variable educ2 from educ, but haven't yet made clear (in our dataset or to other users) what the values of educ2 mean. A new format (and would typically come prior to the DATA statement) that we name degree; the second statement ascribes the newly created degree format to the variable educ2 (and would typically come within the DATA step, following the INPUT statement). In our case, we can enter them as follows:
PROC FORMAT ;
VALUE degree 1="no HS" 2="no BA" 3="BA or more" ;
PROC FREQ ;
TABLES educ2 ; FORMAT educ2 degree. ;
RUN ;
Note that you MUST end your format name with a period, as shown above degree.
Add, highlight, and SUBMIT these commands to get the following output:

Naming conventions:Many special characters -- such as
# @ $ {} -- cannot be used in variable names, value names, or data set names.
The value labels may be up to 32 characters long, beginning with a letter (A-Z
and a-z) or an underscore (_). Subsequent characters may be numbers (0-9), letters,
or underscores. (However, use caution if starting a name with an underscore,
because SAS sometimes creates internal variables that start and end with an
underscore.) As of Version 8.1, you can have variable names up to 32 characters.
Note that SAS value labels (the words associated with categorical numbers
of a variable) are not stored with the SAS dataset, but in a separate
SAS binary file called a catalog.
1.4.2 Controlling Output: You can change a variety of system options from their default values, including the page width, page length and line spacing, centering output, and printing the current date on the output. For example:
OPTIONS NOCENTER FORMDLIM = "~" ;
tells SAS to not to center the output and the FORMDLIM tells says that rather than use the default formfeed command to delimit between output sections, use a line of tildes. This option saves paper, but may make output difficult to read when printed on paper, because tables may get broken across pages using this option rather than the default. There are a myriad of options available, consult the documentation for additional information.
You can also specify a title to be printed at the top of each page of output, using the TITLE command. You may specify a title anywhere in your SAS job, and SAS will use that title starting at that point in the job. You can even specify multiple title lines by appending a number to TITLE. You are limited to ten TITLE lines. An example of two title lines is:
TITLE 'My dissertation analysis, phase 1' ; TITLE2 'Print out of those cases which appear miscoded' ;
Advanced Data Analysis
The first two procedures we'll do don't make use of the recoding and labeling we've just done, but help focus on the relationship(s) that we're interested in explaining.
Analysis of Variance
We've already compared the salaries for each gender, but only with univariate point estimates (of mean and standard deviation). We could test our confidence in a relationship more fully by statistically comparing the complete distributions. The following command performs a "t-test" analysis of variance (ANOVA) in salary between the two genders.
PROC TTEST ;
CLASS gender ;
VAR salary ;
TITLE 'Descriptive statistics listed separately for each gender' ;
RUN ;
Type the commands above, highlight them, and SUBMIT them. Your output (in the OUTPUT window) should include the following:

How to create an HTML version of this output
To get a HTML version of the output, choose Tools -> Options -> Preferences -> Results , check create HTML , then submit the program. Or you can run the following SAS commands before the SAS commands for which you want HTML-style OUTPUT and then close the HTML-output file after the commmand. For example:
ODS HTML FILE = "c:\temp\ttest.html" ;
PROC TTEST ;
CLASS gender ;
VAR salary ;
TITLE2 'Descriptive statistics listed separately for each gender' ;
RUN ;
ODS HTML CLOSE ;

This output shows some numbers we have seen before -- such as that the mean salary for women is about $26K while the mean salary for men is about $41K, but the distribution of salaries for men is wider (with a standard deviation of about $19.5K rather than $7.5K).
The p-value (<.0001), suggests there is a low risk of being wrong in rejecting the null hypothesis, which confirms our impressions from our look at the data in Part 1, and supports our claim that there is a statistical difference between male and female salaries. But why is there a difference? What about those other factors?
Scatterplot
We've already considered the role of education by grouping cases in cross tabulations. We could also produce a plot to compare individual cases. Let's compare two such plots -- one looking at the role of education (EDUC, not EDUC2), and a second looking at the role of tenure at the bank (JOBTIME) -- in order to compare the role of several factors, controlling for gender.
Type the following commands, and highlight and submit them:
PROC PLOT ;
PLOT salary * educ = gender ;
TITLE2 'Plot of Salary by Education' ;
PLOT salary * jobtime = gender ;
TITLE2 'Plot of Salary by JobTime' ;
RUN ;
The output should include the following two plots:


The difference is apparent: In the first plot (looking at the relationship between education and salary), there is an apparent relationship across the entire sample, although we can see that few women are above $40K or 16 years of educ. The second plot shows a less clear relationship between job tenure and salary -- and the few women that earn above $40K are not the ones with the longest job tenure. So, education and gender seem to be related to salary, but length of job tenure does not.
Let's try one more factor before proceeding. Add these lines, and highlight and SUBMIT them:
PROC PLOT ;
PLOT salary * age = gender ;
RUN ;
This third plot shows something very different: Older employees tend to earn less, with the highest salaries going to men in their 40s and the lowest going to women older than 50:

We could investigate a cross tab and plot of each pair of variables. But since we've already shown the possible relevance of at least three explanatory variables, let's turn to a method which allows us to consider all of these variables simultaneously.
Simple Linear Regression
To generate a Linear Regression that addresses our research question, the SAS commands are:
PROC SORT DATA= sasdir.bankdata ;
BY gender ;
PROC REG ;
MODEL salary = educ jobtime preexp age ;
BY gender ;
RUN ;
The output should include the following:


Several numbers in this output are of import, but you can largely focus on those and ignore many others.
- The "Parameter Estimate" column gives the non standardized coefficients for each variable and the intercept, and suggest how the model may be written as an equation. For instance, for the first model (for females), the predicted regression model is: salary = 8999.35 + 1573.56*educ + 27.99*jobtime + 1.96*preexp - 111.14*age This suggests that, once we have controlled for all of these variables simultaneously, each year of additional education increases women's salaries by about $1573.56, each additional month of job time increases their salaries $27.99, and each level of previous experience adds $1.96 -- but each additional year old the respondent is reduces their salary an average of $111.42. Men, by contrast, get almost three times the benefit for each year of education ($4250.41 vs. $1573.56) five times the benefit for previous experience ($119.62 vs. $27.99), and benefit rather than detriment for age (+$757.82 vs. -$111.14), but detriment rather than benefit for previous experience (-$51.63 vs. +$1.96).
- The "Prob > |T|" column gives the p-value for the test statistic that the parameter estimates are significantly different from what would be expected by chance -- i.e. this column tells you if the above coefficients are statistically significant. Using a critical alpha of 0.05 in the above example, the effects of education and age are significant for both men and women, but the effects of jobtime and previous experience are not significant for either men or women.
- The "R-square" statistic summarizes the fit of the model, in several ways. Here, the value of 0.3179 for the model for women suggests that 31.79% of the variance in salary is explained by these four explanatory variables, and that we reduce our errors in predicting salaries of women 31.79% by knowing values of these other variables. It also tells us that the strength of the model is moderate. (A typical convention here is that an r-square under .10 is weak, one between .10 and .5 is moderate, and one above .5 is strong.)
These regression results indicate that education does have a significant effect on salary. The coefficient for education is significant for both men and women. However, even after controlling for the effects of educational differences between sexes, gender continues to have a significant effect: Each additional year of education worth an additional $1573 for womenand $4250 for men, on average. While some other coefficients are not statistically significant, the coefficients for age are close and also indicate gender disparity: Each additional year of age increases men's salaries by $747 but lowers women's salaries by $111. This suggests the possibility of either some other unmodeled difference between the sexes or sexual discrimination on the part of the bank.
Having completed our analysis of the bankdata data set - let's look at other topics including choosing variables and importing data into a SAS dataset from ASCII and Excel files.
Choosing Variable Lists
Variables are defined in the order they appear in the INPUT statement.
After a complete list has been defined in a SAS program, you
can use abbreviated variable lists in later SAS statements. If your variable
names end in consecutive numbers (e.g., test1, test2, test3, test4) then
you may refer to them as a group using a single dash. To refer to the
four variables test1 to test4, type: test1-test4.
You may even do so in the INPUT statement, provided all of the variables
in the abbreviated list have the same format. If your variable
names do not end in a number, (e.g., educ, jobcat, salary, salbegin),
then you may refer to them in abbreviated form using two dashes. For example,
educ--salbegin
refers to all four of the variables listed here. And these shortcuts can
be employed in commands, as in the following (which you need
not type and submit):
PROC PRINT ; VAR id salary--minority ;
All of the commands up to this point are included in course2.sas, and annotated -- if you want to check what you've done, or save an annotated copy for future reference.
Advanced Data Entry
Importing Excel data
Follow these steps to import the Excel data file bank.xls into a SAS dataset named fromexcl.
Step 1: From the drop down menus, choose FILE, and IMPORT DATA as shown here:

Step 2: Select the Standard file format and then select the Excel type (.xls) from the pull-down menu. Then select Next:

Step 3: Now find the Excel file you wish to import using BROWSE or by typing in the full path. Then click Next:

Step 4: Choose the Library and Member name you wish for the Excel file. Then click Finish:

Step 5: The VIEWTABLE window should open as follows:

If you select the wrong format for the excel file, you'll probably get an error message that's similar to the one below:

Elements of a Data Step
The commands you've issued in Part 1 and most of this Part have used a data set which is already in SAS format -- a SAS dataset, ending in the extension sas7bdat. Frequentlyyou will instead need to include statements to get a raw data set (e.g., bank.dat) and identify its columns and rows as variables and cases, respectively. These statements can also create a permanent data set, such as the one we've been working with, so that yo u could later open the file without having to reference and detail the original bank.dat file.
Although data may be included directly in a SAS command file, it is more often read into SAS from a separate data file. In some lucky cases (such as our first workshop session), the data are already in SAS format. More typically, you will have data in some other format, such as ASCII (raw DOS or Unix text) or an Excel spreadsheet You might also use an editor (such as vi on Unix or the Notepad on Windows) to create such a file.
You use a DATA statement to tell SAS that you will be manipulating data. You use an INPUT statement to tell SAS what names to give each of the variables, as well as the column locations where each value can be found. The following syntax creates the dataset we've been using, from a raw ASCII file called bank.dat:
These commands are available in the file readascii.sas
DATA sasdir.banknew ;
INFILE 'c:\temp\bank.dat' MISSOVER ;
MISSING X ;
INPUT
@1 ID 4.
@5 GENDER $char1.
@6 BDATE MMDDYY8.
@14 EDUC 2.
@16 JOBCAT 1.
@17 SALARY Dollar7.0
@24 SALBEGIN Dollar7.0
@31 JOBTIME 2.
@33 PREEXP 6.
@39 MINORITY 1. ;
RUN ;
The DATA step is where data management activities occur. DATA steps are used to:
- Input data, either from raw data files or previously saved SAS data sets. Variable names and attributes are assigned and read in from user-specified locations.
- Transform data, through calculation, selection, or merging of data from several sources.
- Output data sets or print files. The data sets may be permanent or temporary. Print files may be created to produce reports or tables.
IMPORTANT NOTE: The first part is the DATA statement and a dataset name. This name is not the name of your raw data file. It is a name you are giving to the SAS dataset that SAS will create from your raw data. By having a LIBNAME you are making this SAS dataset a permanent one. If instead, you had typed: DATA bankdata ; Then "bankdata" dataset would be deleted when you exit SAS.
The second part is the INPUT statement where you name your data values and describe their location. The INPUT statement describes the arrangement of the data values for each case or observation in your data file and assigns variable names and formats to those data values. The INPUT statement is used in conjunction with the CARDS command when the data reside within your SAS program file, and with the INFILE command when data are read from an external (non-SAS) file, on disk or tape. (The INFILE statement must occur before the INPUT statement, because it tells SAS what data file to read.) In either case, you may use any or all of the three input styles shown in the next section.
Formats for ASCII Input
Telling SAS how to read a data file is fairly simple, and SAS is flexible about how data can be described. SAS provides a vast array of INPUT options and specifications, only a few of which will be covered in this document.
Note these sas commands assume you have defined the SAS library reference mylib and that you are creating a SAS dataset called mydata. Here the SAS commands are reading character input from the file sasdata.dat
- Column style input, which specifies the columns for each variable using this syntax model:
DATA mylib.mydata ; INFILE 'c:\Temp\sasdata.dat'; INPUT name $ 1-8 gender $ 9 age 13-14 ;The $ symbol is used to designate an alphabetic variable, also known as a character variable. The name myfile is an eight character or less name you make up for the SAS dataset that is being created from your raw data. In this case, NAME is an eight-character alphabetic variable that occupies columns one through eight.
You may also wish to use the MISSOVER option, which prevents SAS from going to a new input line if it does not find values in the current line for all the given variables; remaining variables are set to missing.
DATA mylib.mydata ; INFILE 'c:\Temp\sasdata.dat' MISSOVER ; INPUT name $ 1-8 gender $ 9 age 13-14 ; - Fixed format input, which uses a column pointer (@column
number) to point to the column where the variable starts, and a format
modifier to indicate the width of the data values and/or the number of
decimal places.
DATA mylib.mydata ; INFILE 'c:\Temp\sasdata.dat' ; INPUT @1 name $char8. @9 gender $char1. @13 age 2. ;
This tells SAS that NAME is a character variable that begins at column 1 and is eight columns long, while age is a numeric variable that is two columns wide and begins in column 13.
- List format input. You do not need to specify the location
of the variables at all. List format, though easiest for which to write an
input statement, is the most prone to data errors: There must be at least
one blank space between each variable, and missing values must be represented
by periods, NOT blanks.
DATA mylib.newfile ; INFILE 'c:\Temp\sasdata.dat' ; INPUT name $gender $age ;NOTE: This statement will produce errors from our data, because it does not account for missing data nor that there are more than three variables on data line in our data file.
Exporting to ASCII
SAS can write an ASCII text file of data via the FILE and PUT statements shown below. The variables in the PUT statement are written to the file c:\Temp\bank.out and come from the SAS dataset banknew: These commands are available in the file, exportascii.sas which you can download.
DATA _NULL_; /* Write this as is- "_NULL_" is a SAS keyword */
/* for an unnamed dataset */
FILE 'c:\Temp\bank.out' NOPRINT NOTITLES;
/* You supply the filename -
SET banknew ;
/* notice no SAS LIBNAME is used. This will make this SAS dataset
a temporary dataset that will be deleted when you exit SAS */
PUT gender salary jobcat;
/* List variables to export in the PUT statement*/
/* Can use any FORMAT statements, as in an INPUT statement */
RUN ;
Moving Between OSes
The following command file can be used to move SAS data from one operating system to another. Explanations and comments for the procedure are included in comment lines at the bottom of the file.
/*********************************************************************/ /* FILE is called /help/unix/statistics/sas/examples/export.sas */ /* By: Tim FJ Tolson, User Support, ITC, University of Va. */ /* ***** See documentation/explanations below ******* (6/92)*/ /*********************************************************************/ OPTIONS LINESIZE=80; TITLE 'Export SAS dataset to transport DATA set' ; LIBNAME saslib '.'; LIBNAME transp XPORT 'transp.sasport'; PROC CONTENTS data=saslib._all_ ; /* Use one PROC MEANS DATA = sasdataset ; statement for EACH dataset */ PROC MEANS DATA = saslib.tests MAXDEC=4 ; PROC MEANS DATA = saslib.sexmean MAXDEC=4 ; PROC COPY IN = saslib OUT=transp ; SELECT tests ; /*----------*****EXPLANATION/DETAILS ******----------------------------*/ /* Use SELECT if have several SAS membernames in */ /* the SAS library and only want to move one */ /* The first LIBNAME statement reads all of the SAS datasets from the */ /* default (current) directory. The Second LIBNAME statement points */ /* to the directory and FILE that the transport dataset(s) will be */ /* written into. */ /* If there are more than one dataset in the default directory than all */ /* the datasets will be written into the ONE transport file. If you */ /* want to write separate datasets, use the SELECT option on the PROC */ /* COPY command. */
/* If there are datasets in other directories, use additional LIBNAME */ /* statements to point to those directories and use additional LIBNAME */ /* statements to point to additional transport files and then issue */ /* additional PROC COPY commands. */ /* */ /* *** VERIFYING and CHECKING your data for transfer: */ /* The PROC CONTENTS lists all the information about the dataset. */ /* It gives the number of cases and variables, the variable names, type, */ /* any formats, informats, and labels. */ /* PROC MEANS gives the Mean, Standard Deviation, Minimum & Maximum */ /* value for each variable in the dataset. */ /* Use these two pieces of output to compare the original dataset to the */ /* dataset AFTER you imported on your new computer system to make sure */ /* the data tranferred correctly. It's one in 10,000 that it does not, */ /* but you don't want to be the one! */ /* */ /* **** IMPORTANT TRANSFER INFORMATION **** */ /* After you have created the SAS transport dataset you are ready to */ /* move it to a new computer system. They transfer MUST BE MADE in */ /* BINARY MODE! */ /* Use the FTP command: */ /* BINARY. */ /* When you IMPORT the dataset, use the PROC CONTENTS and PROC MEANS */ /* commands to generate the same information generated by this file and */ /* compare the results to make sure the data transferred correctly. */ /* ***********************************************************************/
Advanced Data Management
Merging Data Sets
SAS allows you to merge sorted data sets in two ways: by order and according to the value of a specified variable common to both data sets. This is done using a MERGE statement in a DATA step. The MERGE feature is especially useful for dealing with hierarchical data. Merging according to the value of a specified variable creates a data set where the data for the upper level is duplicated for each member on the lower level of the hierarchy. For example, if you had two data sets containing economic data, one for STATES and another for CITIES, and if both data sets contain a variable STATE, merging the data sets by STATE would create a new data set with one observation for each city containing the data for that city as well as the data for the state in which the city is located.
The SET statement in a DATA step will allow you to copy one data set into another with modifications. Variables may be discarded, recoded, renamed or left the same. For details see the SET, DROP, KEEP and RENAME statements in the SAS Language Usage manual, and the Language Reference manual.
If you will be merging data sets, either to add cases or to add variables, you are encouraged to read our on-the-web merging tutorial:
http://www.itc.virginia.edu/research/indepth/merging.html
Using Arrays
SAS gives you the choice of combining data of the same type (character or numeric) into an array. This is advantageous when similar transformations must be performed on many variables. For example, the SAS command file below reads in grades then recode s ones to zeros using an array:
DATA gradeval ;
INFILE grades85 MISSOVER ;
/* DEFINE THE ARRAY*/
ARRAY question{10} q1-q10;
/* Question is the arrayname and 10 is the number of elements */
/* q1-q10 are the array elements */
INPUT @40 q1 BZ1. q2 BZ1. q3 BZ1. q4 BZ1. q5 BZ1. q6 BZ1. q7 BZ1.;
/* The BZ1. is a SAS informat for a value 1 column wide */
/* and to transform blanks to zeros */
/* Using a DO WHILE loop to recode ones to zeros*/
i=1 ;
DO WHILE (i LE 10);
IF (Question[i]=1) THEN Question[i]=0;
i=i+1 ;
END ; /* end the do while loop */
DROP I ; /* get rid of the i counter variable from the data set */
RUN ;
Advanced Appendices
Options for Program Editors
SAS for Windows has many ways to select/specify commands. You can use:
- Command line, which is intended for experienced SAS users. Use of the command lines require knowledge of the SAS command set.
- Command Bar, the recommend choice, intended as a navigator and for novice users.
- Command Box, a moveable command dialog box.
- Pop-up menus are always available. In any window, clicking the right mouse button will open "pop-up" menus matching the appropriate menus available for that window.
For more information about the PROGRAM EDITOR window Preferences, consult SAS Companion for Microsoft Windows Environment.
Methods for Running SAS
SAS can be run in three different ways. SAS can be run in two different modes: interactive mode and production mode. In production (or batch) mode, you prepare a file of SAS commands and execute this file after it is written; nothing happens until y ou run the prepared commands. Interactive mode means using the SAS display Manager to type, edit, and run SAS commands. In addition, in interactive (window) mode, you can chose to either use the pull-down menus to select your commands, or you can type the m into a program window. We will demonstrate all of these methods of entering commands in SAS for Windows or Macintosh.
The preferred method of running SAS on the Unix computers is the non-interactive mode, in which you prepare a file of SAS commands and then invoke SAS to execute this file of commands. Please consult the Unix document, U-025 SAS on the RS/6000 for further details.
Numerical Accuracy and Representation
Computers do not store or work with numbers in the same manner as you and I. All numbers are represented in binary form in a computer. One practical result of this system of representing numbers is that numbers that are integers to us, e.g. 1 or 7 or 88, are real numbers to the computer, e.g. 1.0000000000232123 or 79.9999999999999987.
Because of this numerical representation, if you typed: IF (anyvar=1) newvar=20 the value of newvar may or may not be 20 for those cases where anyvar was equal to 1. The computer cannot store the integer "1" and make an exact match. It has stored 1.0 0000000000132 or maybe 0.9999999999999778, neither of which equals 1.0000000000000000.
Thus when you are making numerical comparisons or recodes, make sure that your categories include all possible real numbers. For example:
IF educ2 GT 0 AND educ2 LT 12 THEN educ2 = 1 ;
ELSE IF educ2 GE 12 AND educ2 LE 15 THEN educ2 = 2 ;
ELSE IF educ2 GE 16 THEN educ2 = 3 ;
Documentation and Help
SAS Online Documentation offers easy access to the most frequently used SAS documentation (previously available only in print), including news about SAS components that are shipped as experimental or beta. SAS for Windows includes pull-down help as wel l as ASSIST menus and dialogue boxes. You can also use the HELP command from the SAS for Windows command line.
SAS Tutorial
In SAS version 8.2 the online tutorial may be accessed from the Help drop-down menu by selecting the SAS Online Tutor under Books and Training. SAS Institute provides an on-line computer-based training (CBT) tutorial. The SAS/TUTOR module is licensed and available for SAS for Windows and SAS on the RS/6000s. In order to use this program you need to obtain the SAS/TUTOR training notes, which are available for purchase at the University Bookstore's PROFS Publishing. The cost is based on the cost of Profs Publishing photocopying the original notes. (If you have questions about getting a copy of these notes, please e-mail res-consult@virginia.edu ) Once you have these notes, you can invoke the SAS/TUTOR module.
In SAS for Windows, version 8 SAS/TUTOR is invoked by starting up SAS, then selecting, Online Training from the Help menu.
In SAS on Macintosh, double-click on the SAS/Tutor icon in the SAS folder.
You may also want to look at the use of SAS/ASSIST for creating a command file by using the pull-down menus and selecting the commands needed in their appropriate order.
SAS Manuals
SAS Institute, Inc. publishes a large library of manuals and statistical procedure guides. Some of these are available in the trade books section of University of Virginia bookstore. All of the manuals listed may be purchased directly from the SAS Inst itute, Inc., or may be ordered through any bookstore. They are also available, for reference use only, at the ITC Research Computing Support Center (RCSC), Wilson Hall Room 244. We have a comfortable "reading area" at the RCSC where you can browse a manual as well as get assistance. There are manuals in the RCSC that can be checked out for up to 24 hours. Speak with the computing consultant in Room 244 Wilson in order to check out a manual.
Also all the SAS manuals are available on one CD-ROM that you can get from the RCSC for your use or you can assess all of these SAS "OnLineDoc'' at our web site:
http://central.itc.virginia.edu/manuals/sas8/onldoc.html
Sample Syntax
Another aid to understanding SAS may be obtained by looking at sample programs provided by SAS. The programs come complete with data, and may be examined for ideas on how to set up a procedure, or may be run so that the output of the program may be studied.
The location of the SAS Institute, Inc. example files for PC SAS and SAS for Windows depends on the choices made during installation of these products on your module. In general, they are in the SAS subdirectory along with the module to which they per tain. For example, SAS Institute, Inc. sample files for the STAT module are generally located in: /SAS/STAT/SAMPLE, whereas the sample files for the ETS module would be in: /SAS/ETS/SAMPLE.
On the RS/6000s, sample files from SAS Institute, Inc. are in the directory /common/sas82/rs6000/samples, in subdirectories labeled base, stat, graph, af, ets, insight, and or. These files must be copied to your own account before you can run them. Locally written example files may be browsed or copied from our SAS web page at:
http://www.itc.virginia.edu/research/sas/home.html#examples
Web Documentation
The Research Computing Group supports technologically advanced statistical work via our Researchers website.
SAS provides assistance via its own Technical Support website. In addition, in versions 8, the Help files included with the program are in HTML (Web) format.
SAS 8 online documents (help): http://central.itc.virginia.edu/manuals/sas8/onldoc.htm
Consulting Services
Additional assistance with SAS command file construction and statistical routines is available from the Statistical Computing Consultant located in the ITC Research Computing Support Center in Wilson Hall Room 244 (243-8800). The consultants can be contacted via electronic mail to res-consult@virginia.edu. Please note that consulting hours vary by semester, as well as holidays.
For statistical consulting (as opposed to statistical computing consulting), you may wish to contact the Statistics Department (http://www.stat.virginia.edu/consulting.html).
There are no charges for the advice of the faculty consultant, but there is a fee for graduate student consultants as well as for the expertise of statistics faculty other than the dedicated consultant. To find the current faculty consultant, contact the division's secretary, Ms. Brenda Crider (103 Halsey Hall, phone: 924-3222).
