|
Abstract Any programmer dealing with frequent changes to program specifications is some one who has to cope well in frustrating, time-consuming and error-prone challenges. Changes to report headers and footers, dataset contents, and other aspects of client deliverables have to be communicated effectively and implemented correctly. Metadata and metadata-driven utilities are effective tools to reduce or entirely eliminate many of the programming and management problems inherent in traditional project work flows. This paper discusses the nature of metadata, some of its design criteria, and the notes the need for the all-important applications that make it usable throughout the project life cycle. It also presents a simple case history, presenting traditional, "before" code and work flow, followed by revised, metadata-driven coding. The reader should come away with an appreciation of the power of metadata-driven applications and, hopefully, ideas of how these techniques can be implemented in his/her workplace. |
|
Abstract Traditionally, specifications for datasets and data displays were stored in Word or other document formats. This was great for communicating instructions to programmers, but not so great for developing robust and dynamic applications. By moving these specifications from documents into programmatically-accessible metadata, applications can be made more efficient and dynamic. Further, the metadata can be utilized for many other purposes. This paper discusses the rationale for moving specifications from documents to metadata. It outlines the structure and content of the metadata used for a study. The paper then presents a case study that demonstrates how metadata is used to manage a clinical trials project from end to end. It shows how metadata is utilized from the time of study setup all the way through producing submission databases and associated define files. Throughout the paper, we show how well-designed and easily -accessible metadata improves work flow and project management throughout the life cycle of a project. |
|
Abstract The complexity of even small pharmaceutical projects can be daunting. Consider the deliverables: patient profiles, listings, domain and analysis data sets, Define files, tables, and figures. Even in a single study, these routinely total hundreds of files. For NDA submissions, these are but a single piece of a larger “puzzle.” Consider as well the documentation and human resources pushing the study through its life cycle. Project managers need to monitor the completion status of the files. Statisticians and analysts have to identify data requirements and lay out “dummy” displays. Programmers have to write the programs to create the data and reports using specifications that are often, to be kind, “fluid.” Creation of high-quality output requires coordination of effort and clear and immediate communication of results. Rho has migrated much of the requisite project management and data and display specifications to carefully designed and utilized metadata. By moving items that describe data sets and displays from documents and low-level programs into data sets, we have realized significant gains in productivity and quality of output.
This paper describes the current use of metadata at Rho. It:
The paper is largely conceptual and nearly code-free. While we emphasize application development in the pharmaceutical industry, we feel the underlying concepts regarding metadata design and implementation are valid across industries. |
|
Abstract The SAS® macro language has power and flexibility. When badly implemented, however, it demonstrates a chaos-inducing capacity unrivalled by other components of the SAS System. It can generate or supplement code for practically any type of SAS application, and is an essential part of the serious programmer's tool box. Collections of macro applications and utilities can prove invaluable to an organization wanting to routinize work flow and quickly react to new programming challenges. But the language's flexibility is also one of its implementation hazards. The syntax, while sometimes rather baroque, is reasonably straightforward and imposes relatively few spacing, documentation, and similar requirements on the programmer. In the absence of many rules imposed by the language, the result is often awkward and ineffective coding. Some amount of self- imposed structure must be used during the program design process, particularly when writing systems of interconnected applications. This paper presents a collection of macro design guidelines and coding best practices. It is written primarily for programmers who create systems of macro-based applications and utilities, but will also be useful to programmers just starting to become familiar with the language. |
|
Abstract Meet an accomplished SAS programmer and you meet someone who's probably learned by making (and fixing) lots of mistakes along the way. The breadth of the SAS System's target applications, the variety of its "dialects" (Base SAS, macro, SCL, IML, SQL), and the quirky procedural/non-procedural environmental mix conspire to make mastery of the SAS System a slippery slope to ascend. Debugging is the art of gracefully recovering and learning from falls during the ascent. This paper discusses techniques for debugging SAS programs. Its purpose is two-fold. First, it provides behavioral and technical tips for fixing code (how to read error messages in the SAS Log, knowing when there is a problem with the program even if SAS says there isn't, using the DATA step debugger, identifying system options, using PROCs for data validation, using macro variables to control debugging output, etc.) The second focus of the paper is its presentation of design and coding methods that make the programming process more reliable, thus reducing the need for debugging in the first place. The paper's target audience is relative newcomers to the SAS System. More seasoned users may find or rediscover some of the techniques and features being discussed. Emphasis is placed on Base SAS and the macro language, although the techniques themselves are applicable to SCL and other products. |
In recent years, I've discovered that my greatest strength in SAS programming lies in tool-building. It's fun to identify recurring needs (or even a one-time event that seems like it'll be recurring), tease out the commonality and patterns, and then design, build, and document a product that will save programming effort for a client.
The Utility Primer is a best practices paper. It is also one that you can only write after years
of continually identifying "non-best" practices, knowing why they are not ideal, and gradually
refining these thoughts into a strategy that results in solid, reliable tools. The paper
is a condensed version of a section of a one-day course I offer in utility design.
|
Abstract Let's start with the premise that good programmers are lazy by nature. They want to use tools such as formats and ODS for execution-time efficiency or to pretty-up our output, functions to perform calculations, and so on. Another hallmark of a good programmer is a keen eye for pattern recognition. Rather than rewrite basically the same program over and over, they identify similarities and parameterize the program, making it into a general-purpose program, a "utility." This paper steps through the life cycle of a simple utility. It starts with "naïve" code that doesn't exploit program similarities, then illustrates how a general-purpose utility may be developed. It ends with the initial program becoming a call to a simple, powerful routine in a macro library. The transition from simple, brute-force programming into a compact, general- purpose utility isn't a random event. The last sections of the paper present a set of design principles for utilities. Although we focus on Base SAS in Version 9.0, the principles and techniques are readily extended across SAS versions and products. The reader will come away from this paper with an appreciation of both the process and the tool set required to build generalized programs. |
|
Abstract It's relatively easy to write programs that optimize the use of CPU and other machine resources. There is a large and continually growing body of literature on the subject. What isn't as straightforward is knowing when to employ the techniques - blind implementation of tuning techniques is often not required by the task at hand and can sometimes even be counterproductive. This paper addresses both the "how to" and "when to" aspects of writing efficient programs. It describes design and coding techniques that conserve hardware resource usage. It also identifies other, non-machine implications of their usage that could dissuade the programmer from their use. For example, using temporary array elements is more efficient than using named elements but has the documented-but-obscure behavior of retaining values across observations. Maintenance of such code by other than "seasoned" and up to date programmers can be unexpectedly problematic. The concept of efficiency used in the paper includes all aspects of the program life cycle. We apply the "how and when" question to system design issues, system startup, DATA steps, procedures, and macros. Emphasis is on Base SAS software. The reader should finish the paper comfortable with the idea that the "best" program is not always the one that minimizes hardware resources. |
This article originally appeared in the Spring 2004 SESUG Informant. It describes two utility
macros that address some simple needs. The first, QuoteList, takes a macro variable with one or more
unquoted tokens and returns a variable with the tokens quoted and optionally upper-cased. The second
macro, AllMacVars, prints the beginning of each global macro variable, listing them in alphabetical order.
The intent of the article was twofold. First, the code is intended to be useful in and of itself. Second,
as the code is explained, we demonstrate some underlying good programming practices
This paper, first presented in 1994, uses a small but realistic case study to illustrate the transition
from a single program to a robust, and more complicated, system. It describes tools and coding conventions
that make the transition painless and outlines ways to package the system so that users can't inadvertantly
alter the programs. Although the paper is a decade old, it holds up well - I drew from it and other papers
mentioned on this page for my SAS Utilities course (to be offered at SESUG 2004).
Here is a not uncommon scenario in many workplaces. A neophyte SAS programmer is assigned to maintain, debug, or enhance
an application. The atmosphere is sink or swim, the system is complex, the code is sophisticated, the documentation is
scant, and the programmer is bewildered. Questions slowly take shape. "What, exactly, am I supposed to do?" "What part(s)
of the application need my attention?" "Will a change to program X affect program Y?" And, most critically, "where do I start?"
What the poor programmer needs is a strategy for comprehending the program, then finding the "sweet spots" in the
code as efficiently as possible. This paper presents a generalized approach for programmers, particularly SAS "newbies", to
develop an understanding of how applications work. It also shows how to translate this comprehension into effective coding.
The paper identifies and discusses the rationale for questions the programmer should ask about: task definition, program-level
code, supporting code, system design and specification documents, and required domain knowledge.
Beginning SAS programmers should come away with a better understanding of how to correctly frame the programming problem
and effectively gather the resources needed to obtain a solution. They will also come to believe that the coding of,
say, a DATA step is usually simple, but the real art of programming is learning what to code, and why.
The generalized nature of SAS software almost guarantees that "n"
users will develop "n" unique solutions to even basic tasks. The
gap between the task correctly performed by the programs and the
disparate code is, for the most part, due to programming style.
This presentation discusses a set of generalized programming style
guidelines useful to both experienced and novice SAS
programmers.
It first investigates general principles of program design,
those aspects of the analysis and coding process common to all
aspects of SAS programming. The next sections focus on coding
guidelines for the DATA step and procedures. Finally, debugging
techniques are addressed.
The presentation contends that "good" programming style
usually results in programs that are more effective in terms of both
human and machine resources. The intent is not to pronounce one
style good and another lacking, but to simply outline an
experienced user's guidelines and gently prod other users to
examine their programming habits. These habits will become
critical to the success of organizations as SAS software becomes
embedded in more environments and organizations.
Dictionary tables were introduced to the SAS System in during the mid-life of Version 6. Laden with information that is
often difficult, and sometimes impossible, to get through other means, they still appear to be on the outside of many
programmers' Bag of Tricks. This is both perplexing and unfortunate for as we will see in this paper, once their content
and organization is understood, they are readily adapted for a range of applications that "are only limited by your imagination."
Indeed, it is difficult to think of a robust, generalized system utility that would not benefit from use of this metadata.
This paper describes dictionary tables and their associated SASHELP library views. It:
The reader should come away from the discussion with an understanding of the tables as well as with a checklist of SQL and macro
skills that are required to use the tables most effectively.
Understanding and Using Functions
View PDF (280k)
Presentation History
Return to Paper Index
This was presented as part of the Southern SAS Users 2001 conference, in the “Intro to SAS” section. Since it
was one of nearly a dozen papers with a common theme, the paper is devoid of context. Still, it stands on its own
pretty well, and can be used as a broad-ranging, basic introduction to using functions.
The Standalone Program Grows Up: Strategies for System Design
View PDF (145k)
Presentation History
Return to Paper Index
Most of us, if we're lucky, get our SAS "feet" wet by writing programs that are self-contained. That is, the
program doesn't have to run before or after other programs, and it references the outside world only to
access data and use autocall macros. Eventually, however, the demands of an application require making the
transition to a system of programs. This is where things get interesting.
Using the Process Flow Diagram Object to Communicate Information
View PDF (178k)
Presentation History
Return to Paper Index
This is one of my infrequent forays into the Applications sections of the conference circuit. The client needed a visual
and interactive interface to performance and quality control data on machines in its manufacturing facility. The gives
some background to the problem and discusses features that made it simple (the number of machines being monitored was
constant) or difficult (the data had to be presented in varying time intervals, and color-coded).
The “non-obvious” solution, but one that was perfectly legitimate, was using the Process Flow Diagram in Version 6.10
Screen Control Language. What the program amounted to was, in effect, carefully defining a series of rectangles in the
object. Each rectangle represented a time interval, and was hyperlinked to a display of data from several production tables.
Program Comprehension: A Strategy for the Bewildered
View PDF (250k)
Presentation History
Return to Paper Index
I’ll bet we’ve all been in this situation at one time or another in our professional lives: you start a new job or project
and are dumped into the middle of a mass (swamp?) of data and programs, then told to create new report X “that looks almost
like report Y.” There are no formal specifications, the flow of program execution for Report Y is not really obvious, and
pretty soon you feel overwhelmed.
What to do? How do you begin to understand the program, the data, and the system in which they are embedded? This paper
leads the reader down at least part of the Road to Comprehension. It identifies resources to look for, encourages examination
of learning behavior, and lists aspects of the work and programming environments that affect the way information is transmitted
to the programmer. The last section of the paper describes different types of programming activity – debugging, maintenance,
and enhancements – and shows how what you’re doing will affect how you acquire the information you need to be effective.
Abstract
Removing Macro Variables from the SAS Environment
View PDF (176k)
Presentation History
Return to Paper Index
“They” said it couldn’t be done and it seemed like “they” were right. If you didn’t really nose around “under the covers”
of SAS’s catalog and file structures, you could not delete a macro variable in Version 7 or earlier of SAS software. You
could set it to null, but you could not remove it from the macro variable table.
This paper describes a reasonably simple way to actually delete macro variables. It adjusts the memory allocated to macro
variables, then operates on the alternate locations used by SAS when the enforced memory shortage is in effect.
The exercise is now purely an academic one, of course, since the %SYMDEL command in Version 8 and later will actually
remove a variable. The paper remains interesting, though, because it shows how a little digging into SAS internals can
produce positive results. It’s also one of those “impress your co-workers” kinds of things …
The Elements of SAS Programming Style
View PDF (212k)
Presentation History
Return to Paper Index
Many years ago (1988?), someone posted a seemingly innocent question to the SAS list server (SAS-L@uga.edu). It went something
like “I am a 3GL programmer and new to SAS. Is there a SAS programming style reference?” The result, I believe, was one
of the best-quality sustained exchanges ever seen on the list. There were differences of opinion, to be sure, but there
was a general convergence of opinion about what constituted “good” programming style.
This paper is a synthesis of the original discussion, with my “humble but correct” opinions added. I first presented the
paper in 1990, and nearly 30 times since then. It’s constantly changing, in part due to input from readers and in part due
to my experience and occasional change of heart about a topic.
Aside from being a user group presentation cottage industry, it is also the basis for my next book. It will flesh out some
of the examples that were necessarily glossed over in the paper. My intent is to present it as a “best practices” book for
people new to SAS. The realistic completion date is sometime in 2004.
Abstract
Dictionary Tables and Views: Essential Tools for Serious Applications
Co-authored with Jeff Abolafia
View PDF (245k)
Presentation History
Return to Paper Index
It’s hard to think of a Base SAS feature that has a wider range of potential uses than dictionary tables and views. They are
automatically created when a SAS session begins and are continually updated as options are set, macro variables are defined,
datasets are created, and so on. This wealth of information is not that hard to access, but is also not very thoroughly
documented in SAS-supplied documentation. This paper, a near-total rewrite of an earlier paper I wrote with Nancy Michal,
discusses the tables, describes their structures, outlines some of the “gotchas” and subtleties of their usage, and presents
numerous practical examples based on “real world” applications. We also emphasize the importance of understanding SQL, the
most effective tool to handle the tables. In particular, we present many examples that employ SQL’s macro language interface.
The “Essential Tools” subtitle of the paper is not hyperbole. Some of the information in the tables is simply not accessible
by any other means. Any one who wants to write robust, serious utilities or generalized programs needs to have a firm grasp
of the tables’ contents. The next paper fills a small, well-defined need using the tables.
Abstract
Variable Cross-Referencing Macros – Tools for When Base SAS Isn’t Enough
View PDF (423k)
Presentation History
Return to Paper Index
This neat little utility makes extensive use of the dictionary tables discussed just above. The need arose during a project
where data for multiple studies was coming from different sources, and thus had different names and/or attributes for similar
variables (e.g., SEX versus GENDER, 1 / 2 coding versus ‘M’ / ‘F’). There are lots of tools in SAS to describe individual data
sets (the CONTENTS procedure, the COLUMNS and TABLES dictionary tables, to name just two). Out-of-the-box solutions dwindle
rapidly, however, when you want to easily compare study x’s PATIENT data set attributes with PATIENT in study ‘y’. It becomes
time to write your own utility.
The macro processes data from the COLUMNS dictionary table and produces a clean, readable display of the data set comparisons.
The user can control the output, limiting it to only those variables with different attributes in every data set, only similar
attributes, only those that are in each of the data sets specified, and so on. Annotated source code is provided.
Simplicity Through Obscurity: Some Tips To Simplify Your Programming Life
View PDF (182k)
Presentation History
Return to Paper Index
I couldn't say this article was planned for years and crafted over time. During a Q & A session at the DC SAS Users
Group meeting in September 2004 someone asked a question about being able to programmatically identify the name of
the currently executing program. In a class I taught the previous week I madly scribbled a debugging technique on
a white board and thought "not bad. I should write that down some day." The two events were loosely coupled, to say
the least, but they had enough in common to commit to this article. The first paragraph follows:
We are always looking for ways to simplify programs. Macro utility libraries, formats, templates, and the like
are all good ways to accomplish this. Another approach to code reduction is harvesting the randomly acquired
and randomly filed syntax minutia that most of us acquire over the years. With this in mind, this article
addresses several typical and recurring needs: being able to identify the name of the currently executing program,
and having a way to easily turn groups of statements on and off. The solutions utilize arcane items such as the
EXTFILES dictionary table, the reserved FILEREF named #LN00006, and the RUN statement's CANCEL option. It also
reminds us that the macro language can insert even the smallest piece of code to SAS for execution, and can do
so within a program statement.