« Recommendations for Learning SAS | Main | Identify commonly named variables - Multiple solutions »

Reusable Data Merge Macro

I had the need (but mostly curiosity to see if I could do it) to have a reusable macro to allow me to pass in 2 SAS datasets, merge them, and return the resulting dataset.

Here are my 2 tables.

Table 1

reusable_data_merge_table1.jpg 

Table 2

reusable_data_merge_table2.jpg

This is a 1-to-many relationship. The goal is to get ‘name’ from table2 into table1 where the id’s equal so that when we do reporting, we can use the name field instead of id. The name field is more readable by humans.

There are a number of options we could try.

  1. We could create a new variable in table1 called name and manually create some if-statements to set the value of name where id = id.
  2. We could create a format with proc format to apply it to the table2 id column when reporting.The problem with these first 2 is that we aren’t even using the data from table2 anymore. That is not very efficient. If table2 is updated, then your hardcoded data needs updating as well. We can avoid this with the next two options:
  3. We could use proc sql and do a join - this would work quite nicely.
  4. We could do a data merge


For this exercise, I decided to use data merge because I read that a SAS data merge is more efficient than a proc sql.

One thing to note is the number of observations (rows) in table2. If we only ever have a limited handful of observations, then a proc format is probably best. In my example I created, I only have 3 entries. But in my dataset at the office, I have about 500. In this case, a programmatic option is best.

So the first step is to create the table1 and table2 datasets:

data table1;
INPUT id sex $ age;
DATALINES;
1 F 35
2 M 50
3 F 45
3 M 24
3 F 52
1 M 44
2 F 34
1 M 40
3 F 47
2 M 35
;
run;

data table2;
INPUT id name $;
DATALINES;
1 A
2 B
3 C
;
run;

We then create a macro and pass in 3 parameters:

%MACRO domerge(ds1=, ds2=, resultdataset=);

proc sort data=&ds1;
by id;
run;

proc sort data=&ds2;
by id;
run;

data &resultdataset;
merge &ds1 &ds2;
by id;
run;

%MEND domerge;

For work, I put my libnames, macros, and references to data in a separate file. Then in my SAS programs I use a macro include statement to suck in that file.

My file to be included looks like this:

 libname a “e:\test”;

%let t1 = table1;
%let t2 = table2;

%MACRO domerge(ds1=, ds2=, resultdataset=);

proc sort data=&ds1;
by id;
run;

proc sort data=&ds2;
by id;
run;

data &resultdataset;
merge &ds1 &ds2;
by id;
run;

%MEND domerge;

My SAS program I run will look like this:

%INCLUDE “e:\test\include.sas”;

%let d1 = a.&t1;
%let d2 = a.&t2;

%domerge(ds1=&d1, ds2=&d2, resultdataset=d);

This isn’t the most readable way of programming SAS. But I find that using the include helps me easily keep track of all of my libnames and datasets I have available.

To reiterate, my reasons for doing this are that I don’t want to hardcode data. I also don’t want to replicate data across a number of different datasets, so I do the merge but then work with temporary data.

Table2 is more ‘Metadata’. It is data about Table1. For that reason, the metadata should not be hardcoded or permanently added to table1.

That said, with millions of records, it actually is probably best to write the ‘name’ column back to the permanent dataset. Even though data is repeated/replicated, a single table for doing analysis is more efficient. At least in the database world it is.

After all this code is entered, I get a resulting temporary dataset to use for the rest of my session:

reusable_data_merge_.jpg 

My raw data files remain untouched and no data is replicated permanently.

 

Posted on Monday, March 31, 2008 by Registered CommenterJared in | CommentsPost a Comment

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>