A fellow on the SAS Google Group posted a great chunk of code for me that will do a frequency count of words for a dataset with observations of textual data.
Here is the code below:
/* test data set */
data comments;
length obs 8 comment $1000.;
length p 8 c $1.;
drop p c;
input obs @@;
/* skip the blanks */
do while (c='’);
input c $char1. @@; /* guess what is wrong with $1.? */
end;
/* read one char at a time */
p = 1;
substr(comment,p,1) = c;
do until (c=’#');
p + 1;
input c $char1. @@;
substr(comment, p, 1) = c;
end;
substr(comment, p, 1) = ” “; /* get the # out */
/* release the input line */
input;
cards;
1 SAS defines analytics as data-driven insight for
better decisions. With SAS Analytics you get an integrated environment
for predictive analytics and descriptive modeling, data mining, text
mining, forecasting, optimization, simulation, experimental design and
more.#
2 Our analytic solutions provide a range of
techniques and processes for the collection, classification, analysis
and interpretation of data to reveal patterns, anomalies, key
variables and relationships, leading ultimately to new insights for
guided decision making.#
3 We offer a comprehensive suite of analytics
software#
4 SAS offers an integrated suite of analytics
software unmatched in the industry, and delivered to you in a single
environment.#
;
run;
/* parse each word into an obs */
data words;
length obs no 8 word $16.; /* will be truncated if longer */
keep obs no word;
set comments;
no = 0;
do while(1);
no + 1;
word = upcase(scan(comment, no, ” .,!?”));
if word=”" then leave;
output;
end;
run;
proc freq data=words;
tables word/ out=counts;
run;
data test;
set counts;
file print;
if word>=’A';
n + 1;
drop n;
if mod(N,3)=1 then put; /* changed to 3 to narrow */
put word $10. count 5. +3 @;
run;