At work we do surveys where respondants provide comments and feedback. We transfer these handwritten comments to electronic format, word for word, spelling error for spelling error. We usually just code these comments and do analysis on how we coded them.
I was given a request to pull out all comments with specific words. My solution was as follows:
data commentsdata;set surveydata;
if _n_=1 then do;
retain re;
re = prxparse(’/words|to|match/i’); /* the /i here means Case insensitive match */
if missing(re) then do;
putlog ‘ERROR: regex is malformed’;
stop;
end;
end;
if prxmatch(re,Comments); /* Comments is the name of the variable with the comments text */
run;
(Note that in the data I was using, each observation has a comment. Some comments are blank.)
I came across some SAS code yesterday which is similar code, but lists the Regular Expressions nicely. I’ve replicated the post here:
It looks as if you want the following form:
^ start of field
s* (maybe with whitespace at the front)
[A-Z] a letter from A to Z
d a digit
[A-Z] A to Z again
a space
d a digit
[A-Z] A to Z again
d a digit
s* possible whitespace
$ end of field
So try the following code:
data new;
set YourData;
if _n_=1 then do;
re = prxparse(’/^s*[A-Z]d[A-Z] d[A-Z]ds*$/’);
if missing (re) then stop;
end;
if prxmatch(re,YourNameField);
run;