« Setting variable value based on value of another variable | Main | Cumulative Totals »

String Matching

At work we do surveys where respondants provide comments and feedback.  We transfer these handwritten comments to electronic format, word for word, spelling error for spelling error.  We usually just code these comments and do analysis on how we coded them.

I was given a request to pull out all comments with specific words. My solution was as follows:

data commentsdata;set surveydata;

    if _n_=1 then do;
    retain re;
    re = prxparse(’/words|to|match/i’); /* the /i here means Case insensitive match */
    if missing(re) then do;
    putlog ‘ERROR: regex is malformed’;
    stop;
    end;
    end;
    if prxmatch(re,Comments); /* Comments is the name of the variable with the comments text */
    run;

 (Note that in the data I was using, each observation has a comment. Some comments are blank.)

I came across some SAS code yesterday which is similar code, but lists the Regular Expressions nicely. I’ve replicated the post here:

    It looks as if you want the following form:

    ^      start of field
    s*    (maybe with whitespace at the front)
    [A-Z]  a letter from A to Z
    d     a digit
    [A-Z]  A to Z again
    a space
    d     a digit
    [A-Z]  A to Z again
    d     a digit
    s*    possible whitespace
    $      end of field

    So try the following code:

    data new;
    set YourData;
    if _n_=1 then do;
    re = prxparse(’/^s*[A-Z]d[A-Z] d[A-Z]ds*$/’);
    if missing (re) then stop;
    end;

    if prxmatch(re,YourNameField);
    run;

Posted on Monday, March 31, 2008 by Registered CommenterJared in | CommentsPost a Comment

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>