Entries in Text Mining (7)
Review of SAS Course: Mining Textual Data Using SAS Text Miner for SAS9
You may recall awhile back I wrote about my disappointment in availability of SAS Text Miner course offerings and options in Canada. My prayer was anwered when SAS was able to fly Terry Woodfield, a Text Mining expert, to train myself and a few others in SAS Enterprise Miner and Text Miner. This 3 day course was invaluable.
Good courses leave one feeling familiar with the material. Great courses leave one feeling comfortable with the material. The best courses will leave the student feeling comfortable and empowered with the material. This SAS course did just that.
I remember the first time I opened Enterprise Miner. It was overwhelming to see the amount of options available. My thoughts were "What did I get myself into??". But after taking the course I'm suddenly like an 8 year old boy on Christmas Eve that can't wait to open his presents - I can't wait for Monday to get into the office and fire up Enterprise Miner!
I'd attribute the empowerment to the meaningful class discussions, the Teacher sharing real-life examples, and the in-class demonstrations which:
- Helped provide a deeper understanding of Text-Mining concepts, how to expore data, and what to expect when exploring my own data
- Fired my imagination for new ways of using my organizations data
- Made me realizing we have sources of information that were not previously considered as analyzable information.
Terry is an excellent teacher and if you ever have the chance to take a course he is teaching - do it. I don't think you'll regret it because the course notes, although wonderfully detailed and easy to understand, does not replace a good Teacher like Terry to expedite your learning.
One other item I'd like to address is in regards to the Tech side of the course. Thank you SAS tech gurus! Being provided with the ability to remotely log into a workspace with all of the couse notes, data, and software is a tremondous help. It was very professionally done and worked wonderfully.
SAS courses are worth every penny.
SAS Text Mining Training in Canada
I have a beef with SAS. You all know I love SAS, but I've been looking into Text Mining training and it irks me a little. Let me explain...
My work recently invested in SAS' Text Mining product. It is an additional component to Enterprise Miner (i.e. you need Enterprise Miner to run Text Miner). Let me tell you, it ain't cheap - but for good reason - it is very powerful and sophisticated software. Since we've been doing Text-Mining-ish analysis for many years now, we can really use this software.
So we bought it, installed it, and now I need to learn how to use it! Here are my options for learning SAS Text-Miner:
- Buy some books about Enterprise Miner and Text Mining. The one's I picked up include:
a) Introduction to Data Mining Using SAS Enterprise Miner
b) Mining Textual Data Using SAS Text Miner for SAS 9 Course Notes
c) Getting Started with SAS 9.1 Text Miner
While I have the aptitude to sit down with a book and learn the software, it is always valuable to see it hands on from the experts. - E-Training. The options are:
a) Enterprise Miner e-training
b) Enterprise Miner live-training
There is no Text Mining online training. If there was, I'd be signing up for it. - Take a course. There is a 2-day classroom training for Mining Textual Data Using SAS Text Miner for SAS 9. However, we can't justify flying out-of-country. The course is offered in Canada, but only in Toronto, ON and Vancouver, BC. Sady, flying out-of-province is somewhat restricted, so I probably can't go to these. It would be much nicer if the course was also offered in Edmonton or Calgary.
- We did talk to the SAS rep in Edmonton and they did look at different options for training. I think they even looked into flying someone out from SAS, but because there wasn't enough interest from other Text-Miner clients, I guess it's not a worthwhile option. That's understandable.
So, I feel a little ripped off because we payed for this expensive software and then the company sort of leaves you hanging to figure things out on your own.
And don't get me started about how painfully annoying the Text Mining installation was. There needs to be more clear instructions for how to install it. The instructions we did get seemed like an afterthought. But we did eventually figure it out.
While I'd like to see more options in Calgary/Edmonton for classroom training, I understand that it depends on the number of clients that could/would attend. But there are 2 ways I think SAS can improve their Text-Mining offering. First, have clear and extremely detailed instructions on installing Enterprise/Text Miner. Second, offer Live Web Training and/or E-training.
*Update: Although I posted this a couple hours ago, I'd like to say one more thing. SAS tends to be a very progressive company. So I am sure that someday soon they will have videoconferencing capabilities in their classrooms. I see no reason why a public course being tought in California should be limited to residents of California or nearby. Anyone should be able to pay a fee and tap in to the class via the internet.
New Text-Mining Blog at SAS
There is a new Text-Mining Blog at SAS that I am looking forward to watching. Although I have yet to install it, my work has just purchased the SAS Miner product with Text-miner add-on. It will be fun, challenging and rewarding to learn and take advantage of this software.
Knowing the high quality of other SAS blogs, I am certain this new one will come in handy!
Text-mining and Context
Related to my last post about Text-mining and shorthand words, it will be fun to see how text-mining software handles context. Imagine a restaurant customer feedback form where the customer wrote:
"The service and food were horrible. JK. You guys rocked, we had a great time."
So here we have some sarcasm. The JK, which is short for "just kidding" basically nullifies the first sentence. So there are some sentences that change the meaning of preceeding or following sentences.
From what I know about text-mining software, the user has to
train it according to the types of comments that will be fed
into it. Basically the user creates an algorithm (a black box so to speak) that will process the comments correctly. I don't have this issue with my comment analysis at work because the black box is my interpretation of the comment.
Here is another one to ponder. What about slang? For example:
"That waitress is sic, yo".
Was the waitress actually sick or is sic slang for good?
Does shorthand and symbols cause problems with text-mining?
I recently had to convert some customer open-ended feedback from Excel to SAS. The comments were transcribed from hand written comment cards to an Excel spreadsheet. After converting to the SAS dataset, about 10% of them displayed a character error of sorts. The comment looked like this:
"Your product is the greatest. We will tell all our friends. ?□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□□ etc....."
Because there were so few of these, I didn't notice it right away until I applied some analysis to the dataset and my Log was throwing a error messages.
I tracked the problem down to Excels good 'ol Autocorrect. When the transcriber was typing up customer comments, they included the smiley face :) which was automatically converted to the smiley face symbol. SAS can't read the symbol so replaces it with a question mark and a heck of a lot of squares.
It was an easy enough fix but it got me wondering how automated Text-mining software handles shorthand and symbols. Especially in a world of cell phone texting and instant messaging people have developed all sorts of shorthand such as LOL, ROTFLOL, OMG, "ur" stands for "your" and "2moro" means "tomorrow". And what about when people use the smiley face or the tongue smiley :P ? There are many more shorthand words.
Once I get my hands on true text-mining software I'll have to test how to analyze shorthand.

