I recently competed in a http://hackerrank.com competition. The task was to classify text with multi-labels. Therefore, I started with a basic bag of words approach, which performed quite good. After analyzing the data a bit, I realized that some keywords came up in slightly different representation - which for bag of words is a bit unfavorable. E.g. the keyword years of experience consist of 3 words which aren't handled with a bag of words approach because BOW don't know that these 3 words belong together. You can use ngrams to compensate that a bit, but my experience showed, that this fails most of the times. Also, this example can also appear without the word of or it can appear with other words in between. Often the words have small typos in them or other signes like - or 's, which would e.g. affect the matching of regular expressions.