machine learning - How to do text classification with label probabilities? -
i'm trying solve text classification problem academic purpose. need classify tweets labels "cloud" ,"cold", "dry", "hot", "humid", "hurricane", "ice", "rain", "snow", "storms", "wind" , "other". each tweet in training data has probabilities against label. message "can tell it's going tough scoring day. it's windy right yesterday afternoon." has 21% chance being hot , 79% chance wind. have worked on classification problems predicts whether wind or hot or others. in problem, each training data has probabilities against labels. have used mahout naive bayes classifier take specific label given text build model. how convert these input probabilities various labels input classifier?
in probabilistic setting, these probabilities reflect uncertainty class label of training instance. affects parameter learning in classifier.
there's natural way incorporate this: in naive bayes, instance, when estimating parameters in models, instead of each word getting count of 1 class document belongs, gets count of probability. documents high probability of belonging class contribute more class's parameters. situation equivalent when learning mixture of multinomials model using em, probabilities have identical membership/indicator variables instances.
alternatively, if classifier neural net softmax output, instead of target output being vector single [1] , lots of zeros, target output becomes probability vector you're supplied with.
i don't, unfortunately, know of standard implementations allow incorporate these ideas.
Comments
Post a Comment