So this is not a blog for introduction to naive bayes but implementation way for spam message classification. I have also created a YouTube video for this topic which is available on the link below.
And the resources are available on the link below.
Steps to perform text classification are:-
- Lower Casing text
- Remove Punctuation
- Perform Bag of words
- Frequency of words
- Find Bag of Words
- Probability of word on class p(w/c)
- Probability of class given word p(c/w)
In order to perform our classifier we have to preprocess our input data. For text processing, our data will be on text format so we will convert that into vector form. In general we will find a data frame where index will be the example and columns will be all the unique words from our training set. Then each cell will be probability of word on class. Then for the part of prediction, we will find p(c/w) using simple bayes formula:-
p(c/w) = p(w/c) * p(c) / p(w)
Please follow through the video for more information about the topic.
Thank you for reading the post and feel free to share it. :)