Is there any way to have more than 2 indexes?

Apr 26, 2010 at 4:31 PM

I am currently in need of a working naive bayes classifier, but since i'll be using it to categorize text into multiple categories i'd like to know if there's a way to call the analyzer with multiple index's and if it is possible to receive the probabilities of the index's instead of just knowing the best match.

 

Many thnks,

Coordinator
Apr 28, 2010 at 8:54 PM

Hi @thoqbk, there is currently no built in mechanism for more than one index. I actually have a pending TODO ("Support for NonBinary Classification") to add this very feature. In the meantime, it's not in the latest official release, but the latest version in source (45380) has an added public method that will give you the raw probability value.

In the meantime, you can read a great white paper, which is what I plan to use for my implementation, on how to support more than one index:
http://www.fogcreek.com/FogBugz/Downloads/KamensPaper.pdf

I'd love to hear some of your feedback on how this feature could work best :-)

Apr 29, 2010 at 9:34 PM
I actually had the idea to use your binary classifier in a similar manner like the tournament suggested in that paper, the problem is that i dont want just the best match but also the 2nd best. What i'm looking for is something that given, let's say 5 categories and a decent amount of texts to train them, is able to return the probability of a input text belonging to any of them, so that i can select the best match, second best match, third best match...
Aug 28, 2012 at 3:53 AM

The ladder style tournament approach in Kamen's paper is interesting.  Has anyone implemented it with nBayes?

Basically, you'd compare Cat1 and Cat2, the winner of that would face Cat3 and the winner of that would face Cat4 - correct?

I'm curious what happens if one of the tournaments results in an "Undetermined" result.  Would you have to pick a winner no matter how close?

Also, would another approach be to have every category face every other category (a Cartesian product sort of thing) and just pick whoever "wins" the most (Perhaps @Thoqbk could pick the top two winners)..

I'd be happy to help out with the project and clean up some of the code..  It looks like this one hasn't been touched in a few years.

Mike

Coordinator
Aug 28, 2012 at 2:13 PM
Edited Aug 28, 2012 at 2:13 PM

Hi, thanks for your interest ... no, I don't know if anyone has implemented that with this library. I believe if you get an undetermined result, I would consider that category a  "loser" and move on to the next one. If all of them are undetermined, then I would consider the whole thing undetermined, but if only one of N come back with a result, then that one would win by default.  

Or I don't know, would love to hear if that makes sense, or if you have other ideas :-) I have moved the project hosting to GitHub - https://github.com/joelmartinez/nBayes

It would be great if you fork it, experiment with some algorithms, and send me a pull request there ... I would love to test it, and merge it in if it works! :-)

Thanks,
-Joel