tf idf - Can tfidf be weighed to improve classification of sparse data in a corpus? -

- June 15, 2013

i using tfidf prior performing classification on number of websites based on content. unfortunately, training data not uniform: 70% of pre-labeled websites news sites, while rest (tech, arts, entertainment, etc.) each vast minority.

my questions following:

is possible adjust tfidf weighs different labels differently , make behave if data uniform? should perhaps using different approach in case? using gaussian naive bayes classifier after tfidf analysis, else better suited in specific case?
is possible have tfidf give me list of possible labels when probability given label below threshold? example, if vector entries close enough (< 1-2%) more probable 1 class rather another, can print both?

Search This Blog

Deter

tf idf - Can tfidf be weighed to improve classification of sparse data in a corpus? -

Comments

Post a Comment

Popular posts from this blog

java - Unable to make sub reports with Jasper -

Save and close a word document by giving a name in R -

scala - play framework: Modules were resolved with conflicting cross-version suffixes -