public class VSMCosineDocumentSimilarity extends Object
Modifier and Type | Field and Description |
---|---|
static String |
CONTENT |
static org.apache.lucene.document.FieldType |
TYPE_STORED |
Constructor and Description |
---|
VSMCosineDocumentSimilarity(String s1,
String s2) |
VSMCosineDocumentSimilarity(String s1,
String s2,
org.dllearner.algorithms.isle.VSMCosineDocumentSimilarity.TermWeighting termWeighting) |
Modifier and Type | Method and Description |
---|---|
static double |
getCosineSimilarity(String doc1,
String doc2)
Returns the cosine document similarity between document
doc1 and doc2 using TF-IDF as weighting for each term. |
static double |
getCosineSimilarity(String doc1,
String doc2,
org.dllearner.algorithms.isle.VSMCosineDocumentSimilarity.TermWeighting termWeighting)
Returns the cosine document similarity between document
doc1 and doc2 based on termWeighting to compute the weight
for each term in the documents. |
static void |
main(String[] args) |
public static final String CONTENT
public static final org.apache.lucene.document.FieldType TYPE_STORED
public VSMCosineDocumentSimilarity(String s1, String s2, org.dllearner.algorithms.isle.VSMCosineDocumentSimilarity.TermWeighting termWeighting) throws IOException
IOException
public VSMCosineDocumentSimilarity(String s1, String s2) throws IOException
IOException
public static double getCosineSimilarity(String doc1, String doc2) throws IOException
doc1
and doc2
using TF-IDF as weighting for each term.
The resulting similarity ranges from -1 meaning exactly opposite, to 1 meaning exactly the same,
with 0 usually indicating independence, and in-between values indicating intermediate similarity or dissimilarity.doc1
- doc2
- IOException
public static double getCosineSimilarity(String doc1, String doc2, org.dllearner.algorithms.isle.VSMCosineDocumentSimilarity.TermWeighting termWeighting) throws IOException
doc1
and doc2
based on termWeighting
to compute the weight
for each term in the documents.
The resulting similarity ranges from -1 meaning exactly opposite, to 1 meaning exactly the same,
with 0 usually indicating independence, and in-between values indicating intermediate similarity or dissimilarity.doc1
- doc2
- IOException
DL-Learner is licenced under the terms of the GNU General Public License.
Copyright © 2007-2019 Jens Lehmann