## Class Heuristics

• ```public class Heuristics
extends Object```
Implementation of various heuristics. The methods can be used in learning problems and various evaluation scripts. They are verified in unit tests and, thus, should be fairly stable.
Author:
Jens Lehmann
• ### Nested Class Summary

Nested Classes
Modifier and Type Class and Description
`static class ` `Heuristics.HeuristicType`
• ### Constructor Summary

Constructors
Constructor and Description
`Heuristics()`
• ### Method Summary

All Methods
Modifier and Type Method and Description
`static double` ```divideOrZero(int numerator, int denominator)```
`static double` ```getAScore(double recall, double precision)```
Computes arithmetic mean of precision and recall, which is called "A-Score" here (A=arithmetic), but is not an established notion in machine learning.
`static double` ```getAScore(double recall, double precision, double beta)```
Computes arithmetic mean of precision and recall, which is called "A-Score" here (A=arithmetic), but is not an established notion in machine learning.
`static double[]` ```getAScoreApproximationStep1(double beta, int nrOfPosExamples, int nrOfInstanceChecks, int nrOfSuccessfulInstanceChecks)```
In the first step of the AScore approximation, we estimate recall (taking the factor beta into account).
`static double[]` ```getAScoreApproximationStep2(int nrOfPosClassifiedPositives, double[] recallInterval, double beta, int nrOfRelevantInstances, int nrOfInstanceChecks, int nrOfSuccessfulInstanceChecks)```
In step 2 of the A-Score approximation, the precision and overall A-Score is estimated based on the estimated recall.
`static double[]` ```getConfidenceInterval95Wald(int total, int success)```
Computes the 95% confidence interval of an experiment with boolean outcomes, e.g.
`static double` ```getConfidenceInterval95WaldAverage(int total, int success)```
Computes the 95% confidence interval average of an experiment with boolean outcomes, e.g.
`static double` ```getFScore(double recall, double precision)```
Computes F1-Score.
`static double` ```getFScore(double recall, double precision, double beta)```
Computes F-beta-Score.
`static double[]` ```getFScoreApproximation(int nrOfPosClassifiedPositives, double recall, double beta, int nrOfRelevantInstances, int nrOfInstanceChecks, int nrOfSuccessfulInstanceChecks)```
This method can be used to approximate F-Measure and thereby saving a lot of instance checks.
`static double` ```getFScoreBalanced(double recall, double precision, double beta)```
`static double` ```getJaccardCoefficient(int elementsIntersection, int elementsUnion)```
Computes the Jaccard coefficient of two sets.
`static double` ```getMatthewsCorrelationCoefficient(int tp, int fp, int tn, int fn)```
`static double[]` ```getPredAccApproximation(int nrOfPositiveExamples, int nrOfNegativeExamples, double beta, int nrOfPosExampleInstanceChecks, int nrOfSuccessfulPosExampleChecks, int nrOfNegExampleInstanceChecks, int nrOfNegativeNegExampleChecks)```
`static double` ```getPredictiveAccuracy(int nrOfExamples, int nrOfPosClassifiedPositives, int nrOfNegClassifiedNegatives)```
`static double` ```getPredictiveAccuracy(int nrOfPosExamples, int nrOfNegExamples, int nrOfPosClassifiedPositives, int nrOfNegClassifiedNegatives, double beta)```
`static double` ```getPredictiveAccuracy2(int nrOfExamples, int nrOfPosClassifiedPositives, int nrOfPosClassifiedNegatives)```
`static double` ```getPredictiveAccuracy2(int nrOfPosExamples, int nrOfNegExamples, int nrOfPosClassifiedPositives, int nrOfNegClassifiedNegatives, double beta)```
`boolean` ```isTooWeak(int nrOfPositiveExamples, int nrOfPosClassifiedPositives, double noise)```
Computes whether a hypothesis is too weak, i.e.
`boolean` ```isTooWeak2(int nrOfPositiveExamples, int nrOfNegClassifiedPositives, double noise)```
Computes whether a hypothesis is too weak, i.e.
`static double` ```p1(int success, int total)```
`static double` ```p3(double p1, int total)```
• ### Constructor Detail

• #### Heuristics

`public Heuristics()`
• ### Method Detail

• #### getFScore

```public static double getFScore(double recall,
double precision)```
Computes F1-Score.
Parameters:
`recall` - Recall.
`precision` - Precision.
Returns:
Harmonic mean of precision and recall.
• #### getFScore

```public static double getFScore(double recall,
double precision,
double beta)```
Computes F-beta-Score.
Parameters:
`recall` - Recall.
`precision` - Precision.
`beta` - Weights precision and recall. If beta is >1, then recall is more important than precision.
Returns:
Harmonic mean of precision and recall weighted by beta.
• #### getFScoreBalanced

```public static double getFScoreBalanced(double recall,
double precision,
double beta)```
• #### getAScore

```public static double getAScore(double recall,
double precision)```
Computes arithmetic mean of precision and recall, which is called "A-Score" here (A=arithmetic), but is not an established notion in machine learning.
Parameters:
`recall` - Recall.
`precision` - Precison.
Returns:
Arithmetic mean of precision and recall.
• #### getAScore

```public static double getAScore(double recall,
double precision,
double beta)```
Computes arithmetic mean of precision and recall, which is called "A-Score" here (A=arithmetic), but is not an established notion in machine learning.
Parameters:
`recall` - Recall.
`precision` - Precison.
`beta` - Weights precision and recall. If beta is >1, then recall is more important than precision.
Returns:
Arithmetic mean of precision and recall.
• #### getJaccardCoefficient

```public static double getJaccardCoefficient(int elementsIntersection,
int elementsUnion)```
Computes the Jaccard coefficient of two sets.
Parameters:
`elementsIntersection` - Number of elements in the intersection of the two sets.
`elementsUnion` - Number of elements in the union of the two sets.
Returns:
#intersection divided by #union.
• #### getPredictiveAccuracy

```public static double getPredictiveAccuracy(int nrOfExamples,
int nrOfPosClassifiedPositives,
int nrOfNegClassifiedNegatives)```
• #### getPredictiveAccuracy

```public static double getPredictiveAccuracy(int nrOfPosExamples,
int nrOfNegExamples,
int nrOfPosClassifiedPositives,
int nrOfNegClassifiedNegatives,
double beta)```
• #### getPredictiveAccuracy2

```public static double getPredictiveAccuracy2(int nrOfExamples,
int nrOfPosClassifiedPositives,
int nrOfPosClassifiedNegatives)```
• #### getPredictiveAccuracy2

```public static double getPredictiveAccuracy2(int nrOfPosExamples,
int nrOfNegExamples,
int nrOfPosClassifiedPositives,
int nrOfNegClassifiedNegatives,
double beta)```
• #### getMatthewsCorrelationCoefficient

```public static double getMatthewsCorrelationCoefficient(int tp,
int fp,
int tn,
int fn)```
• #### getConfidenceInterval95Wald

```public static double[] getConfidenceInterval95Wald(int total,
int success)```
Computes the 95% confidence interval of an experiment with boolean outcomes, e.g. heads or tails coin throws. It uses the very efficient, but still accurate Wald method.
Parameters:
`success` - Number of successes, e.g. number of times the coin shows head.
`total` - Total number of tries, e.g. total number of times the coin was thrown.
Returns:
A two element double array, where element 0 is the lower border and element 1 the upper border of the 95% confidence interval.
• #### getConfidenceInterval95WaldAverage

```public static double getConfidenceInterval95WaldAverage(int total,
int success)```
Computes the 95% confidence interval average of an experiment with boolean outcomes, e.g. heads or tails coin throws. It uses the very efficient, but still accurate Wald method.
Parameters:
`success` - Number of successes, e.g. number of times the coin shows head.
`total` - Total number of tries, e.g. total number of times the coin was thrown.
Returns:
The average of the lower border and upper border of the 95% confidence interval.
• #### isTooWeak

```public boolean isTooWeak(int nrOfPositiveExamples,
int nrOfPosClassifiedPositives,
double noise)```
Computes whether a hypothesis is too weak, i.e. it has more errors on the positive examples than allowed by the noise parameter.
Parameters:
`nrOfPositiveExamples` - The number of positive examples in the learning problem.
`nrOfPosClassifiedPositives` - The number of positive examples, which were indeed classified as positive by the hypothesis.
`noise` - The noise parameter is a value between 0 and 1, which indicates how noisy the example data is (0 = no noise, 1 = completely random). If a hypothesis contains more errors on the positive examples than the noise value multiplied by the number of all examples, then the hypothesis is too weak.
Returns:
True if the hypothesis is too weak and false otherwise.
• #### isTooWeak2

```public boolean isTooWeak2(int nrOfPositiveExamples,
int nrOfNegClassifiedPositives,
double noise)```
Computes whether a hypothesis is too weak, i.e. it has more errors on the positive examples than allowed by the noise parameter.
Parameters:
`nrOfPositiveExamples` - The number of positive examples in the learning problem.
`nrOfNegClassifiedPositives` - The number of positive examples, which were indeed classified as negative by the hypothesis.
`noise` - The noise parameter is a value between 0 and 1, which indicates how noisy the example data is (0 = no noise, 1 = completely random). If a hypothesis contains more errors on the positive examples than the noise value multiplied by the number of all examples, then the hypothesis is too weak.
Returns:
True if the hypothesis is too weak and false otherwise.
• #### p1

```public static double p1(int success,
int total)```
• #### p3

```public static double p3(double p1,
int total)```
• #### getFScoreApproximation

```public static double[] getFScoreApproximation(int nrOfPosClassifiedPositives,
double recall,
double beta,
int nrOfRelevantInstances,
int nrOfInstanceChecks,
int nrOfSuccessfulInstanceChecks)```
This method can be used to approximate F-Measure and thereby saving a lot of instance checks. It assumes that all positive examples (or instances of a class) have already been tested via instance checks, i.e. recall is already known and precision is approximated.
Parameters:
`nrOfPosClassifiedPositives` - Positive examples (instance of a class), which are classified as positives.
`recall` - The already known recall.
`beta` - Weights precision and recall. If beta is >1, then recall is more important than precision.
`nrOfRelevantInstances` - Number of relevant instances, i.e. number of instances, which would have been tested without approximations. TODO: relevant = pos + neg examples?
`nrOfInstanceChecks` - Performed instance checks for the approximation.
`nrOfSuccessfulInstanceChecks` - Number of successful performed instance checks.
Returns:
A two element array, where the first element is the computed F-beta score and the second element is the length of the 95% confidence interval around it.
• #### getAScoreApproximationStep1

```public static double[] getAScoreApproximationStep1(double beta,
int nrOfPosExamples,
int nrOfInstanceChecks,
int nrOfSuccessfulInstanceChecks)```
In the first step of the AScore approximation, we estimate recall (taking the factor beta into account). This is not much more than a wrapper around the modified Wald method.
Parameters:
`beta` - Weights precision and recall. If beta is >1, then recall is more important than precision.
`nrOfPosExamples` - Number of positive examples (or instances of the considered class).
`nrOfInstanceChecks` - Number of positive examples (or instances of the considered class) which have been checked.
`nrOfSuccessfulInstanceChecks` - Number of positive examples (or instances of the considered class), where the instance check returned true.
Returns:
A two element array, where the first element is the recall multiplied by beta and the second element is the length of the 95% confidence interval around it.
• #### getAScoreApproximationStep2

```public static double[] getAScoreApproximationStep2(int nrOfPosClassifiedPositives,
double[] recallInterval,
double beta,
int nrOfRelevantInstances,
int nrOfInstanceChecks,
int nrOfSuccessfulInstanceChecks)```
In step 2 of the A-Score approximation, the precision and overall A-Score is estimated based on the estimated recall.
Parameters:
`nrOfPosClassifiedPositives` - Positive examples (instance of a class), which are classified as positives.
`recallInterval` - The estimated recall, which needs to be given as a two element array with the first element being the mean value and the second element being the length of the interval (to be compatible with the step1 method).
`beta` - Weights precision and recall. If beta is >1, then recall is more important than precision.
`nrOfRelevantInstances` - Number of relevant instances, i.e. number of instances, which would have been tested without approximations.
`nrOfInstanceChecks` - Performed instance checks for the approximation.
`nrOfSuccessfulInstanceChecks` - Number of performed instance checks, which returned true.
Returns:
A two element array, where the first element is the estimated A-Score and the second element is the length of the 95% confidence interval around it.
• #### getPredAccApproximation

```public static double[] getPredAccApproximation(int nrOfPositiveExamples,
int nrOfNegativeExamples,
double beta,
int nrOfPosExampleInstanceChecks,
int nrOfSuccessfulPosExampleChecks,
int nrOfNegExampleInstanceChecks,
int nrOfNegativeNegExampleChecks)```
• #### divideOrZero

```public static double divideOrZero(int numerator,
int denominator)```