Class DefaultStatistics
- All Implemented Interfaces:
ComputationDirectory
,Statistics
public class DefaultStatistics extends Object implements Statistics
-
Method Summary
Modifier and Type Method Description DRes<SFixed>
chiSquare(List<DRes<SInt>> observed, double[] expected)
Compute the test statistics for a Χ2-test.DRes<SFixed>
chiSquare(List<DRes<SInt>> observed, List<DRes<SFixed>> expected)
Compute the test statistics for a Χ2-test.DRes<SFixed>
correlation(List<DRes<SFixed>> data1, DRes<SFixed> mean1, List<DRes<SFixed>> data2, DRes<SFixed> mean2)
Compute Pearson's correlation coefficient on the two samples.DRes<SFixed>
correlation(List<DRes<SFixed>> data1, List<DRes<SFixed>> data2)
Compute Pearson's correlation coefficient on the two samples.DRes<List<DRes<SFixed>>>
coxRegressionContinuous(List<SurvivalInfoContinuous> data, int iterations, double alpha, double[] beta)
Estimate the parameters of a Cox model on the given data.DRes<List<DRes<SFixed>>>
coxRegressionDiscrete(List<SurvivalInfoDiscrete> data, int iterations, double alpha, double[] beta)
Estimate the parameters of a Cox model on the given data.DRes<SFixed>
ffest(List<List<DRes<SFixed>>> observed)
Compute the F-test statistics for the null hypothesis that the given datasets have the same mean.DRes<List<Pair<BigInteger,Integer>>>
frequencyTable(List<DRes<SInt>> data)
Compute a frequency table for the data.DRes<List<DRes<SInt>>>
histogramContinuous(double[] buckets, List<DRes<SFixed>> data)
Compute the histogram for the given sample.DRes<List<DRes<SInt>>>
histogramContinuous(List<DRes<SFixed>> buckets, List<DRes<SFixed>> data)
Compute the histogram for the given sample.DRes<List<DRes<SInt>>>
histogramDiscrete(int[] buckets, List<DRes<SInt>> data)
Compute the histogram for the given sample.DRes<List<DRes<SInt>>>
histogramDiscrete(List<DRes<SInt>> buckets, List<DRes<SInt>> data)
Compute the histogram for the given sample.DRes<MultiDimensionalArray<List<DRes<SInt>>>>
kAnonymize(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)
Compute a k-anonymized version of the given datset.DRes<MultiDimensionalArray<List<BigInteger>>>
kAnonymizeAndOpen(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)
Compute a k-anonymized version of the given dataset and open it to all parties.DRes<SFixed>
kruskallWallisTest(List<List<DRes<SFixed>>> observed)
Compute the Kruskall-Wallis test statistics for the null hypothesis that the given samples are drawn from same the distribution.DRes<List<Pair<DRes<SInt>,Integer>>>
leakyFrequencyTable(List<DRes<SInt>> data)
Compute a frequency table for the data.DRes<LinearRegression.LinearRegressionResult>
linearRegression(List<ArrayList<DRes<SFixed>>> x, ArrayList<DRes<SFixed>> y)
Compute estimates for the parameters b of a linear model such that b0 x0 + ...DRes<MultiDimensionalArray<DRes<SInt>>>
multiDimensionalHistogramDiscrete(List<List<DRes<SInt>>> buckets, Matrix<DRes<SInt>> data)
Compute the histogram for the given multi-dimensional sample.DRes<SFixed>
sampleMean(List<DRes<SFixed>> data)
Compute the sample mean of the given data.DRes<SFixed>
sampleMedian(List<DRes<SFixed>> data)
Compute the sample median of the sample set.DRes<List<DRes<SFixed>>>
samplePercentiles(List<DRes<SFixed>> data, double[] percentiles)
Compute the sample percentiles of a sample set.DRes<SFixed>
sampleStandardDeviation(List<DRes<SFixed>> data)
Compute the standard deviation of the data.DRes<SFixed>
sampleStandardDeviation(List<DRes<SFixed>> data, DRes<SFixed> mean)
Compute the sample standard deviation of the data given that the sample mean has already been calculated.DRes<SFixed>
sampleVariance(List<DRes<SFixed>> data)
Compute the sample variance of the given data.DRes<SFixed>
sampleVariance(List<DRes<SFixed>> data, DRes<SFixed> mean)
Compute the sample variance of the given data, assuming the sample mean has already been calculated.DRes<SimpleLinearRegression.SimpleLinearRegressionResult>
simpleLinearRegression(List<DRes<SFixed>> x, List<DRes<SFixed>> y)
Compute simple linear regression on two samples.DRes<SFixed>
ttest(List<DRes<SFixed>> data, DRes<SFixed> mu)
Compute the test statistics for a Student's t-test for the hypothesis that the mean of the sample is equal tomu
.DRes<SFixed>
ttest(List<DRes<SFixed>> data1, List<DRes<SFixed>> data2)
Compute the test statistics for a two-sample Student's t-test for the hypothesis that the mean of the two samples are equal.DRes<Matrix<DRes<SInt>>>
twoDimensionalHistogramContinuous(Pair<List<DRes<SFixed>>,List<DRes<SFixed>>> buckets, List<Pair<DRes<SFixed>,DRes<SFixed>>> data)
Compute the histogram for the given two-dimensional sample.DRes<Matrix<DRes<SInt>>>
twoDimensionalHistogramDiscrete(Pair<List<DRes<SInt>>,List<DRes<SInt>>> buckets, List<Pair<DRes<SInt>,DRes<SInt>>> data)
Compute the histogram for the given two-dimensional sample.
-
Method Details
-
sampleMean
Description copied from interface:Statistics
Compute the sample mean of the given data.- Specified by:
sampleMean
in interfaceStatistics
- Parameters:
data
- A dataset.- Returns:
- The sample mean.
-
sampleMedian
Description copied from interface:Statistics
Compute the sample median of the sample set.- Specified by:
sampleMedian
in interfaceStatistics
- Parameters:
data
- Samples.- Returns:
- The median.
-
samplePercentiles
Description copied from interface:Statistics
Compute the sample percentiles of a sample set.- Specified by:
samplePercentiles
in interfaceStatistics
- Parameters:
data
- Samples.- Returns:
- The median.
-
sampleVariance
Description copied from interface:Statistics
Compute the sample variance of the given data, assuming the sample mean has already been calculated.- Specified by:
sampleVariance
in interfaceStatistics
- Parameters:
data
- A dataset.mean
- The sample mean for the given dataset.- Returns:
- The sample variance.
-
sampleVariance
Description copied from interface:Statistics
Compute the sample variance of the given data.- Specified by:
sampleVariance
in interfaceStatistics
- Parameters:
data
- A dataset.- Returns:
- The sample variance.
-
sampleStandardDeviation
Description copied from interface:Statistics
Compute the standard deviation of the data.- Specified by:
sampleStandardDeviation
in interfaceStatistics
- Parameters:
data
- A dataset.- Returns:
- The sample standard deviation.
-
sampleStandardDeviation
Description copied from interface:Statistics
Compute the sample standard deviation of the data given that the sample mean has already been calculated.- Specified by:
sampleStandardDeviation
in interfaceStatistics
- Parameters:
data
- A dataset.mean
- The sample mean for the given dataset.- Returns:
- The sample standard deviation.
-
ttest
Description copied from interface:Statistics
Compute the test statistics for a Student's t-test for the hypothesis that the mean of the sample is equal tomu
.- Specified by:
ttest
in interfaceStatistics
- Parameters:
data
- A dataset.mu
- The parameter for the t-test.- Returns:
- The test statistics.
-
ttest
Description copied from interface:Statistics
Compute the test statistics for a two-sample Student's t-test for the hypothesis that the mean of the two samples are equal. It is assumed that the two samples have the same variance.- Specified by:
ttest
in interfaceStatistics
- Parameters:
data1
- A dataset.data2
- A dataset.- Returns:
- The test statistics for the hypothesis that the two datasets have the same mean.
-
chiSquare
Description copied from interface:Statistics
Compute the test statistics for a Χ2-test.- Specified by:
chiSquare
in interfaceStatistics
- Parameters:
observed
- The observed data.expected
- The expected number of observations in each bucket.- Returns:
- The test statistics that the observed data fits the distribution of the expected.
-
chiSquare
Description copied from interface:Statistics
Compute the test statistics for a Χ2-test.- Specified by:
chiSquare
in interfaceStatistics
- Parameters:
observed
- The observed data.expected
- The expected number of observations in each bucket.- Returns:
- The test statistics that the observed data fits the distribution of the expected.
-
linearRegression
public DRes<LinearRegression.LinearRegressionResult> linearRegression(List<ArrayList<DRes<SFixed>>> x, ArrayList<DRes<SFixed>> y)Description copied from interface:Statistics
Compute estimates for the parameters b of a linear model such that b0 x0 + ... + bk xk = y.- Specified by:
linearRegression
in interfaceStatistics
- Parameters:
x
- The dataset.y
- The dependant values- Returns:
- An estimation for the parameters of a linear model for the given data.
-
simpleLinearRegression
public DRes<SimpleLinearRegression.SimpleLinearRegressionResult> simpleLinearRegression(List<DRes<SFixed>> x, List<DRes<SFixed>> y)Description copied from interface:Statistics
Compute simple linear regression on two samples.- Specified by:
simpleLinearRegression
in interfaceStatistics
- Parameters:
x
- The dataset.y
- The dependant values.- Returns:
- An estimation for the parameters of a linear model.
-
correlation
public DRes<SFixed> correlation(List<DRes<SFixed>> data1, DRes<SFixed> mean1, List<DRes<SFixed>> data2, DRes<SFixed> mean2)Description copied from interface:Statistics
Compute Pearson's correlation coefficient on the two samples. Here it's assumed that the sample means has already been calculated.- Specified by:
correlation
in interfaceStatistics
- Returns:
-
correlation
Description copied from interface:Statistics
Compute Pearson's correlation coefficient on the two samples.- Specified by:
correlation
in interfaceStatistics
- Returns:
-
ffest
Description copied from interface:Statistics
Compute the F-test statistics for the null hypothesis that the given datasets have the same mean.- Specified by:
ffest
in interfaceStatistics
- Parameters:
observed
- A list of datasets.- Returns:
- The test statistics.
-
kruskallWallisTest
Description copied from interface:Statistics
Compute the Kruskall-Wallis test statistics for the null hypothesis that the given samples are drawn from same the distribution.- Specified by:
kruskallWallisTest
in interfaceStatistics
- Returns:
-
leakyFrequencyTable
Description copied from interface:Statistics
Compute a frequency table for the data. Note that the frequencies will be leaked but the corresponding values will not.- Specified by:
leakyFrequencyTable
in interfaceStatistics
- Parameters:
data
- A dataset- Returns:
- A frequency table.
-
frequencyTable
Description copied from interface:Statistics
Compute a frequency table for the data.- Specified by:
frequencyTable
in interfaceStatistics
- Parameters:
data
- A dataset- Returns:
- A frequency table.
-
coxRegressionDiscrete
public DRes<List<DRes<SFixed>>> coxRegressionDiscrete(List<SurvivalInfoDiscrete> data, int iterations, double alpha, double[] beta)Description copied from interface:Statistics
Estimate the parameters of a Cox model on the given data. Here it's assumed that each covariate only takes values in a (small) finite set, e.g. when they indicate group membership. If many different values are possible, useStatistics.coxRegressionContinuous(java.util.List<dk.alexandra.fresco.stat.survival.SurvivalInfoContinuous>, int, double, double[])
instead.- Specified by:
coxRegressionDiscrete
in interfaceStatistics
- Parameters:
data
- The data set.iterations
- The number of iterations.alpha
- The learning rate.beta
- The initial coefficient guess.- Returns:
-
coxRegressionContinuous
public DRes<List<DRes<SFixed>>> coxRegressionContinuous(List<SurvivalInfoContinuous> data, int iterations, double alpha, double[] beta)Description copied from interface:Statistics
Estimate the parameters of a Cox model on the given data.- Specified by:
coxRegressionContinuous
in interfaceStatistics
- Parameters:
data
- The data set.iterations
- The number of iterations.alpha
- The learning rate.beta
- The initial coefficient guess.- Returns:
-
histogramDiscrete
Description copied from interface:Statistics
Compute the histogram for the given sample.- Specified by:
histogramDiscrete
in interfaceStatistics
- Parameters:
buckets
- Upper bound for the buckets to use in the histogram.data
- The sample data.- Returns:
-
histogramDiscrete
Description copied from interface:Statistics
Compute the histogram for the given sample.- Specified by:
histogramDiscrete
in interfaceStatistics
- Parameters:
buckets
- Upper bound for the buckets to use in the histogram.data
- The sample data.- Returns:
-
histogramContinuous
public DRes<List<DRes<SInt>>> histogramContinuous(List<DRes<SFixed>> buckets, List<DRes<SFixed>> data)Description copied from interface:Statistics
Compute the histogram for the given sample.- Specified by:
histogramContinuous
in interfaceStatistics
- Parameters:
buckets
- Upper bound for the buckets to use in the histogram.data
- The sample data.- Returns:
-
histogramContinuous
Description copied from interface:Statistics
Compute the histogram for the given sample.- Specified by:
histogramContinuous
in interfaceStatistics
- Parameters:
buckets
- Upper bound for the buckets to use in the histogram.data
- The sample data.- Returns:
-
twoDimensionalHistogramDiscrete
public DRes<Matrix<DRes<SInt>>> twoDimensionalHistogramDiscrete(Pair<List<DRes<SInt>>,List<DRes<SInt>>> buckets, List<Pair<DRes<SInt>,DRes<SInt>>> data)Description copied from interface:Statistics
Compute the histogram for the given two-dimensional sample.- Specified by:
twoDimensionalHistogramDiscrete
in interfaceStatistics
- Parameters:
buckets
- Upper bounds for the buckets to use in the histogram.data
- The sample data.- Returns:
-
twoDimensionalHistogramContinuous
public DRes<Matrix<DRes<SInt>>> twoDimensionalHistogramContinuous(Pair<List<DRes<SFixed>>,List<DRes<SFixed>>> buckets, List<Pair<DRes<SFixed>,DRes<SFixed>>> data)Description copied from interface:Statistics
Compute the histogram for the given two-dimensional sample.- Specified by:
twoDimensionalHistogramContinuous
in interfaceStatistics
- Parameters:
buckets
- Upper bounds for the buckets to use in the histogram.data
- The sample data.- Returns:
-
multiDimensionalHistogramDiscrete
public DRes<MultiDimensionalArray<DRes<SInt>>> multiDimensionalHistogramDiscrete(List<List<DRes<SInt>>> buckets, Matrix<DRes<SInt>> data)Description copied from interface:Statistics
Compute the histogram for the given multi-dimensional sample.- Specified by:
multiDimensionalHistogramDiscrete
in interfaceStatistics
- Parameters:
buckets
- Upper bounds for the buckets to use in the histogram.data
- The sample data.- Returns:
-
kAnonymize
public DRes<MultiDimensionalArray<List<DRes<SInt>>>> kAnonymize(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)Description copied from interface:Statistics
Compute a k-anonymized version of the given datset.Each row in the data set are the quasi-identifiers of an individual with a corresponding entry in the list of values of the sensitive attribute. The buckets indicates the desired generalization of the quasi-identifiers as in a histogram. K is the smallest allowed number of individuals in each bucket.
The output is a histogram on the given buckets with the value in the histogram being a list of size data.getHeight() with a non-zero entry x at index i indicating that the data point at row i is in this bucket and that the corresponding sensitive attribute was x.
- Specified by:
kAnonymize
in interfaceStatistics
- Parameters:
data
- The quasi identifiers for each individual.sensitiveAttributes
- The corresponding sensitive attributes. Must be non-zerobuckets
- The buckets defining the desired generalization.k
- The smallest allowed number of individuals in each bucket.- Returns:
- A k-anonymous data set with all buckets with fewer than k elements suppressed.
-
kAnonymizeAndOpen
public DRes<MultiDimensionalArray<List<BigInteger>>> kAnonymizeAndOpen(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)Description copied from interface:Statistics
Compute a k-anonymized version of the given dataset and open it to all parties.Each row in the data set are the quasi-identifiers of an individual with a corresponding entry in the list of values of the sensitive attribute. The buckets indicates the desired generalization of the quasi-identifiers as in a histogram. K is the smallest allowed number of individuals in each bucket.
The output is a histogram on the given buckets with the value corresponding to a bucket is a list of the sensitive attributes from the original dataset which ended up in this bucket.
- Specified by:
kAnonymizeAndOpen
in interfaceStatistics
- Parameters:
data
- The quasi identifiers for each individual.sensitiveAttributes
- The corresponding sensitive attributes. Must be non-zero.buckets
- The buckets defining the desired generalization.k
- The smallest allowed number of individuals in each bucket.- Returns:
- A k-anonymous data set with all buckets with fewer than k elements suppressed.
-