Interface Statistics

All Superinterfaces:
ComputationDirectory
All Known Implementing Classes:
DefaultStatistics

public interface Statistics
extends ComputationDirectory
This computation library contains various statistical functions.
  • Method Summary

    Modifier and Type Method Description
    DRes<SFixed> chiSquare​(List<DRes<SInt>> observed, double[] expected)
    Compute the test statistics for a Χ2-test.
    DRes<SFixed> chiSquare​(List<DRes<SInt>> observed, List<DRes<SFixed>> expected)
    Compute the test statistics for a Χ2-test.
    DRes<SFixed> correlation​(List<DRes<SFixed>> data1, DRes<SFixed> mean1, List<DRes<SFixed>> data2, DRes<SFixed> mean2)
    Compute Pearson's correlation coefficient on the two samples.
    DRes<SFixed> correlation​(List<DRes<SFixed>> data1, List<DRes<SFixed>> data2)
    Compute Pearson's correlation coefficient on the two samples.
    DRes<List<DRes<SFixed>>> coxRegressionContinuous​(List<SurvivalInfoContinuous> data, int iterations, double alpha, double[] beta)
    Estimate the parameters of a Cox model on the given data.
    DRes<List<DRes<SFixed>>> coxRegressionDiscrete​(List<SurvivalInfoDiscrete> data, int iterations, double alpha, double[] beta)
    Estimate the parameters of a Cox model on the given data.
    DRes<SFixed> ffest​(List<List<DRes<SFixed>>> observed)
    Compute the F-test statistics for the null hypothesis that the given datasets have the same mean.
    DRes<List<Pair<BigInteger,​Integer>>> frequencyTable​(List<DRes<SInt>> data)
    Compute a frequency table for the data.
    DRes<List<DRes<SInt>>> histogramContinuous​(double[] buckets, List<DRes<SFixed>> data)
    Compute the histogram for the given sample.
    DRes<List<DRes<SInt>>> histogramContinuous​(List<DRes<SFixed>> buckets, List<DRes<SFixed>> data)
    Compute the histogram for the given sample.
    DRes<List<DRes<SInt>>> histogramDiscrete​(int[] buckets, List<DRes<SInt>> data)
    Compute the histogram for the given sample.
    DRes<List<DRes<SInt>>> histogramDiscrete​(List<DRes<SInt>> buckets, List<DRes<SInt>> data)
    Compute the histogram for the given sample.
    DRes<MultiDimensionalArray<List<DRes<SInt>>>> kAnonymize​(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)
    Compute a k-anonymized version of the given datset.
    DRes<MultiDimensionalArray<List<BigInteger>>> kAnonymizeAndOpen​(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)
    Compute a k-anonymized version of the given dataset and open it to all parties.
    DRes<SFixed> kruskallWallisTest​(List<List<DRes<SFixed>>> observed)
    Compute the Kruskall-Wallis test statistics for the null hypothesis that the given samples are drawn from same the distribution.
    DRes<List<Pair<DRes<SInt>,​Integer>>> leakyFrequencyTable​(List<DRes<SInt>> data)
    Compute a frequency table for the data.
    DRes<LinearRegression.LinearRegressionResult> linearRegression​(List<ArrayList<DRes<SFixed>>> x, ArrayList<DRes<SFixed>> y)
    Compute estimates for the parameters b of a linear model such that b0 x0 + ...
    DRes<MultiDimensionalArray<DRes<SInt>>> multiDimensionalHistogramDiscrete​(List<List<DRes<SInt>>> buckets, Matrix<DRes<SInt>> data)
    Compute the histogram for the given multi-dimensional sample.
    DRes<SFixed> sampleMean​(List<DRes<SFixed>> data)
    Compute the sample mean of the given data.
    DRes<SFixed> sampleMedian​(List<DRes<SFixed>> data)
    Compute the sample median of the sample set.
    DRes<List<DRes<SFixed>>> samplePercentiles​(List<DRes<SFixed>> data, double[] quantiles)
    Compute the sample percentiles of a sample set.
    DRes<SFixed> sampleStandardDeviation​(List<DRes<SFixed>> data)
    Compute the standard deviation of the data.
    DRes<SFixed> sampleStandardDeviation​(List<DRes<SFixed>> data, DRes<SFixed> mean)
    Compute the sample standard deviation of the data given that the sample mean has already been calculated.
    DRes<SFixed> sampleVariance​(List<DRes<SFixed>> data)
    Compute the sample variance of the given data.
    DRes<SFixed> sampleVariance​(List<DRes<SFixed>> data, DRes<SFixed> mean)
    Compute the sample variance of the given data, assuming the sample mean has already been calculated.
    DRes<SimpleLinearRegression.SimpleLinearRegressionResult> simpleLinearRegression​(List<DRes<SFixed>> x, List<DRes<SFixed>> y)
    Compute simple linear regression on two samples.
    DRes<SFixed> ttest​(List<DRes<SFixed>> data, DRes<SFixed> mu)
    Compute the test statistics for a Student's t-test for the hypothesis that the mean of the sample is equal to mu.
    DRes<SFixed> ttest​(List<DRes<SFixed>> data1, List<DRes<SFixed>> data2)
    Compute the test statistics for a two-sample Student's t-test for the hypothesis that the mean of the two samples are equal.
    DRes<Matrix<DRes<SInt>>> twoDimensionalHistogramContinuous​(Pair<List<DRes<SFixed>>,​List<DRes<SFixed>>> buckets, List<Pair<DRes<SFixed>,​DRes<SFixed>>> data)
    Compute the histogram for the given two-dimensional sample.
    DRes<Matrix<DRes<SInt>>> twoDimensionalHistogramDiscrete​(Pair<List<DRes<SInt>>,​List<DRes<SInt>>> buckets, List<Pair<DRes<SInt>,​DRes<SInt>>> data)
    Compute the histogram for the given two-dimensional sample.
    static Statistics using​(ProtocolBuilderNumeric builder)  
  • Method Details

    • using

      static Statistics using​(ProtocolBuilderNumeric builder)
    • sampleMean

      DRes<SFixed> sampleMean​(List<DRes<SFixed>> data)
      Compute the sample mean of the given data.
      Parameters:
      data - A dataset.
      Returns:
      The sample mean.
    • sampleMedian

      DRes<SFixed> sampleMedian​(List<DRes<SFixed>> data)
      Compute the sample median of the sample set.
      Parameters:
      data - Samples.
      Returns:
      The median.
    • samplePercentiles

      DRes<List<DRes<SFixed>>> samplePercentiles​(List<DRes<SFixed>> data, double[] quantiles)
      Compute the sample percentiles of a sample set.
      Parameters:
      data - Samples.
      Returns:
      The median.
    • sampleVariance

      DRes<SFixed> sampleVariance​(List<DRes<SFixed>> data, DRes<SFixed> mean)
      Compute the sample variance of the given data, assuming the sample mean has already been calculated.
      Parameters:
      data - A dataset.
      mean - The sample mean for the given dataset.
      Returns:
      The sample variance.
    • sampleVariance

      DRes<SFixed> sampleVariance​(List<DRes<SFixed>> data)
      Compute the sample variance of the given data.
      Parameters:
      data - A dataset.
      Returns:
      The sample variance.
    • sampleStandardDeviation

      DRes<SFixed> sampleStandardDeviation​(List<DRes<SFixed>> data, DRes<SFixed> mean)
      Compute the sample standard deviation of the data given that the sample mean has already been calculated.
      Parameters:
      data - A dataset.
      mean - The sample mean for the given dataset.
      Returns:
      The sample standard deviation.
    • sampleStandardDeviation

      DRes<SFixed> sampleStandardDeviation​(List<DRes<SFixed>> data)
      Compute the standard deviation of the data.
      Parameters:
      data - A dataset.
      Returns:
      The sample standard deviation.
    • ttest

      DRes<SFixed> ttest​(List<DRes<SFixed>> data, DRes<SFixed> mu)
      Compute the test statistics for a Student's t-test for the hypothesis that the mean of the sample is equal to mu.
      Parameters:
      data - A dataset.
      mu - The parameter for the t-test.
      Returns:
      The test statistics.
    • ttest

      DRes<SFixed> ttest​(List<DRes<SFixed>> data1, List<DRes<SFixed>> data2)
      Compute the test statistics for a two-sample Student's t-test for the hypothesis that the mean of the two samples are equal. It is assumed that the two samples have the same variance.
      Parameters:
      data1 - A dataset.
      data2 - A dataset.
      Returns:
      The test statistics for the hypothesis that the two datasets have the same mean.
    • chiSquare

      DRes<SFixed> chiSquare​(List<DRes<SInt>> observed, List<DRes<SFixed>> expected)
      Compute the test statistics for a Χ2-test.
      Parameters:
      observed - The observed data.
      expected - The expected number of observations in each bucket.
      Returns:
      The test statistics that the observed data fits the distribution of the expected.
    • chiSquare

      DRes<SFixed> chiSquare​(List<DRes<SInt>> observed, double[] expected)
      Compute the test statistics for a Χ2-test.
      Parameters:
      observed - The observed data.
      expected - The expected number of observations in each bucket.
      Returns:
      The test statistics that the observed data fits the distribution of the expected.
    • linearRegression

      DRes<LinearRegression.LinearRegressionResult> linearRegression​(List<ArrayList<DRes<SFixed>>> x, ArrayList<DRes<SFixed>> y)
      Compute estimates for the parameters b of a linear model such that b0 x0 + ... + bk xk = y.
      Parameters:
      x - The dataset.
      y - The dependant values
      Returns:
      An estimation for the parameters of a linear model for the given data.
    • simpleLinearRegression

      DRes<SimpleLinearRegression.SimpleLinearRegressionResult> simpleLinearRegression​(List<DRes<SFixed>> x, List<DRes<SFixed>> y)
      Compute simple linear regression on two samples.
      Parameters:
      x - The dataset.
      y - The dependant values.
      Returns:
      An estimation for the parameters of a linear model.
    • correlation

      DRes<SFixed> correlation​(List<DRes<SFixed>> data1, DRes<SFixed> mean1, List<DRes<SFixed>> data2, DRes<SFixed> mean2)
      Compute Pearson's correlation coefficient on the two samples. Here it's assumed that the sample means has already been calculated.
      Parameters:
      data1 -
      mean1 -
      data2 -
      mean2 -
      Returns:
    • correlation

      DRes<SFixed> correlation​(List<DRes<SFixed>> data1, List<DRes<SFixed>> data2)
      Compute Pearson's correlation coefficient on the two samples.
      Parameters:
      data1 -
      data2 -
      Returns:
    • ffest

      DRes<SFixed> ffest​(List<List<DRes<SFixed>>> observed)
      Compute the F-test statistics for the null hypothesis that the given datasets have the same mean.
      Parameters:
      observed - A list of datasets.
      Returns:
      The test statistics.
    • kruskallWallisTest

      DRes<SFixed> kruskallWallisTest​(List<List<DRes<SFixed>>> observed)
      Compute the Kruskall-Wallis test statistics for the null hypothesis that the given samples are drawn from same the distribution.
      Parameters:
      observed -
      Returns:
    • leakyFrequencyTable

      DRes<List<Pair<DRes<SInt>,​Integer>>> leakyFrequencyTable​(List<DRes<SInt>> data)
      Compute a frequency table for the data. Note that the frequencies will be leaked but the corresponding values will not.
      Parameters:
      data - A dataset
      Returns:
      A frequency table.
    • frequencyTable

      DRes<List<Pair<BigInteger,​Integer>>> frequencyTable​(List<DRes<SInt>> data)
      Compute a frequency table for the data.
      Parameters:
      data - A dataset
      Returns:
      A frequency table.
    • coxRegressionDiscrete

      DRes<List<DRes<SFixed>>> coxRegressionDiscrete​(List<SurvivalInfoDiscrete> data, int iterations, double alpha, double[] beta)
      Estimate the parameters of a Cox model on the given data. Here it's assumed that each covariate only takes values in a (small) finite set, e.g. when they indicate group membership. If many different values are possible, use coxRegressionContinuous(java.util.List<dk.alexandra.fresco.stat.survival.SurvivalInfoContinuous>, int, double, double[]) instead.
      Parameters:
      data - The data set.
      iterations - The number of iterations.
      alpha - The learning rate.
      beta - The initial coefficient guess.
      Returns:
    • coxRegressionContinuous

      DRes<List<DRes<SFixed>>> coxRegressionContinuous​(List<SurvivalInfoContinuous> data, int iterations, double alpha, double[] beta)
      Estimate the parameters of a Cox model on the given data.
      Parameters:
      data - The data set.
      iterations - The number of iterations.
      alpha - The learning rate.
      beta - The initial coefficient guess.
      Returns:
    • histogramDiscrete

      DRes<List<DRes<SInt>>> histogramDiscrete​(int[] buckets, List<DRes<SInt>> data)
      Compute the histogram for the given sample.
      Parameters:
      buckets - Upper bound for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • histogramContinuous

      DRes<List<DRes<SInt>>> histogramContinuous​(double[] buckets, List<DRes<SFixed>> data)
      Compute the histogram for the given sample.
      Parameters:
      buckets - Upper bound for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • histogramDiscrete

      DRes<List<DRes<SInt>>> histogramDiscrete​(List<DRes<SInt>> buckets, List<DRes<SInt>> data)
      Compute the histogram for the given sample.
      Parameters:
      buckets - Upper bound for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • histogramContinuous

      DRes<List<DRes<SInt>>> histogramContinuous​(List<DRes<SFixed>> buckets, List<DRes<SFixed>> data)
      Compute the histogram for the given sample.
      Parameters:
      buckets - Upper bound for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • twoDimensionalHistogramDiscrete

      DRes<Matrix<DRes<SInt>>> twoDimensionalHistogramDiscrete​(Pair<List<DRes<SInt>>,​List<DRes<SInt>>> buckets, List<Pair<DRes<SInt>,​DRes<SInt>>> data)
      Compute the histogram for the given two-dimensional sample.
      Parameters:
      buckets - Upper bounds for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • twoDimensionalHistogramContinuous

      DRes<Matrix<DRes<SInt>>> twoDimensionalHistogramContinuous​(Pair<List<DRes<SFixed>>,​List<DRes<SFixed>>> buckets, List<Pair<DRes<SFixed>,​DRes<SFixed>>> data)
      Compute the histogram for the given two-dimensional sample.
      Parameters:
      buckets - Upper bounds for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • multiDimensionalHistogramDiscrete

      DRes<MultiDimensionalArray<DRes<SInt>>> multiDimensionalHistogramDiscrete​(List<List<DRes<SInt>>> buckets, Matrix<DRes<SInt>> data)
      Compute the histogram for the given multi-dimensional sample.
      Parameters:
      buckets - Upper bounds for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • kAnonymize

      DRes<MultiDimensionalArray<List<DRes<SInt>>>> kAnonymize​(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)
      Compute a k-anonymized version of the given datset.

      Each row in the data set are the quasi-identifiers of an individual with a corresponding entry in the list of values of the sensitive attribute. The buckets indicates the desired generalization of the quasi-identifiers as in a histogram. K is the smallest allowed number of individuals in each bucket.

      The output is a histogram on the given buckets with the value in the histogram being a list of size data.getHeight() with a non-zero entry x at index i indicating that the data point at row i is in this bucket and that the corresponding sensitive attribute was x.

      Parameters:
      data - The quasi identifiers for each individual.
      sensitiveAttributes - The corresponding sensitive attributes. Must be non-zero
      buckets - The buckets defining the desired generalization.
      k - The smallest allowed number of individuals in each bucket.
      Returns:
      A k-anonymous data set with all buckets with fewer than k elements suppressed.
    • kAnonymizeAndOpen

      DRes<MultiDimensionalArray<List<BigInteger>>> kAnonymizeAndOpen​(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)
      Compute a k-anonymized version of the given dataset and open it to all parties.

      Each row in the data set are the quasi-identifiers of an individual with a corresponding entry in the list of values of the sensitive attribute. The buckets indicates the desired generalization of the quasi-identifiers as in a histogram. K is the smallest allowed number of individuals in each bucket.

      The output is a histogram on the given buckets with the value corresponding to a bucket is a list of the sensitive attributes from the original dataset which ended up in this bucket.

      Parameters:
      data - The quasi identifiers for each individual.
      sensitiveAttributes - The corresponding sensitive attributes. Must be non-zero.
      buckets - The buckets defining the desired generalization.
      k - The smallest allowed number of individuals in each bucket.
      Returns:
      A k-anonymous data set with all buckets with fewer than k elements suppressed.