Class DefaultStatistics

Object
DefaultStatistics
All Implemented Interfaces:
ComputationDirectory, Statistics

public class DefaultStatistics
extends Object
implements Statistics
  • Method Details

    • sampleMean

      public DRes<SFixed> sampleMean​(List<DRes<SFixed>> data)
      Description copied from interface: Statistics
      Compute the sample mean of the given data.
      Specified by:
      sampleMean in interface Statistics
      Parameters:
      data - A dataset.
      Returns:
      The sample mean.
    • sampleMedian

      public DRes<SFixed> sampleMedian​(List<DRes<SFixed>> data)
      Description copied from interface: Statistics
      Compute the sample median of the sample set.
      Specified by:
      sampleMedian in interface Statistics
      Parameters:
      data - Samples.
      Returns:
      The median.
    • samplePercentiles

      public DRes<List<DRes<SFixed>>> samplePercentiles​(List<DRes<SFixed>> data, double[] percentiles)
      Description copied from interface: Statistics
      Compute the sample percentiles of a sample set.
      Specified by:
      samplePercentiles in interface Statistics
      Parameters:
      data - Samples.
      Returns:
      The median.
    • sampleVariance

      public DRes<SFixed> sampleVariance​(List<DRes<SFixed>> data, DRes<SFixed> mean)
      Description copied from interface: Statistics
      Compute the sample variance of the given data, assuming the sample mean has already been calculated.
      Specified by:
      sampleVariance in interface Statistics
      Parameters:
      data - A dataset.
      mean - The sample mean for the given dataset.
      Returns:
      The sample variance.
    • sampleVariance

      public DRes<SFixed> sampleVariance​(List<DRes<SFixed>> data)
      Description copied from interface: Statistics
      Compute the sample variance of the given data.
      Specified by:
      sampleVariance in interface Statistics
      Parameters:
      data - A dataset.
      Returns:
      The sample variance.
    • sampleStandardDeviation

      public DRes<SFixed> sampleStandardDeviation​(List<DRes<SFixed>> data)
      Description copied from interface: Statistics
      Compute the standard deviation of the data.
      Specified by:
      sampleStandardDeviation in interface Statistics
      Parameters:
      data - A dataset.
      Returns:
      The sample standard deviation.
    • sampleStandardDeviation

      public DRes<SFixed> sampleStandardDeviation​(List<DRes<SFixed>> data, DRes<SFixed> mean)
      Description copied from interface: Statistics
      Compute the sample standard deviation of the data given that the sample mean has already been calculated.
      Specified by:
      sampleStandardDeviation in interface Statistics
      Parameters:
      data - A dataset.
      mean - The sample mean for the given dataset.
      Returns:
      The sample standard deviation.
    • ttest

      public DRes<SFixed> ttest​(List<DRes<SFixed>> data, DRes<SFixed> mu)
      Description copied from interface: Statistics
      Compute the test statistics for a Student's t-test for the hypothesis that the mean of the sample is equal to mu.
      Specified by:
      ttest in interface Statistics
      Parameters:
      data - A dataset.
      mu - The parameter for the t-test.
      Returns:
      The test statistics.
    • ttest

      public DRes<SFixed> ttest​(List<DRes<SFixed>> data1, List<DRes<SFixed>> data2)
      Description copied from interface: Statistics
      Compute the test statistics for a two-sample Student's t-test for the hypothesis that the mean of the two samples are equal. It is assumed that the two samples have the same variance.
      Specified by:
      ttest in interface Statistics
      Parameters:
      data1 - A dataset.
      data2 - A dataset.
      Returns:
      The test statistics for the hypothesis that the two datasets have the same mean.
    • chiSquare

      public DRes<SFixed> chiSquare​(List<DRes<SInt>> observed, List<DRes<SFixed>> expected)
      Description copied from interface: Statistics
      Compute the test statistics for a Χ2-test.
      Specified by:
      chiSquare in interface Statistics
      Parameters:
      observed - The observed data.
      expected - The expected number of observations in each bucket.
      Returns:
      The test statistics that the observed data fits the distribution of the expected.
    • chiSquare

      public DRes<SFixed> chiSquare​(List<DRes<SInt>> observed, double[] expected)
      Description copied from interface: Statistics
      Compute the test statistics for a Χ2-test.
      Specified by:
      chiSquare in interface Statistics
      Parameters:
      observed - The observed data.
      expected - The expected number of observations in each bucket.
      Returns:
      The test statistics that the observed data fits the distribution of the expected.
    • linearRegression

      public DRes<LinearRegression.LinearRegressionResult> linearRegression​(List<ArrayList<DRes<SFixed>>> x, ArrayList<DRes<SFixed>> y)
      Description copied from interface: Statistics
      Compute estimates for the parameters b of a linear model such that b0 x0 + ... + bk xk = y.
      Specified by:
      linearRegression in interface Statistics
      Parameters:
      x - The dataset.
      y - The dependant values
      Returns:
      An estimation for the parameters of a linear model for the given data.
    • simpleLinearRegression

      public DRes<SimpleLinearRegression.SimpleLinearRegressionResult> simpleLinearRegression​(List<DRes<SFixed>> x, List<DRes<SFixed>> y)
      Description copied from interface: Statistics
      Compute simple linear regression on two samples.
      Specified by:
      simpleLinearRegression in interface Statistics
      Parameters:
      x - The dataset.
      y - The dependant values.
      Returns:
      An estimation for the parameters of a linear model.
    • correlation

      public DRes<SFixed> correlation​(List<DRes<SFixed>> data1, DRes<SFixed> mean1, List<DRes<SFixed>> data2, DRes<SFixed> mean2)
      Description copied from interface: Statistics
      Compute Pearson's correlation coefficient on the two samples. Here it's assumed that the sample means has already been calculated.
      Specified by:
      correlation in interface Statistics
      Returns:
    • correlation

      public DRes<SFixed> correlation​(List<DRes<SFixed>> data1, List<DRes<SFixed>> data2)
      Description copied from interface: Statistics
      Compute Pearson's correlation coefficient on the two samples.
      Specified by:
      correlation in interface Statistics
      Returns:
    • ffest

      public DRes<SFixed> ffest​(List<List<DRes<SFixed>>> observed)
      Description copied from interface: Statistics
      Compute the F-test statistics for the null hypothesis that the given datasets have the same mean.
      Specified by:
      ffest in interface Statistics
      Parameters:
      observed - A list of datasets.
      Returns:
      The test statistics.
    • kruskallWallisTest

      public DRes<SFixed> kruskallWallisTest​(List<List<DRes<SFixed>>> observed)
      Description copied from interface: Statistics
      Compute the Kruskall-Wallis test statistics for the null hypothesis that the given samples are drawn from same the distribution.
      Specified by:
      kruskallWallisTest in interface Statistics
      Returns:
    • leakyFrequencyTable

      public DRes<List<Pair<DRes<SInt>,​Integer>>> leakyFrequencyTable​(List<DRes<SInt>> data)
      Description copied from interface: Statistics
      Compute a frequency table for the data. Note that the frequencies will be leaked but the corresponding values will not.
      Specified by:
      leakyFrequencyTable in interface Statistics
      Parameters:
      data - A dataset
      Returns:
      A frequency table.
    • frequencyTable

      public DRes<List<Pair<BigInteger,​Integer>>> frequencyTable​(List<DRes<SInt>> data)
      Description copied from interface: Statistics
      Compute a frequency table for the data.
      Specified by:
      frequencyTable in interface Statistics
      Parameters:
      data - A dataset
      Returns:
      A frequency table.
    • coxRegressionDiscrete

      public DRes<List<DRes<SFixed>>> coxRegressionDiscrete​(List<SurvivalInfoDiscrete> data, int iterations, double alpha, double[] beta)
      Description copied from interface: Statistics
      Estimate the parameters of a Cox model on the given data. Here it's assumed that each covariate only takes values in a (small) finite set, e.g. when they indicate group membership. If many different values are possible, use Statistics.coxRegressionContinuous(java.util.List<dk.alexandra.fresco.stat.survival.SurvivalInfoContinuous>, int, double, double[]) instead.
      Specified by:
      coxRegressionDiscrete in interface Statistics
      Parameters:
      data - The data set.
      iterations - The number of iterations.
      alpha - The learning rate.
      beta - The initial coefficient guess.
      Returns:
    • coxRegressionContinuous

      public DRes<List<DRes<SFixed>>> coxRegressionContinuous​(List<SurvivalInfoContinuous> data, int iterations, double alpha, double[] beta)
      Description copied from interface: Statistics
      Estimate the parameters of a Cox model on the given data.
      Specified by:
      coxRegressionContinuous in interface Statistics
      Parameters:
      data - The data set.
      iterations - The number of iterations.
      alpha - The learning rate.
      beta - The initial coefficient guess.
      Returns:
    • histogramDiscrete

      public DRes<List<DRes<SInt>>> histogramDiscrete​(int[] buckets, List<DRes<SInt>> data)
      Description copied from interface: Statistics
      Compute the histogram for the given sample.
      Specified by:
      histogramDiscrete in interface Statistics
      Parameters:
      buckets - Upper bound for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • histogramDiscrete

      public DRes<List<DRes<SInt>>> histogramDiscrete​(List<DRes<SInt>> buckets, List<DRes<SInt>> data)
      Description copied from interface: Statistics
      Compute the histogram for the given sample.
      Specified by:
      histogramDiscrete in interface Statistics
      Parameters:
      buckets - Upper bound for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • histogramContinuous

      public DRes<List<DRes<SInt>>> histogramContinuous​(List<DRes<SFixed>> buckets, List<DRes<SFixed>> data)
      Description copied from interface: Statistics
      Compute the histogram for the given sample.
      Specified by:
      histogramContinuous in interface Statistics
      Parameters:
      buckets - Upper bound for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • histogramContinuous

      public DRes<List<DRes<SInt>>> histogramContinuous​(double[] buckets, List<DRes<SFixed>> data)
      Description copied from interface: Statistics
      Compute the histogram for the given sample.
      Specified by:
      histogramContinuous in interface Statistics
      Parameters:
      buckets - Upper bound for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • twoDimensionalHistogramDiscrete

      public DRes<Matrix<DRes<SInt>>> twoDimensionalHistogramDiscrete​(Pair<List<DRes<SInt>>,​List<DRes<SInt>>> buckets, List<Pair<DRes<SInt>,​DRes<SInt>>> data)
      Description copied from interface: Statistics
      Compute the histogram for the given two-dimensional sample.
      Specified by:
      twoDimensionalHistogramDiscrete in interface Statistics
      Parameters:
      buckets - Upper bounds for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • twoDimensionalHistogramContinuous

      public DRes<Matrix<DRes<SInt>>> twoDimensionalHistogramContinuous​(Pair<List<DRes<SFixed>>,​List<DRes<SFixed>>> buckets, List<Pair<DRes<SFixed>,​DRes<SFixed>>> data)
      Description copied from interface: Statistics
      Compute the histogram for the given two-dimensional sample.
      Specified by:
      twoDimensionalHistogramContinuous in interface Statistics
      Parameters:
      buckets - Upper bounds for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • multiDimensionalHistogramDiscrete

      public DRes<MultiDimensionalArray<DRes<SInt>>> multiDimensionalHistogramDiscrete​(List<List<DRes<SInt>>> buckets, Matrix<DRes<SInt>> data)
      Description copied from interface: Statistics
      Compute the histogram for the given multi-dimensional sample.
      Specified by:
      multiDimensionalHistogramDiscrete in interface Statistics
      Parameters:
      buckets - Upper bounds for the buckets to use in the histogram.
      data - The sample data.
      Returns:
    • kAnonymize

      public DRes<MultiDimensionalArray<List<DRes<SInt>>>> kAnonymize​(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)
      Description copied from interface: Statistics
      Compute a k-anonymized version of the given datset.

      Each row in the data set are the quasi-identifiers of an individual with a corresponding entry in the list of values of the sensitive attribute. The buckets indicates the desired generalization of the quasi-identifiers as in a histogram. K is the smallest allowed number of individuals in each bucket.

      The output is a histogram on the given buckets with the value in the histogram being a list of size data.getHeight() with a non-zero entry x at index i indicating that the data point at row i is in this bucket and that the corresponding sensitive attribute was x.

      Specified by:
      kAnonymize in interface Statistics
      Parameters:
      data - The quasi identifiers for each individual.
      sensitiveAttributes - The corresponding sensitive attributes. Must be non-zero
      buckets - The buckets defining the desired generalization.
      k - The smallest allowed number of individuals in each bucket.
      Returns:
      A k-anonymous data set with all buckets with fewer than k elements suppressed.
    • kAnonymizeAndOpen

      public DRes<MultiDimensionalArray<List<BigInteger>>> kAnonymizeAndOpen​(Matrix<DRes<SInt>> data, List<DRes<SInt>> sensitiveAttributes, List<List<DRes<SInt>>> buckets, int k)
      Description copied from interface: Statistics
      Compute a k-anonymized version of the given dataset and open it to all parties.

      Each row in the data set are the quasi-identifiers of an individual with a corresponding entry in the list of values of the sensitive attribute. The buckets indicates the desired generalization of the quasi-identifiers as in a histogram. K is the smallest allowed number of individuals in each bucket.

      The output is a histogram on the given buckets with the value corresponding to a bucket is a list of the sensitive attributes from the original dataset which ended up in this bucket.

      Specified by:
      kAnonymizeAndOpen in interface Statistics
      Parameters:
      data - The quasi identifiers for each individual.
      sensitiveAttributes - The corresponding sensitive attributes. Must be non-zero.
      buckets - The buckets defining the desired generalization.
      k - The smallest allowed number of individuals in each bucket.
      Returns:
      A k-anonymous data set with all buckets with fewer than k elements suppressed.