Provided by: libgenome-model-tools-music-perl_0.04-5_all bug

NAME

       CombinePvals - combining probabilities from independent tests of significance into a single aggregate
       figure

SYNOPSIS

               use CombinePvals;

               my $obj = CombinePvals->new ($reference_to_list_of_pvals);

               my $pval = $obj->method_name;

               my $pval = $obj->method_name (@arguments);

DESCRIPTION

       There are a variety of circumstances under which one might have a number of different kinds of tests
       and/or separate instances of the same kind of test for one particular null hypothesis, where each of
       these tests returns a p-value.  The problem is how to properly condense this list of probabilities into a
       single value so as to be able to make a statistical inference, e.g. whether to reject the null
       hypothesis.  This problem was examined heavily starting about the 1930s, during which time numerous
       mathematical contintencies were treated, e.g. dependence vs. independence of tests, optimality, inter-
       test weighting, computational efficiency, continuous vs. discrete tests and combinations thereof, etc.
       There is quite a large mathematical literature on this topic (see "REFERENCES" below) and any one
       particular situation might incur some of the above subtleties.  This package concentrates on some of the
       more straightforward scenarios, furnishing various methods for combining p-vals.  The main consideration
       will usually be the trade-off between the exactness of the p-value (according to strict frequentist
       modeling) and the computational efficiency, or even its actual feasibility.  Tests should be chosen with
       this factor in mind.

       Note also that this scenario of combining p-values (many tests of a single hypothesis) is fundamentally
       different from that where a given hypothesis is tested multiple times.  The latter instance usually calls
       for some method of multiple testing correction.

REFERENCES

       Here is an abbreviated list of the substantive works on the topic of combining probabilities.

       •   Birnbaum,  A. (1954) Combining Independent Tests of Significance, Journal of the American Statistical
           Association 49(267), 559-574.

       •   David, F. N. and Johnson, N. L. (1950) The Probability Integral Transformation When the  Variable  is
           Discontinuous, Biometrika 37(1/2), 42-49.

       •   Fisher,  R.  A. (1958) Statistical Methods for Research Workers, 13-th Ed. Revised, Hafner Publishing
           Co., New York.

       •   Lancaster, H. O. (1949) The Combination of Probabilities Arising from Data in Discrete Distributions,
           Biometrika 36(3/4), 370-382.

       •   Littell, R. C. and Folks, J.  L.  (1971)  Asymptotic  Optimality  of  Fisher's  Method  of  Combining
           Independent Tests, Journal of the American Statistical Association 66(336), 802-806.

       •   Pearson,  E.  S.  (1938)  The  Probability  Integral  Transformation  for Testing Goodness of Fit and
           Combining Independent Tests of Significance, Biometrika 30(12), 134-148.

       •   Pearson, E. S. (1950) On Questions  Raised  by  the  Combination  of  Tests  Based  on  Discontonuous
           Distributions, Biometrika 37(3/4), 383-398.

       •   Pearson,  K. (1933) On a Method of Determining Whether a Sample Of Size N Supposed to Have Been Drawn
           From a Parent Population Having a Known Probability  Integral  Has  Probably  Been  Drawn  at  Random
           Biometrika 25(3/4), 379-410.

       •   Van Valen, L. (1964) Combining the Probabilities from Significance Tests, Nature 201(4919), 642.

       •   Wallis,  W.  A.  (1942)  Compounding  Probabilities from Independent Significance Tests, Econometrica
           10(3/4), 229-248.

       •   Zelen, M. and Joel, L. S. (1959) The Weighted Compounding  of  Two  Independent  Significance  Tests,
           Annals of Mathematical Statistics 30(4), 885-895.

AUTHOR

       Michael C. Wendl

       mwendl@wustl.edu

       Copyright (C) 2009 Washington University

       This  program  is  free  software;  you  can  redistribute it and/or modify it under the terms of the GNU
       General Public License as published by the Free Software Foundation; either version 2 of the License,  or
       (at your option) any later version.

       This  program  is  distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even
       the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General  Public
       License for more details.

       You  should have received a copy of the GNU General Public License along with this program; if not, write
       to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

GENERAL REMARKS ON METHODS

       The available methods are listed below.  Each of computational techniques assumes that tests, as well  as
       their  associated  p-values,  are  independent of one another and none considers any form of differential
       weighting.

CONSTRUCTOR METHODS

       These methods return an object in the CombinePvals class.

   new
       This is the usual object constructor, which takes a mandatory, but otherwise un-ordered (reference to  a)
       list of the p-values obtained by a set of independent tests.

               my $obj = CombinePvals->new ([0.103, 0.078, 0.03, 0.2,...]);

       The method checks to make sure that all elements are actual p-values, i.e. they are real numbers and they
       have values bounded by 0 and 1.

EXACT ENUMERATIVE PROCEDURES FOR STRICTLY DISCRETE DISTRIBUTIONS

       When  all  the  individual  p-vals are derived from tests based on discrete distributions, the "standard"
       continuum methods cannot be used in the strictest sense.  Both Wallis (1942) and Lancaster (1949) discuss
       the option of full enumeration, which will only be feasible when there are a limited number  of  p-values
       and  their  range  is  not  too large.  Feasibility experiments are suggested, depending upon the type of
       hardware and size of calculation.

   exact_enum_arbitrary
       This  routine  is  designed  for  combining  p-values  from  completely  arbitrary  discrete  probability
       distributions.   It  takes  a list-of-lists data structure, each list being the probability tails ordered
       from most extreme to least extreme (i.e. as a probability cumulative density  function)  associated  with
       each  individual  test.   However,  the ordering of the lists themselves is not important.  For instance,
       Wallis (1942) gives the example of two binomials, a one-tailed test having tail values of 0.0625, 0.3125,
       0.6875, 0.9375, and 1, and a two-tailed test having tail values 0.125, 0.625, and 1.  We would then  call
       this method using

               my $pval = $obj->exact_enum_arbitrary (
                  [0.0625, 0.3125, 0.6875, 0.9375, 1],
                  [0.125, 0.625, 1]
               );

       The  internal computational method is relatively straightforard and described in detail by Wallis (1942).
       Note that this method does "all-by-all" multiplication, so it is the least efficient,  although  entirely
       exact.

   exact_enum_identical
       This  routine  is  designed  for  combining  a  set  of  p-values that all come from a single probability
       distribution.

               NOT IMPLEMENTED YET

TRANSFORMS FOR CONTINUOUS DISTRIBUTIONS

       The mathematical literature furnishes several straightforward options for combining p-vals if all of  the
       distributions underlying all of the individual tests are continuous.

   fisher_chisq_transform
       This  routine  implements R.A. Fisher's (1958, originally 1932) chi-square transform method for combining
       p-vals from continuous distributions, which is essentially a CPU-efficient approximation of K.  Pearson's
       log-based  result  (see  e.g.  Wallis  (1942)  pp  232).   Note that the underlying distributions are not
       actually relevant, so no arguments are passed.

               my $pval = $obj->fisher_chisq_transform;

       This is certainly the fastest and easiest method for combining p-vals,  but  its  accuracy  for  discrete
       distributions  will  not usually be very good.  For such cases, an exact or a corrected method are better
       choices.

CORRECTION PROCEDURES FOR DISCRETE DISTRIBUTIONS: LANCASTER'S MODELS

       Enumerative procedures quickly become infeasible if the number of tests and/or the support of  each  test
       grow  large.   A  number  of procedures have been described for correcting the methodologies designed for
       continuum testing, mostly in the context of  applying  so-called  continuity  corrections.   Essentially,
       these  seek  to  "spread"  dicrete  data  out  into  a  pseudo-continuous configuration as appropriate as
       possible, and then apply standard transforms.  Accuracy varies and should be suitably established in each
       case.

       The methods in this section are due to H.O. Lancaster (1949), who discussed two  corrections  based  upon
       the  idea  of  describing  how a chi-square transformed statistic varies between the points of a discrete
       distribution.  Unfortunately, these methods require one to pass some extra information to  the  routines,
       i.e.  not  only the CDF (the p-val of each test), but the CDF value associated with the next-most-extreme
       statistic.  These two pieces of  information  are  the  basis  of  interpolating.   For  example,  if  an
       underlying  distribution  has  the possible tail values of 0.0625, 0.3125, 0.6875, 0.9375, 1 and the test
       itself has a value of 0.6875, then you would pass both 0.3125 and 0.6875 to the routine.  In  all  cases,
       the  lower  value,  i.e.  the  more extreme one, precedes higher value in the argument list.  While there
       generally will be some extra inconvenience in obtaining this information, the accuracy is  much  improved
       over Fisher's method.

   lancaster_mean_corrected_transform
       This method is based on the mean value of the chi-squared transformed statistic.

               my $pval = $obj->lancaster_mean_corrected_transform (@cdf_pairs);

       Its  accuracy  is  good,  but  the method is not strictly defined if one of the tests has either the most
       extreme or second-to-most-extreme statistic.

   lancaster_median_corrected_transform
       This method is based on the median value of the chi-squared transformed statistic.

               my $pval = $obj->lancaster_median_corrected_transform (@cdf_pairs);

       Its accuracy may sometimes be not quite as good as when using the average, but  the  method  is  strictly
       defined for all values of the statistic.

   lancaster_mixed_corrected_transform
       This  method  is  a  mixture  of both the mean and median methods.  Specifically, mean correction is used
       wherever it is well-defined, otherwise median correction is used.

               my $pval = $obj->lancaster_mixed_corrected_transform (@cdf_pairs);

       This will be a good way to handle certain cases.

   additional methods
       The basic functionality of this package is encompassed in the methods  described  above.   However,  some
       lower-level functions can also sometimes be useful.

       exact_enum_arbitrary_2

       Hard-wired  precursor  of  exact_enum_arbitrary  for  2  distributions.  Does no pre-checking, but may be
       useful for comparing to the output of the general program.

       exact_enum_arbitrary_3

       Hard-wired precursor of exact_enum_arbitrary for 3 distributions.   Does  no  pre-checking,  but  may  be
       useful for comparing to the output of the general program.

       binom_coeffs

       Calculates the binomial coefficients needed in the binomial (convolution) approximate solution.

               $pmobj->binom_coeffs;

       The internal data structure is essentially the symmetric half of the appropriately-sized Pascal triangle.
       Considerable memory is saved by not storing the full triangle.

perl v5.30.3                                       2020-11-06              Genome::Model:...n::CombinePvals(3pm)