Previous requirements: ===================== 0) The package does not work with ROOT versions below v4. This is is simply due to the fact that the SVD decomposition subroutines were included only recently. 1) For the time being, it is assumed that the DATA distribution contains just number of events in each bin. This is important because uncertainties are propagated under the assumption that sqrt(N) is the uncertainty on the number of entries N of each data bin. Even if one could force the user to plug in the correct uncertainties in the input histogram, some of the methods (see below) are assuming a Poisson distribution is governing the number of entries N. 2) At least for SVD methods, a necessary requirement is that the number of bins for the resulting unfolded histogram (NGEN, see below) must be less or equal than the number of bins in data (NDATA, see also below): NGEN <= NDATA Examples in this directory: ========================== 1) test_unfold.C What it does: Unfolds a given distribution. Modes: Interpreted Library used: Shared library To run it: "root test_unfold.C" Comments: a) The unfolding library requires at least two histograms: 1) A one-dimensional one, with the data entries, distributes in NDATA bins. 2) A two-dimensional one, with NDATA bins on the X axis and NGEN bins on the Y axis. This histogram describes the migration matrix, namely the bin (i,j) should contain the number of MC entries generated in bin "j" that get reconstructed in bin "i". The entries generated in bin "j" that are not reconstructed or selected (contributing to the inefficiency) must be recorded in bins (0,j) or (NDATA+1,j), i.e. they MUST BE RECORDED either as underflows or overflows. b) Since all entries are normalized in the procedure, and we are assuming zero uncertainty on MC predictions, you can choose the global normalization of the two-dimensional histogram as you wish (total entries in MC, "1" in order to describe efficiencies, "100" to describe percentages,...). c) HOWEVER, the relative normalization of the different bins in the two-dimensional histogram matter -both for columns and for rows). In particular, it is important to guarantee that the shape of the histogram projection onto the Y axis is approximately the shape that you expect at the generator level, and not something totally random. This shape is used to define the "test" histogram, on which some of the methods are slightly dependent (particularly methods 2 and 4, as well as the tuning of the necessary number of iterations for method 1). See below for more details on the different methods. d) Optionally, one can add an additional histogram of "background" entries, to be subtracted from data. This histogram must have also NDATA bins. No uncertainty is assumed on this histogram. e) Usage: TH1F* hdata = ...; TH2F* hmigr = ...; TH1F* hbkg = ...; // not mandatory AMSUnfold* unfold = new AMSUnfold(); int method = ...; TH1F* hgen = unfold->Unfold(method); where method can be: 0: exact inversion 1: Bayesian method 2: Tikhonov regularization 3: C. Delgado regularization 4: J. Alcaraz unfolding DETAILS OF THE METHODS ARE GIVEN BELOW. f) See files "test_unfold.C" and "test2_unfold.C" for the typical calling sequences. See file "../lib/unfold/unfold.h" for further details if some specific parameters have to be changed in the calling sequences. Comments about the exact inversion method (id 0) ==================================================== Being the exact solution of the minimization problem, it usually provides a solution with large correlations between bins, which is not aesthetically acceptable. It is anyway interesting to test its answer before any other method is applied, particularly when the collected statistics is large. Comments about the Bayesian method (id 1) ============================================= Very popular due to its apparent simplicity, it is just a method to converge to the exact solution in several steps. The fact that the initial distribution in the iterative procedure is flat guarantees a reasonable smoothness for the first steps. The optimal step to stop the procedure is something to be "tuned" with the MC distribution (the only dependence on the MC is here). This is provided by the method "int AMSUnfold::Tune_Bayesian()". If the user prefers to fix the number of iterations by hand, the method "TH1F* AMSUnfold::Unfold_Bayesian(int iterations)" must be used. Comments about the Tikhonov regularization (id 2) ===================================================== In this method, the large correlations between bins are suppressed by imposing minimum local curvature (==>smoothness). The degree of smoothness is fixed by analyzing (via SVD decomposition) the degrees of freedom for which the significance is below 1 standard deviation. This decision is slightly dependent on the expected MC shape. If the user prefers to fix the smoothness parameter by hand, the method "TH1F* AMSUnfold::Unfold_Tikhonov(double smooth_param)" must be used. Comments about C. Delgado regularization (id 3) =================================================== In this method the correlations between bins are suppressed by minimizing the algorithmic complexity on the set of all the locally continuos histograms. The net effect is to reduce the number of bins in the unfolding. The method has some characteristics which should be known to obtain correct results: a) It assumes Poisson statistics so the one-dimensional histograms with the data entries and the optional background should be normalized to their corresponding number of entries. b) It does not require to approximately know the expected shape at generator level. c) After unfolding the data distribution, two histograms are available: 1) the returned histogram, which can be smoothed to obtain a more visually appealing histogram. 2) another one, which can be accessed with the method true_binning(), which has been explicitly rebinned to reflect the estimated number of degrees of freedom. As an option it has been included the possibility of minimizing the algorithmic complexity on the set of all the histograms with locally continuous derivates, thus obtaining smooth histograms. This approach has several drawbacks: a) The minimization method does not guarantee a global minimum. b) The change on the number of degrees of freedom is not explicity reflected. c) The resulting unfolded histogram usually is less appealing from a theoretic point of view in the framework of the regularization procedure. To obtain the smoothed unfolded histogram of use the method "TH1F* AMSUnfold::Unfold_CDelgado(1)" Comments about the J. Alcaraz method (id 4) =============================================== In this method, the large correlations between bins are suppressed by imposing the MC expectations for the degrees of freedom for which the significance is below 1 standard deviation. This decision is obviously dependent on the expected MC shape (but not much, a MC at several standard deviations from the data still gives a sensible answer, since we are dealing with non-very significant degrees of freedom). The method is optimal to test the validity or compare to a given MC prediction directly at the generator level. If the user prefers to fix the significance cut by hand, the method "TH1F* AMSUnfold::Unfold_JAlcaraz(double sigma_cut)" must be used. Comments about the C. Delgado true bayesian unfolding (id 5) ============================================================ This method is conceptually different to all the others. Instead of searching for the distribution that maximizes the regularized likelihood (methods 2,3 and 4) or using a truncated succesion of the exact maximum likelihood solution for the statistics at hand (method 1, wrongly called bayesian method), the true bayesian unfolding method samples from the bayesian posterior distribution given the observed data. The unfolded histogram is obtained as the mean of the sample whereas the error is computed as the squared root of the covariance matrix's diagonal. As not reduction on the number of degrees of freedom is performed the errors between different bins are higly correlated, and the histogram plot could show large error bars. In order to correctly compute the errors for any application, the full covarianve matrix is provided by calling TH2F *Unfold::GetCovariance(). This function returns a 2d histogram where the bin i,j is the covariance matrix element i,j. The computed covariance takes into account the fluctuations on data and the migration matrix. As the method makes no assumption about the bins ordering, it can be used to unfold histograms of any dimension provided they are converted to 1 dimensional ones and the migration matrix is correctly provided. Some technical information about the method follows: * The method is fast enough to be usable, but the sampling time scales linearly with the number of bins and the number of entries in the data histogram. * Poisson statistics on the data and the migration matrix is assumed. * The implementation ignores the underflows and overflows in data as well as in the Y direction of the migration matrix. On the contrary, the ones in the X axis of the migration matrix are taken into account in the method. * It is possible to substract a fixed background, but no errors on it is assumed.