Logo ROOT   6.30.04
Reference Guide
 All Namespaces Files Pages
TSPlot.cxx
Go to the documentation of this file.
1 // @(#)root/splot:$Id$
2 // Author: Muriel Pivk, Anna Kreshuk 10/2005
3 
4 /**********************************************************************
5  * *
6  * Copyright (c) 2005 ROOT Foundation, CERN/PH-SFT *
7  * *
8  **********************************************************************/
9 
10 #include "TSPlot.h"
11 #include "TVirtualFitter.h"
12 #include "TH1.h"
13 #include "TTreePlayer.h"
14 #include "TTreeFormula.h"
15 #include "TTreeFormulaManager.h"
16 #include "TSelectorDraw.h"
17 #include "TBrowser.h"
18 #include "TClass.h"
19 #include "TMath.h"
20 
21 extern void Yields(Int_t &, Double_t *, Double_t &f, Double_t *x, Int_t iflag);
22 
23 ClassImp(TSPlot);
24 
25 /** \class TSPlot
26 
27 A common method used in High Energy Physics to perform measurements is
28 the maximum Likelihood method, exploiting discriminating variables to
29 disentangle signal from background. The crucial point for such an
30 analysis to be reliable is to use an exhaustive list of sources of
31 events combined with an accurate description of all the Probability
32 Density Functions (PDF).
33 
34 To assess the validity of the fit, a convincing quality check
35 is to explore further the data sample by examining the distributions of
36 control variables. A control variable can be obtained for instance by
37 removing one of the discriminating variables before performing again
38 the maximum Likelihood fit: this removed variable is a control
39 variable. The expected distribution of this control variable, for
40 signal, is to be compared to the one extracted, for signal, from the
41 data sample. In order to be able to do so, one must be able to unfold
42 from the distribution of the whole data sample.
43 
44 The TSPlot method allows to reconstruct the distributions for
45 the control variable, independently for each of the various sources of
46 events, without making use of any <em>a priori</em> knowledge on <u>this</u>
47 variable. The aim is thus to use the knowledge available for the
48 discriminating variables to infer the behaviour of the individual
49 sources of events with respect to the control variable.
50 
51 TSPlot is optimal if the control variable is uncorrelated with the discriminating variables.
52 
53 
54 A detail description of the formalism itself, called \f$\hbox{$_s$}{\cal P}lot\f$, is given
55 in [<a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/node1.html#bib:sNIM">1</a>].
56 
57 ### The method
58 
59 
60 The \f$\hbox{$_s$}{\cal P}lot\f$ technique is developed in the above context of a
61 maximum Likelihood method making use of discriminating variables.
62 
63 One considers a data sample in which are merged several species
64 of events. These species represent various signal components and
65 background components which all together account for the data sample.
66 The different terms of the log-Likelihood are:
67 
68  - \f$N\f$ : the total number of events in the data sample,
69  - \f${\rm N}_{\rm s}\f$ : the number of species of events populating the data sample,
70  - \f$N_i\f$ : the number of events expected on the average for the \f$i^{\rm th}\f$ species,
71  - \f${\rm f}_i(y_e)\f$" : the value of the PDFs of the discriminating variables
72  \f$y\f$" for the\f$i^{th}\f$ species and for event\f$e\f$",
73  - \f$x\f$" : the set of control variables which, by definition, do not appear in
74  the expression of the Likelihood function \f${\cal L}\f$.
75 
76 The extended log-Likelihood reads:
77 
78  \f[
79 {\cal L}=\sum_{e=1}^{N}\ln \Big\{ \sum_{i=1}^{{\rm N}_{\rm s}}N_i{\rm f}_i(y_e) \Big\} -\sum_{i=1}^{{\rm N}_{\rm s}}N_i \tag{1}
80 \f]
81 
82 From this expression, after maximization of \f${\cal L}\f$ with respect to the \f$N_i\f$ parameters,
83 a weight can be computed for every event and each species, in order to obtain later the true distribution
84 \f$\hbox{M}_i(x)\f$ of variable \f$x\f$. If \f${\rm n}\f$ is one of the
85  \f${\rm N}_{\rm s}\f$ species present in the data sample, the weight for this species is defined by:
86 
87 
88 \f[
89 \fbox{$
90 {_s{\cal P}}_{\rm n}(y_e)={\sum_{j=1}^{{\rm N}_{\rm s}} \hbox{V}_{{\rm n}j}{\rm f}_j(y_e)\over\sum_{k=1}^{{\rm N}_{\rm s}}N_k{\rm f}_k(y_e) } $} , \tag{2}
91 \f]
92 
93 
94 where \f$\hbox{V}_{{\rm n}j}\f$
95 
96 is the covariance matrix resulting from the Likelihood maximization.
97 This matrix can be used directly from the fit, but this is numerically
98 less accurate than the direct computation:
99 
100 
101 \f[
102 \hbox{ V}^{-1}_{{\rm n}j}~=~
103 {\partial^2(-{\cal L})\over\partial N_{\rm n}\partial N_j}~=~
104 \sum_{e=1}^N {{\rm f}_{\rm n}(y_e){\rm f}_j(y_e)\over(\sum_{k=1}^{{\rm N}_{\rm s}}N_k{\rm f}_k(y_e))^2} . \tag{3}
105 \f]
106 
107 
108 The distribution of the control variable \f$x\f$ obtained by histogramming the weighted
109 events reproduces, on average, the true distribution
110 \f${\hbox{ {M}}}_{\rm n}(x)\f$
111 
112 The class TSPlot allows to reconstruct the true distribution
113 \f${\hbox{ {M}}}_{\rm n}(x)\f$
114 
115 of a control variable \f$x\f$ for each of the \f${\rm N}_{\rm s}\f$ species from
116 the sole knowledge of the PDFs of the discriminating variables \f${\rm f}_i(y)\f$.
117 The plots obtained thanks to the TSPlot class are called \f$\hbox {$_s$}{\cal P}lots\f$.
118 
119 
120 ### Some properties and checks
121 
122 
123 Beside reproducing the true distribution,\f$\hbox {$_s$}{\cal P}lots\f$ bear remarkable properties:
124 
125 
126  - Each \f$x\f$ - distribution is properly normalized:
127 
128 \f[
129 \sum_{e=1}^{N} {_s{\cal P}}_{\rm n}(y_e)~=~N_{\rm n} ~. \tag{4}
130 \f]
131 
132 
133  - For any event:
134 
135 \f[
136 \sum_{l=1}^{{\rm N}_{\rm s}} {_s{\cal P}}_l(y_e) ~=~1 ~. \tag{5}
137 \f]
138 
139  That is to say that, summing up the \f${\rm N}_{\rm s}\f$ \f$\hbox {$_s$}{\cal P}lots\f$,
140  one recovers the data sample distribution in \f$x\f$, and summing up the number of events
141  entering in a \f$\hbox{$_s$}{\cal P}lot\f$ for a given species, one recovers the yield of the
142  species, as provided by the fit.
143  The property <a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#eq:NormalizationOK">4</a> is implemented in the TSPlot class as a check.
144 
145  - the sum of the statistical uncertainties per bin
146 
147 \f[
148 \sigma[N_{\rm n}\ _s\tilde{\rm M}_{\rm n}(x) {\delta x}]~=~\sqrt{\sum_{e \subset {\delta x}} ({_s{\cal P}}_{\rm n})^2} ~. \tag{6}
149 \f]
150 
151  reproduces the statistical uncertainty on the yield \f$N_{\rm n}\f$, as provided by the fit:
152  \f$\sigma[N_{\rm n}]\equiv\sqrt{\hbox{ V}_{{\rm n}{\rm n}}}\f$ .
153  Because of that and since the determination of the yields is optimal
154  when obtained using a Likelihood fit, one can conclude that the \f$\hbox{$_s$}{\cal P}lot\f$
155  technique is itself an optimal method to reconstruct distributions of control variables.
156 
157 
158 ### Different steps followed by TSPlot
159 
160 
161  1. A maximum Likelihood fit is performed to obtain the yields \f$N_i\f$
162  of the various species.The fit relies on discriminating variables \f$y\f$
163  uncorrelated with a control variable \f$x\f$:
164  the later is therefore totally absent from the fit.
165 
166  2. The weights \f${_s{\cal P}}\f$ are calculated using Eq.
167  (<a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#eq:weightxnotiny">2</a>)
168  where the covariance matrix is taken from Minuit.
169 
170  3. Histograms of \f$x\f$ are filled by weighting the events with \f${_s{\cal P}}\f$ .
171 
172  4. Error bars per bin are given by Eq. (<a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#eq:ErrorPerBin">6</a>).
173 
174 
175 The \f$\hbox {$_s$}{\cal P}lots\f$ reproduce the true distributions of the species
176 in the control variable \f$x\f$, within the above defined statistical uncertainties.
177 
178 ### Illustrations
179 
180 
181 To illustrate the technique, one considers an example derived from the analysis where
182 \f$\hbox {$_s$}{\cal P}lots\f$
183 have been first used (charmless B decays). One is dealing with a data
184 sample in which two species are present: the first is termed signal and
185 the second background. A maximum Likelihood fit is performed to obtain
186 the two yields \f$N_1\f$ and \f$N_2\f$ . The fit relies on two discriminating
187 variables collectively denoted \f$y\f$ which are chosen within three possible
188 variables denoted \f${m_{\rm ES}}\f$ , \f$\Delta E\f$ and \f${\cal F}\f$.
189 The variable which is not incorporated in \f$y\f$ is used as the control variable
190 \f$x\f$ . The six distributions of the three variables are assumed to be the ones
191 depicted in Fig. <a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#fig:pdfs">1</a>.
192 
193 
194 \image html splot_pdfmesNIM.png width=800
195 
196 #### Figure 1:
197 
198 Distributions of the three discriminating variables available to perform the Likelihood fit:
199 \f${m_{\rm ES}}\f$ , \f$\Delta E\f$ , \f${\cal F}\f$ .
200 Among the three variables, two are used to perform the fit while one is
201 kept out of the fit to serve the purpose of a control variable. The
202 three distributions on the top (resp. bottom) of the figure correspond
203 to the signal (resp. background). The unit of the vertical axis is
204 chosen such that it indicates the number of entries per bin, if one
205 slices the histograms in 25 bins.
206 
207 A data sample being built through a Monte Carlo simulation based on the
208 distributions shown in Fig.
209 <a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#fig:pdfs">1</a>,
210 one obtains the three distributions of Fig.
211 <a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#fig:pdfstot">2</a>.
212 Whereas the distribution of \f$\Delta E\f$ clearly indicates the presence of the signal,
213 the distribution of \f${m_{\rm ES}}\f$ and \f${\cal F}\f$ are less obviously populated by signal.
214 
215 
216 \image html splot_genfiTOTNIM.png width=800
217 
218 #### Figure 2:
219 
220 Distributions of the three discriminating variables for signal plus
221 background. The three distributions are the ones obtained from a data
222 sample obtained through a Monte Carlo simulation based on the
223 distributions shown in Fig.
224 <a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#fig:pdfs">1</a>.
225 The data sample consists of 500 signal events and 5000 background events.
226 
227 
228 Choosing \f$\Delta E\f$ and \f${\cal F}\f$ as discriminating variables to determine
229 \f$N_1\f$ and \f$N_2\f$ through a maximum Likelihood fit, one builds, for the control
230 variable \f${m_{\rm ES}}\f$ which is unknown to the fit, the two \f$\hbox {$_s$}{\cal P}lots\f$
231 for signal and background shown in
232 Fig. <a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#fig:messPlots">3</a>.
233 One observes that the \f$\hbox{$_s$}{\cal P}lot\f$
234 for signal reproduces correctly the PDF even where the latter vanishes,
235 although the error bars remain sizeable. This results from the almost
236 complete cancellation between positive and negative weights: the sum of
237 weights is close to zero while the sum of weights squared is not. The
238 occurence of negative weights occurs through the appearance of the
239 covariance matrix, and its negative components, in the definition of
240 Eq. (<a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#eq:weightxnotiny">2</a>).
241 
242 
243 A word of caution is in order with respect to the error bars. Whereas
244 their sum in quadrature is identical to the statistical uncertainties
245 of the yields determined by the fit, and if, in addition, they are
246 asymptotically correct, the error bars should be handled with care for
247 low statistics and/or for too fine binning. This is because the error
248 bars do not incorporate two known properties of the PDFs: PDFs are
249 positive definite and can be non-zero in a given x-bin, even if in the
250 particular data sample at hand, no event is observed in this bin. The
251 latter limitation is not specific to \f$\hbox {$_s$}{\cal P}lots\f$ ,
252 rather it is always present when one is willing to infer the PDF at the
253 origin of an histogram, when, for some bins, the number of entries does
254 not guaranty the applicability of the Gaussian regime. In such
255 situations, a satisfactory practice is to attach allowed ranges to the
256 histogram to indicate the upper and lower limits of the PDF value which
257 are consistent with the actual observation, at a given confidence
258 level.
259 
260 
261 \image html splot_mass-bkg-sPlot.png width=600
262 
263 #### Figure 3:
264 
265 The \f$\hbox {$_s$}{\cal P}lots\f$ (signal on top, background on bottom)
266 obtained for \f${m_{\rm ES}}\f$ are represented as dots with error bars.
267 They are obtained from a fit using only information from \f$\Delta E\f$ and
268 \f${\cal F}\f$
269 
270 <p>
271 Choosing \f${m_{\rm ES}}\f$ and \f$\Delta E\f$ as discriminating variables to
272 determine \f$N_1\f$ and \f$N_2\f$ through a maximum Likelihood fit, one builds,
273 for the control variable \f${\cal F}\f$ which is unknown to the fit, the two
274 \f$\hbox {$_s$}{\cal P}lots\f$ for signal and background shown in
275 Fig. <a href="http://www.slac.stanford.edu/%7Epivk/sPlot/sPlot_ROOT/sPlot_ROOT.html#fig:FisPlots">4</a>.
276 In the \f$\hbox{$_s$}{\cal P}lot\f$ for signal one observes that error bars are
277 the largest in the \f$x\f$ regions where the background is the largest.
278 
279 
280 \image html splot_fisher-bkg-sPlot.png width=600
281 
282 #### Figure 4:
283 
284 The \f$\hbox {$_s$}{\cal P}lots\f$ (signal on top, background on bottom) obtained
285 for \f${\cal F}\f$ are represented as dots with error bars. They are obtained
286 from a fit using only information from \f${m_{\rm ES}}\f$ and \f$\Delta E\f$
287 
288 The results above can be obtained by running the tutorial TestSPlot.C
289 */
290 
291 
292 ////////////////////////////////////////////////////////////////////////////////
293 /// default constructor (used by I/O only)
294 
295 TSPlot::TSPlot() :
296  fTree(0),
297  fTreename(0),
298  fVarexp(0),
299  fSelection(0)
300 {
301  fNx = 0;
302  fNy=0;
303  fNevents = 0;
304  fNSpecies=0;
305  fNumbersOfEvents=0;
306 }
307 
308 ////////////////////////////////////////////////////////////////////////////////
309 /// Normal TSPlot constructor
310 /// - nx : number of control variables
311 /// - ny : number of discriminating variables
312 /// - ne : total number of events
313 /// - ns : number of species
314 /// - tree: input data
315 
316 TSPlot::TSPlot(Int_t nx, Int_t ny, Int_t ne, Int_t ns, TTree *tree) :
317  fTreename(0),
318  fVarexp(0),
319  fSelection(0)
320 
321 {
322  fNx = nx;
323  fNy=ny;
324  fNevents = ne;
325  fNSpecies=ns;
326 
327  fXvar.ResizeTo(fNevents, fNx);
328  fYvar.ResizeTo(fNevents, fNy);
329  fYpdf.ResizeTo(fNevents, fNSpecies*fNy);
330  fSWeights.ResizeTo(fNevents, fNSpecies*(fNy+1));
331  fTree = tree;
332  fNumbersOfEvents = 0;
333 }
334 
335 ////////////////////////////////////////////////////////////////////////////////
336 /// Destructor
337 
338 TSPlot::~TSPlot()
339 {
340  if (fNumbersOfEvents)
341  delete [] fNumbersOfEvents;
342  if (!fXvarHists.IsEmpty())
343  fXvarHists.Delete();
344  if (!fYvarHists.IsEmpty())
345  fYvarHists.Delete();
346  if (!fYpdfHists.IsEmpty())
347  fYpdfHists.Delete();
348 }
349 
350 ////////////////////////////////////////////////////////////////////////////////
351 /// To browse the histograms
352 
353 void TSPlot::Browse(TBrowser *b)
354 {
355  if (!fSWeightsHists.IsEmpty()) {
356  TIter next(&fSWeightsHists);
357  TH1D* h = 0;
358  while ((h = (TH1D*)next()))
359  b->Add(h,h->GetName());
360  }
361 
362  if (!fYpdfHists.IsEmpty()) {
363  TIter next(&fYpdfHists);
364  TH1D* h = 0;
365  while ((h = (TH1D*)next()))
366  b->Add(h,h->GetName());
367  }
368  if (!fYvarHists.IsEmpty()) {
369  TIter next(&fYvarHists);
370  TH1D* h = 0;
371  while ((h = (TH1D*)next()))
372  b->Add(h,h->GetName());
373  }
374  if (!fXvarHists.IsEmpty()) {
375  TIter next(&fXvarHists);
376  TH1D* h = 0;
377  while ((h = (TH1D*)next()))
378  b->Add(h,h->GetName());
379  }
380  b->Add(&fSWeights, "sWeights");
381 }
382 
383 ////////////////////////////////////////////////////////////////////////////////
384 /// Set the initial number of events of each species - used
385 /// as initial estimates in minuit
386 
387 void TSPlot::SetInitialNumbersOfSpecies(Int_t *numbers)
388 {
389  if (!fNumbersOfEvents)
390  fNumbersOfEvents = new Double_t[fNSpecies];
391  for (Int_t i=0; i<fNSpecies; i++)
392  fNumbersOfEvents[i]=numbers[i];
393 }
394 
395 ////////////////////////////////////////////////////////////////////////////////
396 /// Calculates the sWeights
397 ///
398 /// The option controls the print level
399 /// - "Q" - no print out
400 /// - "V" - prints the estimated #of events in species - default
401 /// - "VV" - as "V" + the minuit printing + sums of weights for control
402 
403 void TSPlot::MakeSPlot(Option_t *option)
404 {
405 
406  if (!fNumbersOfEvents){
407  Error("MakeSPlot","Initial numbers of events in species have not been set");
408  return;
409  }
410  Int_t i, j, ispecies;
411 
412  TString opt = option;
413  opt.ToUpper();
414  opt.ReplaceAll("VV", "W");
415 
416  //make sure that global fitter is minuit
417  char s[]="TFitter";
418  if (TVirtualFitter::GetFitter()){
419  Int_t strdiff=strcmp(TVirtualFitter::GetFitter()->IsA()->GetName(), s);
420  if (strdiff!=0)
421  delete TVirtualFitter::GetFitter();
422  }
423 
424 
425  TVirtualFitter *minuit = TVirtualFitter::Fitter(0, 2);
426  fPdfTot.ResizeTo(fNevents, fNSpecies);
427 
428  //now let's do it, excluding different yvars
429  //for iplot = -1 none is excluded
430  for (Int_t iplot=-1; iplot<fNy; iplot++){
431  for (i=0; i<fNevents; i++){
432  for (ispecies=0; ispecies<fNSpecies; ispecies++){
433  fPdfTot(i, ispecies)=1;
434  for (j=0; j<fNy; j++){
435  if (j!=iplot)
436  fPdfTot(i, ispecies)*=fYpdf(i, ispecies*fNy+j);
437  }
438  }
439  }
440  minuit->Clear();
441  minuit->SetFCN(Yields);
442  Double_t arglist[10];
443  //set the print level
444  if (opt.Contains("Q")||opt.Contains("V")){
445  arglist[0]=-1;
446  }
447  if (opt.Contains("W"))
448  arglist[0]=0;
449  minuit->ExecuteCommand("SET PRINT", arglist, 1);
450 
451  minuit->SetObjectFit(&fPdfTot); //a tricky way to get fPdfTot matrix to fcn
452  for (ispecies=0; ispecies<fNSpecies; ispecies++)
453  minuit->SetParameter(ispecies, "", fNumbersOfEvents[ispecies], 1, 0, 0);
454 
455  minuit->ExecuteCommand("MIGRAD", arglist, 0);
456  for (ispecies=0; ispecies<fNSpecies; ispecies++){
457  fNumbersOfEvents[ispecies]=minuit->GetParameter(ispecies);
458  if (!opt.Contains("Q"))
459  printf("estimated #of events in species %d = %f\n", ispecies, fNumbersOfEvents[ispecies]);
460  }
461  if (!opt.Contains("Q"))
462  printf("\n");
463  Double_t *covmat = minuit->GetCovarianceMatrix();
464  SPlots(covmat, iplot);
465 
466  if (opt.Contains("W")){
467  Double_t *sumweight = new Double_t[fNSpecies];
468  for (i=0; i<fNSpecies; i++){
469  sumweight[i]=0;
470  for (j=0; j<fNevents; j++)
471  sumweight[i]+=fSWeights(j, (iplot+1)*fNSpecies + i);
472  printf("checking sum of weights[%d]=%f\n", i, sumweight[i]);
473  }
474  printf("\n");
475  delete [] sumweight;
476  }
477  }
478 }
479 
480 ////////////////////////////////////////////////////////////////////////////////
481 /// Computes the sWeights from the covariance matrix
482 
483 void TSPlot::SPlots(Double_t *covmat, Int_t i_excl)
484 {
485  Int_t i, ispecies, k;
486  Double_t numerator, denominator;
487  for (i=0; i<fNevents; i++){
488  denominator=0;
489  for (ispecies=0; ispecies<fNSpecies; ispecies++)
490  denominator+=fNumbersOfEvents[ispecies]*fPdfTot(i, ispecies);
491  for (ispecies=0; ispecies<fNSpecies; ispecies++){
492  numerator=0;
493  for (k=0; k<fNSpecies; k++)
494  numerator+=covmat[ispecies*fNSpecies+k]*fPdfTot(i, k);
495  fSWeights(i, (i_excl+1)*fNSpecies + ispecies)=numerator/denominator;
496  }
497  }
498 
499 }
500 
501 ////////////////////////////////////////////////////////////////////////////////
502 /// Returns the matrix of sweights
503 
504 void TSPlot::GetSWeights(TMatrixD &weights)
505 {
506  if (weights.GetNcols()!=fNSpecies*(fNy+1) || weights.GetNrows()!=fNevents)
507  weights.ResizeTo(fNevents, fNSpecies*(fNy+1));
508  weights = fSWeights;
509 }
510 
511 ////////////////////////////////////////////////////////////////////////////////
512 /// Returns the matrix of sweights. It is assumed that the array passed in the
513 /// argurment is big enough
514 
515 void TSPlot::GetSWeights(Double_t *weights)
516 {
517  for (Int_t i=0; i<fNevents; i++){
518  for (Int_t j=0; j<fNSpecies; j++){
519  weights[i*fNSpecies+j]=fSWeights(i, j);
520  }
521  }
522 }
523 
524 ////////////////////////////////////////////////////////////////////////////////
525 /// Fills the histograms of x variables (not weighted) with nbins
526 
527 void TSPlot::FillXvarHists(Int_t nbins)
528 {
529  Int_t i, j;
530 
531  if (!fXvarHists.IsEmpty()){
532  if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
533  fXvarHists.Delete();
534  else
535  return;
536  }
537 
538  //make the histograms
539  char name[12];
540  for (i=0; i<fNx; i++){
541  snprintf(name,sizeof(name), "x%d", i);
542  TH1D *h = new TH1D(name, name, nbins, fMinmax(0, i), fMinmax(1, i));
543  for (j=0; j<fNevents; j++)
544  h->Fill(fXvar(j, i));
545  fXvarHists.Add(h);
546  }
547 
548 }
549 
550 ////////////////////////////////////////////////////////////////////////////////
551 /// Returns the array of histograms of x variables (not weighted).
552 /// If histograms have not already
553 /// been filled, they are filled with default binning 100.
554 
555 TObjArray* TSPlot::GetXvarHists()
556 {
557  Int_t nbins = 100;
558  if (fXvarHists.IsEmpty())
559  FillXvarHists(nbins);
560  else if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
561  FillXvarHists(nbins);
562  return &fXvarHists;
563 }
564 
565 ////////////////////////////////////////////////////////////////////////////////
566 ///Returns the histogram of variable ixvar.
567 /// If histograms have not already
568 /// been filled, they are filled with default binning 100.
569 
570 TH1D *TSPlot::GetXvarHist(Int_t ixvar)
571 {
572  Int_t nbins = 100;
573  if (fXvarHists.IsEmpty())
574  FillXvarHists(nbins);
575  else if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
576  FillXvarHists(nbins);
577 
578  return (TH1D*)fXvarHists.UncheckedAt(ixvar);
579 }
580 
581 ////////////////////////////////////////////////////////////////////////////////
582 /// Fill the histograms of y variables
583 
584 void TSPlot::FillYvarHists(Int_t nbins)
585 {
586  Int_t i, j;
587 
588  if (!fYvarHists.IsEmpty()){
589  if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
590  fYvarHists.Delete();
591  else
592  return;
593  }
594 
595  //make the histograms
596  char name[12];
597  for (i=0; i<fNy; i++){
598  snprintf(name,sizeof(name), "y%d", i);
599  TH1D *h=new TH1D(name, name, nbins, fMinmax(0, fNx+i), fMinmax(1, fNx+i));
600  for (j=0; j<fNevents; j++)
601  h->Fill(fYvar(j, i));
602  fYvarHists.Add(h);
603  }
604 }
605 
606 ////////////////////////////////////////////////////////////////////////////////
607 /// Returns the array of histograms of y variables. If histograms have not already
608 /// been filled, they are filled with default binning 100.
609 
610 TObjArray* TSPlot::GetYvarHists()
611 {
612  Int_t nbins = 100;
613  if (fYvarHists.IsEmpty())
614  FillYvarHists(nbins);
615  else if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
616  FillYvarHists(nbins);
617  return &fYvarHists;
618 }
619 
620 ////////////////////////////////////////////////////////////////////////////////
621 /// Returns the histogram of variable iyvar.If histograms have not already
622 /// been filled, they are filled with default binning 100.
623 
624 TH1D *TSPlot::GetYvarHist(Int_t iyvar)
625 {
626  Int_t nbins = 100;
627  if (fYvarHists.IsEmpty())
628  FillYvarHists(nbins);
629  else if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
630  FillYvarHists(nbins);
631  return (TH1D*)fYvarHists.UncheckedAt(iyvar);
632 }
633 
634 ////////////////////////////////////////////////////////////////////////////////
635 /// Fills the histograms of pdf-s of y variables with binning nbins
636 
637 void TSPlot::FillYpdfHists(Int_t nbins)
638 {
639  Int_t i, j, ispecies;
640 
641  if (!fYpdfHists.IsEmpty()){
642  if (((TH1D*)fYpdfHists.First())->GetNbinsX()!=nbins)
643  fYpdfHists.Delete();
644  else
645  return;
646  }
647 
648  char name[34];
649  for (ispecies=0; ispecies<fNSpecies; ispecies++){
650  for (i=0; i<fNy; i++){
651  snprintf(name,sizeof(name), "pdf_species%d_y%d", ispecies, i);
652  //TH1D *h = new TH1D(name, name, nbins, ypdfmin[ispecies*fNy+i], ypdfmax[ispecies*fNy+i]);
653  TH1D *h = new TH1D(name, name, nbins, fMinmax(0, fNx+fNy+ispecies*fNy+i), fMinmax(1, fNx+fNy+ispecies*fNy+i));
654  for (j=0; j<fNevents; j++)
655  h->Fill(fYpdf(j, ispecies*fNy+i));
656  fYpdfHists.Add(h);
657  }
658  }
659 }
660 
661 ////////////////////////////////////////////////////////////////////////////////
662 /// Returns the array of histograms of pdf's of y variables with binning nbins.
663 /// If histograms have not already
664 /// been filled, they are filled with default binning 100.
665 
666 TObjArray* TSPlot::GetYpdfHists()
667 {
668  Int_t nbins = 100;
669  if (fYpdfHists.IsEmpty())
670  FillYpdfHists(nbins);
671 
672  return &fYpdfHists;
673 }
674 
675 ////////////////////////////////////////////////////////////////////////////////
676 /// Returns the histogram of the pdf of variable iyvar for species #ispecies, binning nbins.
677 /// If histograms have not already
678 /// been filled, they are filled with default binning 100.
679 
680 TH1D *TSPlot::GetYpdfHist(Int_t iyvar, Int_t ispecies)
681 {
682  Int_t nbins = 100;
683  if (fYpdfHists.IsEmpty())
684  FillYpdfHists(nbins);
685 
686  return (TH1D*)fYpdfHists.UncheckedAt(fNy*ispecies+iyvar);
687 }
688 
689 ////////////////////////////////////////////////////////////////////////////////
690 /// The order of histograms in the array:
691 ///
692 /// x0_species0, x0_species1,..., x1_species0, x1_species1,..., y0_species0, y0_species1,...
693 ///
694 /// If the histograms have already been filled with a different binning, they are refilled
695 /// and all histograms are deleted
696 
697 void TSPlot::FillSWeightsHists(Int_t nbins)
698 {
699  if (fSWeights.GetNoElements()==0){
700  Error("GetSWeightsHists", "SWeights were not computed");
701  return;
702  }
703 
704  if (!fSWeightsHists.IsEmpty()){
705  if (((TH1D*)fSWeightsHists.First())->GetNbinsX()!=nbins)
706  fSWeightsHists.Delete();
707  else
708  return;
709  }
710 
711  char name[30];
712 
713  //Fill histograms of x-variables weighted with sWeights
714  for (Int_t ivar=0; ivar<fNx; ivar++){
715  for (Int_t ispecies=0; ispecies<fNSpecies; ispecies++){
716  snprintf(name,30, "x%d_species%d", ivar, ispecies);
717  TH1D *h = new TH1D(name, name, nbins, fMinmax(0, ivar), fMinmax(1, ivar));
718  h->Sumw2();
719  for (Int_t ievent=0; ievent<fNevents; ievent++)
720  h->Fill(fXvar(ievent, ivar), fSWeights(ievent, ispecies));
721  fSWeightsHists.AddLast(h);
722  }
723  }
724 
725  //Fill histograms of y-variables (excluded from the fit), weighted with sWeights
726  for (Int_t iexcl=0; iexcl<fNy; iexcl++){
727  for(Int_t ispecies=0; ispecies<fNSpecies; ispecies++){
728  snprintf(name,30, "y%d_species%d", iexcl, ispecies);
729  TH1D *h = new TH1D(name, name, nbins, fMinmax(0, fNx+iexcl), fMinmax(1, fNx+iexcl));
730  h->Sumw2();
731  for (Int_t ievent=0; ievent<fNevents; ievent++)
732  h->Fill(fYvar(ievent, iexcl), fSWeights(ievent, fNSpecies*(iexcl+1)+ispecies));
733  fSWeightsHists.AddLast(h);
734  }
735  }
736 }
737 
738 ////////////////////////////////////////////////////////////////////////////////
739 /// Returns an array of all histograms of variables, weighted with sWeights.
740 /// If histograms have not been already filled, they are filled with default binning 50
741 /// The order of histograms in the array:
742 ///
743 /// x0_species0, x0_species1,..., x1_species0, x1_species1,..., y0_species0, y0_species1,...
744 
745 TObjArray *TSPlot::GetSWeightsHists()
746 {
747  Int_t nbins = 50; //default binning
748  if (fSWeightsHists.IsEmpty())
749  FillSWeightsHists(nbins);
750 
751  return &fSWeightsHists;
752 }
753 
754 ////////////////////////////////////////////////////////////////////////////////
755 /// The Fill...Hist() methods fill the histograms with the real limits on the variables
756 /// This method allows to refill the specified histogram with user-set boundaries min and max
757 ///
758 ///Parameters:
759 ///
760 /// - type = 1 - histogram of x variable #nvar
761 /// - type = 2 - histogram of y variable #nvar
762 /// - type = 3 - histogram of y_pdf for y #nvar and species #nspecies
763 /// - type = 4 - histogram of x variable #nvar, species #nspecies, WITH sWeights
764 /// - type = 5 - histogram of y variable #nvar, species #nspecies, WITH sWeights
765 
766 void TSPlot::RefillHist(Int_t type, Int_t nvar, Int_t nbins, Double_t min, Double_t max, Int_t nspecies)
767 {
768  if (type<1 || type>5){
769  Error("RefillHist", "type must lie between 1 and 5");
770  return;
771  }
772  char name[20];
773  Int_t j;
774  TH1D *hremove;
775  if (type==1){
776  hremove = (TH1D*)fXvarHists.RemoveAt(nvar);
777  delete hremove;
778  snprintf(name,20,"x%d",nvar);
779  TH1D *h = new TH1D(name, name, nbins, min, max);
780  for (j=0; j<fNevents;j++)
781  h->Fill(fXvar(j, nvar));
782  fXvarHists.AddAt(h, nvar);
783  }
784  if (type==2){
785  hremove = (TH1D*)fYvarHists.RemoveAt(nvar);
786  delete hremove;
787  snprintf(name,20, "y%d", nvar);
788  TH1D *h = new TH1D(name, name, nbins, min, max);
789  for (j=0; j<fNevents;j++)
790  h->Fill(fYvar(j, nvar));
791  fXvarHists.AddAt(h, nvar);
792  }
793  if (type==3){
794  hremove = (TH1D*)fYpdfHists.RemoveAt(nspecies*fNy+nvar);
795  delete hremove;
796  snprintf(name,20, "pdf_species%d_y%d", nspecies, nvar);
797  TH1D *h=new TH1D(name, name, nbins, min, max);
798  for (j=0; j<fNevents; j++)
799  h->Fill(fYpdf(j, nspecies*fNy+nvar));
800  fYpdfHists.AddAt(h, nspecies*fNy+nvar);
801  }
802  if (type==4){
803  hremove = (TH1D*)fSWeightsHists.RemoveAt(fNSpecies*nvar+nspecies);
804  delete hremove;
805  snprintf(name,20, "x%d_species%d", nvar, nspecies);
806  TH1D *h = new TH1D(name, name, nbins, min, max);
807  h->Sumw2();
808  for (Int_t ievent=0; ievent<fNevents; ievent++)
809  h->Fill(fXvar(ievent, nvar), fSWeights(ievent, nspecies));
810  fSWeightsHists.AddAt(h, fNSpecies*nvar+nspecies);
811  }
812  if (type==5){
813  hremove = (TH1D*)fSWeightsHists.RemoveAt(fNx*fNSpecies + fNSpecies*nvar+nspecies);
814  delete hremove;
815  snprintf(name,20, "y%d_species%d", nvar, nspecies);
816  TH1D *h = new TH1D(name, name, nbins, min, max);
817  h->Sumw2();
818  for (Int_t ievent=0; ievent<fNevents; ievent++)
819  h->Fill(fYvar(ievent, nvar), fSWeights(ievent, nspecies));
820  fSWeightsHists.AddAt(h, fNx*fNSpecies + fNSpecies*nvar+nspecies);
821  }
822 }
823 ////////////////////////////////////////////////////////////////////////////////
824 /// Returns the histogram of a variable, weighted with sWeights.
825 /// - If histograms have not been already filled, they are filled with default binning 50
826 /// - If parameter ixvar!=-1, the histogram of x-variable ixvar is returned for species ispecies
827 /// - If parameter ixvar==-1, the histogram of y-variable iyexcl is returned for species ispecies
828 /// - If the histogram has already been filled and the binning is different from the parameter nbins
829 /// all histograms with old binning will be deleted and refilled.
830 
831 TH1D *TSPlot::GetSWeightsHist(Int_t ixvar, Int_t ispecies,Int_t iyexcl)
832 {
833 
834  Int_t nbins = 50; //default binning
835  if (fSWeightsHists.IsEmpty())
836  FillSWeightsHists(nbins);
837 
838  if (ixvar==-1)
839  return (TH1D*)fSWeightsHists.UncheckedAt(fNx*fNSpecies + fNSpecies*iyexcl+ispecies);
840  else
841  return (TH1D*)fSWeightsHists.UncheckedAt(fNSpecies*ixvar + ispecies);
842 
843 }
844 
845 
846 ////////////////////////////////////////////////////////////////////////////////
847 /// Set the input Tree
848 
849 void TSPlot::SetTree(TTree *tree)
850 {
851  fTree = tree;
852 }
853 
854 ////////////////////////////////////////////////////////////////////////////////
855 ///Specifies the variables from the tree to be used for splot
856 ///
857 ///Variables fNx, fNy, fNSpecies and fNEvents should already be set!
858 ///
859 ///In the 1st parameter it is assumed that first fNx variables are x(control variables),
860 ///then fNy y variables (discriminating variables),
861 ///then fNy*fNSpecies ypdf variables (probability distribution functions of discriminating
862 ///variables for different species). The order of pdfs should be: species0_y0, species0_y1,...
863 ///species1_y0, species1_y1,...species[fNSpecies-1]_y0...
864 ///The 2nd parameter allows to make a cut
865 ///TTree::Draw method description contains more details on specifying expression and selection
866 
867 void TSPlot::SetTreeSelection(const char* varexp, const char *selection, Long64_t firstentry)
868 {
869  TTreeFormula **var;
870  std::vector<TString> cnames;
871  TList *formulaList = new TList();
872  TSelectorDraw *selector = (TSelectorDraw*)(((TTreePlayer*)fTree->GetPlayer())->GetSelector());
873 
874  Long64_t entry, entryNumber;
875  Int_t i,nch;
876  Int_t ncols;
877  TObjArray *leaves = fTree->GetListOfLeaves();
878 
879  fTreename= new TString(fTree->GetName());
880  if (varexp)
881  fVarexp = new TString(varexp);
882  if (selection)
883  fSelection= new TString(selection);
884 
885  nch = varexp ? strlen(varexp) : 0;
886 
887 
888 //*-*- Compile selection expression if there is one
889  TTreeFormula *select = 0;
890  if (selection && strlen(selection)) {
891  select = new TTreeFormula("Selection",selection,fTree);
892  if (!select) return;
893  if (!select->GetNdim()) { delete select; return; }
894  formulaList->Add(select);
895  }
896 //*-*- if varexp is empty, take first nx + ny + ny*nspecies columns by default
897 
898  if (nch == 0) {
899  ncols = fNx + fNy + fNy*fNSpecies;
900  for (i=0;i<ncols;i++) {
901  cnames.push_back( leaves->At(i)->GetName() );
902  }
903 //*-*- otherwise select only the specified columns
904  } else {
905  ncols = selector->SplitNames(varexp,cnames);
906  }
907  var = new TTreeFormula* [ncols];
908  Double_t *xvars = new Double_t[ncols];
909 
910  fMinmax.ResizeTo(2, ncols);
911  for (i=0; i<ncols; i++){
912  fMinmax(0, i)=1e30;
913  fMinmax(1, i)=-1e30;
914  }
915 
916 //*-*- Create the TreeFormula objects corresponding to each column
917  for (i=0;i<ncols;i++) {
918  var[i] = new TTreeFormula("Var1",cnames[i].Data(),fTree);
919  formulaList->Add(var[i]);
920  }
921 
922 //*-*- Create a TreeFormulaManager to coordinate the formulas
923  TTreeFormulaManager *manager=0;
924  if (formulaList->LastIndex()>=0) {
925  manager = new TTreeFormulaManager;
926  for(i=0;i<=formulaList->LastIndex();i++) {
927  manager->Add((TTreeFormula*)formulaList->At(i));
928  }
929  manager->Sync();
930  }
931 //*-*- loop on all selected entries
932  // fSelectedRows = 0;
933  Int_t tnumber = -1;
934  Long64_t selectedrows=0;
935  for (entry=firstentry;entry<firstentry+fNevents;entry++) {
936  entryNumber = fTree->GetEntryNumber(entry);
937  if (entryNumber < 0) break;
938  Long64_t localEntry = fTree->LoadTree(entryNumber);
939  if (localEntry < 0) break;
940  if (tnumber != fTree->GetTreeNumber()) {
941  tnumber = fTree->GetTreeNumber();
942  if (manager) manager->UpdateFormulaLeaves();
943  }
944  Int_t ndata = 1;
945  if (manager && manager->GetMultiplicity()) {
946  ndata = manager->GetNdata();
947  }
948 
949  for(Int_t inst=0;inst<ndata;inst++) {
950  Bool_t loaded = kFALSE;
951  if (select) {
952  if (select->EvalInstance(inst) == 0) {
953  continue;
954  }
955  }
956 
957  if (inst==0) loaded = kTRUE;
958  else if (!loaded) {
959  // EvalInstance(0) always needs to be called so that
960  // the proper branches are loaded.
961  for (i=0;i<ncols;i++) {
962  var[i]->EvalInstance(0);
963  }
964  loaded = kTRUE;
965  }
966 
967  for (i=0;i<ncols;i++) {
968  xvars[i] = var[i]->EvalInstance(inst);
969  }
970 
971  // curentry = entry-firstentry;
972  //printf("event#%d\n", curentry);
973  //for (i=0; i<ncols; i++)
974  // printf("xvars[%d]=%f\n", i, xvars[i]);
975  //selectedrows++;
976  for (i=0; i<fNx; i++){
977  fXvar(selectedrows, i) = xvars[i];
978  if (fXvar(selectedrows, i) < fMinmax(0, i))
979  fMinmax(0, i)=fXvar(selectedrows, i);
980  if (fXvar(selectedrows, i) > fMinmax(1, i))
981  fMinmax(1, i)=fXvar(selectedrows, i);
982  }
983  for (i=0; i<fNy; i++){
984  fYvar(selectedrows, i) = xvars[i+fNx];
985  //printf("y_in_loop(%d, %d)=%f, xvars[%d]=%f\n", selectedrows, i, fYvar(selectedrows, i), i+fNx, xvars[i+fNx]);
986  if (fYvar(selectedrows, i) < fMinmax(0, i+fNx))
987  fMinmax(0, i+fNx) = fYvar(selectedrows, i);
988  if (fYvar(selectedrows, i) > fMinmax(1, i+fNx))
989  fMinmax(1, i+fNx) = fYvar(selectedrows, i);
990  for (Int_t j=0; j<fNSpecies; j++){
991  fYpdf(selectedrows, j*fNy + i)=xvars[j*fNy + i+fNx+fNy];
992  if (fYpdf(selectedrows, j*fNy+i) < fMinmax(0, j*fNy+i+fNx+fNy))
993  fMinmax(0, j*fNy+i+fNx+fNy) = fYpdf(selectedrows, j*fNy+i);
994  if (fYpdf(selectedrows, j*fNy+i) > fMinmax(1, j*fNy+i+fNx+fNy))
995  fMinmax(1, j*fNy+i+fNx+fNy) = fYpdf(selectedrows, j*fNy+i);
996  }
997  }
998  selectedrows++;
999  }
1000  }
1001  fNevents=selectedrows;
1002  // for (i=0; i<fNevents; i++){
1003  // printf("event#%d\n", i);
1004  //for (Int_t iy=0; iy<fNy; iy++)
1005  // printf("y[%d]=%f\n", iy, fYvar(i, iy));
1006  //for (Int_t ispecies=0; ispecies<fNSpecies; ispecies++){
1007  // for (Int_t iy=0; iy<fNy; iy++)
1008  // printf("ypdf[sp. %d, y %d]=%f\n", ispecies, iy, fYpdf(i, ispecies*fNy+iy));
1009  // }
1010  //}
1011  delete [] xvars;
1012  delete [] var;
1013 }
1014 
1015 ////////////////////////////////////////////////////////////////////////////////
1016 /// FCN-function for Minuit
1017 
1018 void Yields(Int_t &, Double_t *, Double_t &f, Double_t *x, Int_t /*iflag*/)
1019 {
1020  Double_t lik;
1021  Int_t i, ispecies;
1022 
1023  TVirtualFitter *fitter = TVirtualFitter::GetFitter();
1024  TMatrixD *pdftot = (TMatrixD*)fitter->GetObjectFit();
1025  Int_t nev = pdftot->GetNrows();
1026  Int_t nes = pdftot->GetNcols();
1027  f=0;
1028  for (i=0; i<nev; i++){
1029  lik=0;
1030  for (ispecies=0; ispecies<nes; ispecies++)
1031  lik+=x[ispecies]*(*pdftot)(i, ispecies);
1032  if (lik<0) lik=1;
1033  f+=TMath::Log(lik);
1034  }
1035  //extended likelihood, equivalent to chi2
1036  Double_t ntot=0;
1037  for (i=0; i<nes; i++)
1038  ntot += x[i];
1039  f = -2*(f-ntot);
1040 }
1041