Logo ROOT   6.30.04
Reference Guide
 All Namespaces Files Pages
TTreeCache.cxx
Go to the documentation of this file.
1 // @(#)root/tree:$Id$
2 // Author: Rene Brun 04/06/2006
3 
4 /*************************************************************************
5  * Copyright (C) 1995-2018, Rene Brun and Fons Rademakers. *
6  * All rights reserved. *
7  * *
8  * For the licensing terms see $ROOTSYS/LICENSE. *
9  * For the list of contributors see $ROOTSYS/README/CREDITS. *
10  *************************************************************************/
11 
12 /** \class TTreeCache
13 \ingroup tree
14 \brief A cache to speed-up the reading of ROOT datasets
15 
16 # A cache to speed-up the reading of ROOT datasets
17 
18 ## Table of Contents
19 - [Motivation](#motivation)
20 - [General Description](#description)
21 - [Changes in behaviour](#changesbehaviour)
22 - [Self-optimization](#cachemisses)
23 - [Examples of usage](#examples)
24 - [Check performance and stats](#checkPerf)
25 
26 ## <a name="motivation"></a>Motivation: why having a cache is needed?
27 
28 When writing a TTree, the branch buffers are kept in memory.
29 A typical branch buffersize (before compression) is typically 32 KBytes.
30 After compression, the zipped buffer may be just a few Kbytes.
31 The branch buffers cannot be much larger in case of TTrees with several
32 hundred or thousand branches.
33 
34 When writing, this does not generate a performance problem because branch
35 buffers are always written sequentially and, thanks to OS optimisations,
36 content is flushed to the output file when a few MBytes of data are available.
37 On the other hand, when reading, one may hit performance problems because of
38 latencies e.g imposed by network.
39 For example in a WAN with 10ms latency, reading 1000 buffers of 10 KBytes each
40 with no cache will imply 10s penalty where a local read of the 10 MBytes would
41 take about 1 second.
42 
43 The TreeCache tries to prefetch all the buffers for the selected branches
44 in order to transfer a few multi-Megabytes large buffers instead of many
45 multi-kilobytes small buffers. In addition, TTreeCache can sort the blocks to
46 be read in increasing order such that the file is read sequentially.
47 
48 Systems like xrootd, dCache or httpd take advantage of the TTreeCache in
49 reading ahead as much data as they can and return to the application
50 the maximum data specified in the cache and have the next chunk of data ready
51 when the next request comes.
52 
53 ### Are there cases for which the usage of TTreeCache is detrimental for performance?
54 Yes, some corner cases. For example, when reading only a small fraction of all
55 entries such that not all branch buffers are read.
56 
57 ## <a name="description"></a>General Description
58 This class acts as a file cache, registering automatically the baskets from
59 the branches being processed via direct manipulation of TTrees or with tools
60 such as TTree::Draw, TTree::Process, TSelector, TTreeReader and RDataFrame
61 when in the learning phase. The learning phase is by default 100 entries.
62 It can be changed via TTreeCache::SetLearnEntries.
63 
64 The usage of a TTreeCache can considerably improve the runtime performance at
65 the price of a modest investment in memory, in particular when the TTree is
66 accessed remotely, e.g. via a high latency network.
67 
68 For each TTree being processed a TTreeCache object is created.
69 This object is automatically deleted when the Tree is deleted or
70 when the file is deleted.
71 The user can change the size of the cache with the TTree::SetCacheSize method
72 (by default the size is 30 Megabytes). This feature can be controlled with the
73 environment variable `ROOT_TTREECACHE_SIZE` or the TTreeCache.Size option.
74 The entry range for which the cache is active can also be set with the
75 SetEntryRange method.
76 
77 ## <a name="changesbehaviour"></a>Changes of behavior when using TChain and TEventList
78 
79 The usage of TChain or TEventList have influence on the behaviour of the cache:
80 
81 - Special case of a TChain
82  Once the training is done on the first Tree, the list of branches
83  in the cache is kept for the following files.
84 
85 - Special case of a TEventlist
86  if the Tree or TChain has a TEventlist, only the buffers
87  referenced by the list are put in the cache.
88 
89 The learning phase is started or restarted when:
90  - TTree automatically creates a cache.
91  - TTree::SetCacheSize is called with a non-zero size and a cache
92  did not previously exist
93  - TTreeCache::StartLearningPhase is called.
94  - TTreeCache::SetEntryRange is called
95  * and the learning is not yet finished
96  * and has not been set to manual
97  * and the new minimun entry is different.
98 
99 The learning period is stopped (and prefetching is started) when:
100  - TTreeCache::StopLearningPhase is called.
101  - An entry outside the 'learning' range is requested
102  The 'learning range is from fEntryMin (default to 0) to
103  fEntryMin + fgLearnEntries.
104  - A 'cached' TChain switches over to a new file.
105 
106 
107 ## <a name="cachemisses"></a>Self-optimization in presence of cache misses
108 
109 The TTreeCache can optimize its behavior on a cache miss. When
110 miss optimization is enabled (see the SetOptimizeMisses method),
111 it tracks all branches utilized after the learning phase which caused a cache
112 miss.
113 When one cache miss occurs, all the utilized branches are be prefetched
114 for that event. This optimization utilizes the observation that infrequently
115 accessed branches are often accessed together.
116 An example scenario where such behavior is desirable, is an analysis where
117 a set of collections are read only for a few events in which a certain
118 condition is respected, e.g. a trigger fired.
119 
120 ### Additional memory and CPU usage when optimizing for cache misses
121 When this mode is enabled, the memory dedicated to the cache can increase
122 by at most a factor two in the case of cache miss.
123 Additionally, on the first miss of an event, we must iterate through all the
124 "active branches" for the miss cache and find the correct basket.
125 This can be potentially a CPU-expensive operation compared to, e.g., the
126 latency of a SSD. This is why the miss cache is currently disabled by default.
127 
128 ## <a name="examples"></a>Example usages of TTreeCache
129 
130 A few use cases are discussed below. A cache may be created with automatic
131 sizing when a TTree is used:
132 
133 In some applications, e.g. central processing workflows of experiments, the list
134 of branches to read is known a priori. For these cases, the TTreeCache can be
135 instructed about the branches which will be read via explicit calls to the TTree
136 or TTreeCache interfaces.
137 In less streamlined applications such as analysis, predicting the branches which
138 will be read can be difficult. In such cases, ROOT I/O flags used branches
139 automatically when a branch buffer is read during the learning phase.
140 
141 In the examples below, portions of analysis code are shown.
142 The few statements involving the TreeCache are marked with `//<<<`
143 
144 ### ROOT::RDataFrame and TTreeReader Examples
145 
146 If you use RDataFrame or TTreeReader, the system will automatically cache the
147 best set of branches: no action is required by the user.
148 
149 ### TTree::Draw Example
150 
151 The TreeCache is automatically used by TTree::Draw. The method knows
152 which branches are used in the query and it puts automatically these branches
153 in the cache. The entry range is also inferred automatically.
154 
155 ### TTree::Process and TSelectors Examples
156 
157 The user must enable the cache and tell the system which branches to cache
158 and also specify the entry range. It is important to specify the entry range
159 in case only a subset of the events is processed to avoid wasteful caching.
160 
161 #### Reading all branches
162 
163 ~~~ {.cpp}
164  TTree *T;
165  f->GetObject(T, "mytree");
166  auto nentries = T->GetEntries();
167  auto cachesize = 10000000U; // 10 MBytes
168  T->SetCacheSize(cachesize); //<<<
169  T->AddBranchToCache("*", true); //<<< add all branches to the cache
170  T->Process("myselector.C+");
171  // In the TSelector::Process function we read all branches
172  T->GetEntry(i);
173  // ... Here the entry is processed
174 ~~~
175 
176 #### Reading a subset of all branches
177 
178 In the Process function we read a subset of the branches.
179 Only the branches used in the first entry will be put in the cache
180 ~~~ {.cpp}
181  TTree *T;
182  f->GetObject(T, "mytree");
183  // We want to process only the 200 first entries
184  auto nentries=200UL;
185  auto efirst = 0;
186  auto elast = efirst+nentries;
187  auto cachesize = 10000000U; // 10 MBytes
188  TTreeCache::SetLearnEntries(1); //<<< we can take the decision after 1 entry
189  T->SetCacheSize(cachesize); //<<<
190  T->SetCacheEntryRange(efirst,elast); //<<<
191  T->Process("myselector.C+","",nentries,efirst);
192  // In the TSelector::Process we read only 2 branches
193  auto b1 = T->GetBranch("branch1");
194  b1->GetEntry(i);
195  if (somecondition) return;
196  auto b2 = T->GetBranch("branch2");
197  b2->GetEntry(i);
198  ... Here the entry is processed
199 ~~~
200 ### Custom event loop
201 
202 #### Always using the same two branches
203 
204 In this example, exactly two branches are always used: those need to be
205 prefetched.
206 ~~~ {.cpp}
207  TTree *T;
208  f->GetObject(T, "mytree");
209  auto b1 = T->GetBranch("branch1");
210  auto b2 = T->GetBranch("branch2");
211  auto nentries = T->GetEntries();
212  auto cachesize = 10000000U; //10 MBytes
213  T->SetCacheSize(cachesize); //<<<
214  T->AddBranchToCache(b1, true); //<<< add branch1 and branch2 to the cache
215  T->AddBranchToCache(b2, true); //<<<
216  T->StopCacheLearningPhase(); //<<< we do not need the system to guess anything
217  for (auto i : TSeqL(nentries)) {
218  T->LoadTree(i); //<<< important call when calling TBranch::GetEntry after
219  b1->GetEntry(i);
220  if (some condition not met) continue;
221  b2->GetEntry(i);
222  if (some condition not met) continue;
223  // Here we read the full event only in some rare cases.
224  // There is no point in caching the other branches as it might be
225  // more economical to read only the branch buffers really used.
226  T->GetEntry(i);
227  ... Here the entry is processed
228  }
229 ~~~
230 #### Always using at least the same two branches
231 
232 In this example, two branches are always used: in addition, some analysis
233 functions are invoked and those may trigger the reading of other branches which
234 are a priori not known.
235 There is no point in prefetching branches that will be used very rarely: we can
236 rely on the system to cache the right branches.
237 ~~~ {.cpp}
238  TTree *T;
239  f->GetObject(T, "mytree");
240  auto nentries = T->GetEntries();
241  auto cachesize = 10000000; //10 MBytes
242  T->SetCacheSize(cachesize); //<<<
243  T->SetCacheLearnEntries(5); //<<< we can take the decision after 5 entries
244  auto b1 = T->GetBranch("branch1");
245  auto b2 = T->GetBranch("branch2");
246  for (auto i : TSeqL(nentries)) {
247  T->LoadTree(i);
248  b1->GetEntry(i);
249  if (some condition not met) continue;
250  b2->GetEntry(i);
251  // At this point we may call a user function where a few more branches
252  // will be read conditionally. These branches will be put in the cache
253  // if they have been used in the first 10 entries
254  if (some condition not met) continue;
255  // Here we read the full event only in some rare cases.
256  // There is no point in caching the other branches as it might be
257  // more economical to read only the branch buffers really used.
258  T->GetEntry(i);
259  .. process the rare but interesting cases.
260  ... Here the entry is processed
261  }
262 ~~~
263 
264 ## <a name="checkPerf"></a>How can the usage and performance of TTreeCache be verified?
265 
266 Once the event loop terminated, the number of effective system reads for a
267 given file can be checked with a code like the following:
268 ~~~ {.cpp}
269  printf("Reading %lld bytes in %d transactions\n",myTFilePtr->GetBytesRead(), f->GetReadCalls());
270 ~~~
271 
272 Another handy command is:
273 ~~~ {.cpp}
274 myTreeOrChain.GetTree()->PrintCacheStats();
275 ~~~
276 
277 */
278 
279 #include "TSystem.h"
280 #include "TEnv.h"
281 #include "TTreeCache.h"
282 #include "TChain.h"
283 #include "TList.h"
284 #include "TBranch.h"
285 #include "TBranchElement.h"
286 #include "TEventList.h"
287 #include "TObjArray.h"
288 #include "TObjString.h"
289 #include "TRegexp.h"
290 #include "TLeaf.h"
291 #include "TFriendElement.h"
292 #include "TFile.h"
293 #include "TMath.h"
294 #include "TBranchCacheInfo.h"
295 #include "TVirtualPerfStats.h"
296 #include <limits.h>
297 
298 Int_t TTreeCache::fgLearnEntries = 100;
299 
300 ClassImp(TTreeCache);
301 
302 ////////////////////////////////////////////////////////////////////////////////
303 /// Default Constructor.
304 
305 TTreeCache::TTreeCache() : TFileCacheRead(), fPrefillType(GetConfiguredPrefillType())
306 {
307 }
308 
309 ////////////////////////////////////////////////////////////////////////////////
310 /// Constructor.
311 
312 TTreeCache::TTreeCache(TTree *tree, Int_t buffersize)
313  : TFileCacheRead(tree->GetCurrentFile(), buffersize, tree), fEntryMax(tree->GetEntriesFast()), fEntryNext(0),
314  fBrNames(new TList), fTree(tree), fPrefillType(GetConfiguredPrefillType())
315 {
316  fEntryNext = fEntryMin + fgLearnEntries;
317  Int_t nleaves = tree->GetListOfLeaves()->GetEntries();
318  fBranches = new TObjArray(nleaves);
319 }
320 
321 ////////////////////////////////////////////////////////////////////////////////
322 /// Destructor. (in general called by the TFile destructor)
323 
324 TTreeCache::~TTreeCache()
325 {
326  // Informe the TFile that we have been deleted (in case
327  // we are deleted explicitly by legacy user code).
328  if (fFile) fFile->SetCacheRead(0, fTree);
329 
330  delete fBranches;
331  if (fBrNames) {fBrNames->Delete(); delete fBrNames; fBrNames=0;}
332 }
333 
334 ////////////////////////////////////////////////////////////////////////////////
335 /// Add a branch discovered by actual usage to the list of branches to be stored
336 /// in the cache this function is called by TBranch::GetBasket
337 /// If we are not longer in the training phase this is an error.
338 /// Returns:
339 /// - 0 branch added or already included
340 /// - -1 on error
341 
342 Int_t TTreeCache::LearnBranch(TBranch *b, Bool_t subbranches /*= kFALSE*/)
343 {
344  if (!fIsLearning) {
345  return -1;
346  }
347 
348  // Reject branch that are not from the cached tree.
349  if (!b || fTree->GetTree() != b->GetTree()) return -1;
350 
351  // Is this the first addition of a branch (and we are learning and we are in
352  // the expected TTree), then prefill the cache. (We expect that in future
353  // release the Prefill-ing will be the default so we test for that inside the
354  // LearnPrefill call).
355  if (!fLearnPrefilling && fNbranches == 0) LearnPrefill();
356 
357  return AddBranch(b, subbranches);
358 }
359 
360 ////////////////////////////////////////////////////////////////////////////////
361 /// Add a branch to the list of branches to be stored in the cache
362 /// this function is called by the user via TTree::AddBranchToCache.
363 /// The branch is added even if we are outside of the training phase.
364 /// Returns:
365 /// - 0 branch added or already included
366 /// - -1 on error
367 
368 Int_t TTreeCache::AddBranch(TBranch *b, Bool_t subbranches /*= kFALSE*/)
369 {
370  // Reject branch that are not from the cached tree.
371  if (!b || fTree->GetTree() != b->GetTree()) return -1;
372 
373  //Is branch already in the cache?
374  Bool_t isNew = kTRUE;
375  for (int i=0;i<fNbranches;i++) {
376  if (fBranches->UncheckedAt(i) == b) {isNew = kFALSE; break;}
377  }
378  if (isNew) {
379  fTree = b->GetTree();
380  fBranches->AddAtAndExpand(b, fNbranches);
381  const char *bname = b->GetName();
382  if (fTree->IsA() == TChain::Class()) {
383  // If we have a TChain, we will need to use the branch name
384  // and we better disambiguate them (see atlasFlushed.root for example)
385  // in order to cache all the requested branches.
386  // We do not do this all the time as GetMother is slow (it contains
387  // a linear search from list of top level branch).
388  TString build;
389  const char *mothername = b->GetMother()->GetName();
390  if (b != b->GetMother() && mothername[strlen(mothername)-1] != '.') {
391  // Maybe we ought to prefix the name to avoid ambiguity.
392  auto bem = dynamic_cast<TBranchElement*>(b->GetMother());
393  if (bem->GetType() < 3) {
394  // Not a collection.
395  build = mothername;
396  build.Append(".");
397  if (strncmp(bname,build.Data(),build.Length()) != 0) {
398  build.Append(bname);
399  bname = build.Data();
400  }
401  }
402  }
403  }
404  fBrNames->Add(new TObjString(bname));
405  fNbranches++;
406  if (gDebug > 0) printf("Entry: %lld, registering branch: %s\n",b->GetTree()->GetReadEntry(),b->GetName());
407  }
408 
409  // process subbranches
410  Int_t res = 0;
411  if (subbranches) {
412  TObjArray *lb = b->GetListOfBranches();
413  Int_t nb = lb->GetEntriesFast();
414  for (Int_t j = 0; j < nb; j++) {
415  TBranch* branch = (TBranch*) lb->UncheckedAt(j);
416  if (!branch) continue;
417  if (AddBranch(branch, subbranches)<0) {
418  res = -1;
419  }
420  }
421  }
422  return res;
423 }
424 
425 ////////////////////////////////////////////////////////////////////////////////
426 /// Add a branch to the list of branches to be stored in the cache
427 /// this is to be used by user (thats why we pass the name of the branch).
428 /// It works in exactly the same way as TTree::SetBranchStatus so you
429 /// probably want to look over there for details about the use of bname
430 /// with regular expressions.
431 /// The branches are taken with respect to the Owner of this TTreeCache
432 /// (i.e. the original Tree)
433 /// NB: if bname="*" all branches are put in the cache and the learning phase stopped
434 /// Returns:
435 /// - 0 branch added or already included
436 /// - -1 on error
437 
438 Int_t TTreeCache::AddBranch(const char *bname, Bool_t subbranches /*= kFALSE*/)
439 {
440  TBranch *branch, *bcount;
441  TLeaf *leaf, *leafcount;
442 
443  Int_t i;
444  Int_t nleaves = (fTree->GetListOfLeaves())->GetEntriesFast();
445  TRegexp re(bname,kTRUE);
446  Int_t nb = 0;
447  Int_t res = 0;
448 
449  // first pass, loop on all branches
450  // for leafcount branches activate/deactivate in function of status
451  Bool_t all = kFALSE;
452  if (!strcmp(bname,"*")) all = kTRUE;
453  for (i=0;i<nleaves;i++) {
454  leaf = (TLeaf*)(fTree->GetListOfLeaves())->UncheckedAt(i);
455  branch = (TBranch*)leaf->GetBranch();
456  TString s = branch->GetName();
457  if (!all) { //Regexp gives wrong result for [] in name
458  TString longname;
459  longname.Form("%s.%s",fTree->GetName(),branch->GetName());
460  if (strcmp(bname,branch->GetName())
461  && longname != bname
462  && s.Index(re) == kNPOS) continue;
463  }
464  nb++;
465  if (AddBranch(branch, subbranches)<0) {
466  res = -1;
467  }
468  leafcount = leaf->GetLeafCount();
469  if (leafcount && !all) {
470  bcount = leafcount->GetBranch();
471  if (AddBranch(bcount, subbranches)<0) {
472  res = -1;
473  }
474  }
475  }
476  if (nb==0 && strchr(bname,'*')==0) {
477  branch = fTree->GetBranch(bname);
478  if (branch) {
479  if (AddBranch(branch, subbranches)<0) {
480  res = -1;
481  }
482  ++nb;
483  }
484  }
485 
486  //search in list of friends
487  UInt_t foundInFriend = 0;
488  if (fTree->GetListOfFriends()) {
489  TIter nextf(fTree->GetListOfFriends());
490  TFriendElement *fe;
491  TString name;
492  while ((fe = (TFriendElement*)nextf())) {
493  TTree *t = fe->GetTree();
494  if (t==0) continue;
495 
496  // If the alias is present replace it with the real name.
497  char *subbranch = (char*)strstr(bname,fe->GetName());
498  if (subbranch!=bname) subbranch = 0;
499  if (subbranch) {
500  subbranch += strlen(fe->GetName());
501  if ( *subbranch != '.' ) subbranch = 0;
502  else subbranch ++;
503  }
504  if (subbranch) {
505  name.Form("%s.%s",t->GetName(),subbranch);
506  if (name != bname && AddBranch(name, subbranches)<0) {
507  res = -1;
508  }
509  ++foundInFriend;
510  }
511  }
512  }
513  if (!nb && !foundInFriend) {
514  if (gDebug > 0) printf("AddBranch: unknown branch -> %s \n", bname);
515  Error("AddBranch", "unknown branch -> %s", bname);
516  return -1;
517  }
518  //if all branches are selected stop the learning phase
519  if (*bname == '*') {
520  fEntryNext = -1; // We are likely to have change the set of branches, so for the [re-]reading of the cluster.
521  StopLearningPhase();
522  }
523  return res;
524 }
525 
526 ////////////////////////////////////////////////////////////////////////////////
527 /// Remove a branch to the list of branches to be stored in the cache
528 /// this function is called by TBranch::GetBasket.
529 /// Returns:
530 /// - 0 branch dropped or not in cache
531 /// - -1 on error
532 
533 Int_t TTreeCache::DropBranch(TBranch *b, Bool_t subbranches /*= kFALSE*/)
534 {
535  if (!fIsLearning) {
536  return -1;
537  }
538 
539  // Reject branch that are not from the cached tree.
540  if (!b || fTree->GetTree() != b->GetTree()) return -1;
541 
542  //Is branch already in the cache?
543  if (fBranches->Remove(b)) {
544  --fNbranches;
545  if (gDebug > 0) printf("Entry: %lld, un-registering branch: %s\n",b->GetTree()->GetReadEntry(),b->GetName());
546  }
547  delete fBrNames->Remove(fBrNames->FindObject(b->GetName()));
548 
549  // process subbranches
550  Int_t res = 0;
551  if (subbranches) {
552  TObjArray *lb = b->GetListOfBranches();
553  Int_t nb = lb->GetEntriesFast();
554  for (Int_t j = 0; j < nb; j++) {
555  TBranch* branch = (TBranch*) lb->UncheckedAt(j);
556  if (!branch) continue;
557  if (DropBranch(branch, subbranches)<0) {
558  res = -1;
559  }
560  }
561  }
562  return res;
563 }
564 
565 ////////////////////////////////////////////////////////////////////////////////
566 /// Remove a branch to the list of branches to be stored in the cache
567 /// this is to be used by user (thats why we pass the name of the branch).
568 /// It works in exactly the same way as TTree::SetBranchStatus so you
569 /// probably want to look over there for details about the use of bname
570 /// with regular expressions.
571 /// The branches are taken with respect to the Owner of this TTreeCache
572 /// (i.e. the original Tree)
573 /// NB: if bname="*" all branches are put in the cache and the learning phase stopped
574 /// Returns:
575 /// - 0 branch dropped or not in cache
576 /// - -1 on error
577 
578 Int_t TTreeCache::DropBranch(const char *bname, Bool_t subbranches /*= kFALSE*/)
579 {
580  TBranch *branch, *bcount;
581  TLeaf *leaf, *leafcount;
582 
583  Int_t i;
584  Int_t nleaves = (fTree->GetListOfLeaves())->GetEntriesFast();
585  TRegexp re(bname,kTRUE);
586  Int_t nb = 0;
587  Int_t res = 0;
588 
589  // first pass, loop on all branches
590  // for leafcount branches activate/deactivate in function of status
591  Bool_t all = kFALSE;
592  if (!strcmp(bname,"*")) all = kTRUE;
593  for (i=0;i<nleaves;i++) {
594  leaf = (TLeaf*)(fTree->GetListOfLeaves())->UncheckedAt(i);
595  branch = (TBranch*)leaf->GetBranch();
596  TString s = branch->GetName();
597  if (!all) { //Regexp gives wrong result for [] in name
598  TString longname;
599  longname.Form("%s.%s",fTree->GetName(),branch->GetName());
600  if (strcmp(bname,branch->GetName())
601  && longname != bname
602  && s.Index(re) == kNPOS) continue;
603  }
604  nb++;
605  if (DropBranch(branch, subbranches)<0) {
606  res = -1;
607  }
608  leafcount = leaf->GetLeafCount();
609  if (leafcount && !all) {
610  bcount = leafcount->GetBranch();
611  if (DropBranch(bcount, subbranches)<0) {
612  res = -1;
613  }
614  }
615  }
616  if (nb==0 && strchr(bname,'*')==0) {
617  branch = fTree->GetBranch(bname);
618  if (branch) {
619  if (DropBranch(branch, subbranches)<0) {
620  res = -1;
621  }
622  ++nb;
623  }
624  }
625 
626  //search in list of friends
627  UInt_t foundInFriend = 0;
628  if (fTree->GetListOfFriends()) {
629  TIter nextf(fTree->GetListOfFriends());
630  TFriendElement *fe;
631  TString name;
632  while ((fe = (TFriendElement*)nextf())) {
633  TTree *t = fe->GetTree();
634  if (t==0) continue;
635 
636  // If the alias is present replace it with the real name.
637  char *subbranch = (char*)strstr(bname,fe->GetName());
638  if (subbranch!=bname) subbranch = 0;
639  if (subbranch) {
640  subbranch += strlen(fe->GetName());
641  if ( *subbranch != '.' ) subbranch = 0;
642  else subbranch ++;
643  }
644  if (subbranch) {
645  name.Form("%s.%s",t->GetName(),subbranch);
646  if (DropBranch(name, subbranches)<0) {
647  res = -1;
648  }
649  ++foundInFriend;
650  }
651  }
652  }
653  if (!nb && !foundInFriend) {
654  if (gDebug > 0) printf("DropBranch: unknown branch -> %s \n", bname);
655  Error("DropBranch", "unknown branch -> %s", bname);
656  return -1;
657  }
658  //if all branches are selected stop the learning phase
659  if (*bname == '*') {
660  fEntryNext = -1; // We are likely to have change the set of branches, so for the [re-]reading of the cluster.
661  }
662  return res;
663 }
664 
665 ////////////////////////////////////////////////////////////////////////////////
666 /// Start of methods for the miss cache.
667 ////////////////////////////////////////////////////////////////////////////////
668 
669 ////////////////////////////////////////////////////////////////////////////////
670 /// Enable / disable the miss cache.
671 ///
672 /// The first time this is called on a TTreeCache object, the corresponding
673 /// data structures will be allocated. Subsequent enable / disables will
674 /// simply turn the functionality on/off.
675 void TTreeCache::SetOptimizeMisses(Bool_t opt)
676 {
677 
678  if (opt && !fMissCache) {
679  ResetMissCache();
680  }
681  fOptimizeMisses = opt;
682 }
683 
684 ////////////////////////////////////////////////////////////////////////////////
685 /// Reset all the miss cache training.
686 ///
687 /// The contents of the miss cache will be emptied as well as the list of
688 /// branches used.
689 void TTreeCache::ResetMissCache()
690 {
691 
692  fLastMiss = -1;
693  fFirstMiss = -1;
694 
695  if (!fMissCache) {
696  fMissCache.reset(new MissCache());
697  }
698  fMissCache->clear();
699 }
700 
701 ////////////////////////////////////////////////////////////////////////////////
702 /// For the event currently being fetched into the miss cache, find the IO
703 /// (offset / length tuple) to pull in the current basket for a given branch.
704 ///
705 /// Returns:
706 /// - IOPos describing the IO operation necessary for the basket on this branch
707 /// - On failure, IOPos.length will be set to 0.
708 TTreeCache::IOPos TTreeCache::FindBranchBasketPos(TBranch &b, Long64_t entry)
709 {
710  if (R__unlikely(b.GetDirectory() == 0)) {
711  // printf("Branch at %p has no valid directory.\n", &b);
712  return IOPos{0, 0};
713  }
714  if (R__unlikely(b.GetDirectory()->GetFile() != fFile)) {
715  // printf("Branch at %p is in wrong file (branch file %p, my file %p).\n", &b, b.GetDirectory()->GetFile(),
716  // fFile);
717  return IOPos{0, 0};
718  }
719 
720  // printf("Trying to find a basket for branch %p\n", &b);
721  // Pull in metadata about branch; make sure it is valid
722  Int_t *lbaskets = b.GetBasketBytes();
723  Long64_t *entries = b.GetBasketEntry();
724  if (R__unlikely(!lbaskets || !entries)) {
725  // printf("No baskets or entries.\n");
726  return IOPos{0, 0};
727  }
728  // Int_t blistsize = b.GetListOfBaskets()->GetSize();
729  Int_t blistsize = b.GetWriteBasket();
730  if (R__unlikely(blistsize <= 0)) {
731  // printf("Basket list is size 0.\n");
732  return IOPos{0, 0};
733  }
734 
735  // Search for the basket that contains the event of interest. Unlike the primary cache, we
736  // are only interested in a single basket per branch - we don't try to fill the cache.
737  Long64_t basketOffset = TMath::BinarySearch(blistsize, entries, entry);
738  if (basketOffset < 0) { // No entry found.
739  // printf("No entry offset found for entry %ld\n", fTree->GetReadEntry());
740  return IOPos{0, 0};
741  }
742 
743  // Check to see if there's already a copy of this basket in memory. If so, don't fetch it
744  if ((basketOffset < blistsize) && b.GetListOfBaskets()->UncheckedAt(basketOffset)) {
745 
746  // printf("Basket is already in memory.\n");
747  return IOPos{0, 0};
748  }
749 
750  Long64_t pos = b.GetBasketSeek(basketOffset);
751  Int_t len = lbaskets[basketOffset];
752  if (R__unlikely(pos <= 0 || len <= 0)) {
753  /*printf("Basket returned was invalid (basketOffset=%ld, pos=%ld, len=%d).\n", basketOffset, pos, len);
754  for (int idx=0; idx<blistsize; idx++) {
755  printf("Basket entry %d, first event %d, pos %ld\n", idx, entries[idx], b.GetBasketSeek(idx));
756  }*/
757  return IOPos{0, 0};
758  } // Sanity check
759  // Do not cache a basket if it is bigger than the cache size!
760  if (R__unlikely(len > fBufferSizeMin)) {
761  // printf("Basket size is greater than the cache size.\n");
762  return IOPos{0, 0};
763  }
764 
765  return {pos, len};
766 }
767 
768 ////////////////////////////////////////////////////////////////////////////////
769 /// Given a particular IO description (offset / length) representing a 'miss' of
770 /// the TTreeCache's primary cache, calculate all the corresponding IO that
771 /// should be performed.
772 ///
773 /// `all` indicates that this function should search the set of _all_ branches
774 /// in this TTree. When set to false, we only search through branches that
775 /// have previously incurred a miss.
776 ///
777 /// Returns:
778 /// - TBranch pointer corresponding to the basket that will be retrieved by
779 /// this IO operation.
780 /// - If no corresponding branch could be found (or an error occurs), this
781 /// returns nullptr.
782 TBranch *TTreeCache::CalculateMissEntries(Long64_t pos, Int_t len, Bool_t all)
783 {
784  if (R__unlikely((pos < 0) || (len < 0))) {
785  return nullptr;
786  }
787 
788  int count = all ? (fTree->GetListOfLeaves())->GetEntriesFast() : fMissCache->fBranches.size();
789  fMissCache->fEntries.reserve(count);
790  fMissCache->fEntries.clear();
791  Bool_t found_request = kFALSE;
792  TBranch *resultBranch = nullptr;
793  Long64_t entry = fTree->GetReadEntry();
794 
795  std::vector<std::pair<size_t, Int_t>> basketsInfo;
796  auto perfStats = GetTree()->GetPerfStats();
797 
798  // printf("Will search %d branches for basket at %ld.\n", count, pos);
799  for (int i = 0; i < count; i++) {
800  TBranch *b =
801  all ? static_cast<TBranch *>(static_cast<TLeaf *>((fTree->GetListOfLeaves())->UncheckedAt(i))->GetBranch())
802  : fMissCache->fBranches[i];
803  IOPos iopos = FindBranchBasketPos(*b, entry);
804  if (iopos.fLen == 0) { // Error indicator
805  continue;
806  }
807  if (iopos.fPos == pos && iopos.fLen == len) {
808  found_request = kTRUE;
809  resultBranch = b;
810  // Note that we continue to iterate; fills up the rest of the entries in the cache.
811  }
812  // At this point, we are ready to push back a new offset
813  fMissCache->fEntries.emplace_back(std::move(iopos));
814 
815  if (R__unlikely(perfStats)) {
816  Int_t blistsize = b->GetWriteBasket();
817  Int_t basketNumber = -1;
818  for (Int_t bn = 0; bn < blistsize; ++bn) {
819  if (iopos.fPos == b->GetBasketSeek(bn)) {
820  basketNumber = bn;
821  break;
822  }
823  }
824  if (basketNumber >= 0)
825  basketsInfo.emplace_back((size_t)i, basketNumber);
826  }
827  }
828  if (R__unlikely(!found_request)) {
829  // We have gone through all the branches in this file and the requested basket
830  // doesn't appear to be in any of them. Likely a logic error / bug.
831  fMissCache->fEntries.clear();
832  }
833  if (R__unlikely(perfStats)) {
834  for (auto &info : basketsInfo) {
835  perfStats->SetLoadedMiss(info.first, info.second);
836  }
837  }
838  return resultBranch;
839 }
840 
841 ////////////////////////////////////////////////////////////////////////////////
842 ///
843 /// Process a cache miss; (pos, len) isn't in the buffer.
844 ///
845 /// The first time we have a miss, we buffer as many baskets we can (up to the
846 /// maximum size of the TTreeCache) in memory from all branches that are not in
847 /// the prefetch list.
848 ///
849 /// Subsequent times, we fetch all the buffers corresponding to branches that
850 /// had previously seen misses. If it turns out the (pos, len) isn't in the
851 /// list of branches, we treat this as if it was the first miss.
852 ///
853 /// Returns true if we were able to pull the data into the miss cache.
854 ///
855 Bool_t TTreeCache::ProcessMiss(Long64_t pos, int len)
856 {
857 
858  Bool_t firstMiss = kFALSE;
859  if (fFirstMiss == -1) {
860  fFirstMiss = fEntryCurrent;
861  firstMiss = kTRUE;
862  }
863  fLastMiss = fEntryCurrent;
864  // The first time this is executed, we try to pull in as much data as we can.
865  TBranch *b = CalculateMissEntries(pos, len, firstMiss);
866  if (!b) {
867  if (!firstMiss) {
868  // TODO: this recalculates for *all* branches, throwing away the above work.
869  b = CalculateMissEntries(pos, len, kTRUE);
870  }
871  if (!b) {
872  // printf("ProcessMiss: pos %ld does not appear to correspond to a buffer in this file.\n", pos);
873  // We have gone through all the branches in this file and the requested basket
874  // doesn't appear to be in any of them. Likely a logic error / bug.
875  fMissCache->fEntries.clear();
876  return kFALSE;
877  }
878  }
879  // TODO: this should be a set.
880  fMissCache->fBranches.push_back(b);
881 
882  // OK, sort the entries
883  std::sort(fMissCache->fEntries.begin(), fMissCache->fEntries.end());
884 
885  // Now, fetch the buffer.
886  std::vector<Long64_t> positions;
887  positions.reserve(fMissCache->fEntries.size());
888  std::vector<Int_t> lengths;
889  lengths.reserve(fMissCache->fEntries.size());
890  ULong64_t cumulative = 0;
891  for (auto &mcentry : fMissCache->fEntries) {
892  positions.push_back(mcentry.fIO.fPos);
893  lengths.push_back(mcentry.fIO.fLen);
894  mcentry.fIndex = cumulative;
895  cumulative += mcentry.fIO.fLen;
896  }
897  fMissCache->fData.reserve(cumulative);
898  // printf("Reading %lu bytes into miss cache for %lu entries.\n", cumulative, fEntries->size());
899  fNMissReadPref += fMissCache->fEntries.size();
900  fFile->ReadBuffers(&(fMissCache->fData[0]), &(positions[0]), &(lengths[0]), fMissCache->fEntries.size());
901  fFirstMiss = fLastMiss = fEntryCurrent;
902 
903  return kTRUE;
904 }
905 
906 ////////////////////////////////////////////////////////////////////////////////
907 /// Given an IO operation (pos, len) that was a cache miss in the primary TTC,
908 /// try the operation again with the miss cache.
909 ///
910 /// Returns true if the IO operation was successful and the contents of buf
911 /// were populated with the requested data.
912 ///
913 Bool_t TTreeCache::CheckMissCache(char *buf, Long64_t pos, int len)
914 {
915 
916  if (!fOptimizeMisses) {
917  return kFALSE;
918  }
919  if (R__unlikely((pos < 0) || (len < 0))) {
920  return kFALSE;
921  }
922 
923  // printf("Checking the miss cache for offset=%ld, length=%d\n", pos, len);
924 
925  // First, binary search to see if the desired basket is already cached.
926  MissCache::Entry mcentry{IOPos{pos, len}};
927  auto iter = std::lower_bound(fMissCache->fEntries.begin(), fMissCache->fEntries.end(), mcentry);
928 
929  if (iter != fMissCache->fEntries.end()) {
930  if (len > iter->fIO.fLen) {
931  ++fNMissReadMiss;
932  return kFALSE;
933  }
934  auto offset = iter->fIndex;
935  memcpy(buf, &(fMissCache->fData[offset]), len);
936  // printf("Returning data from pos=%ld in miss cache.\n", offset);
937  ++fNMissReadOk;
938  return kTRUE;
939  }
940 
941  // printf("Data not in miss cache.\n");
942 
943  // Update the cache, looking for this (pos, len).
944  if (!ProcessMiss(pos, len)) {
945  // printf("Unable to pull data into miss cache.\n");
946  ++fNMissReadMiss;
947  return kFALSE;
948  }
949 
950  // OK, we updated the cache with as much information as possible. Search again for
951  // the entry we want.
952  iter = std::lower_bound(fMissCache->fEntries.begin(), fMissCache->fEntries.end(), mcentry);
953 
954  if (iter != fMissCache->fEntries.end()) {
955  auto offset = iter->fIndex;
956  // printf("Expecting data at offset %ld in miss cache.\n", offset);
957  memcpy(buf, &(fMissCache->fData[offset]), len);
958  ++fNMissReadOk;
959  return kTRUE;
960  }
961 
962  // This must be a logic bug. ProcessMiss should return false if (pos, len)
963  // wasn't put into fEntries.
964  ++fNMissReadMiss;
965  return kFALSE;
966 }
967 
968 ////////////////////////////////////////////////////////////////////////////////
969 /// End of methods for miss cache.
970 ////////////////////////////////////////////////////////////////////////////////
971 
972 namespace {
973 struct BasketRanges {
974  struct Range {
975  Long64_t fMin; ///< Inclusive minimum
976  Long64_t fMax; ///< Inclusive maximum
977 
978  Range() : fMin(-1), fMax(-1) {}
979 
980  void UpdateMin(Long64_t min)
981  {
982  if (fMin == -1 || min < fMin)
983  fMin = min;
984  }
985 
986  void UpdateMax(Long64_t max)
987  {
988  if (fMax == -1 || fMax < max)
989  fMax = max;
990  }
991 
992  Bool_t Contains(Long64_t entry) { return (fMin <= entry && entry <= fMax); }
993  };
994 
995  std::vector<Range> fRanges;
996  std::map<Long64_t,size_t> fMinimums;
997  std::map<Long64_t,size_t> fMaximums;
998 
999  BasketRanges(size_t nBranches) { fRanges.resize(nBranches); }
1000 
1001  void Update(size_t branchNumber, Long64_t min, Long64_t max)
1002  {
1003  Range &range = fRanges.at(branchNumber);
1004  auto old(range);
1005 
1006  range.UpdateMin(min);
1007  range.UpdateMax(max);
1008 
1009  if (old.fMax != range.fMax) {
1010  if (old.fMax != -1) {
1011  auto maxIter = fMaximums.find(old.fMax);
1012  if (maxIter != fMaximums.end()) {
1013  if (maxIter->second == 1) {
1014  fMaximums.erase(maxIter);
1015  } else {
1016  --(maxIter->second);
1017  }
1018  }
1019  }
1020  ++(fMaximums[max]);
1021  }
1022  }
1023 
1024  void Update(size_t branchNumber, size_t basketNumber, Long64_t *entries, size_t nb, size_t max)
1025  {
1026  Update(branchNumber, entries[basketNumber],
1027  (basketNumber < (nb - 1)) ? (entries[basketNumber + 1] - 1) : max - 1);
1028  }
1029 
1030  // Check that fMaximums and fMinimums are properly set
1031  bool CheckAllIncludeRange()
1032  {
1033  Range result;
1034  for (const auto &r : fRanges) {
1035  if (result.fMin == -1 || result.fMin < r.fMin) {
1036  if (r.fMin != -1)
1037  result.fMin = r.fMin;
1038  }
1039  if (result.fMax == -1 || r.fMax < result.fMax) {
1040  if (r.fMax != -1)
1041  result.fMax = r.fMax;
1042  }
1043  }
1044  // if (result.fMax < result.fMin) {
1045  // // No overlapping range.
1046  // }
1047 
1048  Range allIncludedRange(AllIncludedRange());
1049 
1050  return (result.fMin == allIncludedRange.fMin && result.fMax == allIncludedRange.fMax);
1051  }
1052 
1053  // This returns a Range object where fMin is the maximum of all the minimun entry
1054  // number loaded for each branch and fMax is the minimum of all the maximum entry
1055  // number loaded for each branch.
1056  // As such it is valid to have fMin > fMax, this is the case where there
1057  // are no overlap between the branch's range. For example for 2 branches
1058  // where we have for one the entry [50,99] and for the other [0,49] then
1059  // we will have fMin = max(50,0) = 50 and fMax = min(99,49) = 49
1060  Range AllIncludedRange()
1061  {
1062  Range result;
1063  if (!fMinimums.empty())
1064  result.fMin = fMinimums.rbegin()->first;
1065  if (!fMaximums.empty())
1066  result.fMax = fMaximums.begin()->first;
1067  return result;
1068  }
1069 
1070  // Returns the number of branches with at least one baskets registered.
1071  UInt_t BranchesRegistered()
1072  {
1073  UInt_t result = 0;
1074  for (const auto &r : fRanges) {
1075  if (r.fMin != -1 && r.fMax != -1)
1076  ++result;
1077  }
1078  return result;
1079  }
1080 
1081  // Returns true if at least one of the branch's range contains
1082  // the entry.
1083  Bool_t Contains(Long64_t entry)
1084  {
1085  for (const auto &r : fRanges) {
1086  if (r.fMin != -1 && r.fMax != -1)
1087  if (r.fMin <= entry && entry <= r.fMax)
1088  return kTRUE;
1089  }
1090  return kFALSE;
1091  }
1092 
1093  void Print()
1094  {
1095  for (size_t i = 0; i < fRanges.size(); ++i) {
1096  if (fRanges[i].fMin != -1 || fRanges[i].fMax != -1)
1097  Printf("Range #%zu : %lld to %lld", i, fRanges[i].fMin, fRanges[i].fMax);
1098  }
1099  }
1100 };
1101 } // Anonymous namespace.
1102 
1103 ////////////////////////////////////////////////////////////////////////////////
1104 /// Fill the cache buffer with the branches in the cache.
1105 
1106 Bool_t TTreeCache::FillBuffer()
1107 {
1108 
1109  if (fNbranches <= 0) return kFALSE;
1110  TTree *tree = ((TBranch*)fBranches->UncheckedAt(0))->GetTree();
1111  Long64_t entry = tree->GetReadEntry();
1112  Long64_t fEntryCurrentMax = 0;
1113 
1114  if (entry != -1 && (entry < fEntryMin || fEntryMax < entry))
1115  return kFALSE;
1116 
1117  if (fEnablePrefetching) { // Prefetching mode
1118  if (fIsLearning) { // Learning mode
1119  if (fEntryNext >= 0 && entry >= fEntryNext) {
1120  // entry is outside the learn range, need to stop the learning
1121  // phase. Doing so may trigger a recursive call to FillBuffer in
1122  // the process of filling both prefetching buffers
1123  StopLearningPhase();
1124  fIsManual = kFALSE;
1125  }
1126  }
1127  if (fIsLearning) { // Learning mode
1128  entry = 0;
1129  }
1130  if (fFirstTime) {
1131  //try to detect if it is normal or reverse read
1132  fFirstEntry = entry;
1133  }
1134  else {
1135  if (fFirstEntry == entry) return kFALSE;
1136  // Set the read direction
1137  if (!fReadDirectionSet) {
1138  if (entry < fFirstEntry) {
1139  fReverseRead = kTRUE;
1140  fReadDirectionSet = kTRUE;
1141  }
1142  else if (entry > fFirstEntry) {
1143  fReverseRead =kFALSE;
1144  fReadDirectionSet = kTRUE;
1145  }
1146  }
1147 
1148  if (fReverseRead) {
1149  // Reverse reading with prefetching
1150  if (fEntryCurrent >0 && entry < fEntryNext) {
1151  // We can prefetch the next buffer
1152  if (entry >= fEntryCurrent) {
1153  entry = fEntryCurrent - tree->GetAutoFlush() * fFillTimes;
1154  }
1155  if (entry < 0) entry = 0;
1156  }
1157  else if (fEntryCurrent >= 0) {
1158  // We are still reading from the oldest buffer, no need to prefetch a new one
1159  return kFALSE;
1160  }
1161  if (entry < 0) return kFALSE;
1162  fFirstBuffer = !fFirstBuffer;
1163  }
1164  else {
1165  // Normal reading with prefetching
1166  if (fEnablePrefetching) {
1167  if (entry < 0 && fEntryNext > 0) {
1168  entry = fEntryCurrent;
1169  } else if (entry >= fEntryCurrent) {
1170  if (entry < fEntryNext) {
1171  entry = fEntryNext;
1172  }
1173  }
1174  else {
1175  // We are still reading from the oldest buffer,
1176  // no need to prefetch a new one
1177  return kFALSE;
1178  }
1179  fFirstBuffer = !fFirstBuffer;
1180  }
1181  }
1182  }
1183  }
1184 
1185  // Set to true to enable all debug output without having to set gDebug
1186  // Replace this once we have a per module and/or per class debugging level/setting.
1187  static constexpr bool showMore = kFALSE;
1188 
1189  static const auto PrintAllCacheInfo = [](TObjArray *branches) {
1190  for (Int_t i = 0; i < branches->GetEntries(); i++) {
1191  TBranch *b = (TBranch *)branches->UncheckedAt(i);
1192  b->PrintCacheInfo();
1193  }
1194  };
1195 
1196  if (showMore || gDebug > 6)
1197  Info("FillBuffer", "***** Called for entry %lld", entry);
1198 
1199  if (!fIsLearning && fEntryCurrent <= entry && entry < fEntryNext) {
1200  // Check if all the basket in the cache have already be used and
1201  // thus we can reuse the cache.
1202  Bool_t allUsed = kTRUE;
1203  for (Int_t i = 0; i < fNbranches; ++i) {
1204  TBranch *b = (TBranch *)fBranches->UncheckedAt(i);
1205  if (!b->fCacheInfo.AllUsed()) {
1206  allUsed = kFALSE;
1207  break;
1208  }
1209  }
1210  if (allUsed) {
1211  fEntryNext = entry;
1212  if (showMore || gDebug > 5)
1213  Info("FillBuffer", "All baskets used already, so refresh the cache early at entry %lld", entry);
1214  }
1215  if (gDebug > 8)
1216  PrintAllCacheInfo(fBranches);
1217  }
1218 
1219  // If the entry is in the range we previously prefetched, there is
1220  // no point in retrying. Note that this will also return false
1221  // during the training phase (fEntryNext is then set intentional to
1222  // the end of the training phase).
1223  if (fEntryCurrent <= entry && entry < fEntryNext) return kFALSE;
1224 
1225  // Triggered by the user, not the learning phase
1226  if (entry == -1)
1227  entry = 0;
1228 
1229  Bool_t resetBranchInfo = kFALSE;
1230  if (entry < fCurrentClusterStart || fNextClusterStart <= entry) {
1231  // We are moving on to another set of clusters.
1232  resetBranchInfo = kTRUE;
1233  if (showMore || gDebug > 6)
1234  Info("FillBuffer", "*** Will reset the branch information about baskets");
1235  } else if (showMore || gDebug > 6) {
1236  Info("FillBuffer", "*** Info we have on the set of baskets");
1237  PrintAllCacheInfo(fBranches);
1238  }
1239 
1240  fEntryCurrentMax = fEntryCurrent;
1241  TTree::TClusterIterator clusterIter = tree->GetClusterIterator(entry);
1242 
1243  auto entryCurrent = clusterIter();
1244  auto entryNext = clusterIter.GetNextEntry();
1245 
1246  if (entryNext < fEntryMin || fEntryMax < entryCurrent) {
1247  // There is no overlap between the cluster we found [entryCurrent, entryNext[
1248  // and the authorized range [fEntryMin, fEntryMax]
1249  // so we have nothing to do
1250  return kFALSE;
1251  }
1252 
1253  fEntryCurrent = entryCurrent;
1254  fEntryNext = entryNext;
1255 
1256 
1257  auto firstClusterEnd = fEntryNext;
1258  if (showMore || gDebug > 6)
1259  Info("FillBuffer", "Looking at cluster spanning from %lld to %lld", fEntryCurrent, fEntryNext);
1260 
1261  if (fEntryCurrent < fEntryMin) fEntryCurrent = fEntryMin;
1262  if (fEntryMax <= 0) fEntryMax = tree->GetEntries();
1263  if (fEntryNext > fEntryMax) fEntryNext = fEntryMax;
1264 
1265  if ( fEnablePrefetching ) {
1266  if ( entry == fEntryMax ) {
1267  // We are at the end, no need to do anything else
1268  return kFALSE;
1269  }
1270  }
1271 
1272  if (resetBranchInfo) {
1273  // We earlier thought we were onto the next set of clusters.
1274  if (fCurrentClusterStart != -1 || fNextClusterStart != -1) {
1275  if (!(fEntryCurrent < fCurrentClusterStart || fEntryCurrent >= fNextClusterStart)) {
1276  Error("FillBuffer", "Inconsistency: fCurrentClusterStart=%lld fEntryCurrent=%lld fNextClusterStart=%lld "
1277  "but fEntryCurrent should not be in between the two",
1278  fCurrentClusterStart, fEntryCurrent, fNextClusterStart);
1279  }
1280  }
1281 
1282  // Start the next cluster set.
1283  fCurrentClusterStart = fEntryCurrent;
1284  fNextClusterStart = firstClusterEnd;
1285  }
1286 
1287  // Check if owner has a TEventList set. If yes we optimize for this
1288  // Special case reading only the baskets containing entries in the
1289  // list.
1290  TEventList *elist = fTree->GetEventList();
1291  Long64_t chainOffset = 0;
1292  if (elist) {
1293  if (fTree->IsA() ==TChain::Class()) {
1294  TChain *chain = (TChain*)fTree;
1295  Int_t t = chain->GetTreeNumber();
1296  chainOffset = chain->GetTreeOffset()[t];
1297  }
1298  }
1299 
1300  //clear cache buffer
1301  Int_t ntotCurrentBuf = 0;
1302  if (fEnablePrefetching){ //prefetching mode
1303  if (fFirstBuffer) {
1304  TFileCacheRead::Prefetch(0,0);
1305  ntotCurrentBuf = fNtot;
1306  }
1307  else {
1308  TFileCacheRead::SecondPrefetch(0,0);
1309  ntotCurrentBuf = fBNtot;
1310  }
1311  }
1312  else {
1313  TFileCacheRead::Prefetch(0,0);
1314  ntotCurrentBuf = fNtot;
1315  }
1316 
1317  //store baskets
1318  BasketRanges ranges((showMore || gDebug > 6) ? fNbranches : 0);
1319  BasketRanges reqRanges(fNbranches);
1320  BasketRanges memRanges((showMore || gDebug > 6) ? fNbranches : 0);
1321  Int_t clusterIterations = 0;
1322  Long64_t minEntry = fEntryCurrent;
1323  Int_t prevNtot;
1324  Long64_t maxReadEntry = minEntry; // If we are stopped before the end of the 2nd pass, this marker will where we need to start next time.
1325  Int_t nReadPrefRequest = 0;
1326  auto perfStats = GetTree()->GetPerfStats();
1327  do {
1328  prevNtot = ntotCurrentBuf;
1329  Long64_t lowestMaxEntry = fEntryMax; // The lowest maximum entry in the TTreeCache for each branch for each pass.
1330 
1331  struct collectionInfo {
1332  Int_t fClusterStart{-1}; // First basket belonging to the current cluster
1333  Int_t fCurrent{0}; // Currently visited basket
1334  Bool_t fLoadedOnce{kFALSE};
1335 
1336  void Rewind() { fCurrent = (fClusterStart >= 0) ? fClusterStart : 0; }
1337  };
1338  std::vector<collectionInfo> cursor(fNbranches);
1339  Bool_t reachedEnd = kFALSE;
1340  Bool_t skippedFirst = kFALSE;
1341  Bool_t oncePerBranch = kFALSE;
1342  Int_t nDistinctLoad = 0;
1343  Bool_t progress = kTRUE;
1344  enum ENarrow {
1345  kFull = 0,
1346  kNarrow = 1
1347  };
1348  enum EPass {
1349  kStart = 1,
1350  kRegular = 2,
1351  kRewind = 3
1352  };
1353 
1354  auto CollectBaskets = [this, elist, chainOffset, entry, clusterIterations, resetBranchInfo, perfStats,
1355  &cursor, &lowestMaxEntry, &maxReadEntry, &minEntry,
1356  &reachedEnd, &skippedFirst, &oncePerBranch, &nDistinctLoad, &progress,
1357  &ranges, &memRanges, &reqRanges,
1358  &ntotCurrentBuf, &nReadPrefRequest](EPass pass, ENarrow narrow, Long64_t maxCollectEntry) {
1359  // The first pass we add one basket per branches around the requested entry
1360  // then in the second pass we add the other baskets of the cluster.
1361  // This is to support the case where the cache is too small to hold a full cluster.
1362  Int_t nReachedEnd = 0;
1363  Int_t nSkipped = 0;
1364  auto oldnReadPrefRequest = nReadPrefRequest;
1365  std::vector<Int_t> potentialVetoes;
1366 
1367  if (showMore || gDebug > 7)
1368  Info("CollectBaskets", "Called with pass=%d narrow=%d maxCollectEntry=%lld", pass, narrow, maxCollectEntry);
1369 
1370  Bool_t filled = kFALSE;
1371  for (Int_t i = 0; i < fNbranches; ++i) {
1372  TBranch *b = (TBranch*)fBranches->UncheckedAt(i);
1373  if (b->GetDirectory()==0 || b->TestBit(TBranch::kDoNotProcess))
1374  continue;
1375  if (b->GetDirectory()->GetFile() != fFile)
1376  continue;
1377  potentialVetoes.clear();
1378  if (pass == kStart && !cursor[i].fLoadedOnce && resetBranchInfo) {
1379  // First check if we have any cluster that is currently in the
1380  // cache but was not used and would be reloaded in the next
1381  // cluster.
1382  b->fCacheInfo.GetUnused(potentialVetoes);
1383  if (showMore || gDebug > 7) {
1384  TString vetolist;
1385  for(auto v : potentialVetoes) {
1386  vetolist += v;
1387  vetolist.Append(' ');
1388  }
1389  if (!potentialVetoes.empty())
1390  Info("FillBuffer", "*** Potential Vetos for branch #%d: %s", i, vetolist.Data());
1391  }
1392  b->fCacheInfo.Reset();
1393  }
1394  Int_t nb = b->GetMaxBaskets();
1395  Int_t *lbaskets = b->GetBasketBytes();
1396  Long64_t *entries = b->GetBasketEntry();
1397  if (!lbaskets || !entries)
1398  continue;
1399  //we have found the branch. We now register all its baskets
1400  // from the requested offset to the basket below fEntryMax
1401  Int_t blistsize = b->GetListOfBaskets()->GetSize();
1402 
1403  auto maxOfBasket = [this, nb, entries](int j) {
1404  return ((j < (nb - 1)) ? (entries[j + 1] - 1) : fEntryMax - 1);
1405  };
1406 
1407  if (pass == kRewind)
1408  cursor[i].Rewind();
1409  for (auto &j = cursor[i].fCurrent; j < nb; j++) {
1410  // This basket has already been read, skip it
1411 
1412  if (j < blistsize && b->GetListOfBaskets()->UncheckedAt(j)) {
1413 
1414  if (showMore || gDebug > 6) {
1415  ranges.Update(i, entries[j], maxOfBasket(j));
1416  memRanges.Update(i, entries[j], maxOfBasket(j));
1417  }
1418  if (entries[j] <= entry && entry <= maxOfBasket(j)) {
1419  b->fCacheInfo.SetIsInCache(j);
1420  b->fCacheInfo.SetUsed(j);
1421  if (narrow) {
1422  // In narrow mode, we would select 'only' this basket,
1423  // so we are done for this round, let's 'consume' this
1424  // basket and go.
1425  ++nReachedEnd;
1426  ++j;
1427  break;
1428  }
1429  }
1430  continue;
1431  }
1432 
1433  // Important: do not try to read maxCollectEntry, otherwise we might jump to the next autoflush
1434  if (entries[j] >= maxCollectEntry) {
1435  ++nReachedEnd;
1436  break; // break out of the for each branch loop.
1437  }
1438 
1439  Long64_t pos = b->GetBasketSeek(j);
1440  Int_t len = lbaskets[j];
1441  if (pos <= 0 || len <= 0)
1442  continue;
1443  if (len > fBufferSizeMin) {
1444  // Do not cache a basket if it is bigger than the cache size!
1445  if ((showMore || gDebug > 7) &&
1446  (!(entries[j] < minEntry && (j < nb - 1 && entries[j + 1] <= minEntry))))
1447  Info("FillBuffer", "Skipping branch %s basket %d is too large for the cache: %d > %d",
1448  b->GetName(), j, len, fBufferSizeMin);
1449  continue;
1450  }
1451 
1452  if (nReadPrefRequest && entries[j] > (reqRanges.AllIncludedRange().fMax + 1)) {
1453  // There is a gap between this basket and the max of the 'lowest' already loaded basket
1454  // If we are tight in memory, reading this basket may prevent reading the basket (for the other branches)
1455  // that covers this gap, forcing those baskets to be read uncached (because the cache wont be reloaded
1456  // until we use this basket).
1457  // eg. We could end up with the cache contain
1458  // b1: [428, 514[ // 'this' basket and we can assume [321 to 428[ is already in memory
1459  // b2: [400, 424[
1460  // and when reading entry 425 we will read b2's basket uncached.
1461 
1462  if (showMore || gDebug > 8)
1463  Info("FillBuffer", "Skipping for now due to gap %d/%d with %lld > %lld", i, j, entries[j],
1464  (reqRanges.AllIncludedRange().fMax + 1));
1465  break; // Without consuming the basket.
1466  }
1467 
1468  if (entries[j] < minEntry && (j<nb-1 && entries[j+1] <= minEntry))
1469  continue;
1470 
1471  // We are within the range
1472  if (cursor[i].fClusterStart == -1)
1473  cursor[i].fClusterStart = j;
1474 
1475  if (elist) {
1476  Long64_t emax = fEntryMax;
1477  if (j<nb-1)
1478  emax = entries[j + 1] - 1;
1479  if (!elist->ContainsRange(entries[j]+chainOffset,emax+chainOffset))
1480  continue;
1481  }
1482 
1483  if (b->fCacheInfo.HasBeenUsed(j) || b->fCacheInfo.IsInCache(j) || b->fCacheInfo.IsVetoed(j)) {
1484  // We already cached and used this basket during this cluster range,
1485  // let's not redo it
1486  if (showMore || gDebug > 7)
1487  Info("FillBuffer", "Skipping basket to avoid redo: %d/%d veto: %d", i, j, b->fCacheInfo.IsVetoed(j));
1488  continue;
1489  }
1490 
1491  if (std::find(std::begin(potentialVetoes), std::end(potentialVetoes), j) != std::end(potentialVetoes)) {
1492  // This basket was in the previous cache/cluster and was not used,
1493  // let's not read it again. I.e. we bet that it will continue to not
1494  // be used. At worst it will be used and thus read by itself.
1495  // Usually in this situation the basket is large so the penalty for
1496  // (re)reading it uselessly is high and the penalty to read it by
1497  // itself is 'small' (i.e. size bigger than latency).
1498  b->fCacheInfo.Veto(j);
1499  if (showMore || gDebug > 7)
1500  Info("FillBuffer", "Veto-ing cluster %d [%lld,%lld[ in branch %s #%d", j, entries[j],
1501  maxOfBasket(j) + 1, b->GetName(), i);
1502  continue;
1503  }
1504 
1505  if (narrow) {
1506  if ((((entries[j] > entry)) || (j < nb - 1 && entries[j + 1] <= entry))) {
1507  // Keep only the basket that contains the entry
1508  if (j == cursor[i].fClusterStart && entry > entries[j])
1509  ++nSkipped;
1510  if (entries[j] > entry)
1511  break;
1512  else
1513  continue;
1514  }
1515  }
1516 
1517  if ((ntotCurrentBuf + len) > fBufferSizeMin) {
1518  // Humm ... we are going to go over the requested size.
1519  if (clusterIterations > 0 && cursor[i].fLoadedOnce) {
1520  // We already have a full cluster and now we would go over the requested
1521  // size, let's stop caching (and make sure we start next time from the
1522  // end of the previous cluster).
1523  if (showMore || gDebug > 5) {
1524  Info(
1525  "FillBuffer",
1526  "Breaking early because %d is greater than %d at cluster iteration %d will restart at %lld",
1527  (ntotCurrentBuf + len), fBufferSizeMin, clusterIterations, minEntry);
1528  }
1529  fEntryNext = minEntry;
1530  filled = kTRUE;
1531  break;
1532  } else {
1533  if (pass == kStart || !cursor[i].fLoadedOnce) {
1534  if ((ntotCurrentBuf + len) > 4 * fBufferSizeMin) {
1535  // Okay, so we have not even made one pass and we already have
1536  // accumulated request for more than twice the memory size ...
1537  // So stop for now, and will restart at the same point, hoping
1538  // that the basket will still be in memory and not asked again ..
1539  fEntryNext = maxReadEntry;
1540 
1541  if (showMore || gDebug > 5) {
1542  Info("FillBuffer", "Breaking early because %d is greater than 4*%d at cluster iteration "
1543  "%d pass %d will restart at %lld",
1544  (ntotCurrentBuf + len), fBufferSizeMin, clusterIterations, pass, fEntryNext);
1545  }
1546  filled = kTRUE;
1547  break;
1548  }
1549  } else {
1550  // We have made one pass through the branches and thus already
1551  // requested one basket per branch, let's stop prefetching
1552  // now.
1553  if ((ntotCurrentBuf + len) > 2 * fBufferSizeMin) {
1554  fEntryNext = maxReadEntry;
1555  if (showMore || gDebug > 5) {
1556  Info("FillBuffer", "Breaking early because %d is greater than 2*%d at cluster iteration "
1557  "%d pass %d will restart at %lld",
1558  (ntotCurrentBuf + len), fBufferSizeMin, clusterIterations, pass, fEntryNext);
1559  }
1560  filled = kTRUE;
1561  break;
1562  }
1563  }
1564  }
1565  }
1566 
1567  ++nReadPrefRequest;
1568 
1569  reqRanges.Update(i, j, entries, nb, fEntryMax);
1570  if (showMore || gDebug > 6)
1571  ranges.Update(i, j, entries, nb, fEntryMax);
1572 
1573  b->fCacheInfo.SetIsInCache(j);
1574 
1575  if (showMore || gDebug > 6)
1576  Info("FillBuffer", "*** Registering branch %d basket %d %s", i, j, b->GetName());
1577 
1578  if (!cursor[i].fLoadedOnce) {
1579  cursor[i].fLoadedOnce = kTRUE;
1580  ++nDistinctLoad;
1581  }
1582  if (R__unlikely(perfStats)) {
1583  perfStats->SetLoaded(i, j);
1584  }
1585 
1586  // Actual registering the basket for loading from the file.
1587  if (fEnablePrefetching){
1588  if (fFirstBuffer) {
1589  TFileCacheRead::Prefetch(pos,len);
1590  ntotCurrentBuf = fNtot;
1591  }
1592  else {
1593  TFileCacheRead::SecondPrefetch(pos,len);
1594  ntotCurrentBuf = fBNtot;
1595  }
1596  }
1597  else {
1598  TFileCacheRead::Prefetch(pos,len);
1599  ntotCurrentBuf = fNtot;
1600  }
1601 
1602  if ( ( j < (nb-1) ) && entries[j+1] > maxReadEntry ) {
1603  // Info("FillBuffer","maxCollectEntry incremented from %lld to %lld", maxReadEntry, entries[j+1]);
1604  maxReadEntry = entries[j+1];
1605  }
1606  if (ntotCurrentBuf > 4 * fBufferSizeMin) {
1607  // Humm something wrong happened.
1608  Warning("FillBuffer", "There is more data in this cluster (starting at entry %lld to %lld, "
1609  "current=%lld) than usual ... with %d %.3f%% of the branches we already have "
1610  "%d bytes (instead of %d)",
1611  fEntryCurrent, fEntryNext, entries[j], i, (100.0 * i) / ((float)fNbranches), ntotCurrentBuf,
1612  fBufferSizeMin);
1613  }
1614  if (pass == kStart) {
1615  // In the first pass, we record one basket per branch and move on to the next branch.
1616  auto high = maxOfBasket(j);
1617  if (high < lowestMaxEntry)
1618  lowestMaxEntry = high;
1619  // 'Consume' the baskets (i.e. avoid looking at it during a subsequent pass)
1620  ++j;
1621  break;
1622  } else if ((j + 1) == nb || entries[j + 1] >= maxReadEntry || entries[j + 1] >= lowestMaxEntry) {
1623  // In the other pass, load the baskets until we get to the maximum loaded so far.
1624  auto high = maxOfBasket(j);
1625  if (high < lowestMaxEntry)
1626  lowestMaxEntry = high;
1627  // 'Consume' the baskets (i.e. avoid looking at it during a subsequent pass)
1628  ++j;
1629  break;
1630  }
1631  }
1632 
1633  if (cursor[i].fCurrent == nb) {
1634  ++nReachedEnd;
1635  }
1636 
1637  if (gDebug > 0)
1638  Info("CollectBaskets",
1639  "Entry: %lld, registering baskets branch %s, fEntryNext=%lld, fNseek=%d, ntotCurrentBuf=%d",
1640  minEntry, ((TBranch *)fBranches->UncheckedAt(i))->GetName(), fEntryNext, fNseek, ntotCurrentBuf);
1641  }
1642  reachedEnd = (nReachedEnd == fNbranches);
1643  skippedFirst = (nSkipped > 0);
1644  oncePerBranch = (nDistinctLoad == fNbranches);
1645  progress = nReadPrefRequest - oldnReadPrefRequest;
1646  return filled;
1647  };
1648 
1649  // First collect all the basket containing the request entry.
1650  bool full = kFALSE;
1651 
1652  full = CollectBaskets(kStart, kNarrow, fEntryNext);
1653 
1654  // Then fill out from all but the 'largest' branch to even out
1655  // the range across branches;
1656  while (!full && !reachedEnd && progress) { // used to be restricted to !oncePerBranch
1657  full = CollectBaskets(kStart, kFull, std::min(maxReadEntry, fEntryNext));
1658  }
1659 
1660  resetBranchInfo = kFALSE; // Make sure the 2nd cluster iteration does not erase the info.
1661 
1662  // Then fill out to the end of the cluster.
1663  if (!full && !fReverseRead) {
1664  do {
1665  full = CollectBaskets(kRegular, kFull, fEntryNext);
1666  } while (!full && !reachedEnd && progress);
1667  }
1668 
1669  // The restart from the start of the cluster.
1670  if (!full && skippedFirst) {
1671  full = CollectBaskets(kRewind, kFull, fEntryNext);
1672  while (!full && !reachedEnd && progress) {
1673  full = CollectBaskets(kRegular, kFull, fEntryNext);
1674  }
1675  }
1676 
1677  clusterIterations++;
1678 
1679  minEntry = clusterIter.Next();
1680  if (fIsLearning) {
1681  fFillTimes++;
1682  }
1683 
1684  // Continue as long as we still make progress (prevNtot < ntotCurrentBuf), that the next entry range to be looked
1685  // at,
1686  // which start at 'minEntry', is not past the end of the requested range (minEntry < fEntryMax)
1687  // and we guess that we not going to go over the requested amount of memory by asking for another set
1688  // of entries (fBufferSizeMin > ((Long64_t)ntotCurrentBuf*(clusterIterations+1))/clusterIterations).
1689  // ntotCurrentBuf / clusterIterations is the average size we are accumulated so far at each loop.
1690  // and thus (ntotCurrentBuf / clusterIterations) * (clusterIterations+1) is a good guess at what the next total
1691  // size
1692  // would be if we run the loop one more time. ntotCurrentBuf and clusterIterations are Int_t but can sometimes
1693  // be 'large' (i.e. 30Mb * 300 intervals) and can overflow the numerical limit of Int_t (i.e. become
1694  // artificially negative). To avoid this issue we promote ntotCurrentBuf to a long long (64 bits rather than 32
1695  // bits)
1696  if (!((fBufferSizeMin > ((Long64_t)ntotCurrentBuf * (clusterIterations + 1)) / clusterIterations) &&
1697  (prevNtot < ntotCurrentBuf) && (minEntry < fEntryMax))) {
1698  if (showMore || gDebug > 6)
1699  Info("FillBuffer", "Breaking because %d <= %lld || (%d >= %d) || %lld >= %lld", fBufferSizeMin,
1700  ((Long64_t)ntotCurrentBuf * (clusterIterations + 1)) / clusterIterations, prevNtot, ntotCurrentBuf,
1701  minEntry, fEntryMax);
1702  break;
1703  }
1704 
1705  //for the reverse reading case
1706  if (!fIsLearning && fReverseRead) {
1707  if (clusterIterations >= fFillTimes)
1708  break;
1709  if (minEntry >= fEntryCurrentMax && fEntryCurrentMax > 0)
1710  break;
1711  }
1712  fEntryNext = clusterIter.GetNextEntry();
1713  if (fEntryNext > fEntryMax) fEntryNext = fEntryMax;
1714  fNextClusterStart = fEntryNext;
1715  } while (kTRUE);
1716 
1717  if (showMore || gDebug > 6) {
1718  Info("FillBuffer", "Mem ranges");
1719  memRanges.Print();
1720  Info("FillBuffer", "Combined ranges");
1721  ranges.Print();
1722  Info("FillBuffer", "Requested ranges");
1723  reqRanges.Print();
1724  PrintAllCacheInfo(fBranches);
1725  }
1726 
1727  if (nReadPrefRequest == 0) {
1728  // Nothing was added in the cache. This usually indicates that the baskets
1729  // contains the requested entry are either already in memory or are too large
1730  // on their own to fit in the cache.
1731  if (showMore || gDebug > 5) {
1732  Info("FillBuffer", "For entry %lld, nothing was added to the cache.", entry);
1733  }
1734  } else if (fEntryNext < firstClusterEnd && !reqRanges.Contains(entry)) {
1735  // Something went very wrong and even-though we searched for the baskets
1736  // holding 'entry' we somehow ended up with a range of entries that does
1737  // validate. So we must have been unable to find or fit the needed basket.
1738  // And thus even-though, we know the corresponding baskets wont be in the cache,
1739  // Let's make it official that 'entry' is within the range of this TTreeCache ('s search.)
1740 
1741  // Without this, the next read will be flagged as 'out-of-range' and then we start at
1742  // the exact same point as this FillBuffer execution resulting in both the requested
1743  // entry still not being part of the cache **and** the beginning of the cluster being
1744  // read **again**.
1745 
1746  if (showMore || gDebug > 5) {
1747  Error("FillBuffer", "Reset the next entry because the currently loaded range does not contains the request "
1748  "entry: %lld. fEntryNext updated from %lld to %lld. %d",
1749  entry, fEntryNext, firstClusterEnd, nReadPrefRequest);
1750  reqRanges.Print();
1751  }
1752 
1753  fEntryNext = firstClusterEnd;
1754  } else {
1755  if (showMore || gDebug > 5) {
1756  Info("FillBuffer", "Complete adding %d baskets from %d branches taking in memory %d out of %d",
1757  nReadPrefRequest, reqRanges.BranchesRegistered(), ntotCurrentBuf, fBufferSizeMin);
1758  }
1759  }
1760 
1761  fNReadPref += nReadPrefRequest;
1762  if (fEnablePrefetching) {
1763  if (fIsLearning) {
1764  fFirstBuffer = !fFirstBuffer;
1765  }
1766  if (!fIsLearning && fFirstTime){
1767  // First time we add autoFlush entries , after fFillTimes * autoFlush
1768  // only in reverse prefetching mode
1769  fFirstTime = kFALSE;
1770  }
1771  }
1772  fIsLearning = kFALSE;
1773  return kTRUE;
1774 }
1775 
1776 ////////////////////////////////////////////////////////////////////////////////
1777 /// Return the desired prefill type from the environment or resource variable
1778 /// - 0 - No prefill
1779 /// - 1 - All branches
1780 
1781 TTreeCache::EPrefillType TTreeCache::GetConfiguredPrefillType() const
1782 {
1783  const char *stcp;
1784  Int_t s = 0;
1785 
1786  if (!(stcp = gSystem->Getenv("ROOT_TTREECACHE_PREFILL")) || !*stcp) {
1787  s = gEnv->GetValue("TTreeCache.Prefill", 1);
1788  } else {
1789  s = TString(stcp).Atoi();
1790  }
1791 
1792  return static_cast<TTreeCache::EPrefillType>(s);
1793 }
1794 
1795 ////////////////////////////////////////////////////////////////////////////////
1796 /// Give the total efficiency of the primary cache... defined as the ratio
1797 /// of blocks found in the cache vs. the number of blocks prefetched
1798 /// ( it could be more than 1 if we read the same block from the cache more
1799 /// than once )
1800 ///
1801 /// Note: This should eb used at the end of the processing or we will
1802 /// get incomplete stats
1803 
1804 Double_t TTreeCache::GetEfficiency() const
1805 {
1806  if ( !fNReadPref )
1807  return 0;
1808 
1809  return ((Double_t)fNReadOk / (Double_t)fNReadPref);
1810 }
1811 
1812 ////////////////////////////////////////////////////////////////////////////////
1813 /// The total efficiency of the 'miss cache' - defined as the ratio
1814 /// of blocks found in the cache versus the number of blocks prefetched
1815 
1816 Double_t TTreeCache::GetMissEfficiency() const
1817 {
1818  if (!fNMissReadPref) {
1819  return 0;
1820  }
1821  return static_cast<double>(fNMissReadOk) / static_cast<double>(fNMissReadPref);
1822 }
1823 
1824 ////////////////////////////////////////////////////////////////////////////////
1825 /// This will indicate a sort of relative efficiency... a ratio of the
1826 /// reads found in the cache to the number of reads so far
1827 
1828 Double_t TTreeCache::GetEfficiencyRel() const
1829 {
1830  if ( !fNReadOk && !fNReadMiss )
1831  return 0;
1832 
1833  return ((Double_t)fNReadOk / (Double_t)(fNReadOk + fNReadMiss));
1834 }
1835 
1836 ////////////////////////////////////////////////////////////////////////////////
1837 /// Relative efficiency of the 'miss cache' - ratio of the reads found in cache
1838 /// to the number of reads so far.
1839 
1840 Double_t TTreeCache::GetMissEfficiencyRel() const
1841 {
1842  if (!fNMissReadOk && !fNMissReadMiss) {
1843  return 0;
1844  }
1845 
1846  return static_cast<double>(fNMissReadOk) / static_cast<double>(fNMissReadOk + fNMissReadMiss);
1847 }
1848 
1849 ////////////////////////////////////////////////////////////////////////////////
1850 /// Static function returning the number of entries used to train the cache
1851 /// see SetLearnEntries
1852 
1853 Int_t TTreeCache::GetLearnEntries()
1854 {
1855  return fgLearnEntries;
1856 }
1857 
1858 ////////////////////////////////////////////////////////////////////////////////
1859 /// Print cache statistics. Like:
1860 ///
1861 /// ~~~ {.cpp}
1862 /// ******TreeCache statistics for file: cms2.root ******
1863 /// Number of branches in the cache ...: 1093
1864 /// Cache Efficiency ..................: 0.997372
1865 /// Cache Efficiency Rel...............: 1.000000
1866 /// Learn entries......................: 100
1867 /// Reading............................: 72761843 bytes in 7 transactions
1868 /// Readahead..........................: 256000 bytes with overhead = 0 bytes
1869 /// Average transaction................: 10394.549000 Kbytes
1870 /// Number of blocks in current cache..: 210, total size: 6280352
1871 /// ~~~
1872 ///
1873 /// - if option = "a" the list of blocks in the cache is printed
1874 /// see also class TTreePerfStats.
1875 /// - if option contains 'cachedbranches', the list of branches being
1876 /// cached is printed.
1877 
1878 void TTreeCache::Print(Option_t *option) const
1879 {
1880  TString opt = option;
1881  opt.ToLower();
1882  printf("******TreeCache statistics for tree: %s in file: %s ******\n",fTree ? fTree->GetName() : "no tree set",fFile ? fFile->GetName() : "no file set");
1883  if (fNbranches <= 0) return;
1884  printf("Number of branches in the cache ...: %d\n",fNbranches);
1885  printf("Cache Efficiency ..................: %f\n",GetEfficiency());
1886  printf("Cache Efficiency Rel...............: %f\n",GetEfficiencyRel());
1887  printf("Secondary Efficiency ..............: %f\n", GetMissEfficiency());
1888  printf("Secondary Efficiency Rel ..........: %f\n", GetMissEfficiencyRel());
1889  printf("Learn entries......................: %d\n",TTreeCache::GetLearnEntries());
1890  if ( opt.Contains("cachedbranches") ) {
1891  opt.ReplaceAll("cachedbranches","");
1892  printf("Cached branches....................:\n");
1893  const TObjArray *cachedBranches = this->GetCachedBranches();
1894  Int_t nbranches = cachedBranches->GetEntriesFast();
1895  for (Int_t i = 0; i < nbranches; ++i) {
1896  TBranch* branch = (TBranch*) cachedBranches->UncheckedAt(i);
1897  printf("Branch name........................: %s\n",branch->GetName());
1898  }
1899  }
1900  TFileCacheRead::Print(opt);
1901 }
1902 
1903 ////////////////////////////////////////////////////////////////////////////////
1904 /// Old method ReadBuffer before the addition of the prefetch mechanism.
1905 
1906 Int_t TTreeCache::ReadBufferNormal(char *buf, Long64_t pos, Int_t len){
1907  //Is request already in the cache?
1908  if (TFileCacheRead::ReadBuffer(buf,pos,len) == 1){
1909  fNReadOk++;
1910  return 1;
1911  }
1912 
1913  static const auto recordMiss = [](TVirtualPerfStats *perfStats, TObjArray *branches, Bool_t bufferFilled,
1914  Long64_t basketpos) {
1915  if (gDebug > 6)
1916  ::Info("TTreeCache::ReadBufferNormal", "Cache miss after an %s FillBuffer: pos=%lld",
1917  bufferFilled ? "active" : "inactive", basketpos);
1918  for (Int_t i = 0; i < branches->GetEntries(); ++i) {
1919  TBranch *b = (TBranch *)branches->UncheckedAt(i);
1920  Int_t blistsize = b->GetListOfBaskets()->GetSize();
1921  for (Int_t j = 0; j < blistsize; ++j) {
1922  if (basketpos == b->GetBasketSeek(j)) {
1923  if (gDebug > 6)
1924  ::Info("TTreeCache::ReadBufferNormal", " Missing basket: %d for %s", j, b->GetName());
1925  perfStats->SetMissed(i, j);
1926  }
1927  }
1928  }
1929  };
1930 
1931  //not found in cache. Do we need to fill the cache?
1932  Bool_t bufferFilled = FillBuffer();
1933  if (bufferFilled) {
1934  Int_t res = TFileCacheRead::ReadBuffer(buf,pos,len);
1935 
1936  if (res == 1)
1937  fNReadOk++;
1938  else if (res == 0) {
1939  fNReadMiss++;
1940  auto perfStats = GetTree()->GetPerfStats();
1941  if (perfStats)
1942  recordMiss(perfStats, fBranches, bufferFilled, pos);
1943  }
1944 
1945  return res;
1946  }
1947 
1948  if (CheckMissCache(buf, pos, len)) {
1949  return 1;
1950  }
1951 
1952  fNReadMiss++;
1953  auto perfStats = GetTree()->GetPerfStats();
1954  if (perfStats)
1955  recordMiss(perfStats, fBranches, bufferFilled, pos);
1956 
1957  return 0;
1958 }
1959 
1960 ////////////////////////////////////////////////////////////////////////////////
1961 /// Used to read a chunk from a block previously fetched. It will call FillBuffer
1962 /// even if the cache lookup succeeds, because it will try to prefetch the next block
1963 /// as soon as we start reading from the current block.
1964 
1965 Int_t TTreeCache::ReadBufferPrefetch(char *buf, Long64_t pos, Int_t len)
1966 {
1967  if (TFileCacheRead::ReadBuffer(buf, pos, len) == 1){
1968  //call FillBuffer to prefetch next block if necessary
1969  //(if we are currently reading from the last block available)
1970  FillBuffer();
1971  fNReadOk++;
1972  return 1;
1973  }
1974 
1975  //keep on prefetching until request is satisfied
1976  // try to prefetch a couple of times and if request is still not satisfied then
1977  // fall back to normal reading without prefetching for the current request
1978  Int_t counter = 0;
1979  while (1) {
1980  if(TFileCacheRead::ReadBuffer(buf, pos, len)) {
1981  break;
1982  }
1983  FillBuffer();
1984  fNReadMiss++;
1985  counter++;
1986  if (counter>1) {
1987  return 0;
1988  }
1989  }
1990 
1991  fNReadOk++;
1992  return 1;
1993 }
1994 
1995 ////////////////////////////////////////////////////////////////////////////////
1996 /// Read buffer at position pos if the request is in the list of
1997 /// prefetched blocks read from fBuffer.
1998 /// Otherwise try to fill the cache from the list of selected branches,
1999 /// and recheck if pos is now in the list.
2000 /// Returns:
2001 /// - -1 in case of read failure,
2002 /// - 0 in case not in cache,
2003 /// - 1 in case read from cache.
2004 /// This function overloads TFileCacheRead::ReadBuffer.
2005 
2006 Int_t TTreeCache::ReadBuffer(char *buf, Long64_t pos, Int_t len)
2007 {
2008  if (!fEnabled) return 0;
2009 
2010  if (fEnablePrefetching)
2011  return TTreeCache::ReadBufferPrefetch(buf, pos, len);
2012  else
2013  return TTreeCache::ReadBufferNormal(buf, pos, len);
2014 }
2015 
2016 ////////////////////////////////////////////////////////////////////////////////
2017 /// This will simply clear the cache
2018 
2019 void TTreeCache::ResetCache()
2020 {
2021  for (Int_t i = 0; i < fNbranches; ++i) {
2022  TBranch *b = (TBranch*)fBranches->UncheckedAt(i);
2023  if (b->GetDirectory()==0 || b->TestBit(TBranch::kDoNotProcess))
2024  continue;
2025  if (b->GetDirectory()->GetFile() != fFile)
2026  continue;
2027  b->fCacheInfo.Reset();
2028  }
2029  fEntryCurrent = -1;
2030  fEntryNext = -1;
2031  fCurrentClusterStart = -1;
2032  fNextClusterStart = -1;
2033 
2034  TFileCacheRead::Prefetch(0,0);
2035 
2036  if (fEnablePrefetching) {
2037  fFirstTime = kTRUE;
2038  TFileCacheRead::SecondPrefetch(0, 0);
2039  }
2040 }
2041 
2042 ////////////////////////////////////////////////////////////////////////////////
2043 /// Change the underlying buffer size of the cache.
2044 /// If the change of size means some cache content is lost, or if the buffer
2045 /// is now larger, setup for a cache refill the next time there is a read
2046 /// Returns:
2047 /// - 0 if the buffer content is still available
2048 /// - 1 if some or all of the buffer content has been made unavailable
2049 /// - -1 on error
2050 
2051 Int_t TTreeCache::SetBufferSize(Int_t buffersize)
2052 {
2053  Int_t prevsize = GetBufferSize();
2054  Int_t res = TFileCacheRead::SetBufferSize(buffersize);
2055  if (res < 0) {
2056  return res;
2057  }
2058 
2059  if (res == 0 && buffersize <= prevsize) {
2060  return res;
2061  }
2062 
2063  // if content was removed from the buffer, or the buffer was enlarged then
2064  // empty the prefetch lists and prime to fill the cache again
2065 
2066  TFileCacheRead::Prefetch(0,0);
2067  if (fEnablePrefetching) {
2068  TFileCacheRead::SecondPrefetch(0, 0);
2069  }
2070 
2071  fEntryCurrent = -1;
2072  if (!fIsLearning) {
2073  fEntryNext = -1;
2074  }
2075 
2076  return 1;
2077 }
2078 
2079 ////////////////////////////////////////////////////////////////////////////////
2080 /// Set the minimum and maximum entry number to be processed
2081 /// this information helps to optimize the number of baskets to read
2082 /// when prefetching the branch buffers.
2083 
2084 void TTreeCache::SetEntryRange(Long64_t emin, Long64_t emax)
2085 {
2086  // This is called by TTreePlayer::Process in an automatic way...
2087  // don't restart it if the user has specified the branches.
2088  Bool_t needLearningStart = (fEntryMin != emin) && fIsLearning && !fIsManual;
2089 
2090  fEntryMin = emin;
2091  fEntryMax = emax;
2092  fEntryNext = fEntryMin + fgLearnEntries * (fIsLearning && !fIsManual);
2093  if (gDebug > 0)
2094  Info("SetEntryRange", "fEntryMin=%lld, fEntryMax=%lld, fEntryNext=%lld",
2095  fEntryMin, fEntryMax, fEntryNext);
2096 
2097  if (needLearningStart) {
2098  // Restart learning
2099  StartLearningPhase();
2100  }
2101 }
2102 
2103 ////////////////////////////////////////////////////////////////////////////////
2104 /// Overload to make sure that the object specific
2105 
2106 void TTreeCache::SetFile(TFile *file, TFile::ECacheAction action)
2107 {
2108  // The infinite recursion is 'broken' by the fact that
2109  // TFile::SetCacheRead remove the entry from fCacheReadMap _before_
2110  // calling SetFile (and also by setting fFile to zero before the calling).
2111  if (fFile) {
2112  TFile *prevFile = fFile;
2113  fFile = 0;
2114  prevFile->SetCacheRead(0, fTree, action);
2115  }
2116  TFileCacheRead::SetFile(file, action);
2117 }
2118 
2119 ////////////////////////////////////////////////////////////////////////////////
2120 /// Static function to set the number of entries to be used in learning mode
2121 /// The default value for n is 10. n must be >= 1
2122 
2123 void TTreeCache::SetLearnEntries(Int_t n)
2124 {
2125  if (n < 1) n = 1;
2126  fgLearnEntries = n;
2127 }
2128 
2129 ////////////////////////////////////////////////////////////////////////////////
2130 /// Set whether the learning period is started with a prefilling of the
2131 /// cache and which type of prefilling is used.
2132 /// The two value currently supported are:
2133 /// - TTreeCache::kNoPrefill disable the prefilling
2134 /// - TTreeCache::kAllBranches fill the cache with baskets from all branches.
2135 /// The default prefilling behavior can be controlled by setting
2136 /// TTreeCache.Prefill or the environment variable ROOT_TTREECACHE_PREFILL.
2137 
2138 void TTreeCache::SetLearnPrefill(TTreeCache::EPrefillType type /* = kNoPrefill */)
2139 {
2140  fPrefillType = type;
2141 }
2142 
2143 ////////////////////////////////////////////////////////////////////////////////
2144 /// The name should be enough to explain the method.
2145 /// The only additional comments is that the cache is cleaned before
2146 /// the new learning phase.
2147 
2148 void TTreeCache::StartLearningPhase()
2149 {
2150  fIsLearning = kTRUE;
2151  fIsManual = kFALSE;
2152  fNbranches = 0;
2153  if (fBrNames) fBrNames->Delete();
2154  fIsTransferred = kFALSE;
2155  fEntryCurrent = -1;
2156 }
2157 
2158 ////////////////////////////////////////////////////////////////////////////////
2159 /// This is the counterpart of StartLearningPhase() and can be used to stop
2160 /// the learning phase. It's useful when the user knows exactly what branches
2161 /// they are going to use.
2162 /// For the moment it's just a call to FillBuffer() since that method
2163 /// will create the buffer lists from the specified branches.
2164 
2165 void TTreeCache::StopLearningPhase()
2166 {
2167  if (fIsLearning) {
2168  // This will force FillBuffer to read the buffers.
2169  fEntryNext = -1;
2170  fIsLearning = kFALSE;
2171  }
2172  fIsManual = kTRUE;
2173 
2174  auto perfStats = GetTree()->GetPerfStats();
2175  if (perfStats)
2176  perfStats->UpdateBranchIndices(fBranches);
2177 
2178  //fill the buffers only once during learning
2179  if (fEnablePrefetching && !fOneTime) {
2180  fIsLearning = kTRUE;
2181  FillBuffer();
2182  fOneTime = kTRUE;
2183  }
2184 }
2185 
2186 ////////////////////////////////////////////////////////////////////////////////
2187 /// Update pointer to current Tree and recompute pointers to the branches in the cache.
2188 
2189 void TTreeCache::UpdateBranches(TTree *tree)
2190 {
2191 
2192  fTree = tree;
2193 
2194  fEntryMin = 0;
2195  fEntryMax = fTree->GetEntries();
2196 
2197  fEntryCurrent = -1;
2198 
2199  if (fBrNames->GetEntries() == 0 && fIsLearning) {
2200  // We still need to learn.
2201  fEntryNext = fEntryMin + fgLearnEntries;
2202  } else {
2203  // We learnt from a previous file.
2204  fIsLearning = kFALSE;
2205  fEntryNext = -1;
2206  }
2207  fNbranches = 0;
2208 
2209  TIter next(fBrNames);
2210  TObjString *os;
2211  while ((os = (TObjString*)next())) {
2212  TBranch *b = fTree->GetBranch(os->GetName());
2213  if (!b) {
2214  continue;
2215  }
2216  fBranches->AddAt(b, fNbranches);
2217  fNbranches++;
2218  }
2219 
2220  auto perfStats = GetTree()->GetPerfStats();
2221  if (perfStats)
2222  perfStats->UpdateBranchIndices(fBranches);
2223 }
2224 
2225 ////////////////////////////////////////////////////////////////////////////////
2226 /// Perform an initial prefetch, attempting to read as much of the learning
2227 /// phase baskets for all branches at once
2228 
2229 void TTreeCache::LearnPrefill()
2230 {
2231  // This is meant for the learning phase
2232  if (!fIsLearning) return;
2233 
2234  // This should be called before reading entries, otherwise we'll
2235  // always exit here, since TBranch adds itself before reading
2236  if (fNbranches > 0) return;
2237 
2238  // Is the LearnPrefill enabled (using an Int_t here to allow for future
2239  // extension to alternative Prefilling).
2240  if (fPrefillType == kNoPrefill) return;
2241 
2242  Long64_t entry = fTree ? fTree->GetReadEntry() : 0;
2243 
2244  // Return early if we are out of the requested range.
2245  if (entry < fEntryMin || entry > fEntryMax) return;
2246 
2247  fLearnPrefilling = kTRUE;
2248 
2249 
2250  // Force only the learn entries to be cached by temporarily setting min/max
2251  // to the learning phase entry range
2252  // But save all the old values, so we can restore everything to how it was
2253  Long64_t eminOld = fEntryMin;
2254  Long64_t emaxOld = fEntryMax;
2255  Long64_t ecurrentOld = fEntryCurrent;
2256  Long64_t enextOld = fEntryNext;
2257  auto currentClusterStartOld = fCurrentClusterStart;
2258  auto nextClusterStartOld = fNextClusterStart;
2259 
2260  fEntryMin = std::max(fEntryMin, fEntryCurrent);
2261  fEntryMax = std::min(fEntryMax, fEntryNext);
2262 
2263  // We check earlier that we are within the authorized range, but
2264  // we might still be out of the (default) learning range and since
2265  // this is called before any branch is added to the cache, this means
2266  // that the user's first GetEntry is this one which is outside of the
2267  // learning range ... so let's do something sensible-ish.
2268  // Note: we probably should also fix the learning range but we may
2269  // or may not have enough information to know if we can move it
2270  // (for example fEntryMin (eminOld right now) might be the default or user provided)
2271  if (entry < fEntryMin) fEntryMin = entry;
2272  if (entry > fEntryMax) fEntryMax = entry;
2273 
2274  // Add all branches to be cached. This also sets fIsManual, stops learning,
2275  // and makes fEntryNext = -1 (which forces a cache fill, which is good)
2276  AddBranch("*");
2277  fIsManual = kFALSE; // AddBranch sets fIsManual, so we reset it
2278 
2279  // Now, fill the buffer with the learning phase entry range
2280  FillBuffer();
2281 
2282  // Leave everything the way we found it
2283  fIsLearning = kTRUE;
2284  DropBranch("*"); // This doesn't work unless we're already learning
2285 
2286  // Restore entry values
2287  fEntryMin = eminOld;
2288  fEntryMax = emaxOld;
2289  fEntryCurrent = ecurrentOld;
2290  fEntryNext = enextOld;
2291  fCurrentClusterStart = currentClusterStartOld;
2292  fNextClusterStart = nextClusterStartOld;
2293 
2294  fLearnPrefilling = kFALSE;
2295 }