Logo ROOT   6.30.04
Reference Guide
 All Namespaces Files Pages
df019_Cache.C
Go to the documentation of this file.
1 /// \file
2 /// \ingroup tutorial_dataframe
3 /// \notebook -draw
4 /// This tutorial shows how the content of a data frame can be cached in memory
5 /// in form of a data frame. The content of the columns is stored in memory in
6 /// contiguous slabs of memory and is "ready to use", i.e. no ROOT IO operation
7 /// is performed.
8 ///
9 /// Creating a cached data frame storing all of its content deserialised and uncompressed
10 /// in memory is particularly useful when dealing with datasets of a moderate size
11 /// (small enough to fit the RAM) over which several explorative loops need to be
12 /// performed at as fast as possible. In addition, caching can be useful when no file
13 /// on disk needs to be created as a side effect of checkpointing part of the analysis.
14 ///
15 /// All steps in the caching are lazy, i.e. the cached data frame is actually filled
16 /// only when the event loop is triggered on it.
17 ///
18 /// \macro_code
19 /// \macro_image
20 ///
21 /// \date June 2018
22 /// \author Danilo Piparo
23 
24 void df019_Cache()
25 {
26  // We create a data frame on top of the hsimple example
27  auto hsimplePath = gROOT->GetTutorialDir();
28  hsimplePath += "/hsimple.root";
29  ROOT::RDataFrame df("ntuple", hsimplePath.Data());
30 
31  // We apply a simple cut and define a new column
32  auto df_cut = df.Filter([](float py) { return py > 0.f; }, {"py"})
33  .Define("px_plus_py", [](float px, float py) { return px + py; }, {"px", "py"});
34 
35  // We cache the content of the dataset. Nothing has happened yet: the work to accomplish
36  // has been described. As for `Snapshot`, the types and columns can be written out explicitly
37  // or left for the jitting to handle (`df_cached` is intentionally unused - it shows how to
38  // to create a *cached* data frame specifying column types explicitly):
39  auto df_cached = df_cut.Cache<float, float>({"px_plus_py", "py"});
40  auto df_cached_implicit = df_cut.Cache();
41  auto h = df_cached_implicit.Histo1D<float>("px_plus_py");
42 
43  // Now the event loop on the cached dataset is triggered. This event triggers the loop
44  // on the `df` data frame lazily.
45  h->DrawCopy();
46 }