Logo ROOT   6.30.04
Reference Guide
 All Namespaces Files Pages
df019_Cache.py
Go to the documentation of this file.
1 ## \file
2 ## \ingroup tutorial_dataframe
3 ## \notebook -draw
4 ## This tutorial shows how the content of a data frame can be cached in memory
5 ## in form of a data frame. The content of the columns is stored in memory in
6 ## contiguous slabs of memory and is "ready to use", i.e. no ROOT IO operation
7 ## is performed.
8 ##
9 ## Creating a cached data frame storing all of its content deserialised and uncompressed
10 ## in memory is particularly useful when dealing with datasets of a moderate size
11 ## (small enough to fit the RAM) over which several explorative loops need to be
12 ## performed at as fast as possible. In addition, caching can be useful when no file
13 ## on disk needs to be created as a side effect of checkpointing part of the analysis.
14 ##
15 ## All steps in the caching are lazy, i.e. the cached data frame is actually filled
16 ## only when the event loop is triggered on it.
17 ##
18 ## \macro_code
19 ## \macro_image
20 ##
21 ## \date June 2018
22 ## \author Danilo Piparo
23 
24 import ROOT
25 RDataFrame = ROOT.ROOT.RDataFrame
26 import os
27 
28 # We create a data frame on top of the hsimple example
29 hsimplePath = os.path.join(str(ROOT.gROOT.GetTutorialDir().Data()), "hsimple.root")
30 df = RDataFrame("ntuple", hsimplePath)
31 
32 #We apply a simple cut and define a new column
33 df_cut = df.Filter("py > 0.f")\
34  .Define("px_plus_py", "px + py")
35 
36 # We cache the content of the dataset. Nothing has happened yet: the work to accomplish
37 # has been described.
38 df_cached = df_cut.Cache()
39 
40 h = df_cached.Histo1D("px_plus_py")
41 
42 # Now the event loop on the cached dataset is triggered. This event triggers the loop
43 # on the `df` data frame lazily.
44 h.Draw()