Example 1: Streamer method is overridden#
Goal: handle a class whose Streamer overrides the default layout by inserting
an extra mask. We will inspect the bytes, write a Python reader, wrap it with a
factory, register it, then read with Uproot. Finally, we show how to port the
reader to C++ for production speed.
See also
A full example can be found in the example repository.
We define a demo class TOverrideStreamer whose Streamer method is overridden to show how to read such classes using uproot-custom.
There are 2 member variables in TOverrideStreamer: m_int and m_double:
TOverrideStreamer.hh##pragma once
#include <TObject.h>
class TOverrideStreamer : public TObject {
public:
TOverrideStreamer( int val = 0 )
: TObject(), m_int( val ), m_double( (double)val * 3.14 ) {}
private:
int m_int{ 0 };
double m_double{ 0.0 };
ClassDef( TOverrideStreamer, 1 );
};
We add a mask in the Streamer method to demonstrate how to handle special logic in overridden Streamer methods:
TOverrideStreamer.cc##include <TBuffer.h>
#include <TObject.h>
#include <iostream>
#include "TOverrideStreamer.hh"
ClassImp( TOverrideStreamer );
void TOverrideStreamer::Streamer( TBuffer& b ) {
if ( b.IsReading() )
{
TObject::Streamer( b ); // Call base class Streamer
b >> m_int;
unsigned int mask;
b >> mask; // We additionally read a mask
if ( mask != 0x12345678 )
{
std::cerr << "Error: Unexpected mask value: " << std::hex << mask << std::dec
<< std::endl;
return;
}
b >> m_double;
}
else
{
TObject::Streamer( b ); // Call base class Streamer
b << m_int;
unsigned int mask = 0x12345678; // Example mask
b << mask; // Write the mask
b << m_double;
}
}
Step 1: Check the binary data#
Before implementing the Reader and Factory, we should check the binary data of TOverrideStreamer to understand how the data is stored in the ROOT file:
>>> import uproot
>>> import uproot_custom as uc
>>>
>>> br = uproot.open("demo_data.root")["my_tree:override_streamer"]
>>> bin_arr = br.array(interpretation=uc.AsBinary())
>>> evt0 = bin_arr[0].to_numpy()
>>> evt0
array([ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 18, 52, 86, 120, 0, 0, 0, 0, 0, 0, 0, 0],
dtype=uint8)
Referring to the Streamer method above, we can see that the binary data contains:
TObjectcontent (10 bytes,0, 1, 0, 0, 0, 0, 0, 0, 0, 0)m_int(4 bytes,0, 0, 0, 0, which is 0)mask (4 bytes,
18, 52, 86, 120, which is0x12345678)m_double(8 bytes,0, 0, 0, 0, 0, 0, 0, 0, which is 0.0)
These bytes are the data your Reader needs to read.
Step 2: Implement Python Reader to read binary data#
We implement a Python Reader named OverrideStreamerReader:
1from array import array
2import numpy as np
3
4from uproot_custom.readers.python import IReader
5
6
7class OverrideStreamerReader(IReader):
8 def __init__(self, name):
9 super().__init__(name)
10 self.m_ints = array("i") # int32
11 self.m_doubles = array("d") # float64
12
13 def read(self, buffer):
14 # Skip TObject header
15 buffer.skip_TObject()
16
17 # Read integer value
18 self.m_ints.append(buffer.read_int32())
19
20 # Read a custom added mask value
21 mask = buffer.read_uint32()
22 if mask != 0x12345678:
23 raise RuntimeError(f"Error: Unexpected mask value: {mask:#x}")
24
25 # Read double value
26 self.m_doubles.append(buffer.read_double())
27
28 def data(self):
29 int_array = np.asarray(self.m_ints)
30 double_array = np.asarray(self.m_doubles)
31 return int_array, double_array
In the
readmethod, we skip theTObjectheader, then read the member variables and the mask according to the logic in theStreamermethod.
In the
datamethod, we return a tuple containing 2numpyarrays: one form_intand the other form_double.
Step 3: Implement Python Factory#
To use our OverrideStreamerReader and reconstruct the final awkward array, we need to implement a corresponding Factory. We can implement a Factory named OverrideStreamerFactory to do this.
A Factory requires at least 3 methods: build_factory, build_python_reader and make_awkward_content. An optional method make_awkward_form can be implemented to enable dask functionality.
First, import necessary modules:
import awkward.contents
import awkward.forms
from uproot_custom import Factory
Implement build_factory#
We can make an assumpion that the fName of the TStreamerInfo is TOverrideStreamer for our class. If the fName matches, we return a tree config dictionary containing the Factory and the name of the corresponding Reader. Otherwise, we return None to let other factories have a chance to handle current class.
class OverrideStreamerFactory(Factory):
@classmethod
def build_factory(
cls,
top_type_name: str,
cur_streamer_info: dict,
all_streamer_info: dict,
item_path: str,
**kwargs,
):
fName = cur_streamer_info["fName"]
if fName != "TOverrideStreamer":
return None
return cls(fName) # Factory takes `name: str` as constructor argument
Tip
In production, you may want use item_path to make a more accurate identification of whether the current class is the one you want to handle:
class OverrideStreamerFactory(Factory):
@classmethod
def build_factory(
cls,
top_type_name: str,
cur_streamer_info: dict,
all_streamer_info: dict,
item_path: str,
**kwargs,
):
if item_path != "/my_tree:override_streamer":
return None
return cls(fName)
Implement build_python_reader#
Implement build_python_reader to create an instance of OverrideStreamerReader:
def build_python_reader(self):
return OverrideStreamerReader(self.name)
Implement make_awkward_content#
Implement make_awkward_content to construct awkward contents from the raw data returned by the Reader:
def make_awkward_content(self, raw_data):
int_array, double_array = raw_data
return awkward.contents.RecordArray(
[
awkward.contents.NumpyArray(int_array),
awkward.contents.NumpyArray(double_array),
],
["m_int", "m_double"],
)
The raw_data is the object returned by the data method of the Reader. In our example, it is a tuple containing 2 numpy arrays, as illustrated above.
See also
Refer to awkward direct constructors for more details about awkward contents.
(Optional) Implement make_awkward_form#
The make_awkward_form method is optional, but it is easy to implement, since the awkward.forms is similar to awkward.contents:
def make_awkward_form(self):
return awkward.forms.RecordForm(
[
awkward.forms.NumpyForm("int32"),
awkward.forms.NumpyForm("float64"),
],
["m_int", "m_double"],
)
See also
Refer to awkward forms for more details about awkward forms.
Step 4: Register target branch and the Factory#
Finally, we need to register the branch we want to read with uproot-custom, and also register the OverrideStreamerFactory so that it can be used by uproot-custom.
We can do this by adding the following code in the __init__.py of your package:
from uproot_custom import registered_factories, AsCustom
AsCustom.target_branches |= {
"/my_tree:override_streamer",
}
registered_factories.add(OverrideStreamerFactory)
Don’t forget to switch to the Python backend during development, since the default backend is C++:
import uproot_custom.factories as fac
fac.reader_backend = "python" # default is "cpp"
Step 5: Read data with Uproot#
Now we can read the data using Uproot as usual:
>>> b = uproot.open("demo_data.root")["my_tree:override_streamer"]
>>> arr = b.array()
Step 6: Port the reader to C++ for production speed#
Once the Python reader is working, you can port the same logic to C++ for
significantly better performance. The C++ reader only needs to keep the same
reading logic as the Python version — the same fields read in the same
order, and data() returning the same structure.
See also
See ../../tutorial/customize-factory-reader/port-to-cpp.md for the full
C++ reader API reference (IReader, BinaryBuffer, pybind11 bindings).
OverrideStreamerReader (C++) — same logic as the Python version# 1#include <cstdint>
2#include <memory>
3#include <vector>
4
5#include "uproot-custom/uproot-custom.hh"
6
7using namespace uproot;
8
9class OverrideStreamerReader : public IReader {
10 public:
11 OverrideStreamerReader( std::string name )
12 : IReader( name )
13 , m_data_ints( std::make_shared<std::vector<int>>() )
14 , m_data_doubles( std::make_shared<std::vector<double>>() ) {}
15
16 void read( BinaryBuffer& buffer ) {
17 // Skip TObject header
18 buffer.skip_TObject();
19
20 // Read integer value
21 m_data_ints->push_back( buffer.read<int>() );
22
23 // Read a custom added mask value
24 auto mask = buffer.read<uint32_t>();
25 if ( mask != 0x12345678 )
26 {
27 throw std::runtime_error( "Error: Unexpected mask value: " +
28 std::to_string( mask ) );
29 }
30
31 // Read double value
32 m_data_doubles->push_back( buffer.read<double>() );
33 }
34
35 py::object data() const {
36 auto int_array = make_array( m_data_ints );
37 auto double_array = make_array( m_data_doubles );
38 return py::make_tuple( int_array, double_array );
39 }
40
41 private:
42 const std::string m_name;
43 std::shared_ptr<std::vector<int>> m_data_ints;
44 std::shared_ptr<std::vector<double>> m_data_doubles;
45};
46
47// Declare the reader
48PYBIND11_MODULE( my_reader_cpp, m ) {
49 declare_reader<OverrideStreamerReader, std::string>( m, "OverrideStreamerReader" );
50}
Then add build_cpp_reader to the factory:
from .my_reader_cpp import OverrideStreamerReader as OverrideStreamerCppReader
def build_cpp_reader(self):
return OverrideStreamerCppReader(self.name)
After adding build_cpp_reader, simply remove the
fac.reader_backend = "python" line (or set it back to "cpp") to use the
default C++ backend for production:
import uproot_custom.factories as fac
fac.reader_backend = "cpp" # this is the default, so you can also just remove the line
The factory's `make_awkward_content` and `make_awkward_form` remain exactly
the same — they work identically regardless of which backend is used,
because both readers return data in the same structure.