Example 1: Streamer method is overridden#
See also
A full example can be found in the example directory of this repo.
We define a demo class TOverrideStreamer whose Streamer method is overridden to show how to read such classes using uproot-custom.
There are 2 member variables in TOverrideStreamer: m_int and m_double:
TOverrideStreamer.hh##pragma once
#include <TObject.h>
class TOverrideStreamer : public TObject {
public:
TOverrideStreamer( int val = 0 )
: TObject(), m_int( val ), m_double( (double)val * 3.14 ) {}
private:
int m_int{ 0 };
double m_double{ 0.0 };
ClassDef( TOverrideStreamer, 1 );
};
We add a mask in the Streamer method to demonstrate how to handle special logic in overridden Streamer methods:
TOverrideStreamer.cc##include <TBuffer.h>
#include <TObject.h>
#include <iostream>
#include "TOverrideStreamer.hh"
ClassImp( TOverrideStreamer );
void TOverrideStreamer::Streamer( TBuffer& b ) {
if ( b.IsReading() )
{
TObject::Streamer( b ); // Call base class Streamer
b >> m_int;
unsigned int mask;
b >> mask; // We additionally read a mask
if ( mask != 0x12345678 )
{
std::cerr << "Error: Unexpected mask value: " << std::hex << mask << std::dec
<< std::endl;
return;
}
b >> m_double;
}
else
{
TObject::Streamer( b ); // Call base class Streamer
b << m_int;
unsigned int mask = 0x12345678; // Example mask
b << mask; // Write the mask
b << m_double;
}
}
Step 1: Check the binary data#
Before implementing the reader and factory, we should check the binary data of TOverrideStreamer to understand how the data is stored in the ROOT file:
>>> import uproot
>>> import uproot_custom as uc
>>>
>>> br = uproot.open("demo_data.root")["my_tree:override_streamer"]
>>> bin_arr = br.array(interpretation=uc.AsBinary())
>>> evt0 = bin_arr[0].to_numpy()
>>> evt0
array([ 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 18, 52, 86, 120, 0, 0, 0, 0, 0, 0, 0, 0],
dtype=uint8)
Referring to the Streamer method above, we can see that the binary data contains:
TObjectcontent (10 bytes,0, 1, 0, 0, 0, 0, 0, 0, 0, 0)m_int(4 bytes,0, 0, 0, 0, which is 0)mask (4 bytes,
18, 52, 86, 120, which is0x12345678)m_double(8 bytes,0, 0, 0, 0, 0, 0, 0, 0, which is 0.0)
These bytes are the data your reader needs to read.
Step 2: Implement C++ reader to read binary data#
We can implement a reader named OverrideStreamerReader to do this:
1#include <cstdint>
2#include <memory>
3#include <vector>
4
5#include "uproot-custom/uproot-custom.hh"
6
7using namespace uproot;
8
9class OverrideStreamerReader : public IElementReader {
10 public:
11 OverrideStreamerReader( std::string name )
12 : IElementReader( name )
13 , m_data_ints( std::make_shared<std::vector<int>>() )
14 , m_data_doubles( std::make_shared<std::vector<double>>() ) {}
15
16 void read( BinaryBuffer& buffer ) {
17 // Skip TObject header
18 buffer.skip_TObject();
19
20 // Read integer value
21 m_data_ints->push_back( buffer.read<int>() );
22
23 // Read a custom added mask value
24 auto mask = buffer.read<uint32_t>();
25 if ( mask != 0x12345678 )
26 {
27 throw std::runtime_error( "Error: Unexpected mask value: " +
28 std::to_string( mask ) );
29 }
30
31 // Read double value
32 m_data_doubles->push_back( buffer.read<double>() );
33 }
34
35 py::object data() const {
36 auto int_array = make_array( m_data_ints );
37 auto double_array = make_array( m_data_doubles );
38 return py::make_tuple( int_array, double_array );
39 }
40
41 private:
42 const std::string m_name;
43 std::shared_ptr<std::vector<int>> m_data_ints;
44 std::shared_ptr<std::vector<double>> m_data_doubles;
45};
46
47// Declare the reader
48PYBIND11_MODULE( my_reader_cpp, m ) {
49 declare_reader<OverrideStreamerReader, std::string>( m, "OverrideStreamerReader" );
50}
In the
readmethod, we skip theTObjectheader, then read the member variables and the mask according to the logic in theStreamermethod.
In the
datamethod, we return apy::tuplecontaining 2numpyarrays: one form_intand the other form_double.Finally we declare the
readerin thePYBIND11_MODULE, so that it can be used in Python.
Step 3: Implement Python factory#
To use our OverrideStreamerReader and reconstruct the final awkward array, we need to implement a corresponding factory. We can implement a factory named OverrideStreamerFactory to do this.
A factory requires at least 3 methods: build_factory, build_cpp_reader and make_awkward_content. An optional method make_awkward_form can be implemented to enable dask functionality.
First, import necessary modules:
import awkward.contents
import awkward.forms
from uproot_custom import Factory
from .my_reader_cpp import OverrideStreamerReader
The my_reader_cpp is the compiled C++ module containing our OverrideStreamerReader.
Implement build_factory#
We can make an assumpion that the fName of the TStreamerInfo is TOverrideStreamer for our class. If the fName matches, we return a tree config dictionary containing the factory and the name of the corresponding reader. Otherwise, we return None to let other factories have a chance to handle current class.
class OverrideStreamerFactory(Factory):
@classmethod
def build_factory(
cls,
top_type_name: str,
cur_streamer_info: dict,
all_streamer_info: dict,
item_path: str,
**kwargs,
):
fName = cur_streamer_info["fName"]
if fName != "TOverrideStreamer":
return None
return cls(fName) # Factory takes `name: str` as constructor argument
Tip
In production, you may want use item_path to make a more accurate identification of whether the current class is the one you want to handle:
class OverrideStreamerFactory(Factory):
@classmethod
def build_factory(
cls,
top_type_name: str,
cur_streamer_info: dict,
all_streamer_info: dict,
item_path: str,
**kwargs,
):
if item_path != "/my_tree:override_streamer":
return None
return cls(fName)
Implement build_cpp_reader#
Implement build_cpp_reader to create an instance of OverrideStreamerReader:
def build_cpp_reader(self):
return OverrideStreamerReader(self.name)
Implement make_awkward_content#
Implement make_awkward_content to construct awkward contents from the raw data returned by the reader:
def make_awkward_content(self, raw_data):
int_array, double_array = raw_data
return awkward.contents.RecordArray(
[
awkward.contents.NumpyArray(int_array),
awkward.contents.NumpyArray(double_array),
],
["m_int", "m_double"],
)
The raw_data is the object returned by the data method of the reader. In our example, it is a py::tuple containing 2 numpy arrays, as illustrated above.
See also
Refer to awkward direct constructors for more details about awkward contents.
(Optional) Implement make_awkward_form#
The make_awkward_form method is optional, but it is easy to implement, since the awkward.forms is similar to awkward.contents:
def make_awkward_form(self):
return awkward.forms.RecordForm(
[
awkward.forms.NumpyForm("int32"),
awkward.forms.NumpyForm("float64"),
],
["m_int", "m_double"],
)
See also
Refer to awkward forms for more details about awkward forms.
Step 4: Register target branch and the factory#
Finally, we need to register the branch we want to read with uproot-custom, and also register the OverrideStreamerFactory so that it can be used by uproot-custom.
We can do this by adding the following code in the __init__.py of your package:
from uproot_custom import registered_factories, AsCustom
AsCustom.target_branches |= {
"/my_tree:override_streamer",
}
registered_factories.add(OverrideStreamerFactory)
Step 5: Read data with uproot#
Now we can read the data using uproot as usual:
>>> b = uproot.open("demo_data.root")["my_tree:override_streamer"]
>>> arr = b.array()