Example 1: Streamer method is overridden#

See also

A full example can be found in the example directory of this repo.

We define a demo class TOverrideStreamer whose Streamer method is overridden to show how to read such classes using uproot-custom.

There are 2 member variables in TOverrideStreamer: m_int and m_double:

TOverrideStreamer.hh#
#pragma once

#include <TObject.h>

class TOverrideStreamer : public TObject {
  public:
    TOverrideStreamer( int val = 0 )
        : TObject(), m_int( val ), m_double( (double)val * 3.14 ) {}

  private:
    int m_int{ 0 };
    double m_double{ 0.0 };

    ClassDef( TOverrideStreamer, 1 );
};

We add a mask in the Streamer method to demonstrate how to handle special logic in overridden Streamer methods:

TOverrideStreamer.cc#
#include <TBuffer.h>
#include <TObject.h>
#include <iostream>

#include "TOverrideStreamer.hh"

ClassImp( TOverrideStreamer );

void TOverrideStreamer::Streamer( TBuffer& b ) {
    if ( b.IsReading() )
    {
        TObject::Streamer( b ); // Call base class Streamer

        b >> m_int;

        unsigned int mask;
        b >> mask; // We additionally read a mask
        if ( mask != 0x12345678 )
        {
            std::cerr << "Error: Unexpected mask value: " << std::hex << mask << std::dec
                      << std::endl;
            return;
        }

        b >> m_double;
    }
    else
    {
        TObject::Streamer( b ); // Call base class Streamer
        b << m_int;
        unsigned int mask = 0x12345678; // Example mask
        b << mask;                      // Write the mask
        b << m_double;
    }
}

Step 1: Check the binary data#

Before implementing the reader and factory, we should check the binary data of TOverrideStreamer to understand how the data is stored in the ROOT file:

>>> import uproot
>>> import uproot_custom as uc
>>>
>>> br = uproot.open("demo_data.root")["my_tree:override_streamer"]
>>> bin_arr = br.array(interpretation=uc.AsBinary())
>>> evt0 = bin_arr[0].to_numpy()
>>> evt0
array([  0,   1,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,  18,  52,  86, 120,   0,   0,   0,   0,   0,   0,   0,   0],
      dtype=uint8)

Referring to the Streamer method above, we can see that the binary data contains:

  • TObject content (10 bytes, 0,   1,   0,   0,   0,   0,   0,   0,   0,   0)

  • m_int (4 bytes, 0,   0,   0,   0, which is 0)

  • mask (4 bytes, 18,  52,  86, 120, which is 0x12345678)

  • m_double (8 bytes, 0,   0,   0,   0,   0,   0,   0,   0, which is 0.0)

These bytes are the data your reader needs to read.

Step 2: Implement C++ reader to read binary data#

We can implement a reader named OverrideStreamerReader to do this:

 1#include <cstdint>
 2#include <memory>
 3#include <vector>
 4
 5#include "uproot-custom/uproot-custom.hh"
 6
 7using namespace uproot;
 8
 9class OverrideStreamerReader : public IElementReader {
10  public:
11    OverrideStreamerReader( std::string name )
12        : IElementReader( name )
13        , m_data_ints( std::make_shared<std::vector<int>>() )
14        , m_data_doubles( std::make_shared<std::vector<double>>() ) {}
15
16    void read( BinaryBuffer& buffer ) {
17        // Skip TObject header
18        buffer.skip_TObject();
19
20        // Read integer value
21        m_data_ints->push_back( buffer.read<int>() );
22
23        // Read a custom added mask value
24        auto mask = buffer.read<uint32_t>();
25        if ( mask != 0x12345678 )
26        {
27            throw std::runtime_error( "Error: Unexpected mask value: " +
28                                      std::to_string( mask ) );
29        }
30
31        // Read double value
32        m_data_doubles->push_back( buffer.read<double>() );
33    }
34
35    py::object data() const {
36        auto int_array    = make_array( m_data_ints );
37        auto double_array = make_array( m_data_doubles );
38        return py::make_tuple( int_array, double_array );
39    }
40
41  private:
42    const std::string m_name;
43    std::shared_ptr<std::vector<int>> m_data_ints;
44    std::shared_ptr<std::vector<double>> m_data_doubles;
45};
46
47// Declare the reader
48PYBIND11_MODULE( my_reader_cpp, m ) {
49    declare_reader<OverrideStreamerReader, std::string>( m, "OverrideStreamerReader" );
50}
  • In the read method, we skip the TObject header, then read the member variables and the mask according to the logic in the Streamer method.

  • In the data method, we return a py::tuple containing 2 numpy arrays: one for m_int and the other for m_double.

  • Finally we declare the reader in the PYBIND11_MODULE, so that it can be used in Python.

Step 3: Implement Python factory#

To use our OverrideStreamerReader and reconstruct the final awkward array, we need to implement a corresponding factory. We can implement a factory named OverrideStreamerFactory to do this.

A factory requires at least 3 methods: build_factory, build_cpp_reader and make_awkward_content. An optional method make_awkward_form can be implemented to enable dask functionality.

First, import necessary modules:

import awkward.contents
import awkward.forms
from uproot_custom import Factory
from .my_reader_cpp import OverrideStreamerReader

The my_reader_cpp is the compiled C++ module containing our OverrideStreamerReader.

Implement build_factory#

We can make an assumpion that the fName of the TStreamerInfo is TOverrideStreamer for our class. If the fName matches, we return a tree config dictionary containing the factory and the name of the corresponding reader. Otherwise, we return None to let other factories have a chance to handle current class.

class OverrideStreamerFactory(Factory):
    @classmethod
    def build_factory(
        cls,
        top_type_name: str,
        cur_streamer_info: dict,
        all_streamer_info: dict,
        item_path: str,
        **kwargs,
    ):
        fName = cur_streamer_info["fName"]
        if fName != "TOverrideStreamer":
            return None

        return cls(fName) # Factory takes `name: str` as constructor argument

Tip

In production, you may want use item_path to make a more accurate identification of whether the current class is the one you want to handle:

class OverrideStreamerFactory(Factory):
    @classmethod
    def build_factory(
        cls,
        top_type_name: str,
        cur_streamer_info: dict,
        all_streamer_info: dict,
        item_path: str,
        **kwargs,
    ):
        if item_path != "/my_tree:override_streamer":
            return None

        return cls(fName)

Implement build_cpp_reader#

Implement build_cpp_reader to create an instance of OverrideStreamerReader:

def build_cpp_reader(self):
    return OverrideStreamerReader(self.name)

Implement make_awkward_content#

Implement make_awkward_content to construct awkward contents from the raw data returned by the reader:

def make_awkward_content(self, raw_data):
    int_array, double_array = raw_data

    return awkward.contents.RecordArray(
        [
            awkward.contents.NumpyArray(int_array),
            awkward.contents.NumpyArray(double_array),
        ],
        ["m_int", "m_double"],
    )

The raw_data is the object returned by the data method of the reader. In our example, it is a py::tuple containing 2 numpy arrays, as illustrated above.

See also

Refer to awkward direct constructors for more details about awkward contents.

(Optional) Implement make_awkward_form#

The make_awkward_form method is optional, but it is easy to implement, since the awkward.forms is similar to awkward.contents:

def make_awkward_form(self):
    return awkward.forms.RecordForm(
        [
            awkward.forms.NumpyForm("int32"),
            awkward.forms.NumpyForm("float64"),
        ],
        ["m_int", "m_double"],
    )

See also

Refer to awkward forms for more details about awkward forms.

Step 4: Register target branch and the factory#

Finally, we need to register the branch we want to read with uproot-custom, and also register the OverrideStreamerFactory so that it can be used by uproot-custom.

We can do this by adding the following code in the __init__.py of your package:

from uproot_custom import registered_factories, AsCustom

AsCustom.target_branches |= {
    "/my_tree:override_streamer",
}

registered_factories.add(OverrideStreamerFactory)

Step 5: Read data with uproot#

Now we can read the data using uproot as usual:

>>> b = uproot.open("demo_data.root")["my_tree:override_streamer"]
>>> arr = b.array()