Example 2: Read TObjArray with unique known type#
See also
A full example can be found in the example directory of this repo.
This example shows how to use user-known rules to read data.
In this example, we define a TObjWithObjArray class, which contains a TObjArray of TObjWithInt objects. We know the type of objects in the TObjArray is always TObjWithInt, so we can use user-known rules to read the data.
The definition of TObjInObjArray and TObjWithObjArray is as follows:
TObjInObjArray#class TObjInObjArray : public TObject {
  private:
    // STL
    std::string m_str;
    std::vector<int> m_vec_int;
    std::map<int, float> m_map_if;
    std::map<std::string, double> m_map_sd;
    std::map<int, std::string> m_map_is;
    std::map<int, std::vector<int>> m_map_vec_int;
    std::map<int, std::map<int, float>> m_map_map_if;
    // TArray
    TArrayI m_tarr_i{ 0 };
    TArrayC m_tarr_c{ 0 };
    TArrayS m_tarr_s{ 0 };
    TArrayL m_tarr_l{ 0 };
    TArrayF m_tarr_f{ 0 };
    TArrayD m_tarr_d{ 0 };
    // TString
    TString m_tstr;
    // CStyle array
    int m_carr_int[3]{ 0, 0, 0 };
    std::vector<int> m_carr_vec_int[2];
    // basic types
    bool m_bool{ false };
    int8_t m_int8{ 0 };
    int16_t m_int16{ 0 };
    int32_t m_int32{ 0 };
    int64_t m_int64{ 0 };
    uint8_t m_uint8{ 0 };
    uint16_t m_uint16{ 0 };
    uint32_t m_uint32{ 0 };
    uint64_t m_uint64{ 0 };
    float m_float{ 0.0 };
    double m_double{ 0.0 };
    ClassDef( TObjInObjArray, 1 );
  public:
    // ... (skipped for brevity)
};
TObjWithObjArray#class TObjWithObjArray : public TObject {
  private:
    TObjArray m_obj_array;
    ClassDef( TObjWithObjArray, 1 );
  public:
    TObjWithObjArray( int val = 0 ) : TObject(), m_obj_array()
    {
        // preallocate space for 5 elements
        for ( int i = 0; i < val % 5; i++ )
        {
            // This will lead to memory leak, but it's just an example.
            m_obj_array.Add( new TObjInObjArray( val + i ) );
        }
    }
};
Step 1: Check binary data#
Similar to Example 1, we should check the binary data first. Since ROOT automatically splits TObjWithObjArray into several sub-branches, we can just focus on the branch m_obj_array.
To print the binary data, run:
>>> import uproot
>>> import uproot_custom as uc
>>>
>>> br = uproot.open("demo_data.root")["my_tree:obj_with_obj_array/m_obj_array"]
>>> bin_arr = br.array(interpretation=uc.AsBinary())
>>> evt1 = bin_arr[1].to_numpy()
>>> evt1
array([ 64,   0,   1, 155,   0,   3,   0,   1,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   1,   0,   0,   0,   0,  64,
         0,   1, 130, 255, 255, 255, 255,  84,  79,  98, 106,  73, 110,
        79,  98, 106,  65, 114, 114,  97, 121,   0,  64,   0,   1, 107,
         0,   1,   0,   1,   0,   0,   0,   0,   0,   0,   0,   0,  64,
         0,   0,   8,   0,   9,   5, 115, 116, 114,  95,  49,  64,   0,
         0,  10,   0,   9,   0,   0,   0,   1,   0,   0,   0,   1,  64,
         0,   0,  20,  64,   9,   0,   0, 105,   3,  41, 120,   0,   0,
         0,   1,   0,   0,   0,   0,  63, 128,   0,   0,  64,   0,   0,
        32,  64,   9,   0,   0, 250, 148,  40, 178,   0,   0,   0,   1,
        64,   0,   0,   8,   0,   9,   5, 107, 101, 121,  95,  48,  63,
       240,   0,   0,   0,   0,   0,   0,  64,   0,   0,  28,  64,   9,
         0,   0,  11,  95, 183,  82,   0,   0,   0,   1,   0,   0,   0,
         0,  64,   0,   0,   8,   0,   9,   5, 118,  97, 108,  95,  49,
        64,   0,   0,  34,  64,   9,   0,   0,  11,  95, 183,  82,   0,
         0,   0,   1,   0,   0,   0,   0,  64,   0,   0,  14,   0,   9,
         0,   0,   0,   2,   0,   0,   0,   1,   0,   0,   0,  11,  64,
         0,   0,  48,  64,   9,   0,   0,  11,  95, 183,  82,   0,   0,
         0,   1,   0,   0,   0,   0,  64,   0,   0,  28,  64,   9,   0,
         0, 105,   3,  41, 120,   0,   0,   0,   2,   0,   0,   0,   0,
         0,   0,   0,  10,  63, 128,   0,   0,  64,   0,   0,   0,   0,
         0,   0,   1,   0,   0,   0,   1,   0,   0,   0,   1,   1,   0,
         0,   0,   1,   0,   1,   0,   0,   0,   1,   0,   0,   0,   0,
         0,   0,   0,   1,   0,   0,   0,   1,  63, 128,   0,   0,   0,
         0,   0,   1,  63, 240,   0,   0,   0,   0,   0,   0,   6, 116,
       115, 116, 114,  95,  49,   0,   0,   0,   1,   0,   0,   0,   2,
         0,   0,   0,   3,  64,   0,   0,  26,   0,   9,   0,   0,   0,
         2,   0,   0,   0,   1,   0,   0,   0,  11,   0,   0,   0,   2,
         0,   0,   0,   2,   0,   0,   0,  12,   0,   1,   0,   2,   0,
         0,   0,   3,   0,   0,   0,   0,   0,   0,   0,   4,  11,   0,
        12,   0,   0,   0,  13,   0,   0,   0,   0,   0,   0,   0,  14,
        63, 140, 204, 205,  64,   1, 153, 153, 153, 153, 153, 154],
      dtype=uint8)
We can refer to the TObjArray::Streamer method:
TObjArray::Streamer method# 1void TObjArray::Streamer(TBuffer &b)
 2{
 3   UInt_t R__s, R__c;
 4   Int_t nobjects;
 5   if (b.IsReading()) {
 6      Version_t v = b.ReadVersion(&R__s, &R__c);
 7      if (v > 2)
 8         TObject::Streamer(b);
 9      if (v > 1)
10         fName.Streamer(b);
11
12      if (GetEntriesFast() > 0) Clear();
13
14      b >> nobjects;
15      b >> fLowerBound;
16      if (nobjects >= fSize) Expand(nobjects);
17      fLast = -1;
18      TObject *obj;
19      for (Int_t i = 0; i < nobjects; i++) {
20         obj = (TObject*) b.ReadObjectAny(TObject::Class());
21         if (obj) {
22            fCont[i] = obj;
23            fLast = i;
24         }
25      }
26      Changed();
27      b.CheckByteCount(R__s, R__c,TObjArray::IsA());
28   } else {
29      // skipped the writing part for brevity
30   }
31}
According to the Streamer method, the binary data contains:
- Line 6: The first 6 bytes are the- fNBytes(uint32) +- fVersion(int16)header (refer to here).
- Line 7-8: Since the- fVersionis- 3>- 2, the base class- TObjectis then streamed, which takes 10 bytes.
- Line 9-10: Since the- fVersionis- 3>- 1, the next part is- fName(- TString).
- Line 14: The next 4 bytes is- nobjects(int32), which is- [0, 0, 0, 1], i.e.- 1. This means there is one object in this- TObjArray.
- Line 15: The next 4 bytes is- fLowerBound(int32), which is- [0, 0, 0, 0], i.e.- 0.
- Line 19-25: Loop over- nobjectsto read each object. Note that the- [255, 255, 255, 255]indicates that the object’s binary layout follows this rule. In- uproot-custom, it can be handled by- ObjectHeaderFactory.
Tip
For other ROOT built-in classes, it is suggested to check both the streamer information and the source code. If the Streamer method is not overridden, the streamer information is usually enough.
In summary, the binary data contains:
- TObjArrayheader (- fNBytes+- fVersion+- TObject+- fName+- nobjects+- fLowerBound).
- Loop over - nobjectsto read each object:- ObjectHeaderbefore each- TObjInObjArrayobject.
- Data members of - TObjInObjArrayobject.
 
So we need such factories/readers to read the data:
- TObjArrayFactory/- TObjArrayReaderto read- TObjArrayheader and loop over- nobjects.
- ObjectHeaderFactory/- ObjectHeaderReaderto read- ObjectHeader, which are already implemented in- uproot-custom.
- AnyClassFactory/- AnyClassReaderto read- TObjInObjArrayobject, which are already implemented in- uproot-custom.
The TObjArrayFactory/TObjArrayReader should be implemented by ourselves. Note that since we know the type of objects in the TObjArray is always TObjInObjArray, we can take just 1 AnyClassFactory/AnyClassReader as sub-factory/sub-reader to read all objects. This is also a process that embedding user-known rules.
Step 2: Implement C++ reader to read binary data#
Our TObjArrayReader can be implemented as follows:
class TObjArrayReader : public IElementReader {
  private:
    SharedReader m_element_reader;
    std::shared_ptr<std::vector<int64_t>> m_offsets;
  public:
    TObjArrayReader( std::string name, SharedReader element_reader )
        : IElementReader( name )
        , m_element_reader( element_reader )
        , m_offsets( std::make_shared<std::vector<int64_t>>( 1, 0 ) ) {}
    void read( BinaryBuffer& buffer ) override final {
        buffer.skip_fNBytes();
        buffer.skip_fVersion();
        buffer.skip_TObject();
        buffer.read_TString(); // fName
        auto fSize = buffer.read<uint32_t>();
        buffer.skip( 4 ); // fLowerBound
        m_offsets->push_back( m_offsets->back() + fSize );
        m_element_reader->read_many( buffer, fSize );
    }
    py::object data() const override final {
        auto offsets_array      = make_array( m_offsets );
        py::object element_data = m_element_reader->data();
        return py::make_tuple( offsets_array, element_data );
    }
};
PYBIND11_MODULE( my_reader_cpp, m ) {
    declare_reader<TObjArrayReader, std::string, SharedReader>( m, "TObjArrayReader" );
}
- In the constructor, we take one - SharedReaderas the- m_element_reader, which is expected to read- TObjInObjArrayobjects.
- In - readmethod, we read the- TObjArrayheader, then call- m_element_reader->read_manyto read multiple- TObjInObjArrayobjects in one go. Also, we record the offsets of each event in- m_offsets.
- In - datamethod, we return a tuple of- (offsets, element_data), where- offsetsis a 1D array of int64,- element_datais the data returned by- m_element_reader.
- Finally, we declare the - TObjArrayReaderin the- my_reader_cppmodule.
Important
You should always use IElementReader::read_many method to read multiple objects in one go, since some classes (e.g. std::vector) may have “1 header + multiple objects” structure.
Step 3: Implement Python factory#
Similar to Example 1, we need to identify the TObjArray branch and implement a corresponding factory to use our TObjArrayReader.
First, import necessary modules. Since we need to use ObjectHeaderFactory and AnyClassFactory, some extra imports are needed:
import awkward.contents
import awkward.forms
import awkward.index
from uproot_custom import (
    Factory,
    build_factory,
)
from uproot_custom.factories import AnyClassFactory, ObjectHeaderFactory
from .my_reader_cpp import TObjArrayReader
The my_reader_cpp is the compiled C++ module containing our TObjArrayReader.
Implement build_factory#
In this example, we simply regard any TObjArray branch as our target branch. You can implement more specific rules to identify the target branch with item_path.
 1class TObjArrayFactory(Factory):
 2    @classmethod
 3    def priority(cls):
 4        return 50
 5
 6    @classmethod
 7    def build_factory(
 8        cls,
 9        top_type_name: str,
10        cur_streamer_info: dict,
11        all_streamer_info: dict,
12        item_path: str,
13        **kwargs,
14    ):
15        if top_type_name != "TObjArray":
16            return None
17
18        item_path = item_path.replace(".TObjArray*", "")
19        obj_typename = "TObjInObjArray"
20
21        sub_factories = []
22        for s in all_streamer_info[obj_typename]:
23            sub_factories.append(
24                build_factory(
25                    cur_streamer_info=s,
26                    all_streamer_info=all_streamer_info,
27                    item_path=f"{item_path}.{obj_typename}",
28                )
29            )
30
31        return cls(
32            name=cur_streamer_info["fName"],
33            element_factory=ObjectHeaderFactory(
34                name=obj_typename,
35                element_factory=AnyClassFactory(
36                    name=obj_typename,
37                    sub_factories=sub_factories,
38                ),
39            ),
40        )
- Line 3-4: Override- prioritymethod to give a higher priority than the factories with default priority- 10, so that our factory can be chosen first.
- Line 18-19: Fix the- item_path, otherwise the- .TObjArray*suffix will be kept to the final awkward arrays.
- Line 21-29: Prepare the- sub_configsfor- AnyClassFactoryto read- TObjInObjArrayobjects.
- Line 31-40: Combine- ObjectHeaderFactoryand- AnyClassFactoryas the- element_configto read each object in the- TObjArray.
Implement constructor#
The TObjArrayFactory requires an element_factory to read each object in the TObjArray. So we need to implement the constructor:
def __init__(self, name: str, element_factory: Factory):
    super().__init__(name)
    self.element_factory = element_factory
Implement build_cpp_reader#
The build_cpp_reader method is straightforward:
1def build_cpp_reader(self):
2    element_reader = self.element_factory.build_cpp_reader()
3    return TObjArrayReader(self.name, element_reader)
- In - Line 2, we use- self.element_factory.build_cpp_readerto create the- element_reader. Here,- ObjectHeaderFactory, then- AnyClassFactoryare called to create corresponding sub-factories.
- In - Line 3, we just create an instance of- TObjArrayReader, passing the- element_readerto it.
Implement make_awkward_content#
The make_awkward_content method is also straightforward:
def make_awkward_content(self, raw_data):
    offsets, element_raw_data = raw_data
    element_content = self.element_factory.make_awkward_content(element_raw_data)
    return awkward.contents.ListOffsetArray(
        awkward.index.Index64(offsets),
        element_content,
    )
We use self.element_factory.make_awkward_content to construct the element_content, then combine it with offsets to create a ListOffsetArray.
(Optional) Implement make_awkward_form#
You can implement make_awkward_form to provide the awkward form of the final array without reading the binary data:
def make_awkward_form(self):
    element_form = self.element_factory.make_awkward_form()
    return awkward.forms.ListOffsetForm(
        "i64",
        element_form,
    )
Step 4: Register target branch and the factory#
Finally, register the branch we want to read with uproot-custom, and also register the TObjArrayFactory so that it can be used by uproot-custom.
We can do this by adding the following code in the __init__.py of your package:
from uproot_custom import registered_factories, AsCustom
AsCustom.target_branches |= {
    "/my_tree:obj_with_obj_array/m_obj_array",
}
registered_factories.add(TObjArrayFactory)
Step 5: Read data with uproot#
Now we can read the data using uproot as usual:
>>> b = uproot.open("demo_data.root")["my_tree:obj_with_obj_array/m_obj_array"]
>>> arr = b.array()