Example 2: Read TObjArray with unique known type#
See also
A full example can be found in the example directory of this repo.
This example shows how to use user-known rules to read data.
In this example, we define a TObjWithObjArray class, which contains a TObjArray of TObjWithInt objects. We know the type of objects in the TObjArray is always TObjWithInt, so we can use user-known rules to read the data.
The definition of TObjInObjArray and TObjWithObjArray is as follows:
TObjInObjArray#class TObjInObjArray : public TObject {
private:
// STL
std::string m_str;
std::vector<int> m_vec_int;
std::map<int, float> m_map_if;
std::map<std::string, double> m_map_sd;
std::map<int, std::string> m_map_is;
std::map<int, std::vector<int>> m_map_vec_int;
std::map<int, std::map<int, float>> m_map_map_if;
// TArray
TArrayI m_tarr_i{ 0 };
TArrayC m_tarr_c{ 0 };
TArrayS m_tarr_s{ 0 };
TArrayL m_tarr_l{ 0 };
TArrayF m_tarr_f{ 0 };
TArrayD m_tarr_d{ 0 };
// TString
TString m_tstr;
// CStyle array
int m_carr_int[3]{ 0, 0, 0 };
std::vector<int> m_carr_vec_int[2];
// basic types
bool m_bool{ false };
int8_t m_int8{ 0 };
int16_t m_int16{ 0 };
int32_t m_int32{ 0 };
int64_t m_int64{ 0 };
uint8_t m_uint8{ 0 };
uint16_t m_uint16{ 0 };
uint32_t m_uint32{ 0 };
uint64_t m_uint64{ 0 };
float m_float{ 0.0 };
double m_double{ 0.0 };
ClassDef( TObjInObjArray, 1 );
public:
// ... (skipped for brevity)
};
TObjWithObjArray#class TObjWithObjArray : public TObject {
private:
TObjArray m_obj_array;
ClassDef( TObjWithObjArray, 1 );
public:
TObjWithObjArray( int val = 0 ) : TObject(), m_obj_array()
{
// preallocate space for 5 elements
for ( int i = 0; i < val % 5; i++ )
{
// This will lead to memory leak, but it's just an example.
m_obj_array.Add( new TObjInObjArray( val + i ) );
}
}
};
Step 1: Check binary data#
Similar to Example 1, we should check the binary data first. Since ROOT automatically splits TObjWithObjArray into several sub-branches, we can just focus on the branch m_obj_array.
To print the binary data, run:
>>> import uproot
>>> import uproot_custom as uc
>>>
>>> br = uproot.open("demo_data.root")["my_tree:obj_with_obj_array/m_obj_array"]
>>> bin_arr = br.array(interpretation=uc.AsBinary())
>>> evt1 = bin_arr[1].to_numpy()
>>> evt1
array([ 64, 0, 1, 155, 0, 3, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 64,
0, 1, 130, 255, 255, 255, 255, 84, 79, 98, 106, 73, 110,
79, 98, 106, 65, 114, 114, 97, 121, 0, 64, 0, 1, 107,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 64,
0, 0, 8, 0, 9, 5, 115, 116, 114, 95, 49, 64, 0,
0, 10, 0, 9, 0, 0, 0, 1, 0, 0, 0, 1, 64,
0, 0, 20, 64, 9, 0, 0, 105, 3, 41, 120, 0, 0,
0, 1, 0, 0, 0, 0, 63, 128, 0, 0, 64, 0, 0,
32, 64, 9, 0, 0, 250, 148, 40, 178, 0, 0, 0, 1,
64, 0, 0, 8, 0, 9, 5, 107, 101, 121, 95, 48, 63,
240, 0, 0, 0, 0, 0, 0, 64, 0, 0, 28, 64, 9,
0, 0, 11, 95, 183, 82, 0, 0, 0, 1, 0, 0, 0,
0, 64, 0, 0, 8, 0, 9, 5, 118, 97, 108, 95, 49,
64, 0, 0, 34, 64, 9, 0, 0, 11, 95, 183, 82, 0,
0, 0, 1, 0, 0, 0, 0, 64, 0, 0, 14, 0, 9,
0, 0, 0, 2, 0, 0, 0, 1, 0, 0, 0, 11, 64,
0, 0, 48, 64, 9, 0, 0, 11, 95, 183, 82, 0, 0,
0, 1, 0, 0, 0, 0, 64, 0, 0, 28, 64, 9, 0,
0, 105, 3, 41, 120, 0, 0, 0, 2, 0, 0, 0, 0,
0, 0, 0, 10, 63, 128, 0, 0, 64, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0,
0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 1, 63, 128, 0, 0, 0,
0, 0, 1, 63, 240, 0, 0, 0, 0, 0, 0, 6, 116,
115, 116, 114, 95, 49, 0, 0, 0, 1, 0, 0, 0, 2,
0, 0, 0, 3, 64, 0, 0, 26, 0, 9, 0, 0, 0,
2, 0, 0, 0, 1, 0, 0, 0, 11, 0, 0, 0, 2,
0, 0, 0, 2, 0, 0, 0, 12, 0, 1, 0, 2, 0,
0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 11, 0,
12, 0, 0, 0, 13, 0, 0, 0, 0, 0, 0, 0, 14,
63, 140, 204, 205, 64, 1, 153, 153, 153, 153, 153, 154],
dtype=uint8)
We can refer to the TObjArray::Streamer method:
TObjArray::Streamer method# 1void TObjArray::Streamer(TBuffer &b)
2{
3 UInt_t R__s, R__c;
4 Int_t nobjects;
5 if (b.IsReading()) {
6 Version_t v = b.ReadVersion(&R__s, &R__c);
7 if (v > 2)
8 TObject::Streamer(b);
9 if (v > 1)
10 fName.Streamer(b);
11
12 if (GetEntriesFast() > 0) Clear();
13
14 b >> nobjects;
15 b >> fLowerBound;
16 if (nobjects >= fSize) Expand(nobjects);
17 fLast = -1;
18 TObject *obj;
19 for (Int_t i = 0; i < nobjects; i++) {
20 obj = (TObject*) b.ReadObjectAny(TObject::Class());
21 if (obj) {
22 fCont[i] = obj;
23 fLast = i;
24 }
25 }
26 Changed();
27 b.CheckByteCount(R__s, R__c,TObjArray::IsA());
28 } else {
29 // skipped the writing part for brevity
30 }
31}
According to the Streamer method, the binary data contains:
Line 6: The first 6 bytes are thefNBytes(uint32) +fVersion(int16)header (refer to here).Line 7-8: Since thefVersionis3>2, the base classTObjectis then streamed, which takes 10 bytes.Line 9-10: Since thefVersionis3>1, the next part isfName(TString).Line 14: The next 4 bytes isnobjects(int32), which is[0, 0, 0, 1], i.e.1. This means there is one object in thisTObjArray.Line 15: The next 4 bytes isfLowerBound(int32), which is[0, 0, 0, 0], i.e.0.Line 19-25: Loop overnobjectsto read each object. Note that the[255, 255, 255, 255]indicates that the object’s binary layout follows this rule. Inuproot-custom, it can be handled byObjectHeaderFactory.
Tip
For other ROOT built-in classes, it is suggested to check both the streamer information and the source code. If the Streamer method is not overridden, the streamer information is usually enough.
In summary, the binary data contains:
TObjArrayheader (fNBytes+fVersion+TObject+fName+nobjects+fLowerBound).Loop over
nobjectsto read each object:ObjectHeaderbefore eachTObjInObjArrayobject.Data members of
TObjInObjArrayobject.
So we need such factories/readers to read the data:
TObjArrayFactory/TObjArrayReaderto readTObjArrayheader and loop overnobjects.ObjectHeaderFactory/ObjectHeaderReaderto readObjectHeader, which are already implemented inuproot-custom.AnyClassFactory/AnyClassReaderto readTObjInObjArrayobject, which are already implemented inuproot-custom.
The TObjArrayFactory/TObjArrayReader should be implemented by ourselves. Note that since we know the type of objects in the TObjArray is always TObjInObjArray, we can take just 1 AnyClassFactory/AnyClassReader as sub-factory/sub-reader to read all objects. This is also a process that embedding user-known rules.
Step 2: Implement C++ reader to read binary data#
Our TObjArrayReader can be implemented as follows:
class TObjArrayReader : public IElementReader {
private:
SharedReader m_element_reader;
std::shared_ptr<std::vector<int64_t>> m_offsets;
public:
TObjArrayReader( std::string name, SharedReader element_reader )
: IElementReader( name )
, m_element_reader( element_reader )
, m_offsets( std::make_shared<std::vector<int64_t>>( 1, 0 ) ) {}
void read( BinaryBuffer& buffer ) override final {
buffer.skip_fNBytes();
buffer.skip_fVersion();
buffer.skip_TObject();
buffer.read_TString(); // fName
auto fSize = buffer.read<uint32_t>();
buffer.skip( 4 ); // fLowerBound
m_offsets->push_back( m_offsets->back() + fSize );
m_element_reader->read_many( buffer, fSize );
}
py::object data() const override final {
auto offsets_array = make_array( m_offsets );
py::object element_data = m_element_reader->data();
return py::make_tuple( offsets_array, element_data );
}
};
PYBIND11_MODULE( my_reader_cpp, m ) {
declare_reader<TObjArrayReader, std::string, SharedReader>( m, "TObjArrayReader" );
}
In the constructor, we take one
SharedReaderas them_element_reader, which is expected to readTObjInObjArrayobjects.In
readmethod, we read theTObjArrayheader, then callm_element_reader->read_manyto read multipleTObjInObjArrayobjects in one go. Also, we record the offsets of each event inm_offsets.In
datamethod, we return a tuple of(offsets, element_data), whereoffsetsis a 1D array of int64,element_datais the data returned bym_element_reader.Finally, we declare the
TObjArrayReaderin themy_reader_cppmodule.
Important
You should always use IElementReader::read_many method to read multiple objects in one go, since some classes (e.g. std::vector) may have “1 header + multiple objects” structure.
Step 3: Implement Python factory#
Similar to Example 1, we need to identify the TObjArray branch and implement a corresponding factory to use our TObjArrayReader.
First, import necessary modules. Since we need to use ObjectHeaderFactory and AnyClassFactory, some extra imports are needed:
import awkward.contents
import awkward.forms
import awkward.index
from uproot_custom import (
Factory,
build_factory,
)
from uproot_custom.factories import AnyClassFactory, ObjectHeaderFactory
from .my_reader_cpp import TObjArrayReader
The my_reader_cpp is the compiled C++ module containing our TObjArrayReader.
Implement build_factory#
In this example, we simply regard any TObjArray branch as our target branch. You can implement more specific rules to identify the target branch with item_path.
1class TObjArrayFactory(Factory):
2 @classmethod
3 def priority(cls):
4 return 50
5
6 @classmethod
7 def build_factory(
8 cls,
9 top_type_name: str,
10 cur_streamer_info: dict,
11 all_streamer_info: dict,
12 item_path: str,
13 **kwargs,
14 ):
15 if top_type_name != "TObjArray":
16 return None
17
18 item_path = item_path.replace(".TObjArray*", "")
19 obj_typename = "TObjInObjArray"
20
21 sub_factories = []
22 for s in all_streamer_info[obj_typename]:
23 sub_factories.append(
24 build_factory(
25 cur_streamer_info=s,
26 all_streamer_info=all_streamer_info,
27 item_path=f"{item_path}.{obj_typename}",
28 )
29 )
30
31 return cls(
32 name=cur_streamer_info["fName"],
33 element_factory=ObjectHeaderFactory(
34 name=obj_typename,
35 element_factory=AnyClassFactory(
36 name=obj_typename,
37 sub_factories=sub_factories,
38 ),
39 ),
40 )
Line 3-4: Overrideprioritymethod to give a higher priority than the factories with default priority10, so that our factory can be chosen first.Line 18-19: Fix theitem_path, otherwise the.TObjArray*suffix will be kept to the final awkward arrays.Line 21-29: Prepare thesub_configsforAnyClassFactoryto readTObjInObjArrayobjects.Line 31-40: CombineObjectHeaderFactoryandAnyClassFactoryas theelement_configto read each object in theTObjArray.
Implement constructor#
The TObjArrayFactory requires an element_factory to read each object in the TObjArray. So we need to implement the constructor:
def __init__(self, name: str, element_factory: Factory):
super().__init__(name)
self.element_factory = element_factory
Implement build_cpp_reader#
The build_cpp_reader method is straightforward:
1def build_cpp_reader(self):
2 element_reader = self.element_factory.build_cpp_reader()
3 return TObjArrayReader(self.name, element_reader)
In
Line 2, we useself.element_factory.build_cpp_readerto create theelement_reader. Here,ObjectHeaderFactory, thenAnyClassFactoryare called to create corresponding sub-factories.In
Line 3, we just create an instance ofTObjArrayReader, passing theelement_readerto it.
Implement make_awkward_content#
The make_awkward_content method is also straightforward:
def make_awkward_content(self, raw_data):
offsets, element_raw_data = raw_data
element_content = self.element_factory.make_awkward_content(element_raw_data)
return awkward.contents.ListOffsetArray(
awkward.index.Index64(offsets),
element_content,
)
We use self.element_factory.make_awkward_content to construct the element_content, then combine it with offsets to create a ListOffsetArray.
(Optional) Implement make_awkward_form#
You can implement make_awkward_form to provide the awkward form of the final array without reading the binary data:
def make_awkward_form(self):
element_form = self.element_factory.make_awkward_form()
return awkward.forms.ListOffsetForm(
"i64",
element_form,
)
Step 4: Register target branch and the factory#
Finally, register the branch we want to read with uproot-custom, and also register the TObjArrayFactory so that it can be used by uproot-custom.
We can do this by adding the following code in the __init__.py of your package:
from uproot_custom import registered_factories, AsCustom
AsCustom.target_branches |= {
"/my_tree:obj_with_obj_array/m_obj_array",
}
registered_factories.add(TObjArrayFactory)
Step 5: Read data with uproot#
Now we can read the data using uproot as usual:
>>> b = uproot.open("demo_data.root")["my_tree:obj_with_obj_array/m_obj_array"]
>>> arr = b.array()