Reader & factory interface#
Uproot-custom splits the work between two layers:
Reader — reads binary data from the ROOT byte buffer. It can be written in Python (for rapid development and debugging) or C++ (for production performance).
Factory (Python) — orchestrates readers, and converts their output into
awkwardarrays.
This page describes the Python reader and factory interfaces, then presents a worked example using the Python backend.
Reader interface (Python)#
Base class: IReader#
Every Python reader must inherit from uproot_custom.readers.python.IReader
and implement two methods:
Method |
Purpose |
|---|---|
|
Consume bytes from the buffer for one entry. |
|
Return all accumulated data as |
Three additional methods handle special reading patterns (override only when needed):
Method |
When it is called |
|---|---|
|
Reading a fixed number of elements (e.g. c-style arrays). Override when multiple elements share a single header. |
|
Reading elements until a byte position is reached. |
|
Member-wise reading: all first members, then all second members, etc. Used by some STL containers. |
IReader — base class# 1class IReader:
2 def __init__(self, name: str):
3 self.name = name
4
5 def read(self, buffer: BinaryBuffer) -> None:
6 raise NotImplementedError
7
8 def data(self):
9 raise NotImplementedError
10
11 def read_many(self, buffer: BinaryBuffer, count: int) -> int:
12 for _ in range(count):
13 self.read(buffer)
14 return count
15
16 def read_until(self, buffer: BinaryBuffer, end_pos: int) -> int:
17 count = 0
18 while buffer.cursor < end_pos:
19 self.read(buffer)
20 count += 1
21 return count
22
23 def read_many_memberwise(self, buffer: BinaryBuffer, count: int) -> int:
24 raise NotImplementedError
Note
Every reader requires a name: str for debug output.
BinaryBuffer#
BinaryBuffer wraps a byte buffer and provides convenience methods for
reading ROOT binary data.
Reading
read_uint8() → int,read_uint16(),read_uint32(),read_uint64(): Read unsigned integers.read_int8() → int,read_int16(),read_int32(),read_int64(): Read signed integers.read_float() → float,read_double() → float: Read floating-point values.read_bool() → bool: Read a boolean value.read_fVersion() → int: ReadfVersion(signed 16-bit).read_fNBytes() → int: ReadfNBytesfrom the buffer, check the mask, and return the actual number of bytes.read_null_terminated_string() → str: Read a null-terminated string from the buffer.read_obj_header() → str: Read the object header, return the object’s name if present.read_TString() → str: Read aTStringfrom the buffer.
Skipping
skip(n): Skipnbytes.skip_fVersion(): Skip thefVersion(2 bytes).skip_fNBytes(): Equivalent toread_fNBytes(), will check the mask.skip_null_terminated_string(): Skip a null-terminated string.skip_obj_header(): Skip the object header.skip_TObject(): Skip aTObject.
Miscellaneous
cursor: Current cursor position (byte index into the buffer).entries: Number of entries in the data buffer.offsets: Entry offsets of the data buffer.
Accepting sub-readers#
Composite readers accept sub-readers for nested types.
Pass sub-readers as IReader instances because they are managed by Python’s
garbage collector.
Debug output#
Set the UPROOT_DEBUG environment variable to enable debug printing. When
enabled, the module-level debug_print function becomes an alias for
print:
import os
os.environ["UPROOT_DEBUG"] = "1"
# Now debug_print(...) calls will produce output
You can also print the current buffer state:
def read(self, buffer):
print(buffer) # shows the next N bytes from the current cursor
...
Factory interface (Python)#
Every Python factory must inherit from Factory and implement four
methods:
Method |
Purpose |
|---|---|
|
Return an |
|
Match a streamer node; return an instance if matched, |
|
Create and return the Python reader for this node. |
|
Convert raw |
|
Return an |
Uproot-custom iterates over all registered factory classes sorted by
priority() in descending order and uses the first one whose
build_factory returns a non-None result.
priority (classmethod)#
Controls the order in which factories are tried. Factories with higher
priority are called first. The default value is 10.
Override this method when your factory needs to run before or after the built-in ones:
class MySpecialFactory(Factory):
@classmethod
def priority(cls):
return 20 # run before most built-in factories (default 10)
For reference, the built-in factories use the following priorities:
Priority |
Factory |
Why |
|---|---|---|
20 |
|
Must match C-style arrays before other factories see the same type name. |
10 |
All other built-in factories |
Default. |
0 |
|
Catch-all fallback — should run last. |
Tip
When registering a custom factory that targets a specific class name, the
default priority (10) is usually sufficient because build_factory already
returns None for non-matching classes. Increase the priority only when your
factory competes with a built-in factory for the same type name.
Constructor#
The constructor receives all parameters needed for the three runtime methods.
At minimum it must accept a name (usually fName from the streamer info).
build_factory (classmethod)#
Called during factory-tree construction. Parameters:
top_type_name: str— the top-level type name, withstd::prefixes stripped (e.g.vectorforstd::vector<std::map<int, float>>).cur_streamer_info: dict— the streamer dictionary for the current data member. Use it to decide whether this factory is applicable. Example:{'@fUniqueID': 0, '@fBits': 16777216, 'fName': 'm_int', 'fTitle': '', 'fType': 3, 'fSize': 4, 'fArrayLength': 0, 'fArrayDim': 0, 'fMaxIndex': array([0, 0, 0, 0, 0], dtype='>i4'), 'fTypeName': 'int'}
all_streamer_info: dict— maps every class name to the list of its members’ streamer dictionaries. Use it to look up nested classes:>>> all_streamer_info["TSimpleObject"] [{'@fUniqueID': 0, '@fBits': 16777216, 'fName': 'TObject', 'fTitle': 'Basic ROOT object', 'fType': 66, 'fSize': 0, 'fArrayLength': 0, 'fArrayDim': 0, 'fMaxIndex': array([ 0, -1877229523, 0, 0, 0], dtype='>i4'), 'fTypeName': 'BASE', 'fBaseVersion': 1}, {'@fUniqueID': 0, '@fBits': 16777216, 'fName': 'm_int', 'fTitle': '', 'fType': 3, 'fSize': 4, 'fArrayLength': 0, 'fArrayDim': 0, 'fMaxIndex': array([0, 0, 0, 0, 0], dtype='>i4'), 'fTypeName': 'int'}, ... ]
Build sub-factories for nested members:
sub_factories = [] for member in all_streamer_info["TSimpleObject"]: sub_fac = build_factory(member) sub_factories.append(sub_fac)
item_path: str— the dot-separated path from the root to this node. Useful for path-specific logic.**kwargs— reserved for future use.
Return an instance of the factory if the node matches, or None to pass
to the next registered factory.
build_python_reader#
Instantiate and return the Python reader. For composite factories, also build sub-readers and wire them together.
make_awkward_content(raw_data)#
Convert raw_data (the value returned by the reader’s data() method) into
an awkward.contents.Content.
See also
make_awkward_form#
Return an awkward.forms.Form that describes the data layout. This is used
by dask for lazy evaluation.
See also
Worked example: TArray#
Take the TArrayReader / TArrayFactory pair as a concrete example.
TArray nodes store a uint32_t size followed by that many typed elements.
Python reader#
TArrayReader reads fSize, then reads fSize elements. It accumulates
offsets and data using Python’s array module:
TArrayReader (Python)#from array import array
import numpy as np
from uproot_custom.readers.python import IReader, BinaryBuffer, DTYPE_TO_READER, DTYPE_TO_TYPECODE
class TArrayReader(IReader):
def __init__(self, name, dtype):
super().__init__(name)
self.dtype = dtype
self.typecode = DTYPE_TO_TYPECODE[dtype]
self._data = array(self.typecode)
self.offsets = array("q", [0])
self.buffer_reader = DTYPE_TO_READER[dtype]
def read(self, buffer):
fSize = buffer.read_uint32()
self.offsets.append(self.offsets[-1] + fSize)
for _ in range(fSize):
self._data.append(self.buffer_reader(buffer))
def data(self):
offsets_array = np.frombuffer(self.offsets.tobytes(), dtype="int64")
data_array = np.frombuffer(self._data.tobytes(), dtype=self.dtype)
return offsets_array, data_array
Python factory#
TArrayFactory matches any TArray* type name, creates the corresponding
typed Python reader, and converts the offsets + data arrays into an awkward
ListOffsetArray:
TArrayFactory#import uproot_custom.readers.python
class TArrayFactory(Factory):
"""
This class reads TArray from a binary paerser.
TArray includes TArrayC, TArrayS, TArrayI, TArrayL, TArrayL64, TArrayF, and TArrayD.
Corresponding dtype is int8, int16, int32, int64, int64, float32, and float64 respectively.
"""
typename2dtype = {
"TArrayC": "int8",
"TArrayS": "int16",
"TArrayI": "int32",
"TArrayL": "int64",
"TArrayL64": "int64",
"TArrayF": "float32",
"TArrayD": "float64",
}
@classmethod
def build_factory(
cls,
top_type_name,
cur_streamer_info,
all_streamer_info,
item_path,
**kwargs,
):
"""
Return when `top_type_name` is in `cls.typenames`.
"""
if top_type_name not in cls.typename2dtype:
return None
dtype = cls.typename2dtype[top_type_name]
return cls(name=cur_streamer_info["fName"], dtype=dtype)
def __init__(self, name: str, dtype: str):
super().__init__(name)
self.dtype = dtype
def build_cpp_reader(self):
return {
"int8": uproot_custom.readers.cpp.TArrayCReader,
"int16": uproot_custom.readers.cpp.TArraySReader,
"int32": uproot_custom.readers.cpp.TArrayIReader,
"int64": uproot_custom.readers.cpp.TArrayLReader,
"float32": uproot_custom.readers.cpp.TArrayFReader,
"float64": uproot_custom.readers.cpp.TArrayDReader,
}[self.dtype](self.name)
def build_python_reader(self):
return uproot_custom.readers.python.TArrayReader(self.name, self.dtype)
def make_awkward_content(self, raw_data):
offsets, data = raw_data
return awkward.contents.ListOffsetArray(
awkward.index.Index64(offsets),
awkward.contents.NumpyArray(data),
)
def make_awkward_form(self):
return ak.forms.ListOffsetForm("i64", ak.forms.NumpyForm(self.dtype))
See also
Once your Python reader is working correctly, you can port the logic to C++
for production performance. See Port readers to C++ for the C++ IReader API,
BinaryBuffer reference, pybind11 bindings, and a worked C++ version of
TArrayReader.
Next step
You now understand the Reader and Factory interfaces. Move on to Project setup to create a proper Python package that wires your factory and reader together.