Port readers to C++#
Once your Python reader is working correctly, you can port its logic to C++
for significantly better performance. The C++ reader only needs to keep the
same reading logic as the Python version — the same fields read in the same
order, and data() returning the same structure — so porting is a
straightforward, mechanical process.
Tip
Start with a Python reader for rapid prototyping and debugging. Once the logic is validated, transferring it to C++ is quick because the two readers share the same structure and only differ in language syntax.
C++ IReader base class#
Every C++ reader must inherit from IReader and implement two pure-virtual
methods:
Method |
Purpose |
|---|---|
|
Consume bytes from the buffer for one entry. |
|
Return all accumulated data as |
The same optional methods are available as in Python:
Method |
When it is called |
|---|---|
|
Reading a fixed number of elements. |
|
Reading elements until a byte position. |
|
Member-wise reading for STL containers. |
IReader — C++ base class# 1class IReader {
2 protected:
3 const std::string m_name;
4
5 public:
6 IReader( std::string name ) : m_name( name ) {}
7 virtual ~IReader() = default;
8
9 virtual const std::string name() const { return m_name; }
10
11 virtual void read( BinaryBuffer& buffer ) = 0;
12 virtual py::object data() const = 0;
13
14 virtual uint32_t read_many( BinaryBuffer& buffer, const int64_t count ) {
15 for ( int32_t i = 0; i < count; i++ ) { read( buffer ); }
16 return count;
17 }
18
19 virtual uint32_t read_until( BinaryBuffer& buffer, const uint8_t* end_pos ) {
20 uint32_t cur_count = 0;
21 while ( buffer.get_cursor() < end_pos )
22 {
23 read( buffer );
24 cur_count++;
25 }
26 return cur_count;
27 }
28
29 virtual uint32_t read_many_memberwise( BinaryBuffer& buffer, const int64_t count ) {
30 if ( count < 0 )
31 {
32 std::stringstream msg;
33 msg << name() << "::read_many_memberwise with negative count: " << count;
34 throw std::runtime_error( msg.str() );
35 }
36 return read_many( buffer, count );
37 }
38};
C++ BinaryBuffer#
The C++ BinaryBuffer wraps a uint8_t* buffer with the same convenience
methods as the Python version:
Reading
const T read<T>(): Read a value of typeTfrom the buffer, and advance the cursor.const int16_t read_fVersion(): Equivalent toread<int16_t>().const uint32_t read_fNBytes(): ReadfNBytesfrom the buffer, check the mask, and return the actual number of bytes.const std::string read_null_terminated_string(): Read a null-terminated string from the buffer.const std::string read_obj_header(): Read the object header from the buffer, return the object’s name if present.const std::string read_TString(): Read aTStringfrom the buffer.
Skipping
void skip(const size_t nbytes): Skipnbytesbytes.void skip_fVersion(): Skip thefVersion(2 bytes).void skip_fNBytes(): Equivalent toread_fNBytes(), will check the mask.void skip_null_terminated_string(): Skip a null-terminated string.void skip_obj_header(): Skip the object header.void skip_TObject(): Skip aTObject.
Miscellaneous
const uint8_t* get_data() const: Get the start of the data buffer.const uint8_t* get_cursor() const: Get the current cursor position.const uint32_t* get_offsets() const: Get the entry offsets of the data buffer.const uint64_t* entries() const: Get the number of entries of the data buffer.void debug_print( const size_t n = 100 ) const: Print the nextnbytes from the current cursor for debugging.
Accepting sub-readers#
Composite readers (e.g. STLSeqReader) accept sub-readers for nested types.
Pass sub-readers as std::shared_ptr<IReader> (aliased as SharedReader)
because ownership is shared between C++ and Python.
Zero-copy numpy conversion#
Use the make_array helper to convert a std::shared_ptr<std::vector<T>>
into a numpy array without copying:
std::shared_ptr<std::vector<int>> data = std::make_shared<std::vector<int>>();
data->push_back(1);
data->push_back(2);
data->push_back(3);
py::array_t<int> np_array = make_array(data);
Exposing a reader to Python#
Uproot-custom uses pybind11 for bindings. The declare_reader helper
simplifies declaration:
PYBIND11_MODULE( my_cpp_reader, m) {
declare_reader<MyReaderClass, constructor_arg1_type, constructor_arg2_type, ...>(m, "MyReaderClass");
}
constructor_argN_type— the types of the constructor arguments (SharedReader,std::string, etc.). Omit if the constructor takes onlyname.The second argument to
declare_readeris the Python-visible class name.
Import in Python:
from my_cpp_reader import MyReaderClass
C++ debug output#
Use the debug_print helper for conditional logging. Messages are only
emitted when the UPROOT_DEBUG macro is defined at compile time or the
UPROOT_DEBUG environment variable is set at runtime:
// Will print "The reader name is Bob"
debug_print("The reader name is %s", "Bob");
// Call buffer.debug_print(50), print next 50 bytes from current cursor
debug_print( buffer, 50 )
Adding build_cpp_reader to the factory#
Once the C++ reader is ready, add a build_cpp_reader method to the factory
alongside the existing build_python_reader. This is the only Python-side
change needed — make_awkward_content and make_awkward_form remain exactly
the same, because both readers return data in the same structure.
Side-by-side comparison: OverrideStreamerReader#
The following tabs show the same OverrideStreamerReader (from the
pipeline example)
in Python and C++. Notice how the reading logic is identical — only the
language syntax differs.
OverrideStreamerReader (Python)# 1from array import array
2import numpy as np
3from uproot_custom.readers.python import IReader
4
5
6class OverrideStreamerReader(IReader):
7 def __init__(self, name):
8 super().__init__(name)
9 self.m_ints = array("i") # int32
10 self.m_doubles = array("d") # float64
11
12 def read(self, buffer):
13 buffer.skip_TObject() # skip base class
14 self.m_ints.append(buffer.read_int32()) # m_int
15
16 mask = buffer.read_uint32() # custom mask
17 if mask != 0x12345678:
18 raise RuntimeError(f"Unexpected mask: {mask:#x}")
19
20 self.m_doubles.append(buffer.read_double()) # m_double
21
22 def data(self):
23 return np.asarray(self.m_ints), np.asarray(self.m_doubles)
OverrideStreamerReader (C++)# 1#include <cstdint>
2#include <memory>
3#include <vector>
4
5#include "uproot-custom/uproot-custom.hh"
6
7using namespace uproot;
8
9class OverrideStreamerReader : public IReader {
10 public:
11 OverrideStreamerReader( std::string name )
12 : IReader( name )
13 , m_data_ints( std::make_shared<std::vector<int>>() )
14 , m_data_doubles( std::make_shared<std::vector<double>>() ) {}
15
16 void read( BinaryBuffer& buffer ) {
17 buffer.skip_TObject(); // skip base class
18 m_data_ints->push_back( buffer.read<int>() ); // m_int
19
20 auto mask = buffer.read<uint32_t>(); // custom mask
21 if ( mask != 0x12345678 )
22 throw std::runtime_error( "Unexpected mask: " +
23 std::to_string( mask ) );
24
25 m_data_doubles->push_back( buffer.read<double>() ); // m_double
26 }
27
28 py::object data() const {
29 return py::make_tuple( make_array( m_data_ints ),
30 make_array( m_data_doubles ) );
31 }
32
33 private:
34 std::shared_ptr<std::vector<int>> m_data_ints;
35 std::shared_ptr<std::vector<double>> m_data_doubles;
36};
37
38PYBIND11_MODULE( my_reader_cpp, m ) {
39 declare_reader<OverrideStreamerReader, std::string>( m, "OverrideStreamerReader" );
40}
Then add build_cpp_reader to the factory:
OverrideStreamerFactory#from .my_reader_cpp import OverrideStreamerReader as OverrideStreamerCppReader
class OverrideStreamerFactory(Factory):
def build_cpp_reader(self):
return OverrideStreamerCppReader(self.name)
# build_python_reader, make_awkward_content, make_awkward_form
# remain exactly the same as before
Worked example: TArray in C++#
Compare with the Python version — the reading logic is identical; only the language syntax differs:
TArrayReader (C++) — same logic as the Python version#template <typename T>
class TArrayReader : public IReader {
private:
SharedVector<int64_t> m_offsets;
SharedVector<T> m_data;
public:
TArrayReader( std::string name )
: IReader( name )
, m_offsets( std::make_shared<std::vector<int64_t>>( 1, 0 ) )
, m_data( std::make_shared<std::vector<T>>() ) {}
void read( BinaryBuffer& buffer ) override {
auto fSize = buffer.read<uint32_t>();
m_offsets->push_back( m_offsets->back() + fSize );
for ( auto i = 0; i < fSize; i++ ) { m_data->push_back( buffer.read<T>() ); }
}
py::object data() const override {
auto offsets_array = make_array( m_offsets );
auto data_array = make_array( m_data );
return py::make_tuple( offsets_array, data_array );
}
};
Then add build_cpp_reader to TArrayFactory:
TArrayFactory#class TArrayFactory(Factory):
def build_cpp_reader(self):
return {
"int8": uproot_custom.readers.cpp.TArrayCReader,
"int16": uproot_custom.readers.cpp.TArraySReader,
"int32": uproot_custom.readers.cpp.TArrayIReader,
"int64": uproot_custom.readers.cpp.TArrayLReader,
"float32": uproot_custom.readers.cpp.TArrayFReader,
"float64": uproot_custom.readers.cpp.TArrayDReader,
}[self.dtype](self.name)
def build_python_reader(self):
return uproot_custom.readers.python.TArrayReader(self.name, self.dtype)
# ... make_awkward_content and make_awkward_form remain unchanged
Switching to the C++ backend#
After implementing build_cpp_reader, switch the backend to use
C++ readers:
import uproot_custom.factories as fac
fac.reader_backend = "cpp"
Since "cpp" is the default value, you can simply remove any explicit
fac.reader_backend = "python" that was set during development.
See also
See Reader backends for a full discussion of backend selection and troubleshooting.