Port readers to C++#

Once your Python reader is working correctly, you can port its logic to C++ for significantly better performance. The C++ reader only needs to keep the same reading logic as the Python version — the same fields read in the same order, and data() returning the same structure — so porting is a straightforward, mechanical process.

Tip

Start with a Python reader for rapid prototyping and debugging. Once the logic is validated, transferring it to C++ is quick because the two readers share the same structure and only differ in language syntax.


C++ IReader base class#

Every C++ reader must inherit from IReader and implement two pure-virtual methods:

Method

Purpose

read(BinaryBuffer&)

Consume bytes from the buffer for one entry.

data() py::object

Return all accumulated data as numpy arrays (or nested containers of them).

The same optional methods are available as in Python:

Method

When it is called

read_many(buffer, count)

Reading a fixed number of elements.

read_until(buffer, end_pos)

Reading elements until a byte position.

read_many_memberwise(buffer, count)

Member-wise reading for STL containers.

IReader — C++ base class#
 1class IReader {
 2  protected:
 3    const std::string m_name;
 4
 5  public:
 6    IReader( std::string name ) : m_name( name ) {}
 7    virtual ~IReader() = default;
 8
 9    virtual const std::string name() const { return m_name; }
10
11    virtual void read( BinaryBuffer& buffer ) = 0;
12    virtual py::object data() const           = 0;
13
14    virtual uint32_t read_many( BinaryBuffer& buffer, const int64_t count ) {
15        for ( int32_t i = 0; i < count; i++ ) { read( buffer ); }
16        return count;
17    }
18
19    virtual uint32_t read_until( BinaryBuffer& buffer, const uint8_t* end_pos ) {
20        uint32_t cur_count = 0;
21        while ( buffer.get_cursor() < end_pos )
22        {
23            read( buffer );
24            cur_count++;
25        }
26        return cur_count;
27    }
28
29    virtual uint32_t read_many_memberwise( BinaryBuffer& buffer, const int64_t count ) {
30        if ( count < 0 )
31        {
32            std::stringstream msg;
33            msg << name() << "::read_many_memberwise with negative count: " << count;
34            throw std::runtime_error( msg.str() );
35        }
36        return read_many( buffer, count );
37    }
38};

C++ BinaryBuffer#

The C++ BinaryBuffer wraps a uint8_t* buffer with the same convenience methods as the Python version:

Reading

  • const T read<T>(): Read a value of type T from the buffer, and advance the cursor.

  • const int16_t read_fVersion(): Equivalent to read<int16_t>().

  • const uint32_t read_fNBytes(): Read fNBytes from the buffer, check the mask, and return the actual number of bytes.

  • const std::string read_null_terminated_string(): Read a null-terminated string from the buffer.

  • const std::string read_obj_header(): Read the object header from the buffer, return the object’s name if present.

  • const std::string read_TString(): Read a TString from the buffer.

Skipping

  • void skip(const size_t nbytes): Skip nbytes bytes.

  • void skip_fVersion(): Skip the fVersion (2 bytes).

  • void skip_fNBytes(): Equivalent to read_fNBytes(), will check the mask.

  • void skip_null_terminated_string(): Skip a null-terminated string.

  • void skip_obj_header(): Skip the object header.

  • void skip_TObject(): Skip a TObject.

Miscellaneous

  • const uint8_t* get_data() const: Get the start of the data buffer.

  • const uint8_t* get_cursor() const: Get the current cursor position.

  • const uint32_t* get_offsets() const: Get the entry offsets of the data buffer.

  • const uint64_t* entries() const: Get the number of entries of the data buffer.

  • void debug_print( const size_t n = 100 ) const: Print the next n bytes from the current cursor for debugging.


Accepting sub-readers#

Composite readers (e.g. STLSeqReader) accept sub-readers for nested types. Pass sub-readers as std::shared_ptr<IReader> (aliased as SharedReader) because ownership is shared between C++ and Python.


Zero-copy numpy conversion#

Use the make_array helper to convert a std::shared_ptr<std::vector<T>> into a numpy array without copying:

std::shared_ptr<std::vector<int>> data = std::make_shared<std::vector<int>>();
data->push_back(1);
data->push_back(2);
data->push_back(3);

py::array_t<int> np_array = make_array(data);

Exposing a reader to Python#

Uproot-custom uses pybind11 for bindings. The declare_reader helper simplifies declaration:

PYBIND11_MODULE( my_cpp_reader, m) {
    declare_reader<MyReaderClass, constructor_arg1_type, constructor_arg2_type, ...>(m, "MyReaderClass");
}
  • constructor_argN_type — the types of the constructor arguments (SharedReader, std::string, etc.). Omit if the constructor takes only name.

  • The second argument to declare_reader is the Python-visible class name.

Import in Python:

from my_cpp_reader import MyReaderClass

C++ debug output#

Use the debug_print helper for conditional logging. Messages are only emitted when the UPROOT_DEBUG macro is defined at compile time or the UPROOT_DEBUG environment variable is set at runtime:

// Will print "The reader name is Bob"
debug_print("The reader name is %s", "Bob");

// Call buffer.debug_print(50), print next 50 bytes from current cursor
debug_print( buffer, 50 )

Adding build_cpp_reader to the factory#

Once the C++ reader is ready, add a build_cpp_reader method to the factory alongside the existing build_python_reader. This is the only Python-side change needed — make_awkward_content and make_awkward_form remain exactly the same, because both readers return data in the same structure.


Side-by-side comparison: OverrideStreamerReader#

The following tabs show the same OverrideStreamerReader (from the pipeline example) in Python and C++. Notice how the reading logic is identical — only the language syntax differs.

OverrideStreamerReader (Python)#
 1from array import array
 2import numpy as np
 3from uproot_custom.readers.python import IReader
 4
 5
 6class OverrideStreamerReader(IReader):
 7    def __init__(self, name):
 8        super().__init__(name)
 9        self.m_ints = array("i")       # int32
10        self.m_doubles = array("d")    # float64
11
12    def read(self, buffer):
13        buffer.skip_TObject()                        # skip base class
14        self.m_ints.append(buffer.read_int32())      # m_int
15
16        mask = buffer.read_uint32()                  # custom mask
17        if mask != 0x12345678:
18            raise RuntimeError(f"Unexpected mask: {mask:#x}")
19
20        self.m_doubles.append(buffer.read_double())  # m_double
21
22    def data(self):
23        return np.asarray(self.m_ints), np.asarray(self.m_doubles)
OverrideStreamerReader (C++)#
 1#include <cstdint>
 2#include <memory>
 3#include <vector>
 4
 5#include "uproot-custom/uproot-custom.hh"
 6
 7using namespace uproot;
 8
 9class OverrideStreamerReader : public IReader {
10  public:
11    OverrideStreamerReader( std::string name )
12        : IReader( name )
13        , m_data_ints( std::make_shared<std::vector<int>>() )
14        , m_data_doubles( std::make_shared<std::vector<double>>() ) {}
15
16    void read( BinaryBuffer& buffer ) {
17        buffer.skip_TObject();                          // skip base class
18        m_data_ints->push_back( buffer.read<int>() );   // m_int
19
20        auto mask = buffer.read<uint32_t>();             // custom mask
21        if ( mask != 0x12345678 )
22            throw std::runtime_error( "Unexpected mask: " +
23                                      std::to_string( mask ) );
24
25        m_data_doubles->push_back( buffer.read<double>() ); // m_double
26    }
27
28    py::object data() const {
29        return py::make_tuple( make_array( m_data_ints ),
30                               make_array( m_data_doubles ) );
31    }
32
33  private:
34    std::shared_ptr<std::vector<int>> m_data_ints;
35    std::shared_ptr<std::vector<double>> m_data_doubles;
36};
37
38PYBIND11_MODULE( my_reader_cpp, m ) {
39    declare_reader<OverrideStreamerReader, std::string>( m, "OverrideStreamerReader" );
40}

Then add build_cpp_reader to the factory:

Adding C++ reader to OverrideStreamerFactory#
from .my_reader_cpp import OverrideStreamerReader as OverrideStreamerCppReader

class OverrideStreamerFactory(Factory):
    def build_cpp_reader(self):
        return OverrideStreamerCppReader(self.name)

    # build_python_reader, make_awkward_content, make_awkward_form
    # remain exactly the same as before

Worked example: TArray in C++#

Compare with the Python version — the reading logic is identical; only the language syntax differs:

TArrayReader (C++) — same logic as the Python version#
template <typename T>
class TArrayReader : public IReader {
    private:
    SharedVector<int64_t> m_offsets;
    SharedVector<T> m_data;

    public:
    TArrayReader( std::string name )
        : IReader( name )
        , m_offsets( std::make_shared<std::vector<int64_t>>( 1, 0 ) )
        , m_data( std::make_shared<std::vector<T>>() ) {}

    void read( BinaryBuffer& buffer ) override {
        auto fSize = buffer.read<uint32_t>();
        m_offsets->push_back( m_offsets->back() + fSize );
        for ( auto i = 0; i < fSize; i++ ) { m_data->push_back( buffer.read<T>() ); }
    }

    py::object data() const override {
        auto offsets_array = make_array( m_offsets );
        auto data_array    = make_array( m_data );
        return py::make_tuple( offsets_array, data_array );
    }
};

Then add build_cpp_reader to TArrayFactory:

Adding C++ reader support to TArrayFactory#
class TArrayFactory(Factory):
    def build_cpp_reader(self):
        return {
            "int8": uproot_custom.readers.cpp.TArrayCReader,
            "int16": uproot_custom.readers.cpp.TArraySReader,
            "int32": uproot_custom.readers.cpp.TArrayIReader,
            "int64": uproot_custom.readers.cpp.TArrayLReader,
            "float32": uproot_custom.readers.cpp.TArrayFReader,
            "float64": uproot_custom.readers.cpp.TArrayDReader,
        }[self.dtype](self.name)

    def build_python_reader(self):
        return uproot_custom.readers.python.TArrayReader(self.name, self.dtype)

    # ... make_awkward_content and make_awkward_form remain unchanged

Switching to the C++ backend#

After implementing build_cpp_reader, switch the backend to use C++ readers:

import uproot_custom.factories as fac
fac.reader_backend = "cpp"

Since "cpp" is the default value, you can simply remove any explicit fac.reader_backend = "python" that was set during development.

See also

See Reader backends for a full discussion of backend selection and troubleshooting.