Implementing Smart References in C++

The C++ community is all excited about smart pointers, but the related idea of smart references doesn't seem to get much press.

Smart references are an old idea. Kennedy's Object-Oriented Abstract Type Hierarchy (OATH) advocated smart references in C++ in 1990. Unfortunately, Kennedy's ideas were not developed further. We aim to revive them.

A smart reference, like a smart pointer, is a class containing a pointer as its only data member. For both, implicit object copying such as argument passing is implemented using a pointer copy. Aliasing semantics are used — a copy of a smart reference refers to the same object on the heap.

So what's the difference between a smart pointer and a smart reference?

Smart pointers are manipulated using pointer syntax, operator* and operator->, while smart reference objects are manipulated directly. This matches our goal of pointer-less programming.
Smart reference classes are a little more work to implement, since (trivial) forwarding functions must be written to replace the operator->. This matches our goal of moving programming effort from library class users to library class implementers.
Smart references do a much better job of implementation hiding. The internals of the smart reference are protected from user access, while a smart pointer needs to be initialized using a raw pointer and the raw pointer can always be accessed using operator->.
For (immutable) value objects, choosing to use smart reference-ness is an implementation decision that should not be visible to the user. The class implementer should decide whether to use a pointer representation at all. For example, "small" objects could be stack-allocated, while "large" objects could be heap-allocated and accessed via a smart reference, and there should be no difference in API. This matches our goal of separating interface from implementation.

STL entities like std::string are not smart pointers — the STL defines copying semantics. A copy of an STL aggregate object does not alias the original object. We believe that the copying semantics of the STL were a design mistake, at least for mutable objects with identity.

For programmers of a modern language like Java, there is nothing new here. Smart references have exactly the semantics of a Java object reference.

The implementation choices

We will use as our running example this hierarchy

Object
    String
    Number
        Int
        Double
...

which we want to implement using smart references.

There are two obvious smart reference implementation approaches:

Use C++ inheritance to directly model the "logical inheritance" relation.

    class Object { ... };
    class String : public Object { ... };
    class Number : public Object { ... };
    class Int    : public Number { ... };

Define constructors or conversion operators for logical upcasts. For example,

    class Object { ... };
    class String { operator Object (); ... };
    class Number { operator Object (); ... };
    class Int    { operator Object (); operator Number (); ... };

The problems

Approach 1 above is the solution chosen by Kennedy, but (as he points out) it is not typesafe. Unsafe code as follows can be written:

    void f (Object & o) { o = Int (1); }
    void g () { String s ("?"); f (s); /* s is now no longer a String */ }

We choose Approach 2.

This approach suffers from the static overloading problem.

If we have a statically overloaded function f,

    void f (Object);
    void f (Number);
    ...
        f (Int (3));

we would like the call to f to resolve statically to f (Number), because Number is Int's logically most derived base class.

Kennedy proved that the desired semantics cannot be implemented with the desired syntax, because there is no way in C++ for one user-defined conversion to be preferred over another. But it is possible to implement the desired semantics with a somewhat uglier syntax, which is developed below:

A solution

One of the "fundamental theorems" of strongly typed languages is to encode as many concepts as possible using the type system.

We have excluded the possibility of smart references classes directly inheriting from each other, but not the possibility of having base classes of the smart reference classes forming an inheritance hierarchy. We can encode type hierarchy information using empty classes that exist purely to carry type information.

Let's try:

    template <class T> class IsA;
    template <> class IsA<Object> {};
    template <> class IsA<String> : public IsA<Object> {};
    template <> class IsA<Number> : public IsA<Object> {};
    template <> class IsA<Int>    : public IsA<Number> {};

Now our smart reference classes can inherit from the corresponding IsA objects:

    class Object : public IsA<Object> { ... };
    class String : public IsA<String> { ... };
    class Number : public IsA<Number> { ... };
    class Int    : public IsA<Int>    { ... };

This uses the Curiously Recurring Template Pattern (CRTP) popularized by Coplien.

The IsA hierarchy is a "spine" of empty classes that can be manipulated using C++ templates.

With the above definitions we can write

    void f (const IsA<Object>& x) { ... }
    void f (const IsA<Number>& x) { ... }
    ...
        f (Int (3));

Now the desired function

    f (const IsA<Number>& x);

is called, since we have the inheritance relations

    IsA<Object>  ⇒  IsA<Number>  ⇒  IsA<Int>  ⇒  Int

even though there is no direct inheritance relationship between Object, Number, and Int.

This looks promising, but this is not quite what we want, since the body of the function

    f (const IsA<Number>& x);

cannot access the "real" object, of type Int, that has been passed in. If all object types are represented the same way and if free-standing IsA<TYPE> objects are outlawed, then the body of f can be written as follows:

    void f (const IsA<Number>& x)
    {
      Number n (static_cast<const Number&> (x));
      /* Use n */
    }

But we would like to allow different representations for different classes inheriting from IsA<Number> and we would like to have the original type of the argument available within the body of f. Pondering on this for a while leads naturally to our primary novel implementation idea:

Doubly Curiously Recurring Template Pattern (DCRTP)

    template <class T, class U = T> class IsA;
    template <class T> class IsA<T,Object> {};
    template <class T> class IsA<T,String> : public IsA<T,Object> {};
    template <class T> class IsA<T,Number> : public IsA<T,Object> {};
    template <class T> class IsA<T,Int>    : public IsA<T,Number> {};

    class Object : public IsA<Object,Object> { ... };
    class String : public IsA<String,String> { ... };
    class Number : public IsA<Number,Number> { ... };
    class Int    : public IsA<Int,Int>       { ... };

    template <class T> void f (const IsA<T,Object>& x) { ... }
    template <class T> void f (const IsA<T,Number>& x) { ... }
    ...
        f (Int (3));

As before, the desired f overload is called. But now the body of f can access the true original object, using the invariant first template parameter, as follows:

    template <class T>
    void f (const IsA<T,Number>& x)
    {
      T t (static_cast<const T&> (x));
      /* Use t */
    };

(Our inheritance hierarchy tries to ensure that a const IsA<T,Number>& can only occur via a reference to an actual T object.)

Of course, we hide the ugly static_cast by providing constructors

    T::T (const IsA<T,U>& x)

for all types T,U.

    template <class T>
    void f (const IsA<T,Number>& x)
    {
      T t (x);  // or: Number t (x);
      /* Use t */
    }

If the body of function f is large, we can avoid template expansion code bloat (although an excellent C++ implementation would make this unnecessary) by using forwarding functions as follows:

    // Function bodies in .cc file
    void f_Object (Object x) { /* lots of code here ... */ }
    void f_Number (Number x) { /* lots of code here ... */ }

    // Forwarding templates in .hpp file
    template <class T> inline void f (const IsA<T,Object>& x) { f_Object (x); }
    template <class T> inline void f (const IsA<T,Number>& x) { f_Number (x); }

MObS library code is written in this style. Many functions are templates taking arguments of the form const IsA<T,Number>&, returning "real" objects of a type like Number. As library implementers, we are willing to put up with a certain amount of syntactic bitterness to get pleasing semantics.

The functionality of our technique is actually a superset of Java's static overloading, because the templated functions retain access to the original type of the passed argument, allowing calls to other templated functions that may statically distinguish amongst those types. This allows mixtures of static and dynamic polymorphism.

Since we want to encourage programming without user-visible pointers or references, it would be nice if we could sweeten

    template <class T> void f (const IsA<T,Number>& n) { ... }

but the only way that comes to mind is a macro

#define ISA(T,U) const IsA<T,U>&

which is distasteful.

Users of the framework who don't create statically overloaded functions can use the desired syntax

    void f (Number n) { ... }

Lazy users can simply program in a Lisp-like style by declaring variables and function parameters of type Object and relying on run-time type checking. This is very much a matter of programming language religion. We like static type checking, even though we admit to being lazy. We like having the compiler find type bugs for us.

Boxing of Value Types

We have succeeded in making life much easier for library clients, but library class implementers have to do a bit of work to integrate a class into the framework. For simple value types, it would be nice if we could get automatic incorporation into the framework using "boxing". In fact, this is possible.

To be eligible for Object-hood, a type only needs to have an operator<<. Note that this can always be done without needing to modify the type's definition. For example, if we have

    struct TrivialString
    {
      std::string s_;
      TrivialString (std::string s) : s_ (s) {}
    };

    // Perhaps defined in a separate module
    inline std::ostream&
    operator<< (std::ostream& o, TrivialString myString)
    {
      return o << myString.s_ << "!";
    }

we can use TrivialString's as Objects and subsequently recover them using Unbox<TrivialString>.

    Object pair = cons (3, TrivialString ("Hello"));
    assert (toString (pair) == "(3 . Hello!)");
    assert (Unbox<TrivialString>(cdr (pair)).s_ == "Hello");

More control over the integration into the framework is possible, if the type integrator is willing to do some work. We use the C++ template traits technique to define properties of the type.

    // ----------------------------------------------------------------
    // A minimal class that wants to be a ISequence
    // TrivialSequence needs to do a little more work to be integrated.
    struct TrivialSequence
    {
      int length () { return 3; }
    };
    inline std::ostream&
    operator<< (std::ostream& o, TrivialSequence)
    {
      return o << "(TrivialSequence: 3)";
    }


    // TrivialSequence registers its "boxing traits"
    namespace MObS
    {
      // Give TrivialSequence a class name (optional)
      template <>
      struct ClassName<TrivialSequence>
      {
	static std::string Name () { return "TrivialSequence"; }
      };

      // Declare that a Boxed<TrivialSequence> is an ISequence, not just an Object
      template <> struct BoxingTraits<TrivialSequence>
	: public DefaultBoxingTraits<TrivialSequence>
      {
	typedef ISequence ObBase; // default is `Object'
      };

      // Need to define the virtual function `length' required by ISequence
      namespace ObI  // "Object Implementation"
      {
	template <>
	class Impl<Boxed<TrivialSequence> >
	  : public DefaultImpl<Boxed<TrivialSequence> >
	{
	public:
	  // Constructors are never inherited
	  inline Impl (TrivialSequence v)
	    : DefaultImpl<Boxed<TrivialSequence> > (v) {}
	  // Typically implement virtual functions by forwarding
	  virtual size_t length () { return value().length(); }
	};
      }
    }

Boxing allows us to use dynamic polymorphism with arbitrary pre-existing value types.

The key to implementing boxed types is to define a template Boxed<T> in the "obvious" way, imitating the implementation of our reference types above:

    template <class T, class U> class IsA<T,Boxed<U> >
        : public IsA<T,Object> {};
    template <class T> class Boxed
	: public IsA<Boxed<T>,Boxed<T> > { ... };

and then start adding static configurability using template traits classes like

    template <typename T> struct DefaultBoxingTraits { ... };

and introducing inheritance from those intermediate base classes.

Some template hackery (template parameter constraints) is required to prevent ambiguity between object boxing constructors taking "foreign" value types and object constructors taking framework types.

This finally allows user code like:

    Object x = list (Number (3), true, 'x');

even though nothing special is known about the types bool or char.