CSD lists implementation notes

List design

The singly-linked list design is straight-forward: a head entry points to the first item, and all the intrusive entries point to the next item. The last item has a next value of nullptr (encoded as zero). In the case of stailq, an entry_ref to the last item also stored.

The tailq’s doubly-linked list design is insipred by the std::list implementation in libc++. In this design, the list is actually circular: the tailq_entry in the list head links the two ends of the list, and corresponds to the end() sentinel. 2 Its “previous” link points to the last item in the list, and its “next” item points back around to the first item in the list. This is shown in the figure below, which also illustrates the tagged pointers in the invocable_tagged_ref encoding:

tailq data structure diagram

When the list is empty, both links in the head entry point to itself:

empty tailq data structure diagram

This is the ideal design given the STL container requirements, e.g., that *--end() is a valid expression that yields the same value as back(). If you work out how to edit the links to insert and remove items “in the middle” of the list, this same code will work even at the boundary conditions (a list of size 0, size 1, etc.) without any special conditionals that check for those cases.

This interesting property makes the circular doubly-linked list one of the “cleanest” and most “algorithmically satisfying” data structures you’re ever likely to encounter – especially when compared to “white board job interview” linked lists which explicitly use nullptr at the head/tail of the list, and need if statements in almost every function to treat the boundary cases differently.

Miscellaneous Notes

iterator_t and const_iterator_t

Several classes define seemingly superfluous identity type aliases for their iterators, e.g.

class iterator;
class const_iterator;

using iterator_t = std::type_identity_t<iterator>;
using const_iterator_t = std::type_identity_t<const_iterator>;

These are needed to work around a strict policy in clang regarding the use of incomplete types in template class member functions, see this cfe-dev mailing list post for details.

compatible_slist and friends

Although copying list heads is not possible, it is possible to move list heads. The source and destination of the move do not need to have the exact same type. For example, the source list head might not store the list size inline (i.e., its SizeMember is csg::no_size) whereas the destination list head does store the size.

In this sense, certain list head types are compatible with others for the purpose of move construction and assignment. The definition of compatibility is formalized as a concept, e.g., for slists:

template <typename T, typename O>
concept compatible_slist = slist<T> &&
    std::same_as<typename T::value_type, typename O::value_type> &&
    std::same_as<typename T::entry_extractor_type, typename O::entry_extractor_type>;

Originally the code used other_list_t for compatible list move constructor and move assignment overloads. other_list_t is an alias template which binds the template parameters that have to be same while leaving the others free. If template argument deduction could be carried out against these overloads, then a compatible list would find an available move constructor or move assignment operator.

When support for C++20 ranges was added, the other_list_t-based compatible list constructor overloads started “losing out” to the range constructors during overload resolution: after all, the compatible lists are ranges and thus technically are more constrained. This required “compatible lists” to be formalized as their own concept which is a refinement of (more constrained than) std::ranges::input_range<T>. Note that compatible_slist has slist<T> as its first atomic constraint, which in turn has std::ranges::input_range<T> as its first constraint.

Unusual code formatting convention

The naming of variables, functions, and classes may seem inconsistent: sometimes lower_case_snake_case identifiers are used, other times camelCase (replete with m_ member prefixes) is used. There is a pattern though: implementation details (such as a private member variables) use camelCase, whereas names in public interfaces use lower_case_snake_case. Why is this?

It is because CSD copies LLVM’s style: its normal style is camelCase, but for consistency, it uses the STL formatting rules when implementing STL-compatible containers or “STL-like” helper classes. This is a useful visual indication that the class in question supports the same interface (namely, an iterator-centric interface) as a “normal” STL class. Since almost everything in CSD is designed to be STL compatible, most public interfaces use lower_case_snake_case. Meanwhile, all non-public names use the CSG “house style”, which is a camelCase style similar to LLVM.

Footnotes

1

Several popular C codebases contain a special macro that does this “inverse offsetof” operation to obtain the original structure address from one of the members. It is usually spelled container_of.

2

This kind of circular linked list is also described in the classic algorithms textbook Introduction to Algorithms (commonly refered to as “CLRS”) in section 10.2, “Sentinels”, when discussing linked lists.