CppCon 2018: Titus Winters “Modern C++ Design (part 1 of 2)”


– Good afternoon everyone. I’m Titus Winters, I do not
love doing my own introductions, but there’s a couple things
that I do want to say. I’ve been leading most of the
C++ Common Library efforts at Google for eight years now, dear lord. I’m a maintainer for
Google C++ style guide. I founded Abseil. I’m now the chair for Library Evolution, that is the group at the
standards committee level that does API design for
the Standard Library. The new study group on tooling, I’ve done tons of guidance, I write a lot of the
tip of the week series, I work professionally on providing good guidance on API
design, and because of all of the code base wide
refactoring work in Abseil and all of the other teams we’ve been on, we are subjected to the pain of fixing it when we get these things wrong, which is to say hopefully you can trust that I am not making all of this up. So, when we talk about design,
why do we talk about design? I think we do this
because we want to ensure that the things that
we produce are usable, that people understand what you mean when you write out an interface, when you write out a function,
when you write out a class, that they know how to use that. By looking at the things that work and the things that don’t we
find ideas and design patterns that are easy to follow and that make the resulting APIs easier to work with. In the end, though, design serves us. This is largely not about math or fundamental principles of the universe. These are not rules written
out on stone tablets and brought down from on high. I am not bringing you commandments here. I am bringing you stories, best practices, things that I have found seem to be the way that we do things,
things that I have found seem to be the right
way to express things, but this is all going to be
evolving over time, right? And I do want you to think about all of these things carefully. Don’t just take my word for it. There are a few places where there are underlying math source embolic logic, and those things will
help inform good design, and that’ll be good. We’ll sort of call that out. There is also this question, are we prescriptivist or descriptivist? We can approach design just like grammar from either of these views. Do we see the rules as
they were written down on those tablets and value
the rules over all else? Or do we see that, oh
hey, we’ve made a mess, and some things work
and some things don’t, and try to produce rules
that describe what things did and what things didn’t to
like encourage the good ones and nudge us away from the bad. And you can sort of take
either viewpoint here, but I do sort of prefer
when we approach things in a descriptivist fashion
of this seems to work, this seems to not work,
and here’s why, right? It is really important
to me that you all think about these things and understand why. This talk will be in roughly three parts, starting small with the
basic units of design and working our way up
to big questions like is this an acceptable
design pattern for types? There’s a spectrum here. As we go forward in the talk we’ll go from talking about syntax
to semantics to theory. This is scheduled in a
two-hour block here at CppCon. And I’m gonna cover the
first of these in this talk and the higher level pieces,
types and type design, in the next slot. So we’ll wade in starting with the smaller and hopefully more understd
part of the design spectrum. First, a question. What is the atom of C++ API design? That is, what is the fundamental
small chunk of API design? It might not be the smallest chunk, but it should be the small
thing that we reach for or that we think about most often. And if you asked me this a
year ago I would have said, well, it’s the function, right? After all, that’s the
piece that we use the most. Free functions, member functions, special member functions, et cetera. But recently I’ve started to think for maybe the last year or so, maybe functions are actually our protons. The better unit of design
is slightly larger. The better unit of design
is an overload set. When you have a well-designed,
when you have a reasonable, when you have a good overload set, and it turns out there’s
actually very solid agreement amongst all of the experts on what is good and reasonable here, overload sets are a much
better unit of design, especially as we move
to a richer type system, richer set of vocabulary types, concepts, and even deeper understanding
of move semantics and move semantic designs. Pop quiz, what does this mean? What does this simple function
signature like this mean? By the end of the talk
I want you to understand that this question is bogus. This question is ill formed. You really need to know a
bit more about Foo, the type, and quite a bit more about f and everything else
that is named the same, everything else that is named f. I will say, if f is
appearing all by itself and isn’t part of an overload set, what we’ve got here is
the function signature for maybe move. This nugget didn’t actually
fit anywhere else in the talk, and I really find it very important so I’m just gonna say that right now. You can repeat it three times
to yourself under your breath. It is maybe move when you see a function signature like this. Okay, overload sets. Somewhat formally, an overload set is a collection of
functions in the same scope, that’s namespace, class,
et cetera, et cetera, of the same name such that
if any one of them is found by name resolution they all will be. That captures the syntax, that captures what the
compiler cares about, but not the semantics. That is what a user will care about, of a good overload set. The core guideline says
very good things about this. Core guidelines has two rules. What is overload options
that are roughly equivalent, that is if you have two things that are doing roughly the same thing, name them the same. And also the flip side,
overload only for operations that are roughly equivalent, all right? That is, if you have two things that are doing something
very, very different, please name them differently, right? This should not be shocking. The Google C++ style guide says use overloaded functions,
including constructors, only if a reader looking at a call site can get a good idea of what is happening without having to first figure out exactly which overload is called, right? You shouldn’t need to do
overload resolution in your head and know all of the
symbols that might show up through transitive inclusion. Like, what’s everything in your program that might have the same name, right? You should actually only
name things the same if it doesn’t matter to
the reader which of those is actually gonna get picked, right? If it’s gonna do the same thing. We’re definitely lacking
a solid theoretical way to describe that same thing, ’cause it’s sort of squishy, right? Like you can’t really say, like, give me the semantic definition of I have a function of two
arguments and a function of three arguments, and
they do the same thing. Like, that’s gonna be just
weird to try to come up with any sort of formal
definition of that, right? But we sort of can see what we
mean with some examples here. So for instance, we can overload on arity. How many parameters the function takes. And a great example of
this is StrCat from Abseil. We’ve had a variation on StrCat in our code base at Google for many years. Pre C++ 11 StrCat was an overload set of something like 25 or
26 separate functions to go from arity 1 all
the way up to arity 25. And it didn’t matter, right? You don’t need to know which
of those you’re calling because what StrCat does
is take all of its things, convert them to string, and return you the concatenation there. And even after we switch to C++ 11 and moved this to being
a variadic template, still doesn’t matter, right? It’s one thing, because
even that statement of it being a variadic template is slightly a lie because the
first I think five arities are hand rolled free functions
for optimization purposes to make it easier on the compiler, and none of that matters, right? Because you see a call to StrCat, you don’t have to count them, you don’t have to know
which one is called. It just does one thing, right? So you can clearly overload on arity in some cases like this. You can also overload on types, usually for types that are similar, and the most common example
that you’re gonna find is for legacy like stringish overloads. There was some old
function in your code base that accepted const char*
and someone got tired of that and they added an overload
for const string ref. And this is a great example of a well-designed overload set, right? You’ve got some sort of stringish data. The user that is calling this function or the reader of some code
that is calling this function, this overload set, sorry, excuse me, doesn’t need to know which
type is being passed exactly or which function is being called exactly because we can see at a glance that the semantics are the same, right? In this case, one is implemented in line in terms of the other. We see overloads throughout
the standard library for optimization as a
result of move semantics. For instance, there’s vector push_back. This is an overload set. This fits and slightly
expands our definition of these things have the same semantics and I don’t need to know
which of these is called. At the call site, the
user doesn’t have to care whether it’s the Lvalue or
Rvalue version of push_back. At most, they need to watch
out for use after move, but that’s true irrespective
of what API you’re calling. You always need to watch
out for use after move. This also helps flesh out what we mean by the same semantics. It is the same post
condition on the vector, not necessarily for the
T that was passed in. However, we don’t actually really care about the post condition on the T, because the T is either
const ref, not being changed, is a temporary, which case we don’t care, or withstd moved and it’s
definitely not our problem, see previous result, right? Does that all jibe? And it’s also worth noting
that the calling code, so long as it obeys this restriction, would be the same behavior,
not the same optimization, if we removed the Rvalue
push_back overload. The semantics for all of our callers are totally the same, right? Nothing’s actually gonna change. So with those sort of examples,
if we can overload on arity we can overload for optimization, we can overload for same types-ish, same platonic notion of
types like string-ish data. Let’s look at the overload set guidance. We can say properties
of a good overload set, you can judge the correctness without having to do overload resolution, and I really like the second option here. A single good comment can
describe the full set. For StrCat that comment
would be something like take all the provided arguments, convert them to string
in the default fashion, and return a string formed
by the concatenation of the stringified arguments. For the string thing for Foo it’d be do x on the given
string, whatever that is. For vector push_back it
would be something like add this T to the back of the vector. Right, we don’t need to have a comment on every individual
element of the set, right? It’s probably the case that one comment describing the overload
set as a whole is actually more explanatory of what that
overload set does, right? And probably much clearer for a user. So this pushes some of this squishiness of what is a good overload
set back a little bit onto what is a good comment, which is still squishy, I can’t define it, but I’ll know it when
I’ll see it sort of thing. But practically speaking, nine times out of 10 you
can spot the bad comments when the comment is encouraging you to do overload resolution, right? Is that like, you’ve all
seen comments like that? That is definitely a code smell, right? Does this all make some sense? Good overload set. Any questions? There are mics. Please feel free, I will not be able to
like see you probably, but please feel free to chime in. I would love to hear from you. It’s kind of awkward. When we start consciously
treating overload sets as the basic unit of design, then we start seeing them
in other places, right? The most important overload set of all is one that we’ve discussed a
lot over the last few years, but usually not specifically in terms of it being an overload set. Any guesses? Copy versus move. I really, really like the
formulation of copy and move as an overload set. This actually has huge
conceptual ramifications when we reconceptualize along those lines. The type trait for movable
isn’t stupid anymore. It’s always really bothered
me that is move constructable didn’t really you whether
it actually moved. It only told you if syntactically
you could construct it from an Rvalue, right? That was like, ehhhhh. Whereas now when you recognize that move and copy are an overload set, all that actually matters
is that you can construct from a temporary, you can construct from an Rvalue, right? ‘Cause they’re an overload set. It doesn’t matter which one you pick. It is up to the type author in that model to ensure that move is efficient whenever it’s plausible, right? It is up to the user to
ensure that move is used wherever it’s relevant or important. And the user doesn’t
need to know if a type has a move constructor because you don’t need to know which member of
the overload set is chosen. This also requires that the
semantics of copy versus move must be the same, at least with
respect to the destination. The type, the object being constructed. This matches the way
that the standard library is behaving more and more. This matches the way that
concepts for the standard library is being defined. This matches the behavior for papers that I’ve been writing about what the standard library promises. More on that later in the week. Move is an optimization of copy is what I’ve been saying for a few years, but I think the better way
to phrase it is move and copy must be a well-designed overload set. Does that make sense? Interesting, explicitly
conceptualizing everything, even constructors like move and copy, as an overload set gives us some guidance on things like explicit. When you view your constructors
as an overload set, then you start having a better idea of when explicit applies. Does a user need to know
which constructor was picked? If so, make that constructor explicit. Viewing it another way, copy and move are the canonical constructors that at least take parameters. We know they’re semantics. They take a T and they make a
new T that’s like the given T. That’s the canonical constructor behavior. But if your constructor doesn’t take T but takes some other type or
maybe some other types, right? If you’d usually be
comfortable passing T and U, Foo and Bar, const string
ref and const char* as an overload set, then
you’re probably fine having constructors for
both of those, right? You could have a constructor
that accepts Bar in your Foo. If it would be an acceptable overload set for both Foo and Bar, right? And that’s most commonly
the case when T and U represent the same idea, right? These are two different types of the same sort of canonical data. If it is, on the other hand, merely the case that we can construct a T from some bag of parameters,
but those aren’t basically a T, right, this is vectors constructor that takes a T and a size, right? Okay that is not the same as a vector. That is a recipe for creating a vector. Then your constructor
should be explicit, right? Does that make sense? Any questions? I’ll wait just a second. I find that we wildly, wildly underuse explicit on constructors. And I think the standard
library is as guilty as that as anybody. Like almost all constructors should probably have been tagged explicit, and we kind of screwed that up, but we’re good, okay. All of that said, in overload there’s another really
common pattern that I see, which is people attempting
to use overload sets to enforce certain types of behavior. And my high-level guidance
is don’t use equals delete on a member of an overload set. Is that a question? Is that a question? Nope, all right. So I somewhat regularly see
people try to delete a member of an overload set to enforce
lifetime requirements. Show of hands, anyone seen
someone do this in their code? Yep, a few, yep. Looks like maybe 5% of you. The problem here is that no temporaries versus things that have
the correct lifetime, things that have the lifetime
that I actually require, are not actually synonyms, right? That Venn diagram like
overlaps a little bit, but it’s more disjoint than not. Like generally speaking,
if the lifetime requirement for a parameter to your function is not as long as this function
call, like the default, simple, obvious thing, then it’s gonna be a little
hard to pin down, right? It could be this function must live as long as this function, or this variable must live until the next time you call this function. This variable must live
as long as this object. This variable must live
until this thread completes. This variable must live until
this call backfires, right? All of those things are complicated. Certainly much more complicated than just no temporaries, right? Those are different levels of complexity. Starting asynchronous work
is particularly challenging. And when you do this, the
most common workaround for most people when
they say, like oh, geez, you’ve equals deleted this,
I can’t pass a temporary, I guess I’ll make this not a temporary. Nine times out of 10 they do this by pulling out an automatic variable. And technically now that’s gonna build, but the odds that that,
the lifetime requirement of that automatic variable
actually matched the requirement of your kooky API are pretty slim, right? In practice, your API that
is kicking off async work or storing a reference, right, is going to require you to
have a pretty detailed comment in its API saying exactly
what the lifetime requirement on that reference is. The freeform nature, right,
the arbitrary boundless possible complexity of that requirement, is a whole lot more complicated
than even C++’s type system. Right? Equal’s deleting a thing here
doesn’t solve that problem. All of which to say, the solution to documenting lifetime
requirements on borrowed references is either a, don’t make
it a borrowed reference, or b, document the actual requirement. The type system like this
cannot do it for you. If you want to equals delete
it on top of that documentation I guess that’s fine. But it’s a half measure at best. It’s a quarter measure at best, and it gets really messy
and it’s misleading, and it’s a false sense of security. And I would not accept it in code review, but I guess your mileage may vary. Good? (coughing)
Excuse me. There’s also using equals delete or just (mumbles) a function from an overload set in some cases, in order to force the user
to use the move version of a function instead of the
copy version of a function. And in simple cases,
maybe even in most cases, that looks fine. That could be fine. But in general you don’t
know all of the ways that your API is going to be used. That is fundamental to the whole business or providing an API in the first place. While it might be the case that you know that many invocations of your function should be done via move, not copy, you can’t know that for everything, right? If I wanna do two separate scans on a slightly modified chunk of DNA, it’s less efficient to
call this on temporaries ’cause I have to do
the modification twice. And if you make me contort my code so that I only do the modification ones but can’t call your move only interface, I can do that, of course, but it is a little bit more awkward. My point being, for functions
you can’t really know that nobody ever is going
to need the copy API. And when you provide it as an option, the calling code is certainly simpler. Sort of at a very high
level, don’t be judgy, right? You don’t know all of the ways that your code is going to be used, be accepting. If you do somehow know that
copies must never ever happen, that is almost certainly
a property of the type, not of the function that you’re
passing that data to, right? Make a DNA class in this example. So if you’re that worried about
accidental copies adding up, make it a separate class,
don’t use string, right? And then probably still make it copyable, just with some special name, all right? Maybe make it more explicit and hard to trigger accidentally. I’ve sort of snuck in here
a pass by value design sink. Here DNAScan is accepting
a string, the DNA, which is presumably a very large string, by value, ooohh. Other things like vector
push_back from earlier do this as an overload set. Which one is right? That is to say, is vector’s push_back a well-designed overload set? Should everyone always be doing that when you’re accepting a value to sink? Or is just accepting a value fine? And there’s been a lot
of discussion on this. In fact, I think one of Herb’s keynotes the first year or two of this
conference had a long section that was touching on a
lot of the same things. Spent a lot of discussion
and partial guidance on this, and a lot of that guidance
does not agree, right? And to some extent that is because there are a lot of possible questions, a lot of different scenarios that you might be optimizing for. So really we should be
asking some questions before we try to come up with perfect, all-encompassing guidance. And the questions that you might think that you might need to be asking, is this a generic or am I
sinking a particular type, right? In the DNAScan example,
I’m sinking exactly string. That gives me some knowledge. Or I might be sinking
exactly DNA strand, right? And that gives me knowledge about probably the relative costs of copying and moving versus what the function
I’m about to do is. Is it a question? Although it might be good to, I can try to repeat, but either way. – [Man] Can you clarify the
sink, what you mean by sink? – Can I clarify what I mean by sink? So there are a lot of functions where you pass a value in and
it’s read and returned to you and nothing else, right? That’s sort of normal. Then there’s things like vector push_back, which is a sink. It’s passed in and
copied and stored, right? So for any function that
you are accepting the input to then copy either for storage
in the very common case, usually like vector push_back, or even in some cases I’m accepting it in order to make a copy
that I’m gonna mutate. So, you could imagine a silly function that is print everything
capitalized, right? Which might accept a string
and need to make a copy of it so that it can capitalize it
before it prints it, right? So sometimes you might have either storage or I just need a copy. Does that capture it? Yeah. Yeah, and there’s also the question of relating to whether it’s generic or not. How expensive is the function compared to a copy or a move of that
type or those types, right? And if it’s generic, like
vector push_back, right, all you’re actually doing is making a copy or doing a move, right? There’s basically no
overhead above and beyond the cost of copy or move. For something like DNAScan, right, if I need to sink it I’m
probably also going to do a whole bunch of work on it. And that whole bunch
of work, it’s probably much more expensive
than a move on a string or a move on a DNA snippet, right? So you need to maybe weigh
those things a little bit. But there are more questions. There’s the question of are
there multiple parameters that are being sunk, right? If I need to sink two or
three or four parameters then the cross product
of const ref and ref ref for all of those parameters means I have a combinatorial explosion of
elements in my overload set. And that just might
not turn out to be fun. Certainly not fun to maintain. There’s a question of over time
as I maintain this library, as I maintain this code base, do I know that this is always going to be a sink of exactly T or do I
just want T-ish things, right? In the case of accepting strings, you might if you have a
lot of not actually strings in your code base but things
that convert to string_view, you might make your sink in
terms of string_view instead so that there’s one
clear conversion point. And then there’s the question
which I think Herb raised in his keynote a couple years ago of can allocation reuse dominate? And this is a case where if I have a type who has a member variable
like a log or something, then it could be the case that
as I pend data to that log, maybe it’s a string, it may
have to resize and reallocate as I append more data to it. And if I sink a new log,
sink a new string into place, the allocation of the old one is lost when I move the new one in, right? And if I continue growing again then I’m gonna have to do all of that reallocation over and over again. That seems like a fairly rare case, but is not by any means unheard of. So it is actually a thing that you might actually have to consider, like when you’re deciding how to accept your sink parameters. There may even be other questions above and beyond this five, but I think that’s a
reasonably complete set and already very complicated. But I will throw out the following sort of very general guidance. I would probably personally
provide this as the guidelines. You probably want the overload
set of constr ref and ref ref if the implementation of
your function is small compared to move constructing a T. It is a little bit more complex. It is worse error messages, it is worse compilation performance, and it is probably too much of a pain if you have multiple parameters. Right, there’s that
combinatorial explosion. You could sink by value
if the implementation is largely, larger cost
than move constructing a T. Like in the DNAScan example, I’m about to walk through everything in it, DNA snippet, right? That is bonkers more expensive
than moving a string. But it also does constrain
you a little bit. You want that to continue to be a T and exactly a T for all time. You don’t want conversions in there. And then there’s const T ref is actually never a terrible choice if
you don’t know the answers to these questions because it’s simple and it gives you flexibility. It’s well understd, right? It’s hard to get wrong. That’s how I would simplify that. It’s also worth noting that this gets a little more complicated
if you’re dealing with strong exception types
and sinks that may throw if DNAScan may throw and DNA needs to be strong exception safe, then you have an additional
set of constraints. Practically speaking sinks don’t usually throw except for allocation. If exception safety is
your primary concern you may have to reevaluate
this a little bit. Mostly don’t pass by value for types that are strong exception safe. When I’m talking about non-sink overloads historically I find that we’re
talking about const char* and const string&, I
mentioned that a little bit, these tend to have a similar look. In modern code we tend to
replace that overload set with string_view. And once we start
talking about string_view as the string like parameter type, then we start looking at other common non-owning parameter types, like span, these have unusual designs,
there are sharp edges. There was a whole talk on
that already this morning. Span even leads us to
a bigger can of worms, because unlike string view, like string view does one
thing, it is character data span tries to be a general,
any contiguous range of type T, but there are lots of
contiguous ranges of almost T that you’re reasonably
likely to work with. For instance, there’s pointers
versus smart pointers. We can easily publish guidance to say, don’t pass smart pointers
by const ref in general. If you wanna pass a pointer not the ownership wrapping information. So we get it ingrained, don’t do this, like identify this in code review. Suggest const T* or even const T ref. But types don’t actually decompose, right? A vector of unique pointer
T is not convertible to a vector of T*. And if you’ve got a
vector of owned pointers and need to invoke a
function with a vector of T*, there just isn’t a good way to do that. So modeling based on span, it is not hard to imagine producing a more
generic span of T-ish things. I’ve seen this in my code base as AnySpan, which I don’t love the name, but I do increasingly like the type. It effectively type erases
a contiguous container of things that can be
converted to T* or T ref in a fairly clear fashion. And we can go further and
further down that rabbit hole. Maybe it doesn’t need to be contiguous. Maybe it’s just some form of range. Maybe we can do this for
associative containers and we have a map view or a set view. Stepping back a little
bit, C++ is a language that is all about types, more so than basically anything else. Overloads for non-owning
reference parameters like string_view and span and AnySpan, are about getting closer to duck typing, in terms of what types are accepted, which, give me anything
that looks like a duck and quacks like a duck, and
I will use it as a duck. Bjarne was talking about this in the keynote this morning with concepts. It’s a language approach
to a very similar problem. And those are the two main conventions that are emerging in this space, right? We can, in the library, build non-owning reference
parameter types like these or we move to more generic
code and use concepts. And when it comes to that question of which of these will emerge, I don’t think the community
has enough experience to provide particularly deep guidance yet. My suspicion is that this will come down to whether the library of
types like string_view and span are found to be sufficiently expressive. If the library providers of the world build a rich set of such types, we’ll probably go that way. This approach has a headstart, after all. If we invest a similar effort in concepts and, important, we find
only a comparable set of sharp edges for concept
usage versus view usage, then it may be that
concepts comes to dominate. That is a pretty significant shift and with a lot of unknowns. And it’s unclear yet
whether everyday programmers can write in a generic
and duck typed fashion efficiently and safely. We will see how that turns out, but interestingly we already have a trial of that happening right
now without concepts in the form of callables, std function. Even without concepts we
can write fairly reasonably, something that takes in a callable in either a library or a language fashion. Both of these have their uses, but I think when we’re
writing everyday code, most of us are going to
reach for the library form. And that seems telling. Until we have erasure
and storage for concepts, I think we’re probably going to reach for the library solution. If I had to guess about the future, I’m gonna guess that we’ll
devote a fair amount of energy to both approaches and we’ll
wind up with a powerful, very useful set of concepts
and then those will be type erased and provided as library, like with library types that wrap them. And most user code will
deal in those library types. It’s just a guess. Even still, std function is a
little unusual in this class of type erased parameter type. ‘Cause when we compare to
string_view or other view types, std function is simpler in a
couple very important ways. First, it’s only erasing one thing, right? If I accept the std function I’m accepting one callable thing. Nearly every other commonly
discussed type erased type is erasing a collection of things. String_view and span erase
contiguous sequences. AnySpan it erases a contiguous
sequence of not quite T. Map view or set view
erased the ordering details of some associative container,
et cetera, et cetera. When you’re type erasing a single thing it is much easier as std function does to make that an owning type. A type where you can copy it
and not have any requirement that the original outlive the copy. When we do type eraser over a container, on the other hand, over a collection, then we generally don’t
want to actually copy all of the things in that container. And we rapidly wind up with types that are very, very easy
to make them dangle. And then we get two
big schools of thought. We can have non-owning
reference parameter types only as parameters, right? Have string view only as a parameter type, never use it anywhere else. And this school of thought will say, non-owning reference
parameter types are okay as long as they’re only
function parameters. And then there’s the
use with caution school of use non-owning reference
parameter types just carefully. Like, yep, there’s sharp edges there. Just stay away from the pointy bits. Always question storage of any such type. There is also a third school of thought that these types are
all completely garbage and too hard to use and we
should throw it away entirely. I don’t see that happening, but I have been surprised before. Even with just these two options we have, as a community, a
difficult choice to make, especially in a language
with such lofty goals as do not pay for what you do not use. Because there are absolutely use cases for non-owning types like string_view above and beyond just as a parameter. Consider your file name processing. You could imagine a
path processing function that takes a string view for the path and returns a view into it for the suffix or the file name or the directory. But note that we’re looking at this. Using string_view on input here means that we don’t have to
overload on char* and string ref and whatever user provided types might be contiguous and useful. String_view does all of
that overload work for us. That’s the point of vocabulary types. That non-owning parameter
type as a replacement for an overload set is very powerful and it is why we are talking
about this right now. We could make this design a little bit more palatable to some people by changing that return value to string instead of string_view, but forcing a copy there and
changing that return type feels a little awkward, especially if this wasn’t
suffix but was directory, right? If you deal in very long file names, those might start to
actually be large copies. That might start to add up. Don’t pay for what you don’t
use, that’s C++, all right? If you can use this
style of design safely, that sounds like a very C++ thing. But it is awfully easy to misuse. Take a glance at this slide. Half of these are bugs, and they are awfully close neighbors to code that works just fine. All of which is to say if
we continue to build views and other non-owning reference parameters, there’s going to be a tension here. I think that the basic language, like design and evolution principles, are gonna say yes, it’s
fine to use these carefully. If a user hurts themselves
on that sharp edge, that’s on them. But we’re definitely going to see a lot of very caring
people offering guidance like never use these except as a parameter or even never use these at all. And that is a hard tension width. These are going to be
the most efficient ways to express that overload set, or instance. Personally I’ve been using string_view for quite a while and I
find it pretty easy to spot questionable use in code review. Anytime that it is used as
anything other than a parameter I ask why do we know
that the underlying data will live longer than this view. That does not work so great if you are an almost always auto person, sorry. But this is all sort of a long tangent on doing type erasure for parameter types and duck typing and a library form. There’s open questions here. We’ll see how this all plays out. But we need to pop back
the stack a little bit. We’re done looking at
overloading on parameters or producing parameter types that hide that overload set for you, and instead we’re going to look at the other important
dimension for overload sets, which is method qualifiers. This is a really important
variation on overloads. You can overload member functions based on method qualifiers, either ref qualified or const qualified. Overloads that vary in const qualification tend to be of the form
access this underlying data in a const appropriate fashion, right? You see this in vectors,
operators, square brackets, right? If you have a const vector
you get a const T ref. If you have non-const vector
you get a non-const T ref. All right? Simple, easy, obvious overload set. Overloads that vary on ref qualification tend to be about optimization. You can do one thing safely,
the Lvalue qualified version, and if we know that we are
operating on a temporary, or operating on an Rvalue, we can more aggressively optimize by leaving the object as a whole in that dreaded, valid,
but unspecified state. So for example, in C++
20 the string buf type will gain an overload for str, a ref qualified overload for str. Here a ref qualified overload means steal. So you can change your code
that is returning buf.str, which has to copy out of the buffer, to return std move of buf.str to say, I’m done with this buffer
and because I’m done with this buffer you don’t
have to copy that string out, you can move that string out. When we use a pattern like this, we don’t need to worry about scary naming for destructive member operations. With just consistency
with higher level rules don’t operate on moved from objects does all of the warning
that we need to do, right? That’s very handy, like you rely on existing user experience, and understanding and forming these performance when
available overload sets is also a nice way to
be future compatible. We can all start writing this
return statement right now. It doesn’t hurt anything and it expresses a reasonable intent, I’m done with this buffer. When the underlying
standard library catches up, it’ll just optimize a
little better, right? So a future compatible design,
that’s always nice to see. When we combine const and
reference qualifier overloads, we can keep const correctness
and provide good optimization like in the case of optionals value. These types of overload sets still meet our general definition
for good overload set. A user does not need to
know which one is called. A single comment can describe
probably more clearly, the behavior of the whole overload set without having comments for
each member individually. While we’re here we
should talk a little bit about method qualifiers on
their own without the aid of an overload set. So what do ref qualified
methods mean when not part of an overload set and
what do const methods mean? If you’ve got nothing but an
Rvalue-ref qualified function that means to do once. This is a great design
for destructive operations and things like call this
function at most once. It should only be used, however,
when the Lvalue equivalent semantic would be buggy or
break the design of your type, not just because of inefficiency, right? This goes back to the
don’t equals delete things just because you’re being judgy. It’s perfectly reasonable
for me to provide only the Rvalue version here because the whole type is
this is a one-time callable. On the flip side, Lvalue
qualifying a function says don’t do this on temporaries. This comes up almost never,
outside of overload sets, but it does have one case
that I have been seeing, which is we should maybe
be Lvalue qualifying our assignment operators in general. Like you can currently
assign to a temporary of most user-defined types. You currently cannot assign
to a temporary of an int. Like we are not doing as ints do. But if you ref qualify it like this, then the compiler will
catch that that assignment is probably nonsense
and not what you meant. And practice, I don’t think I’ve ever actually encountered
that bug in real code, I don’t actually care that much, but from a design consistency perspective that’s maybe an actual use
case and it sort of expresses what the intent is. Moving away from references, what do we really mean when
we const qualify a thing? Hypothetically if we marked
every method as const and every member as mutable, this class builds just fine. But this is going to be an absolutely rotten type to work with. Const should mean const. But there are types that
have mutable members, and those aren’t actually a problem. But there’s some question,
there’s some connection there. How do we use const and
mutable well in design? And I suspect that there are
a couple ways to view this, but the one that has
given me the most mileage is the tie between const
methods, mutable members, and thread safety. The standard has some
things to say about this. It says it in a very obtuse fashion. I’m 95% sure that’s the right citation. If you squint it talks about read access, write access, modification,
and const arguments. According to the person
that claims responsibility for that wording, it’s horrible wording, but the intent is roughly this. Const accesses to standard
types do not cause data races. Standard types are thread compatible unless otherwise specified. Here we have to define thread compatible as a very hand wavy definition,
concurrent invocation of const methods on this
type do not cause data races. Any mutations of an instance of this type means that all of accesses
require external synchronization, as opposed to thread safe
where concurrent invocations of const or noon-const methods
do not cause data races. That’s mostly things like mutex. There is, of course, also a
thread unsafe classification, but you should just not do that. More on that in the next talk. It’s interesting to note
that if you build your types out of thread compatible
or thread safe types, and you don’t use the mutable keyword for your member variables, then you’re probably thread
compatible right outta the box. There are some scenarios where
pointers are shared around and that isn’t quite true. But more on that in the next talk. In this model of things
const is less about I am changing the internal values and more about it is safe to
call this method concurrently. And with that model of
things we can see at a glance that this class is thread
unsafe unless response is inherently thread safe. And usually what such a
design requires is a Mutex. But what just happened? We started talking about
properties of types, which means that we’re
finally ready to move on from low-level API design and
talk about higher level stuff. But it is also important to
note that this is a bridge. There is a bridge between these domains. Cons is both a promise about your values and a promise about the
ways that it is safe for your type to interact
with the rest of the program. And that makes that a
topic for the next talk. And we have lots of time for questions. I will leave this up to jog your memory. There are microphones in both places. – [Man] So you were talking
about the qualifiers on methods. I’m not sure I understand
the meaning of a const Rvalue reference type method. – Yes, the optional value overload set has a const ref
ref in it’s overload set. And I am 95% sure that that is only there so that it works nicely
in generic contexts, but like semantically it
doesn’t mean anything. – [Man] Okay, so I’m not crazy that it sounds meaningless.
– Yeah, you’re not crazy. Yeah, it is, the first time that I took a good hard look at it
I’m like, wait, wha? Huh? Yeah, you know you’re well spotted. Yeah. – [Man] Kind of in the same
vein with ref qualified members, you talked about the star
member of string buf. And you talked about how interactive with the guidance that we
not use moved from objects. Now imagine that we had a
type that was like string buf, but it had separate buffers
for input and output, and had ref qualified members that allowed us to
retrieve either of those. If we follow the guidance
not use moved from members and ref qualified them
both we could extract one or the other but not both
in a destructive manner. What are you’re thoughts
on that kind of API design there a type is safe to be
used after it’s moved from so that you can extract other
members from it destructively. – I think it would be
really hard to express the, I think it would be really difficult for that to actually play out in practice because the move constructor, no that’s not quite right. I would be deeply skeptical to start with because the very, very
high level principle is don’t touch it after you’ve
called std move on it. Right? Except in very, very unusual circumstances that you do not want to get into. And so I think you would
probably be better off with some other naming
for those types of things, and I haven’t actually seen
a whole lot of value types where there’s multiple logical parts to it that you would want to be consuming. I think, perhaps, a more
accurate thing would be that you wanted you wanted an accessor for the input and the output individually that you could steal from. And then it would be a
std move on that member, but you’d have to sort of
make that member public, and I don’t know, it’s gonna
be kind of a weird type. – [Man] Thank you.
– Yeah. – [Audience Member]
Actually, in the same vein of move from types, I
guess you’re saying that advice is to never touch
a move from object. There are cases definitely with
the standard library objects where you potentially could reuse them with certain constraints. Like if you build up a
vector that’s a member, and you, once it’s built
up to a certain point, you can move out those values
but then start building up your vector fresh again, as opposed to having a
unique pointer to it. I mean, do you see the
standard keeping that kind of generic advice or do you see certain standard types providing slightly stronger guarantees about what you can do with move from objects? – I mean, you will always be able to, in the next talk we will talk
a lot about the precondition, like preconditions expressed
on the APIs of a type. And you will always be
able to call any function that has no precondition
after it has been moved from. Whether you should is an
entirely different story, right? It is definitely well
understood that when you move from a unique pointer now
it is definitely null, and so you could call reset on it and when you move from a vector you don’t know what’s in
it anymore and we all- – [Audience Member] And were
assigned to it or something. – Right, you could assign to
it, you could call clear on it. You could ask it its size, right? But you should not make any assumptions about data being there or not. But practically speaking, the
likelihood of encountering a scenario where the clearest
way to write your code actually has you reusing that
zombie husk seems rare, right? And you’re probably better
off not causing the wait wha, of your reusing it after move. Like, just don’t poke that bear. Like, yeah–
– [Man] I have a personal example which
I’ll talk about later. – Yeah, I mean, yes, technically
speaking, it will work. But there is a higher level
requirement on everyone of don’t produce code that
makes your reader go wha? ‘Cause confusion costs
more than CPU cycles. All right? Over here. – [Man] Would you mind
to show again the slide with the results of function
return in string_view? Taking the string_view
and returning part of it. Yeah, the one with like red and, yeah, it’s beautiful. I’m afraid it’s not very safe. I mean you marked option four as good, and I believe it’s undefined behavior. – No.
– [Man] Your argument of destroyed– – Not, no. – [Man] Your string actually–
– Temporaries are destroyed at semicolons. By the time the string’s
copy instructor runs, actually by the time the string’s, yes, copy constructor, move constructor? Copy constructor, by the time
the copy constructor runs the temporary is still there because the temporary doesn’t
go away until the semicolon. Like, I guarantee this is fine. John.
– [John] Hey Titus. – What’s up? – [John] So you were
talking in the beginning about how overload sets should
define a group of functions that are all semantically
basically the same. – Yep.
– [John] And you were also talking about five minutes ago about how it’s really important
for const to be meaningful and especially in thread’s
safety situations, and there are pretty
commonly used overload sets like operator brackets is like this, which often will overload on const-ness even though giving a read only reference and giving a mutable reference
are really, really different, especially in a thread safety context, but I don’t think anyone in the room would argue that that’s a, that like operator brackets
is somehow completely a broken design like on
a vector or something. So how would that fit into the
advice that you’re giving us? – Well so the advice is
like, at a very high level the advice is it is
probably a good overload set if you can have a single
comment for it, right? And a comment for that
const non-const overload set on vector is give me
the specified T, right? Like give me that object,
maintaining as much const-ness as you can if you wanna be
really wordy about it, right? But like, that is a reasonable definition. – [John] Awesome.
– Yeah. – [John] Thanks.
– Yeah. (audience member yelling) And yes, and I will
pitch Jeff Gromer’s talk on thread compatibility and
thread safety on Thursday. That’s actually in my script in the next part of the talk as well, so everyone that sees both of these will get that pitch twice, but yes, go to Gromer’s talk, it’ll be great. – [Man] I’m not gonna try to
start a debate, I suppose, on edge cases, but I do
have some curiosities regarding perhaps some
guidance you might offer on how access modifiers when
used with different types of constructors and more
importantly non-const references when past functions,
how would you recommend this mechanism as a tool to
prevent implicit conversion from types, particular in my example, I suppose const char* to standard strings, but plenty of times where that has come up with other situations. – I think that actually is
the third bullet point here, make explicit any of your constructors that aren’t an obvious easy overload. Like, I think knowing what we know today, we probably would have made
the const char* constructor for string explicit so that you can spot the fact that oh that
is an expensive copy. – [Man] Sure, you would
encounter the same scenario with assignment operations as well when you’re not dealing with a constructor at that particular point as well, but you’d still end up having to encounter implicit conversion
for the type provided. – I think I lost you, sorry. – [Man] I may just be
blowing hot air I suppose. – No, like, it is, code is very hard, and it is much easier with examples instead of verbally, so
come find me afterwards if you wanna talk. Yeah, I just can’t quite do that one live. Yeah. – [Man] You had a slide about taking sink, from data as sinks and
about taking it as value versus ref versus Rvalue ref, and the guidance was based on sort of relatively complicated
evaluation of whether one operation’s gonna be
more expensive than another. Is there a fundamental
reason why that’s a decision that I have to be making
as an API designer and the compiler can’t decide for me. – The, in this language the compiler
can’t design for you. We have too much legacy stuff, like we can’t change these behaviors. I think in theory it is the sort of thing that might be amenable to optimization, or to automation, but that
would be a mad science project first off in order to figure
that out a little bit, because among other things, like, it’s going to change wildly
if you take a new text. It’s going to change wildly
if you call an RPC, right? Like when you’re sinking a thing, like you need to know
the cost of those things, and not every line of code
is equivalently costly, and trying to teach the compiler, like, which of these things is expensive? That would be a neat trick. So like in the presence of magic, yes in theory that would be cool. And until then it’s gonna
be a little complicated, and I don’t know. Use your best judgment. It’s hard. – [Man] Hello, in a very early slide you had really the general idea
of whether an overload set, you know, if you’re taking in a std string and then it’s a lightweight wrapper around something that
goes to a const char, pointer to a const char, that was a good thing. And then I think at a later point, if I understood you correctly,
you started saying that when you identified people
using const references to std unique pointers, in
coder views you see that as like an issue.
– Oh yeah. – [Man] Is it kind of
like an issue in it’s just a nice wrapper around passing a pointer down to some? – It’s not a wrapper
around a pointer, right? – [Man] No, no, no, I
mean like when you make a, add something to your overload set just to make it easier
for people already using unique pointers to pass
down the raw pointer? – I, no because the operation to
actually extract a raw pointer from a unique pointer is a one, whereas if you only had
a const char* overload, well, no. Yeah, now I see your point. There is a logical inconsistency there. I think it is that it is very common for us in legacy code bases to have char*’s floating around
and strings floating around, and like it’s nice if
you don’t have to know which one it is and
which one to care about. Whereas, passing a unique
pointer by reference, especially by const reference, is just fundamentally a little silly because you’re saying
I can only invoke this if I already have ownership of the thing, but I’m not transferring ownership, right? It’d be like, that would be
an okay function by itself if you had to prove
ownership of an object, which is a weird semantic. I guess strictly speaking,
if it is an overload of T* and const unique pointer ref, I guess strictly speaking that might be okay, but I don’t know. That’s a weird, like, I feel like that’s the wrong result, but I think you might be right. (audience laughing) So, yeah, I don’t know,
I’ll have to think about it. But interesting, yeah. We are strictly out of
time, but I will take Eric. – [Eric] Hi.
– What’s up? – [Eric] Well Titus, I
think I heard you suggest that you recommend
explicit on constructors of more than one argument. – If those constructors
aren’t logically the thing. Like, for any constructor
that is accepting a bag of parameters from
which you can construct, as opposed to the parameters it has are platonically like the same notion as what you are constructing. So maybe in a ranges form two
iterators is a range, right? But in a vector, a T and a
size is not actually a vector. Did I head you off? – [Eric] No I mean it’s
a question I’ve had, because I mean, C++ has
this language speaker and I’ve never known what to do with it. – Yeah, I think by default, by default we should be tagging all the constructors explicit
until you think about it and we have the default
wrong as is often the case. But yeah, like I really think that explicit should be way more common. There’s a tip of the week on that. – [Eric] Okay, thanks.
– So yep and we’re outta time. Thank you all very much and–
(audience clapping)

14 Comments

Add a Comment

Your email address will not be published. Required fields are marked *