Introduction

If exactly one instance of some class is needed by your program, a singleton is often used as a seemingly reasonable solution. However, singletons are an example of an anti-pattern and they should generally be avoided except for either trivial programs or in specific cases. (More on this later.) But, if you are going to use them, at least implement them correctly — especially since there are unfortunately several bad ways to implement them.

A singleton has exactly two requirements:

It gets constructed before its first use.
It doesn’t get destructed until after its last use.

Unfortunately, those are harder to do than they might seem.

Basic Interface

At its simplest, a singleton’s interface looks like this:

class singleton {
    singleton();
public:
    static singleton* instance();
};

That is, its constructor is private so the only way to obtain an instance is via the public instance() function. That seems pretty simple, but the devil is in the details of the implementation.

Bad Implementation

The first seemingly good (yet bad) implementation is:

singleton* singleton::instance() {
    static singleton obj;
    return &obj;
}

It’s seemingly good because:

It uses a function-local static object that avoids the static initialization order fiasco.
It’s thread-safe. That is, if two threads call instance() at approximately the same time, it is guaranteed (in C++11 and later) that the constructor will:
- Run exactly once; and:
- The function will not return (in any thread) until the constructor completes.

Unfortunately, this implementation doesn’t avoid the static deinitialization order fiasco.

A non-trivial C++ program comprises multiple translation units (TUs). Just as initialization of global objects in those TUs happens in an unspecified order, deinitialization happens in the reverse unspecified order (the order in which destructors run). If ~singleton() in TU 1 runs before the destructor in TU 2 that’s still using the singleton, boom: you’re dead.

Better Implementation

An implementation that fixes this is:

singleton* singleton::instance() {
    static singleton *const obj = new singleton{};
    return obj;
}

That is, the function-local static object is changed to a pointer to said object. The caveat is that its destructor will never run. If this doesn’t actually matter for your use-case, this implementation is fine; however, if you need its destructor to run (for example, to flush a file buffer), then you need to do something else.

Nifty Counter

If you need the destructor of your singleton to run after its last use, you can use a Nifty Counter.

The trick is to add a distinct static object into every TU in your program that increments the counter upon construction and decrements it upon destruction. When the counter reaches 0, it destroys the singleton. The way to get a distinct counter into every TU is to define it in the header:

// Stream.h

struct Stream {
    Stream();
    ~Stream();
};
extern Stream &stream;  // global stream object

static struct StreamInitializer {
    StreamInitializer();
    ~StreamInitializer();
} streamInitializer;

This is similar to how the standard iostreams of cin, cout, and cerr are initialized.

Notice that:

The global stream “object” is declared as just a reference. (The actual object is defined in the .cpp file.)
Even though every TU defines an object named streamInitializer, they’re all distinct because they’re declared static.

The corresponding .cpp file is:

// Stream.cpp

static size_t nifty_counter;
alignas(Stream) static std::byte const stream_buf[ sizeof(Stream) ];
Stream &stream = reinterpret_cast<Stream&>( stream_buf );

Stream::Stream()  { /* ... */ }
Stream::~Stream() { /* ... */ }

StreamInitializer::StreamInitializer() {
    if ( nifty_counter++ == 0 )
        new ( &stream ) Stream{};
}

StreamInitializer::~StreamInitializer() {
    if ( --nifty_counter == 0 )
        stream.~Stream();
}

The code shown above differs slightly from the original code. It’s been updated not to use std::aligned_storage that’s been deprecated in C++23 and to use the recommended technique instead.

Notice that:

stream_buf is the aligned raw memory for the Stream object. We need raw memory because we want to control exactly when the Stream object is constructed and destructed.
stream just refers to the raw memory.
The StreamInitializer constructor uses placement new to construct the actual Stream object in stream_buf.
~StreamInitializer calls ~Stream explicitly on the Stream object in stream_buf to return it to being raw memory.

This works well enough (except I think there are two issues with it — more on these later), but it might be nice if this technique were generalized so that:

You can have any number of such singleton objects. (While that seems contradictory, remember the example of cin, cout, and cerr.) This requires one counter for each.
You can easily make any class be a singleton.

A Generalized Nifty Counter Library

A generalized implementation requires three parts for each singleton:

A nifty counter.
A raw memory buffer for singleton object storage.
An initializer/deinitializer.

A Generalized Nifty Counter

A generalized nifty counter starts out with:

class singleton_counter;

template<typename T>
concept Singleton = std::is_base_of_v<singleton_counter,T>;

That is:

singleton_counter shall be the base class for all singletons.
Singleton is a concept to enforce that: for a type T, T is a singleton only if it’s derived from singleton_counter.

The singleton_counter is:

class singleton_counter {
protected:
    singleton_counter() { }
private:
    std::atomic<size_t> _ref_cnt{ 0 };

    void inc_ref() noexcept {
        _ref_cnt.fetch_add( 1, std::memory_order_relaxed );
    }

    bool dec_ref_is_last() noexcept {
        return _ref_cnt.fetch_sub( 1, std::memory_order_acq_rel ) == 1;
    }

    template<Singleton T>
    friend class singleton_init;

    singleton_counter( singleton_counter const& ) = delete;
    singleton_counter( singleton_counter&& ) = delete;
    singleton_counter& operator=( singleton_counter const& ) = delete;
    singleton_counter& operator=( singleton_counter&& ) = delete;
};

It contains:

A protected default constructor since only derived classes should be allowed to construct it.
An atomic size_t for the counter. (The original nifty_counter not being atomic is one of the two issues I mentioned earlier — more later.)
An inc_ref() function that increments the reference count.
A dec_ref_is_last() function that decrements the reference count and returns true only if the count is 1 (meaning it’s the last surviving instance).
Declares singleton_init (the singleton initializer — more later) to be a friend so that it can call the member functions.

It also deletes copy and move constructors and assignment operators since this is for a singleton, so it should never be copied or moved.

See Appendix 1 at the end of this article for why memory_order_relaxed and memory_order_acq_rel were used. To explain it now would be too much of a digression into the weeds.

A Generalized Raw Memory Buffer

A generalized raw memory buffer for singleton storage is:

template<Singleton T>
class alignas(T) singleton_buf {
public:
    constexpr singleton_buf() : _buf{ 0 } { }

    constexpr T& ref() noexcept {
        return reinterpret_cast<T&>( _buf );
    }
private:
    char _buf[ sizeof(T) ];

    singleton_buf( singleton_buf const& ) = delete;
    singleton_buf( singleton_buf&& ) = delete;
    singleton_buf& operator=( singleton_buf const& ) = delete;
    singleton_buf& operator=( singleton_buf&& ) = delete;
};

Notice that:

It contains _buf, the raw memory for the singleton object.
It’s declared with alignas(T) so that it’s suitably aligned.

Like singleton_counter, it also deletes copy and move constructors and assignment operators.

A Generalized Initializer

A generalized initializer is:

template<Singleton T>
class singleton_init {
public:
    explicit singleton_init( T *buf )
        noexcept( noexcept( T{} ) ) : _singleton{ buf }
    {
        static auto const _ = new (buf) T{};
        _singleton->inc_ref();
    }

    ~singleton_init() noexcept {
        if ( _singleton->dec_ref_is_last() )
            _singleton->~T();
    }

private:
    T *const _singleton;

    singleton_init( singleton_init const& ) = delete;
    singleton_init( singleton_init&& ) = delete;
    singleton_init& operator=( singleton_init const& ) = delete;
    singleton_init& operator=( singleton_init&& ) = delete;
};

While the constructor takes a T*, it’s a bit deceiving because it’s initially a pointer to singleton_buf (raw memory) that will become a T shortly, but isn’t a T yet.
The static auto const _ = new (buf) T{}; constructs the T in the raw memory buffer exactly once by virtue of being an initializer to a function-local static variable that’s initialized only the first time the function is called.

It’s named just _ because we don’t use the variable but we have to name it something. We don’t need to use the variable because we already know what its address is: buf.

Unlike the original code, we can’t do something like:
```
if ( _singleton->inc_ref_is_first() )
    new (buf) T{};
```
since we can’t access the counter (that’s now part of the object) because the object hasn’t been constructed yet! Hence, the counter being zero can’t be used to determine whether to construct the object, so it’s now used only to determine whether we should destruct the object.
The _singleton->inc_ref() then increments the reference count of either the brand new or pre-existing fully constructed singleton.
The destructor decrements the reference count: if it’s the last surviving instance, it explicitly calls the destructor.

Like the other classes, it also deletes copy and move constructors and assignment operators.

And there you have a generalized nifty counter implementation.

Retrofitting the Original Example

Retrofitting the new generalized classes onto the Stream example would result in:

// Stream.h

class Stream : private singleton_counter {
    Stream();
    ~Stream();

    template<Singleton T>
    friend class singleton_init;
public:
    // ...
};

extern Stream &stream;
static singleton_init<Stream> const stream_init{ &stream };

// Stream.cpp

static constinit singleton_buf<Stream> stream_buf;
Stream &stream = stream_buf.ref();

Notice that:

Stream uses private inheritance rather than public to embed the nifty counter inside the object as an implementation detail. Users of Stream objects don’t need to know or care.
The constructor and destructor are private since only stream_init should be allowed to construct and destruct Stream objects — which is why singleton_init is declared as a friend.

The original code could have (and should have) made the constructor and destructor private as well.

Issues with the Original Code

I mentioned that there are what I believe to be two issues with the original code:

The fact that nifty_counter isn’t atomic.

The fact that this code:

if ( nifty_counter++ == 0 )
    new ( &stream ) Stream{};

can allow a race condition to happen.

Both of these issues arise from the fact that the C++20 standard (§6.9.3.3¶5) does not guarantee that global objects are initialized before either main() is entered or a thread is spawned.

The first issue can be fixed easily by making nifty_counter be atomic. However, that doesn’t fix the second issue because the counter increment, check for zero, and construction of Stream has to be done as a transaction.

Consider the following sequence of events in the presence of two threads, T1 and T2, and that nifty_counter is 0 to start:

T1 increments nifty_counter to 1, compares its old value (0) to 0, sees that it’s equal, and calls Stream’s constructor that starts running.
T2 increments nifty_counter to 2, compares its old value (1) to 0, sees that it’s not equal, and simply returns.
T2 now begins using stream before its constructor has finished on T1: boom, you’re dead.

To fix the original code, you would need to add a mutex:

std::mutex stream_mux;

StreamInitializer::StreamInitializer() {
    std::lock_guard<std::mutex> lock{ stream_mux };
    if ( nifty_counter++ == 0 )
        new ( &stream ) Stream{};
}

In the generalized implementation, the second issue goes away because it uses a function-local static object that, as mentioned, C++ guarantees is thread-safe.

Singletons are Bad

As mentioned in the introduction, you generally should avoid using singletons except for either trivial programs or in very specific cases. Why?

Singletons (that are really just global variables in prettier packaging) are bad for the same reasons global variables are bad.
It’s sometimes the case that, as a program grows, it turns out that you actually need more than one “singleton” object. Retroactively fixing this is difficult so it would have been better to future-proof your program by not using singletons in the first place.
Singletons make is difficult to test your program. For example, if you have a singleton that accesses a database and uses it like:
```
void insert_record( record const *r ) {
    auto connection = Database::instance()->connect();
    // ...
}
```
but you want to test how your program behaves when the database returns errors, you’d like to mock the database so that it always returns specific errors to test against. That’s very hard to do when you’ve hard-coded Database::instance() throughout your program.

Alternative to Singletons

What’s an alternative to using singletons? Pass pointers to “context” objects (in this example, Database*) around instead:

void insert_record( Database *db, record const *r ) {
    auto connection = db->connect();
    // ...
}

Yes, it’s more verbose to have to pass around a Database* all over the place, but it’s much easier to mock because you can create a DatabaseForTest class that’s derived from Database where all member functions are virtual so you can override them do whatever you need for your test. In the long run, the added verbosity is outweighed by the added flexibility.

Specific Cases for Singletons

So what are the specific cases when using singletons is OK? One such case is when:

The behavior of the singleton in no way affects the behavior of your program; and:
You never test your program in response to what the singleton does.

An easy example are the aforementioned iostreams of cin, cout, and cerr. You just print stuff to cout and cerr. Your program (reasonably) assumes that cout and cerr are always there and “just work”: you never test cout or cerr for failure. Another similar example would be a logging library where it’s reasonable to have something like:

Logger::instance()->info( "hello, world!" );

Conclusion

Singletons, while often convenient, make your code more difficult to maintain and test in the long run and therefore should generally be avoided. But, if you are going to use them, at least implement them correctly using the techniques shown.

Appendix 1: Atomic Memory Orders

The code in singleton_counter includes:

void inc_ref() noexcept {
    _ref_cnt.fetch_add( 1, std::memory_order_relaxed );
}

bool dec_ref_is_last() noexcept {
    return _ref_cnt.fetch_sub( 1, std::memory_order_acq_rel ) == 1;
}

Both fetch_add() and fetch_sub() use memory_order_seq_cst (the safest, but least performant memory_order) by default when the second argument is omitted.

When using an atomic as a reference counter, the more performant memory orders of memory_order_relaxed can be used for fetch_add() and memory_order_acq_rel can be used for fetch_sub(). However, why that’s the case is a story for another time.

Appendix 2: Singleton Constructors with Parameters

The singleton_init class as implemented above contains:

        static auto const _ = new (buf) T{};

But what if T doesn’t have a default constructor? Or, even if it does, you want to pass arguments to a non-default constructor? You can augment singleton_init to take any number of arguments (including zero) for T’s constructor as follows:

template<Singleton T>
class singleton_init {
public:
    template<typename... Args>
    explicit singleton_init( T *buf, Args&&... args )
        noexcept( noexcept( T{ std::forward<Args>( args )... } ) ) :
        _singleton{ buf }
    {
        static auto const _ = new (buf) T{ std::forward<Args>( args )... };
        _singleton->inc_ref();
    }

    // ...

Why didn’t I just show this implementation initially? The added template and use of std::forward() would have been too distracting from the main points.

Singletons in C++

Introduction

Basic Interface

Bad Implementation

Better Implementation

Nifty Counter

A Generalized Nifty Counter Library

A Generalized Nifty Counter

A Generalized Raw Memory Buffer

A Generalized Initializer

Retrofitting the Original Example

Issues with the Original Code

Singletons are Bad

Alternative to Singletons

Specific Cases for Singletons

Conclusion

Appendix 1: Atomic Memory Orders

Appendix 2: Singleton Constructors with Parameters