关于 c ：我的双重检查锁定模式实现正确吗？

Is my Double-Checked Locking Pattern implementation right?

Meyers 的《Effective Modern C》一书中的一个例子，第 16 条。

in a class caching an expensive-to-compute int, you might try to use a
pair of std::atomic avriables instead of a mutex:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

class Widget {
public:
int magicValue() const {
if (cachedValid) {
return cachedValue;
} else {
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();

cachedValue = va1 + val2;
cacheValid = true;
return cachedValue;
}
}
private:
mutable std::atomic<bool> cacheValid { false };
mutable std::atomic<int> cachedValue;
};

This will work, but sometimes it will work a lot harder than it
should.Consider: A thread calls Widget::magicValue, sees cacheValid as
false, performs the two expensive computations, and assigns their sum
to cachedValud. At that point, a second thread calss
Widget::magicValue, also sees cacheValid as false, and thus carries
out the same expensive computations that the first thread has just
finished.

然后他给出了一个使用互斥锁的解决方案：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

class Widget {
public:
int magicValue() const {
std::lock_guard<std::mutex> guard(m);
if (cacheValid) {
return cachedValue;
} else {
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();

cachedValue = va1 + val2;
cacheValid = true;
return cachedValue;
}
}
private:
mutable std::mutex m;
mutable bool cacheValid { false };
mutable int cachedValue;
};

但我认为解决方案不是那么有效，我考虑将互斥锁和原子结合起来组成一个双重检查锁定模式，如下所示。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

class Widget {
public:
int magicValue() const {
if (!cacheValid) {
std::lock_guard<std::mutex> guard(m);
if (!cacheValid) {
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();

cachedValue = va1 + val2;
cacheValid = true;
}
}
return cachedValue;
}
private:
mutable std::mutex m;
mutable std::atomic<bool> cacheValid { false };
mutable std::atomic<int> cachedValue;
};

因为我是多线程编程的新手，所以想了解一下：

我的代码对吗？
它的性能更好吗？

编辑：

修复了代码。if (!cachedValue) -> if (!cacheValid)

相关讨论

原则上，是的，你是对的。但是，如果我们假设计算花费的时间比设置原子要长得多，那么您的场景的可能性非常低，并且使用原子可以避免昂贵的互斥锁。
我认为您的方法是正确的，并且比前两种方法更好。它只是解释了双重检查锁定模式。
我认为这是不正确的，如果第二个线程在第一个线程评估之后但在防护实例化之前评估 cacheValid。我认为影响可能与第一个例子相同。
如果不查看如何将 cachedValue 设置为 false，就很难判断您的代码是否合法/安全。
@Lingxi 我刚刚修复了代码，
@DavidSchwartz mutable std::atomic<bool> cacheValid { false };。 magicValue 只能在构造对象上调用。为此， cacheValid 必须在此之前已经初始化为 false 。
@HappyCactus 我已经修复了类型错误，抱歉误导
@prehistoricpenguin：cachedValid 变量实际上具有名称 cacheValid。请在您提供的所有代码中修复此问题(甚至在本书的片段中)。
@Tsyvarev 已修复，谢谢

正如HappyCactus所指出的，第二个检查if (!cachedValue)实际上应该是if (!cachedValid)。除了这个错字，我认为您对双重检查锁定模式的演示是正确的。但是，我认为没有必要在 cachedValue 上使用 std::atomic。 cachedValue 写入的唯一位置是 cachedValue = va1 + val2;。在完成之前，没有线程会到达语句 return cachedValue;，这是读取 cachedValue 的唯一位置。因此，写入和读取不可能同时进行。而且并发读取也没有问题。

相关讨论

使其原子化是必要的：cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html。由于标志是原子的，它的写入可以跨处理器同步，而非原子值则不是。所以你可以看到 cacheValid = true 但是 cacheValue = /* 在这个处理器上未初始化 */
@BillyONeal 不。 cacheValid 上使用的内存顺序语义是 memory_order_seq_cst。它建立所需的同步关系。当 cacheValid 被读取 true 时，对 cacheValue 的修改也必须在同一处理器上看到。这实际上就是为什么可以使用原子来订购非原子的原因。

Is my code right?

是的。您应用的双重检查锁定模式是正确的。但请参阅下面的一些改进。

Does it performance better ?

与完全锁定的变体(您的帖子中的第二个)相比，它的性能大多更好，直到 magicValue() 只被调用一次(但即使在这种情况下，性能损失也可以忽略不计)。

与无锁变体(您的帖子中的第一个)相比，您的代码表现出更好的性能，直到值计算比等待互斥锁更快。

例如，10 个值的总和(通常)比等待互斥锁要快。在这种情况下，第一个变体是可取的。另一方面，从文件中读取 10 次比等待互斥体慢，所以你的变体比第一次好。

实际上，对您的代码有一些简单的改进，可以使其更快(至少在某些机器上)并提高对代码的理解：

cachedValue 变量根本不需要原子语义。它受 cacheValid 标志保护，原子性完成所有工作。此外，单个原子标志可以保护多个非原子值。

此外，如该答案 https://stackoverflow.com/a/30049946/3440745 中所述，当访问 cacheValid 标志时，您不需要顺序一致性顺序(当您阅读时默认应用或写原子变量)，释放-获取顺序就足够了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

class Widget {
public:
int magicValue() const {
//’Acquire’ semantic when read flag.
if (!cacheValid.load(std::memory_order_acquire)) {
std::lock_guard<std::mutex> guard(m);
// Reading flag under mutex locked doesn’t require any memory order.
if (!cacheValid.load(std::memory_order_relaxed)) {
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();

cachedValue = va1 + val2;
// ‘Release’ semantic when write flag
cacheValid.store(true, std::memory_order_release);
}
}
return cachedValue;
}
private:
mutable std::mutex m;
mutable std::atomic<bool> cacheValid { false };
mutable int cachedValue; // Atomic isn’t needed here.
};

相关讨论

cachedValue 在这个解决方案中不需要是原子的，因为写入它受到互斥锁的保护。在 Meyers\\’ 书中的代码(以及下面 Maxim Egorushkin 建议的解决方案中)中，没有互斥锁阻止多个线程同时分配给 cachedValue (所有这些线程都会看到 cacheValid 值为 false在函数的顶部)。在这种情况下，重要的是 cachedValue 是原子的，因为否则同时写入它会产生未定义的行为。
@KnowItAllWannabe： cachedValue 在此解决方案中已经不是原子的。如果您的意思是 cacheValid，它应该是原子的，因为它(第一次)被检查出互斥保护这是 DCLP 的主要特征，该标志应该是 volatile*(在该术语与线程相关的语言中，例如java) 或 *atomic.

我的评论就是：评论。我同意给定的实现很好(并且这么说)，但是在代码上方的第 1 点中，您写道 “cachedValue 变量根本不需要原子语义[因为]它受 <x1 > 标志。” 那不是真的。 cachedValue 不必是原子的，因为它受互斥体保护。正如我在下面对 Maxim Egorushkin 答案的评论中指出的那样，如果没有互斥锁，则 cacheValid 和 cachedValue 都必须是原子的。
@KnowItAllWannabe：表达式 return cachedValue; 不受互斥锁保护，因此不能说 cachedValue 变量受互斥锁保护。互斥锁只保护一些访问，并防止并发写入。
return 语句是读取。同时读取不是问题，因此不需要保护。
@KnowItAllWannabe：不，只有当对该变量的所有访问都在互斥锁锁定的情况下执行时，才说对该变量的访问受到互斥锁的保护。如果没有对 cacheValid 变量的适当访问，给定的示例将在 cachedValue 变量上运行(并且损坏)。

您可以通过降低内存排序要求来稍微提高解决方案的效率。这里不需要原子操作的默认顺序一致性内存顺序。

性能差异在 x86 上可能可以忽略不计，但在 ARM 上很明显，因为顺序一致性内存顺序在 ARM 上很昂贵。有关详细信息，请参阅 Herb Sutter 的”强”和”弱”硬件内存模型。

建议更改：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

class Widget {
public:
int magicValue() const {
if (cachedValid.load(std::memory_order_acquire)) { // Acquire semantics.
return cachedValue;
} else {
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();

cachedValue = va1 + val2; // Non-atomic write.

// Release semantics.
// Prevents compiler and CPU store reordering.
// Makes this and preceding stores by this thread visible to other threads.
cachedValid.store(true, std::memory_order_release);
return cachedValue;
}
}
private:
mutable std::atomic<bool> cacheValid { false };
mutable int cachedValue; // Non-atomic.
};

相关讨论

您是否考虑过这种情况：一个线程调用 Widget::magicValue，将 cacheValid 视为 false，执行两个昂贵的计算，并将它们的总和分配给 cachedValud。此时，第二个线程调用 Widget::magicValue，也将 cacheValid 视为 false，因此执行与第一个线程刚刚完成的相同的昂贵计算。
@prehistoricpenguin 如果计算 cachedValue 只能执行一次，您可能喜欢使用互斥锁。
否决，因为如果两个线程将 cacheValid 读取为 false，则两者都将计算 cachedValue，并且如果两个线程同时对 cachedValue 进行赋值，则您有两个线程同时写入非原子，这会导致未定义的行为。
@KnowItAllWannabe：对于所有已知的 C 实现，同时将相同的值写入变量是可以的。可能，C 标准也注意到了这一点，但我不记得它在哪里这样做了。
@Tsyvarev：据我所知，这是未定义的行为，本文中有一个示例说明它如何出错。
@KnowItAllWannabe：第 2.4 节。您所引用的论文中说：”在任何传统架构上似乎都不太可能实现存储，以便重写相同的位会导致最终值不同于任何一个写入值”。所有其他示例均不适用于给定代码，因为它在写入之前不访问 cachedValue。唯一的例外是”注释 6″，它讲述了 SPARC 的”块初始化存储”指令。
@Tsyvarev：就未定义的行为而言，唯一重要的文档是 C 标准。据我所知，同时写入单个内存位置总是会产生未定义的行为，即使两个写入的值相同。我引用的那篇论文简单地解释了可以解释 UB 是如何出现的情况——甚至是理论上的情况。您是否知道标准中的某些内容，如果两个写入具有相同的值，则对单个内存位置的非同步写入会产生明确定义的行为？

不正确：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

int magicValue() const {
if (!cachedValid) {

// this part is unprotected, what if a second thread evaluates
// the previous test when this first is here? it behaves
// exactly like in the first example.

std::lock_guard<std::mutex> guard(m);
if (!cachedValue) {
auto val1 = expensiveComputation1();
auto val2 = expensiveComputation2();