关于mysql：在多个表上选择最大值，不计算两次

Select max value on multiple tables, without counting them twice

我正在做一个查询，允许我按分数订购食谱。

表结构

结构是Leaflet包含一个或多个flyer_items，其中可以包含一个或多个ingredients_to_flyer_item(此表将成分链接到Leaflet项目)。另一个表 ingredient_to_recipe 将相同的成分链接到一个或多个食谱。最后包含指向 .sql 文件的链接。

示例查询

我想获取 recipe_id 和作为配方一部分的每种成分的 MAX 价格权重的总和(由成分 to_recipe 链接)，但如果一个配方有多个成分属于同一个 flyers_item，它应该计算一次.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

SELECT itr.recipe_id,
SUM(itr.weight),
SUM(max_price_weight),
SUM(itr.weight + max_price_weight) AS score
FROM
( SELECT MAX(itf.max_price_weight) AS max_price_weight,
itf.flyer_item_id,
itf.ingredient_id
FROM
(SELECT ifi.ingredient_id,
MAX(i.price_weight) AS max_price_weight,
ifi.flyer_item_id
FROM flyer_items i
JOIN ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id
WHERE i.flyer_id IN (1,
2)
GROUP BY ifi.ingredient_id ) itf
GROUP BY itf.flyer_item_id) itf2
JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id`
WHERE recipe_id = 5730
GROUP BY itr.`recipe_id`
ORDER BY score DESC
LIMIT 0,10

查询几乎可以正常工作，因为大多数结果都很好，但是对于某些行，某些成分被忽略了，并且没有按应有的方式计入分数。

测试用例

1
2
3
4
5
6
7
8
9

| recipe_id | ‘score’ with current query | what ‘score’ should be | explanation |
|———–|—————————-|————————|—————————————————————————–|
| 8376 | 51 | 51 | Good result |
| 3152 | 1 | 18 | Only 1 ingredient having a score of one is counted, should be 4 ingredients |
| 4771 | 41 | 45 | One ingredient worth score 4 is ignored |
| 10230 | 40 | 40 | Good result |
| 8958 | 39 | 39 | Good result |
| 4656 | 28 | 34 | One ingredient worth 6 is ignored |
| 11338 | 1 | 10 | 2 ingredients, worth 4 and 5 are ignored |

我很难找到一种简单的方法来解释它。让我知道是否还有其他帮助。

这里是运行查询、测试示例和测试用例的演示数据库的链接：https://nofile.io/f/F4YSEu8DWmT/meta.zip

非常感谢。

更新(如 Rick James 所问)：

这是我能做到的最远距离。结果总是很好，在子查询中也是如此，但是，我已经完全通过 \\’flyer_item_id\\’ 取出了组。所以通过这个查询，我得到了很好的分数，但是如果食谱的许多成分是相同的 flyer_item_item，它们将被计算多次(对于 recipe_id = 10557 的得分将是 59 而不是好的 56，因为 2 个成分价值 3位于同一个 flyers_item 中)。我唯一需要做的就是为每个食谱的每个 flyer_item_id 计算一个 MAX(price_weight)，(我最初尝试通过在第一个 group_by 成分 ID 上按 \\’flyer_item_id\\’ 进行分组。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

SELECT itr.recipe_id,
SUM(itr.weight) as total_ingredient_weight,
SUM(itf.price_weight) as total_price_weight,
SUM(itr.weight+itf.price_weight) as score
FROM
(SELECT fi1.id, MAX(fi1.price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, recipe_id
FROM flyer_items fi1
INNER JOIN (
SELECT flyer_items.id as id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id
FROM flyer_items
JOIN ingredient_to_flyer_item ON flyer_items.id = ingredient_to_flyer_item.flyer_item_id
GROUP BY id
) fi2 ON fi1.id = fi2.id AND fi1.price_weight = fi2.price_weight
JOIN ingredient_to_flyer_item ON fi1.id = ingredient_to_flyer_item.flyer_item_id
JOIN ingredient_to_recipe ON ingredient_to_flyer_item.ingredient_id = ingredient_to_recipe.ingredient_id
GROUP BY ingredient_to_flyer_item.ingredient_id) AS itf
INNER JOIN `ingredient_to_recipe` AS `itr` ON `itf`.`ingredient_id` = `itr`.`ingredient_id`
GROUP BY `itr`.`recipe_id`
ORDER BY `score` DESC
LIMIT 10

这是解释，但我不确定它是否有用，因为最后一个工作部分仍然缺失：

1
2
3
4
5
6
7
8
9
10

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
|—-|————-|————————–|————|——–|——————————-|—————|———|——————————————————-|——–|———-|———————————|—|
| 1 | PRIMARY | itr | NULL | ALL | recipe_id,ingredient_id | NULL | NULL | NULL | 151800 | 100.00 | Using temporary; Using filesort | |
| 1 | PRIMARY | <derived2> | NULL | ref | | | 4 | metadata3.itr.ingredient_id | 10 | 100.00 | NULL | |
| 2 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 2 | DERIVED | fi1 | NULL | eq_ref | id_2,id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | |
| 2 | DERIVED | <derived3> | NULL | ref | | | 9 | metadata3.ingredient_to_flyer_item.flyer_item_id,m… | 10 | 100.00 | NULL | |
| 2 | DERIVED | ingredient_to_recipe | NULL | ref | ingredient_id | ingredient_id | 4 | metadata3.ingredient_to_flyer_item.ingredient_id | 40 | 100.00 | NULL | |
| 3 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 3 | DERIVED | flyer_items | NULL | eq_ref | id_2,id,flyer_id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | |

更新 2

我设法找到了一个有效的查询，但现在我必须让它更快，它需要超过 500 毫秒才能运行。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

SELECT sum(ff.price_weight) as price_weight, sum(ff.weight) as weight, sum(ff.price_weight+ff.weight) as score, ff.recipe_id FROM
(
SELECT DISTINCT
itf.flyer_item_id as flyer_item_id,
itf.recipe_id,
itf.weight,
aprice_weight AS price_weight
FROM
(SELECT itfin.flyer_item_id AS flyer_item_id,
itfin.price_weight AS aprice_weight,
itfin.ingredient_id,
itr.recipe_id,
itr.weight
FROM
(SELECT ifi2.flyer_item_id, ifi2.ingredient_id as ingredient_id, MAX(ifi2.price_weight) as price_weight
FROM
ingredient_to_flyer_item ifi1
INNER JOIN (
SELECT id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, ingredient_to_flyer_item.flyer_item_id
FROM ingredient_to_flyer_item
GROUP BY ingredient_id
) ifi2 ON ifi1.price_weight = ifi2.price_weight AND ifi1.ingredient_id = ifi2.ingredient_id
WHERE flyer_id IN (1,2)
GROUP BY ifi1.ingredient_id) AS itfin
INNER JOIN `ingredient_to_recipe` AS `itr` ON `itfin`.`ingredient_id` = `itr`.`ingredient_id`

) AS itf
) ff
GROUP BY recipe_id
ORDER BY `score` DESC
LIMIT 20

这里是解释：

1
2
3
4
5
6
7
8

| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
|—-|————-|————————–|————|——-|———————————————-|—————|———|———————|——|———-|———————————|—|
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1318 | 100.00 | Using temporary; Using filesort | |
| 2 | DERIVED | <derived4> | NULL | ALL | NULL | NULL | NULL | NULL | 37 | 100.00 | Using temporary | |
| 2 | DERIVED | itr | NULL | ref | ingredient_id | ingredient_id | 4 | itfin.ingredient_id | 35 | 100.00 | NULL | |
| 4 | DERIVED | <derived5> | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 4 | DERIVED | ifi1 | NULL | ref | ingredient_id,itx_full,price_weight,flyer_id | ingredient_id | 4 | ifi2.ingredient_id | 1 | 12.50 | Using where | |
| 5 | DERIVED | ingredient_to_flyer_item | NULL | index | ingredient_id,itx_full | ingredient_id | 4 | NULL | 249 | 100.00 | NULL | |

相关讨论

您能否仅提供 MCVE 作为文本而不是 zip 文件。
@Strawberry 即使是一个简单的例子，也很难在线上这么多。 :\\\\
删除 ORDER BY price_weight DESC。在子查询中排序没有意义。
在子查询中，您有列 flyer_id。您没有按此列进行分组，因此您应该使用聚合函数，例如 max(flyer_id)。 ingredient_id 列也存在相同的错误。
请为列使用前缀。我们无法猜测列来自哪些表。
@TheImpaler 查询已根据您的建议进行了更新。
您是否验证了子查询生成了正确的值？
@RickJames我已经更新了一个不太完整的版本，但我确信数据是正确的，直到那个缺失的部分。
@杰夫B。 – 子查询之一是 SELECT id,…，但从未使用过 id。去掉它。并检查其他小错误。
@杰夫B。 – 为了更好地关注 500ms 问题，请对可以从大查询中干净提取的每个子查询进行计时。这可能有助于缩小我们需要关注的范围。

我一直想看看这个，但不幸的是直到现在还没有时间。我认为这个查询会给你你正在寻找的结果。

1
2
3
4
5
6
7
8
9
10
11

SELECT recipe_id, SUM(weight) AS weight, SUM(max_price_weight) AS price_weight, SUM(weight + max_price_weight) AS score
FROM (SELECT recipe_id, ingredient_id, MAX(weight) AS weight, MAX(price_weight) AS max_price_weight
FROM (SELECT itr.recipe_id, MIN(itr.ingredient_id) AS ingredient_id, MAX(itr.weight) AS weight, fi.id, MAX(fi.price_weight) AS price_weight
FROM ingredient_to_recipe itr
JOIN ingredient_to_flyer_item itfi ON itfi.ingredient_id = itr.ingredient_id
JOIN flyer_items fi ON fi.id = itfi.flyer_item_id
GROUP BY itr.recipe_id, fi.id) ri
GROUP BY recipe_id, ingredient_id) r
GROUP BY recipe_id
ORDER BY score DESC
LIMIT 10

它首先按 flyer_item_id 分组，然后按 MIN(ingredient_id) 分组，以考虑配方中具有相同 flyer_item_id 的成分。然后将结果相加得到你想要的分数。如果我使用带有

的查询

1	HAVING recipe_id IN (8376, 3152, 4771, 10230, 8958, 4656, 11338)

子句给出以下结果，与您上面的”应该是什么分数”列相匹配：

1
2
3
4
5
6
7
8

recipe_id weight price_weight score
8376 10 41 51
4771 5 40 45
10230 10 30 40
8958 15 24 39
4656 15 19 34
3152 0 18 18
11338 0 10 10

我不确定此查询在您的系统上执行的速度有多快，它与您在我的笔记本电脑上的查询相当(我预计会慢一些)。我很确定有一些优化是可能的，但同样，还没有时间彻底研究它们。

我希望这能为您找到可行的解决方案提供更多帮助。

听起来像”爆炸-内爆”。这是查询具有 JOIN 和 GROUP BY.

的地方

JOIN 从连接表中收集适当的行组合；然后

GROUP BY COUNTs、SUMs 等，为您提供聚合的膨胀值。

有两个常见的修复方法，都涉及将聚合与 JOIN.

分开

案例1：

1
2
3
4

SELECT …
( SELECT SUM(x) FROM t2 WHERE id = … ) AS sum_x,
…
FROM t1 …

如果您需要来自 t2 的多个聚合，这种情况会变得很笨拙，因为它一次只允许一个。

案例 2：

1
2
3
4
5
6

SELECT …
FROM ( SELECT grp,
SUM(x) AS sum_x,
COUNT(*) AS ct
FROM t2 ) AS s
JOIN t1 ON t1.grp = s.grp

你有 2 个 JOINs 和 3 个 GROUP BYs，所以我建议你从内到外调试(并重写)你的查询。

1
2
3
4
5
6
7

SELECT ifi.ingredient_id,
MAX(price_weight) as max_price_weight,
flyer_item_id
from flyer_items i
join ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id
where flyer_id in (1, 2)
group by ifi.ingredient_id

但我不能帮助你，因为你没有通过它所在的表(或别名)来限定 price_weight。(对于其他一些列也是如此。)

(实际上，MAX 和 MIN 不会得到夸大的值；AVG 会得到稍微错误的值；COUNT 和 SUM 会得到”错误”的值。)

因此，我将把剩下的作为”练习”留给读者”。

索引

1
2
3
4
5
6

itr: (ingredient_id, recipe_id) — for the JOIN and WHERE and GROUP BY
itr: (recipe_id, ingredient_id, weight) — for 1st Update
(There is no optimization available for the ORDER BY and LIMIT)
flyer_items: (flyer_id, price_weight) — unless flyer_id is the PRIMARY KEY
ifi: (flyer_item_id, ingredient_id)
ifi: (ingredient_id, flyer_item_id) — for 1st Update

请为相关表提供`SHOW CREATE TABLE。

请提供 EXPLAIN SELECT ….

如果 ingredient_to_flyer_item 是多：多映射表，请按照此处的提示进行操作。 ingredient_to_recipe 同上？

GROUP BY itf.flyer_item_id 可能无效，因为它不包括非聚合的 ifi.ingredient_id。参见”only_full_group_by”。

重新制定

评估完 INDEXes 后，请尝试以下操作。注意：我不知道它是否会正常工作。

1	JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id`

到

1
2
3
4

JOIN ( SELECT recipe_id,
ingredient_id,
SUM(weight) AS sum_weight
FROM ingredient_to_recipe ) AS itr

并用这些计算的和更改初始的 SELECT 以替换 SUMs。 (我怀疑我没有正确处理 ingredient_id。)

您正在运行什么版本的 MySQL/MariaDB？

Select max value on multiple tables, without counting them twice

猜你喜欢