关于java：查找出现在一组列表中的每一个中的所有数字

Find all numbers that appear in each of a set of lists

我有几个整数对象的 ArrayLists，存储在 HashMap 中。

我想获取每个列表中出现的所有数字(整数对象)的列表(ArrayList)。

到目前为止我的想法是：

遍历每个 ArrayList 并将所有值放入 HashSet

这将为我们提供列表中所有值的”列表”，但只有一次

遍历 HashSet
2.1 每次迭代执行 ArrayList.contains()
2.2 如果 ArrayLists 都没有为操作返回 false，则将该数字添加到包含所有最终值的”主列表”中。

如果你能想出更快或更高效的方法，有趣的是，当我写这篇文章时，我想出了一个相当不错的解决方案。但我仍然会发布它以防万一它对其他人有用。

当然，如果您有更好的方法，请告诉我。

相关讨论

可能值得在你的 while 循环中对 resSet 进行空检查。
哦，您不需要为每个 it.next() 构造一个新的哈希集 – retainAll 适用于集合，并且 it.next() 中的重复元素不会影响操作。
编辑：我想对于某些retainAll情况有一些节省，但在这种特殊情况下，自定义方法可能无论如何都是有序的。
@Carl：如果我在列表本身上使用retainAll，它会增加时间复杂度。当 Y 是一个简单的 List 实现时，X.retainAll(Y) 在 O(|X|*|Y|) 时间内工作。当 Y 为 HashSet 时，它的工作时间平均为 O(|X|)，所以复制是值得的。

您必须更改第 1 步：
– 使用最短列表而不是您的 hashSet(如果它不在最短列表中，则它不在所有列表中……)

然后在其他列表中调用 contains 并在一个返回 false 时删除值(并跳过对该值的进一步测试)

最后，最短的列表将包含答案…

一些代码：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

public class TestLists {

private static List<List<Integer>> listOfLists = new ArrayList<List<Integer>>();

private static List<Integer> filter(List<List<Integer>> listOfLists) {

// find the shortest list
List<Integer> shortestList = null;
for (List<Integer> list : listOfLists) {
if (shortestList == null || list.size() < shortestList.size()) {
shortestList = list;
}
}

// create result list from the shortest list
final List<Integer> result = new LinkedList<Integer>(shortestList);

// remove elements not present in all list from the result list
for (Integer valueToTest : shortestList) {
for (List<Integer> list : listOfLists) {
// no need to compare to itself
if (shortestList == list) {
continue;
}

// if one list doesn’t contain value, remove from result and break loop
if (!list.contains(valueToTest)) {
result.remove(valueToTest);
break;
}
}
}

return result;
}

public static void main(String[] args) {
List<Integer> l1 = new ArrayList<Integer>(){{
add(100);
add(200);
}};
List<Integer> l2 = new ArrayList<Integer>(){{
add(100);
add(200);
add(300);
}};
List<Integer> l3 = new ArrayList<Integer>(){{
add(100);
add(200);
add(300);
}};
List<Integer> l4 = new ArrayList<Integer>(){{
add(100);
add(200);
add(300);
}};
List<Integer> l5 = new ArrayList<Integer>(){{
add(100);
add(200);
add(300);
}};
listOfLists.add(l1);
listOfLists.add(l2);
listOfLists.add(l3);
listOfLists.add(l4);
listOfLists.add(l5);
System.out.println(filter(listOfLists));

}

使用 Google Collections Multiset 使这(表示方式)变得轻而易举(尽管我也喜欢 Eyal 的回答)。它可能不如这里的其他一些在时间/内存方面有效，但很清楚发生了什么。

假设列表本身不包含重复项：

1
2
3
4
5
6
7
8
9
10
11
12

Multiset<Integer> counter = HashMultiset.create();
int totalLists = 0;
// for each of your ArrayLists
{
counter.addAll(list);
totalLists++;
}

List<Integer> inAll = Lists.newArrayList();

for (Integer candidate : counter.elementSet())
if (counter.count(candidate) == totalLists) inAll.add(candidate);`

如果列表可能包含重复的元素，它们可以先通过一个集合：

1	counter.addAll(list) => counter.addAll(Sets.newHashSet(list))

最后，如果您希望稍后可能需要一些额外的数据(例如，某个特定值与切入点有多接近)，这也是理想的选择。

另一种稍微修改了 Eyal 的方法(基本上将通过集合过滤列表然后保留所有重叠元素的行为折叠在一起)，并且比上述更轻量级：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

public List<Integer> intersection(Iterable<List<Integer>> lists) {

Iterator<List<Integer>> listsIter = lists.iterator();
if (!listsIter.hasNext()) return Collections.emptyList();
Set<Integer> bag = new HashSet<Integer>(listsIter.next());
while (listsIter.hasNext() && !bag.isEmpty()) {
Iterator<Integer> itemIter = listsIter.next().iterator();
Set<Integer> holder = new HashSet<Integer>(); //perhaps also pre-size it to the bag size
Integer held;
while (itemIter.hasNext() && !bag.isEmpty())
if ( bag.remove(held = itemIter.next()) )
holder.add(held);
bag = holder;
}
return new ArrayList<Integer>(bag);
}

从第一个 List 创建一个 Set(例如 HashSet)。

对于每个剩余的列表：

如果 List 和 Set 都足够小，则调用 set.retainAll (list)
否则调用 set.retainAll (new HashSet <Integer> (list))

我不能说在哪个阈值之后步骤 2 的第二个变体变得更快，但我猜可能是 > 20 大小左右。如果你的列表都很小，你可以不用这个检查。

我记得如果您不仅关心 O(*) 部分，而且关心因子，那么 Apache 集合具有更有效的纯整数结构。

Find all numbers that appear in each of a set of lists

猜你喜欢