关于 r：将数据框减少到更少的行

Reduce a data frame to fewer rows

假设我有一个数据框”dat”，例如：

1
2
3
4
5
6
7
8
9
10
11

col1 col2
12 a
43 a
54 a
11 a
33 b
43 b
34 c
34 c
342 c
343 c

现在我有一个向量作为

1	vec <- c(a,a,a,b,c,c)

我想要做的是根据向量”vec”删除数据框”dat”中的额外行，这意味着在数据框中只保留与”a”对应的前3行，只保留与 “b” 对应的前 1 行，只保留与 c.

对应的前 2 行

我应该得到

的输出

1
2
3
4
5
6
7

col1 col2
12 a
43 a
54 a
33 b
34 c
34 c

不用for循环最快的方法是什么？

相关讨论

好的。我试图找出一个 DT 答案一段时间

这是另一种 Map() 方法。

1
2
3
4
5
6
7
8
9
10
11
12

fvec <- factor(vec)
## find the index for the first occurrence of a new level
m <- match(levels(fvec), df$col2)

df[unlist(Map(seq, from = m, length.out = tabulate(fvec))), ]
# col1 col2
# 1 12 a
# 2 43 a
# 3 54 a
# 5 33 b
# 7 34 c
# 8 34 c

或者你可以在匹配

之后使用rle()

1
2
3
4
5
6
7
8
9

rl <- rle(match(vec, df$col2))
df[unlist(Map(seq, rl$values, length.out = rl$lengths)),]
# col1 col2
# 1 12 a
# 2 43 a
# 3 54 a
# 5 33 b
# 7 34 c
# 8 34 c

使用 dplyr 你可以这样做：

1
2
3
4
5
6

#create a data frame with frequencies
tv <- data.frame(table(vec))

#filter values
group_by(dat, col2) %>%
filter(row_number() <= tv$Freq[tv$vec %in% col2])

给出：

1
2
3
4
5
6
7
8
9
10

#Source: local data frame [6 x 2]
#Groups: col2
#
# col1 col2
#1 12 a
#2 43 a
#3 54 a
#4 33 b
#5 34 c
#6 34 c

这是一种使用 split 和 Map 的方式：

数据

1
2
3
4
5
6
7
8
9
10
11
12
13

dat <- read.table(header=T, text=’ col1 col2
12 a
43 a
54 a
11 a
33 b
43 b
34 c
34 c
342 c
343 c’,stringsAsFactors=F)

vec <- c(‘a’,’a’,’a’,’b’,’c’,’c’)

解决方案

1
2
3
4
5
6
7
8
9

#count frequencies
tabvec <- table(vec)

data.frame(do.call(rbind,
#use split to split data.frame according to col2
#use head to only choose the first n rows according to tabvec
#convert output into a data.frame
Map(function(x,y) head(x,y), split(dat, as.factor(dat$col2)), tabvec)
))

输出：

1
2
3
4
5
6
7

col1 col2
a.1 12 a
a.2 43 a
a.3 54 a
b 33 b
c.7 34 c
c.8 34 c

来源：https://www.codenong.com/29291080/

Reduce a data frame to fewer rows

猜你喜欢