r - tidyr spread function generates sparse matrix when compact vector expected -
i'm learning dplyr, having come plyr, , want generate (per group) columns (per interaction) output of xtabs.
short summary: i'm getting
a    b 1    na na   2   when wanted
a    b 1    2   xtabs data looks this:
> xtabs(data=data.frame(p=c(f,t,f,t,f),a=c(f,f,t,t,t)))        p       false true   false     1    2   true      1    1   now do( wants it's data in data frames, this:
> xtabs(data=data.frame(p=c(f,t,f,t,f),a=c(f,f,t,t,t))) %>% as.data.frame       p     freq 1 false false    1 2  true false    1 3 false  true    2 4  true  true    1   now want single row output columns being interaction of levels. here's i'm looking for:
false_false true_true false_true true_false           1         1          2          1   but instead
> xtabs(data=data.frame(p=c(f,t,f,t,f),a=c(f,f,t,t,t))) %>%      as.data.frame %>%      unite(s,a,p) %>%      spread(s,freq)   false_false false_true true_false true_true 1           1         na         na        na 2          na          1         na        na 3          na         na          2        na 4          na         na         na         1   i'm misunderstanding here. i'm looking equivalent of reshape2's code here (using magrittr pipes consistency):
> xtabs(data=data.frame(p=c(f,t,f,t,f),a=c(f,f,t,t,t))) %>%      as.data.frame %>% # can omitted. (safely??)     melt %>%      mutate(s=interaction(p,a),value=value) %>%      dcast(na~s) using p, id variables   na false.false true.false false.true true.true 1 na           1          1          2         1   (note na used here because don't have grouping variable in simplified example)
update - interestingly, adding single grouping column seems fix - why synthesise (presumably row_name) grouping column without me telling it?
> xtabs(data=data.frame(h="foo",p=c(f,t,f,t,f),a=c(f,f,t,t,t))) %>%    as.data.frame %>%    unite(s,a,p) %>%    spread(s,freq)     h false_false false_true true_false true_true 1 foo           1          1          2         1   this seems partial solution.
the key here spread doesn't aggregate data.
hence, if hadn't used xtabs aggregate first, doing this:
a <- data.frame(p=c(f,t,f,t,f),a=c(f,f,t,t,t), freq = 1) %>%      unite(s,a,p) ##             s freq ## 1 false_false    1 ## 2  false_true    1 ## 3  true_false    1 ## 4   true_true    1 ## 5  true_false    1  %>% spread(s, freq) ##   false_false false_true true_false true_true ## 1           1         na         na        na ## 2          na          1         na        na ## 3          na         na          1        na ## 4          na         na         na         1 ## 5          na         na          1        na   which wouldn't make sense other way (without aggregation).
this predictable based on file fill parameter:
if there isn't value every combination of other variables , key column, value substituted.
in case, there aren't other variables combine key column. had there been, then...
b <- data.frame(p=c(f,t,f,t,f),a=c(f,f,t,t,t), freq = 1                                 , h = rep(c("foo", "bar"), length.out = 5)) %>%      unite(s,a,p) b ##             s freq   h ## 1 false_false    1 foo ## 2  false_true    1 bar ## 3  true_false    1 foo ## 4   true_true    1 bar ## 5  true_false    1 foo  > b %>% spread(s, freq) ## error: duplicate identifiers rows (3, 5)   ...it fail, because can't aggregate rows 3 , 5 (because isn't designed to).
the tidyr/dplyr way group_by , summarize instead of xtabs, because summarize preserves grouping column, hence spread can tell observations belong in same row:
b %>%   group_by(h, s) %>%     summarize(freq = sum(freq)) ## source: local data frame [4 x 3] ## groups: h ##  ##     h           s freq ## 1 bar  false_true    1 ## 2 bar   true_true    1 ## 3 foo false_false    1 ## 4 foo  true_false    2  b %>%   group_by(h, s) %>%     summarize(freq = sum(freq)) %>%     spread(s, freq) ## source: local data frame [2 x 5] ##  ##     h false_false false_true true_false true_true ## 1 bar          na          1         na         1 ## 2 foo           1         na          2        na      
Comments
Post a Comment