Nov 19, 2022

# Solution 1 :

Here is a `data.table` solution. I tried to avoid manual calculations and though of a solution based on long to wide transformation.
Here is my solution, with step by step detail after:

``````library(lubridate)
library(data.table)

dt <- setDT(dt)
dt[,Date := date(Date)]
dt[,type := fifelse(Type == "SS_RT",fifelse(Remark == "AT_1_O","A1","A2"),"B")]
## transform to wide
df2 <- rbind(dcast(data = dt,Date~type ,value.var = "Price",fill = 0)[,linetype := "count"],
dcast(data = dt,Date~type ,value.var = "Price",fill = 0,fun.aggregate = sum)[,linetype := "value"])
## A and tot
df2[,tot := rowSums(.SD),.SDcols = c("A1","A2","B")]
df2[,A := A1+A2]
## create pc
cols <- c("A","A1","A2","B")
df2[,paste0(cols,"_pc") := lapply(.SD,function(x) round(x/tot*100) ),.SDcols = cols]
cols <- c("A1","A2")
df2[,paste0(cols,"_exc") := lapply(.SD,function(x) round(x/(A1+A2)*100) ),.SDcols = cols]
df2 <- merge(CJ(Date = seq(min(dt\$Date),max(dt\$Date),1),linetype = c("count","value")),
df2,all = T,by = c("Date","linetype"))

df2[is.na(df2)] <- 0
df2[,linetype := NULL]
df2

Date   A1   A2    B  tot    A A_pc A1_pc A2_pc B_pc A1_exc A2_exc
1: 2020-12-01    3    1    3    7    4   57    43    14   43     75     25
2: 2020-12-01 3800 1200 4200 9200 5000   54    41    13   46     76     24
3: 2020-12-02    0    0    0    0    0    0     0     0    0      0      0
4: 2020-12-02    0    0    0    0    0    0     0     0    0      0      0
5: 2020-12-03    0    0    0    0    0    0     0     0    0      0      0
6: 2020-12-03    0    0    0    0    0    0     0     0    0      0      0
7: 2020-12-04    0    0    0    0    0    0     0     0    0      0      0
8: 2020-12-04    0    0    0    0    0    0     0     0    0      0      0
9: 2020-12-05    0    0    0    0    0    0     0     0    0      0      0
10: 2020-12-05    0    0    0    0    0    0     0     0    0      0      0
11: 2020-12-06    0    0    0    0    0    0     0     0    0      0      0
12: 2020-12-06    0    0    0    0    0    0     0     0    0      0      0
13: 2020-12-07    0    1    2    3    1   33     0    33   67      0    100
14: 2020-12-07    0 1600 3200 4800 1600   33     0    33   67      0    100
``````

So first step is I create the `type` variable following your rule:

``````dt[,Date := date(Date)]
dt[,type := fifelse(Type == "SS_RT",fifelse(Remark == "AT_1_O","A1","A2"),"B")]
``````

We know `A` is just `A1` + `A2`. It allows me to transform the table to wide format. I do it twice: once to count, once to make the sum per type:

``````dcast(data = dt,Date ~ type ,value.var = "Price",fill = 0)

Date A1 A2 B
1: 2020-12-01  3  1 3
2: 2020-12-07  0  1 2
``````

Here I count the number of occurrence for each type, because it uses the default aggregate: `lenght`.
If I use `sum` as aggregate function:

``````dcast(data = dt,Date~type ,value.var = "Price",fill = 0,fun.aggregate = sum)

Date   A1   A2    B
1: 2020-12-01 3800 1200 4200
2: 2020-12-07    0 1600 3200
``````

I add the `linetype` variable, which will help me after to add the missing dates (I use it to keep two lines per dates).

I bind the two, I obtain:

``````         Date   A1   A2    B linetype
1: 2020-12-01    3    1    3    count
2: 2020-12-07    0    1    2    count
3: 2020-12-01 3800 1200 4200    value
4: 2020-12-07    0 1600 3200    value
``````

I then calculate `A` and the total:

``````df2[,tot := rowSums(.SD),.SDcols = c("A1","A2","B")]
df2[,A := A1+A2]
``````

I then calculate the percentage (`_pc`) and Excl variables (that I names `_exc` for simplicity), using lapply and a vector of the column I want to transform. I use `fifelse` to avoid dividing by 0:

``````cols <- c("A","A1","A2","B")
df2[,paste0(cols,"_pc") := lapply(.SD,function(x) round(x/tot*100) ),.SDcols = cols]
cols <- c("A1","A2")
df2[,paste0(cols,"_exc") := lapply(.SD,function(x) round(x/(A1+A2)*100) ),.SDcols = cols]

Date   A1   A2    B linetype  tot    A A_pc A1_pc A2_pc B_pc A1_exc A2_exc
1: 2020-12-01    3    1    3    count    7    4   57    43    14   43     75     25
2: 2020-12-01 3800 1200 4200    value 9200 5000   54    41    13   46     76     24
3: 2020-12-07    0    1    2    count    3    1   33     0    33   67      0    100
4: 2020-12-07    0 1600 3200    value 4800 1600   33     0    33   67      0    100
``````

I then add the missing dates, by merging with all combination of `linetype` and `Date` and keeping all rows. I use the `CJ` function to create a `data.table` with all combinations of the two variables:

``````CJ(Date = seq(min(dt\$Date),max(dt\$Date),1),linetype = c("count","value"))
Date linetype
1: 2020-12-01    count
2: 2020-12-01    value
3: 2020-12-02    count
4: 2020-12-02    value
5: 2020-12-03    count
6: 2020-12-03    value
7: 2020-12-04    count
8: 2020-12-04    value
9: 2020-12-05    count
10: 2020-12-05    value
11: 2020-12-06    count
12: 2020-12-06    value
13: 2020-12-07    count
14: 2020-12-07    value
``````

And then replace missing values with 0 and supress the `linetype` variable.

You can then reorder the columns with `setcolorder`, and use `kabbleExtra` (see here) to produce your html output.

You can do the same with `dplyr`, using `pivot_wider` to transform to wide, `mutate_all` instead of the `lapply(.SD,...)` to do your calculations, `expand.grid` instead of `CJ` to generate the table of missing dates.

# Solution 2 :

This was a lot of busy work. It is basically everything you said in base R grouped by the date in . Note, I’m not sure if the prices are accurate in OP or if I had other issues.

``````dt[, Date := as.POSIXct(Date, "UTC")]
dt[,
{
t_ss_rt = Type == 'SS_RT'
Type_A = sum(t_ss_rt)
Type_B = .N - Type_A

tot_Price = sum(Price)
Type_A_price = sum(Price[t_ss_rt])
Type_B_price = tot_Price - Type_A_price

rm_ss_rt = t_ss_rt & Remark == 'AT_1_O'
Type_A1 = sum(rm_ss_rt)
Type_A2 = Type_A - Type_A1

tot_An_Price = sum(Price[t_ss_rt])
Type_A1_Price = sum(Price[rm_ss_rt])
Type_A2_Price = tot_An_Price - Type_A1_Price

Type_A1_Excl = Type_A1 / (Type_A1 + Type_A2)
Type_A2_Excl = Type_A2 / (Type_A1 + Type_A2)

.(c(Type_A, Type_A_price), c(Type_A / .N, Type_A_price / tot_Price),
c(Type_A1, Type_A1_Price), c(Type_A1 / .N, Type_A1_Price / tot_Price),
c(Type_A2, Type_A2_Price), c(Type_A2 / .N, Type_A2_Price / tot_Price),
c(Type_B, Type_B_price), c(Type_B / .N, NA_real_), c(.N, tot_Price),
c(Type_A1_Excl, Type_A1_Price / (tot_An_Price)), c(Type_A2_Excl, Type_A2_Price / tot_An_Price))
},
by = .(Date)]
``````

For these results:

``````                  Date   V1        V2   V3        V4   V5        V6   V7        V8   V9  V10  V11
1: 2020-12-01 10:12:14    4 0.5714286    3 0.4285714    1 0.1428571    3 0.4285714    7 0.75 0.25
2: 2020-12-01 10:12:14 5000 0.5434783 3800 0.4130435 1200 0.1304348 4200        NA 9200 0.76 0.24
3: 2020-12-07 10:12:14    1 0.3333333    0 0.0000000    1 0.3333333    2 0.6666667    3 0.00 1.00
4: 2020-12-07 10:12:14 1600 0.3333333    0 0.0000000 1600 0.3333333 3200        NA 4800 0.00 1.00
``````

Data source:

``````library(data.table)

"ID   ,     Date      ,            Type ,      Remark,      Price
PRT-11,    2020-12-01 10:12:14,   SS_RT,      AT_1_O   ,   1000
PRT-11,    2020-12-01 10:12:14,   SS_RT ,     AT_1_O  ,    1200
PRT-11,    2020-12-01 10:12:14,   SS_RT  ,    AT_1_O ,     1600
PRT-11,    2020-12-01 10:12:14,   SS_RG   ,   AT_1_A,      1600
PRT-11,    2020-12-01 10:12:14,   SS_RG    ,  AT_1_B     , 1600
PRT-11,    2020-12-01 10:12:14,   SS_RG     , AT_1_C    ,  1000
PRT-11,    2020-12-01 10:12:14,   SS_RT,      AT_1_Y   ,   1200
PRT-11,    2020-12-07 10:12:14,   SS_RT ,     AT_1_U  ,    1600
PRT-11,    2020-12-07 10:12:14,   SS_RI  ,    AT_1_M ,     1600
PRT-11,    2020-12-07 10:12:14,   SS_RO   ,   AT_1_P,      1600")
``````

# Problem :

I have created the below-mentioned data frame in R.

My_DF

``````ID        Date                  Type       Remark      Price
PRT-11    2020-12-01 10:12:14   SS_RT      AT_1_O      1000
PRT-11    2020-12-01 10:12:14   SS_RT      AT_1_O      1200
PRT-11    2020-12-01 10:12:14   SS_RT      AT_1_O      1600
PRT-11    2020-12-01 10:12:14   SS_RG      AT_1_A      1600
PRT-11    2020-12-01 10:12:14   SS_RG      AT_1_B      1600
PRT-11    2020-12-01 10:12:14   SS_RG      AT_1_C      1000
PRT-11    2020-12-01 10:12:14   SS_RT      AT_1_Y      1200
PRT-11    2020-12-07 10:12:14   SS_RT      AT_1_U      1600
PRT-11    2020-12-07 10:12:14   SS_RI      AT_1_M      1600
PRT-11    2020-12-07 10:12:14   SS_RO      AT_1_P      1600
``````

I want to covert the above-mentioned DF in the following structure Dataframe and convert it in HTML format which can be used to send email using `mailR` library.

Where I have followed the following condition.

• If `Type` is equal to `SS_RT` then it is `Type - A`

• If `Type` is anything other than `SS_RT` then it is `Type - B`

• If `Type` is equal to `SS_RT` with `Remark` is equal to `AT_1_O` then it is `Type - A1`

• If `Type` is equal to `SS_RT` with `Remark` other than `AT_1_O` then it is `Type - A2`

• The Formula for `Type - A1 (Excl)` is `Type - A1` divided by the sum of `Type - A1`and `Type - A2`

• The Formula for `Type - A1 (Excl)` is `Type - A2` divided by the sum of `Type - A1`and `Type - A2`

Rest all the %age formula are pretty straightforward by `Total` in the denominator.

In the data frame, it is possible the there is no entry for a particular date. For that, we need to ensure that of all the available dates we need to take min and max date and ensure that for that date which is not available we show value as 0 in bother count and sum column.

I have merged the date in two rows the first row is used for the count and the second one is for the sum group by their logical defination.

### Comment posted by Cole

Hi. Could you confirm that the first total price should be 4000? My calculation shows that it should be 5000 (e.g., 1000 + 1200 + 1600 + 1200). Also, your expected output includes the dates by every month. The provided dataset is only 6 days a part. I feel like there are already answers for completing datasets but you should provide details – what’s the sequence of dates?

### Comment posted by Viper

@Cole: Your calculation is correct, it should be 5000. Also, the sequance of date should be in descending order considering the minimum and maximum date. If in dataset the Min Date is 1st Dec and Max Date is 10 Dec and entry for 2,4,6 Dec is not present in dataset then we need to create a blank entry for that.

### Comment posted by Cole

Thanks for clarifying. I do not follow the blank entries. Why would December 8 not be created?

### Comment posted by Viper

@Cole: So that users get to know that for a particular date there was no entry so that they can check accordingly.

### Comment posted by Viper

Thanks, How can I merge the same date in HTML output?

you can use

Getting Error

### Comment posted by denis

yep, dt must be a

### Comment posted by Viper

Getting some random error at