Solution 1 :

Here is a data.table solution. I tried to avoid manual calculations and though of a solution based on long to wide transformation.
Here is my solution, with step by step detail after:

library(lubridate)
library(data.table)

dt <- setDT(dt)
dt[,Date := date(Date)]
dt[,type := fifelse(Type == "SS_RT",fifelse(Remark == "AT_1_O","A1","A2"),"B")]
## transform to wide
df2 <- rbind(dcast(data = dt,Date~type ,value.var = "Price",fill = 0)[,linetype := "count"],
             dcast(data = dt,Date~type ,value.var = "Price",fill = 0,fun.aggregate = sum)[,linetype := "value"])
## A and tot
df2[,tot := rowSums(.SD),.SDcols = c("A1","A2","B")]
df2[,A := A1+A2]
## create pc
cols <- c("A","A1","A2","B")
df2[,paste0(cols,"_pc") := lapply(.SD,function(x) round(x/tot*100) ),.SDcols = cols]
cols <- c("A1","A2")
df2[,paste0(cols,"_exc") := lapply(.SD,function(x) round(x/(A1+A2)*100) ),.SDcols = cols]
## add missing dates
df2 <- merge(CJ(Date = seq(min(dt$Date),max(dt$Date),1),linetype = c("count","value")),
             df2,all = T,by = c("Date","linetype"))

df2[is.na(df2)] <- 0
df2[,linetype := NULL]
df2

          Date   A1   A2    B  tot    A A_pc A1_pc A2_pc B_pc A1_exc A2_exc
 1: 2020-12-01    3    1    3    7    4   57    43    14   43     75     25
 2: 2020-12-01 3800 1200 4200 9200 5000   54    41    13   46     76     24
 3: 2020-12-02    0    0    0    0    0    0     0     0    0      0      0
 4: 2020-12-02    0    0    0    0    0    0     0     0    0      0      0
 5: 2020-12-03    0    0    0    0    0    0     0     0    0      0      0
 6: 2020-12-03    0    0    0    0    0    0     0     0    0      0      0
 7: 2020-12-04    0    0    0    0    0    0     0     0    0      0      0
 8: 2020-12-04    0    0    0    0    0    0     0     0    0      0      0
 9: 2020-12-05    0    0    0    0    0    0     0     0    0      0      0
10: 2020-12-05    0    0    0    0    0    0     0     0    0      0      0
11: 2020-12-06    0    0    0    0    0    0     0     0    0      0      0
12: 2020-12-06    0    0    0    0    0    0     0     0    0      0      0
13: 2020-12-07    0    1    2    3    1   33     0    33   67      0    100
14: 2020-12-07    0 1600 3200 4800 1600   33     0    33   67      0    100

So first step is I create the type variable following your rule:

dt[,Date := date(Date)]
dt[,type := fifelse(Type == "SS_RT",fifelse(Remark == "AT_1_O","A1","A2"),"B")]

We know A is just A1 + A2. It allows me to transform the table to wide format. I do it twice: once to count, once to make the sum per type:

dcast(data = dt,Date ~ type ,value.var = "Price",fill = 0)

         Date A1 A2 B 
1: 2020-12-01  3  1 3    
2: 2020-12-07  0  1 2    

Here I count the number of occurrence for each type, because it uses the default aggregate: lenght.
If I use sum as aggregate function:

dcast(data = dt,Date~type ,value.var = "Price",fill = 0,fun.aggregate = sum)

         Date   A1   A2    B
1: 2020-12-01 3800 1200 4200
2: 2020-12-07    0 1600 3200

I add the linetype variable, which will help me after to add the missing dates (I use it to keep two lines per dates).

I bind the two, I obtain:

         Date   A1   A2    B linetype
1: 2020-12-01    3    1    3    count
2: 2020-12-07    0    1    2    count
3: 2020-12-01 3800 1200 4200    value
4: 2020-12-07    0 1600 3200    value

I then calculate A and the total:

df2[,tot := rowSums(.SD),.SDcols = c("A1","A2","B")]
df2[,A := A1+A2]

I then calculate the percentage (_pc) and Excl variables (that I names _exc for simplicity), using lapply and a vector of the column I want to transform. I use fifelse to avoid dividing by 0:

cols <- c("A","A1","A2","B")
df2[,paste0(cols,"_pc") := lapply(.SD,function(x) round(x/tot*100) ),.SDcols = cols]
cols <- c("A1","A2")
df2[,paste0(cols,"_exc") := lapply(.SD,function(x) round(x/(A1+A2)*100) ),.SDcols = cols]


         Date   A1   A2    B linetype  tot    A A_pc A1_pc A2_pc B_pc A1_exc A2_exc
1: 2020-12-01    3    1    3    count    7    4   57    43    14   43     75     25
2: 2020-12-01 3800 1200 4200    value 9200 5000   54    41    13   46     76     24
3: 2020-12-07    0    1    2    count    3    1   33     0    33   67      0    100
4: 2020-12-07    0 1600 3200    value 4800 1600   33     0    33   67      0    100

I then add the missing dates, by merging with all combination of linetype and Date and keeping all rows. I use the CJ function to create a data.table with all combinations of the two variables:

CJ(Date = seq(min(dt$Date),max(dt$Date),1),linetype = c("count","value"))
          Date linetype
 1: 2020-12-01    count
 2: 2020-12-01    value
 3: 2020-12-02    count
 4: 2020-12-02    value
 5: 2020-12-03    count
 6: 2020-12-03    value
 7: 2020-12-04    count
 8: 2020-12-04    value
 9: 2020-12-05    count
10: 2020-12-05    value
11: 2020-12-06    count
12: 2020-12-06    value
13: 2020-12-07    count
14: 2020-12-07    value

And then replace missing values with 0 and supress the linetype variable.

You can then reorder the columns with setcolorder, and use kabbleExtra (see here) to produce your html output.

You can do the same with dplyr, using pivot_wider to transform to wide, mutate_all instead of the lapply(.SD,...) to do your calculations, expand.grid instead of CJ to generate the table of missing dates.

Solution 2 :

This was a lot of busy work. It is basically everything you said in base R grouped by the date in . Note, I’m not sure if the prices are accurate in OP or if I had other issues.

dt[, Date := as.POSIXct(Date, "UTC")]
dt[,
   {
     t_ss_rt = Type == 'SS_RT'
     Type_A = sum(t_ss_rt)
     Type_B = .N - Type_A
     
     tot_Price = sum(Price)
     Type_A_price = sum(Price[t_ss_rt])
     Type_B_price = tot_Price - Type_A_price
     
     rm_ss_rt = t_ss_rt & Remark == 'AT_1_O'
     Type_A1 = sum(rm_ss_rt)
     Type_A2 = Type_A - Type_A1
     
     tot_An_Price = sum(Price[t_ss_rt])
     Type_A1_Price = sum(Price[rm_ss_rt])
     Type_A2_Price = tot_An_Price - Type_A1_Price
     
     Type_A1_Excl = Type_A1 / (Type_A1 + Type_A2)
     Type_A2_Excl = Type_A2 / (Type_A1 + Type_A2)
     
     .(c(Type_A, Type_A_price), c(Type_A / .N, Type_A_price / tot_Price),
       c(Type_A1, Type_A1_Price), c(Type_A1 / .N, Type_A1_Price / tot_Price),
       c(Type_A2, Type_A2_Price), c(Type_A2 / .N, Type_A2_Price / tot_Price),
       c(Type_B, Type_B_price), c(Type_B / .N, NA_real_), c(.N, tot_Price),
       c(Type_A1_Excl, Type_A1_Price / (tot_An_Price)), c(Type_A2_Excl, Type_A2_Price / tot_An_Price))
   },
   by = .(Date)]

For these results:

                  Date   V1        V2   V3        V4   V5        V6   V7        V8   V9  V10  V11
1: 2020-12-01 10:12:14    4 0.5714286    3 0.4285714    1 0.1428571    3 0.4285714    7 0.75 0.25
2: 2020-12-01 10:12:14 5000 0.5434783 3800 0.4130435 1200 0.1304348 4200        NA 9200 0.76 0.24
3: 2020-12-07 10:12:14    1 0.3333333    0 0.0000000    1 0.3333333    2 0.6666667    3 0.00 1.00
4: 2020-12-07 10:12:14 1600 0.3333333    0 0.0000000 1600 0.3333333 3200        NA 4800 0.00 1.00

Data source:

library(data.table)

dt = data.table::fread(
"ID   ,     Date      ,            Type ,      Remark,      Price
PRT-11,    2020-12-01 10:12:14,   SS_RT,      AT_1_O   ,   1000
PRT-11,    2020-12-01 10:12:14,   SS_RT ,     AT_1_O  ,    1200
PRT-11,    2020-12-01 10:12:14,   SS_RT  ,    AT_1_O ,     1600
PRT-11,    2020-12-01 10:12:14,   SS_RG   ,   AT_1_A,      1600
PRT-11,    2020-12-01 10:12:14,   SS_RG    ,  AT_1_B     , 1600
PRT-11,    2020-12-01 10:12:14,   SS_RG     , AT_1_C    ,  1000
PRT-11,    2020-12-01 10:12:14,   SS_RT,      AT_1_Y   ,   1200
PRT-11,    2020-12-07 10:12:14,   SS_RT ,     AT_1_U  ,    1600
PRT-11,    2020-12-07 10:12:14,   SS_RI  ,    AT_1_M ,     1600
PRT-11,    2020-12-07 10:12:14,   SS_RO   ,   AT_1_P,      1600")

Problem :

I have created the below-mentioned data frame in R.

My_DF

ID        Date                  Type       Remark      Price
PRT-11    2020-12-01 10:12:14   SS_RT      AT_1_O      1000
PRT-11    2020-12-01 10:12:14   SS_RT      AT_1_O      1200
PRT-11    2020-12-01 10:12:14   SS_RT      AT_1_O      1600
PRT-11    2020-12-01 10:12:14   SS_RG      AT_1_A      1600
PRT-11    2020-12-01 10:12:14   SS_RG      AT_1_B      1600
PRT-11    2020-12-01 10:12:14   SS_RG      AT_1_C      1000
PRT-11    2020-12-01 10:12:14   SS_RT      AT_1_Y      1200
PRT-11    2020-12-07 10:12:14   SS_RT      AT_1_U      1600
PRT-11    2020-12-07 10:12:14   SS_RI      AT_1_M      1600
PRT-11    2020-12-07 10:12:14   SS_RO      AT_1_P      1600

I want to covert the above-mentioned DF in the following structure Dataframe and convert it in HTML format which can be used to send email using mailR library.

enter image description here

Where I have followed the following condition.

  • If Type is equal to SS_RT then it is Type - A

  • If Type is anything other than SS_RT then it is Type - B

  • If Type is equal to SS_RT with Remark is equal to AT_1_O then it is Type - A1

  • If Type is equal to SS_RT with Remark other than AT_1_O then it is Type - A2

  • The Formula for Type - A1 (Excl) is Type - A1 divided by the sum of Type - A1and Type - A2

  • The Formula for Type - A1 (Excl) is Type - A2 divided by the sum of Type - A1and Type - A2

Rest all the %age formula are pretty straightforward by Total in the denominator.

In the data frame, it is possible the there is no entry for a particular date. For that, we need to ensure that of all the available dates we need to take min and max date and ensure that for that date which is not available we show value as 0 in bother count and sum column.

I have merged the date in two rows the first row is used for the count and the second one is for the sum group by their logical defination.

Comments

Comment posted by Cole

Hi. Could you confirm that the first total price should be 4000? My calculation shows that it should be 5000 (e.g., 1000 + 1200 + 1600 + 1200). Also, your expected output includes the dates by every month. The provided dataset is only 6 days a part. I feel like there are already answers for completing datasets but you should provide details – what’s the sequence of dates?

Comment posted by Viper

@Cole: Your calculation is correct, it should be 5000. Also, the sequance of date should be in descending order considering the minimum and maximum date. If in dataset the Min Date is 1st Dec and Max Date is 10 Dec and entry for 2,4,6 Dec is not present in dataset then we need to create a blank entry for that.

Comment posted by Cole

Thanks for clarifying. I do not follow the blank entries. Why would December 8 not be created?

Comment posted by Viper

@Cole: So that users get to know that for a particular date there was no entry so that they can check accordingly.

Comment posted by Viper

Thanks, How can I merge the same date in HTML output?

Comment posted by cran.r-project.org/web/packages/kableExtra/vignettes/…

you can use

Comment posted by Viper

Getting Error

Comment posted by denis

yep, dt must be a

Comment posted by Viper

Getting some random error at

By