Solution 1 :

So, apparently the bug above has the following explanation:

  • The outputs shown below the chunk codes in R.Studio(.Rmd) are based on the data held in Global Environment.
  • The Knitted HTML, on the contrary, is rendered by running the script from .Rmd.

Normally it shouldn’t pose a problem. But if some code chunks are with eval=FALSE to skip the repeated lengthy execution (in my case, data imputation using MICE), then there’s imputed data in Global Environment and non-imputed data being knitted. So, the models in knitted HTML are run on incomplete set of data and are all off.

Before receiving the suggestion with cache=TRUE, I found another workaround, which is doing all required transformations and imputations once, then saving the data with a new code chunk, then setting EVAL=FALSE for this chunk and the chunks above that no longer have to be run (even though some of them still have to be shown).
Then, I import the treated data in a hidden chunk (eval=TRUE, include=FALSE) and run the rest of the training, etc. While technically it’s not the best in terms of reproducibility, it saved my neck and computation time.

Problem :

I am sorry in advance if this question sounds stupid: I am at ease with R but relatively new to Markdown. I realize that Markdown .Rmd script is meant to be reproducible, so whatever is in the Markdown script has to come from it and not Global Environment or other script. I have done a tedious work of copying my very long intial .R script ito .Rmd, with explanations, like a report. My problem is the following: after running the code in .Rmd script I get the outputs below each chunk. I then Knit it, and the outputs in HTML document are not the same. The essentials are the same, but the model summaries are not. I simply cannot understand why.
I have of course tried restarting R Studio, cleaning up Global Environment and starting again from blank script. The tooth-grinding problem is, my script is long and some chunks are heavy (like imputation of missing data using MICE). So every time I have a problem and I have to re-compute everything, it’s a very long coffee break.

While I cannot include the code for this reason, I still hope very much that someone has encountered this problem before and can share their experience. I particularly want to know what happens if you leave some chunks {r eval=FALSE} and run them manually for the first time only. Could this be a source of the problems? If so, how do you guys Knit long computation-heavy scripts?

Thanks very much in advance.

P.S. After throwing this bottle into the sea, I’ll go and try splitting my script into few scripts to pinpoint the problem (and to be able to include the part that causes the problem).

Comments

Comment posted by Bruno

I did not understand your problem, it is probably some render output setting but let me save you from caffeine addiction you can set a chunk with cache = TRUE, and it should be skipped if nothing changed, if that still does not save you enough time look into the DALEX package

Comment posted by Ekaterina Kinsht

Thanks for the tip, sounds like cache=TRUE will be a great time saver.

By