Still experimental and may change. This form cannot pass any arguments to ]codeCache, such as cacheRepo, thus it is of limited utility. However, it is a clean alternative for simple cases.

Cache(FUN, ..., notOlderThan = NULL, .objects = NULL,
  outputObjects = NULL, algo = "xxhash64", cacheRepo = NULL,
  length = getOption("reproducible.length", Inf),
  compareRasterFileLength, userTags = c(), digestPathContent,
  omitArgs = NULL, classOptions = list(), debugCache = character(),
  sideEffect = FALSE, makeCopy = FALSE,
  quick = getOption("reproducible.quick", FALSE),
  verbose = getOption("reproducible.verbose", 0), cacheId = NULL,
  useCache = getOption("reproducible.useCache", TRUE),
  showSimilar = getOption("reproducible.showSimilar", FALSE))

# S4 method for ANY
Cache(FUN, ..., notOlderThan = NULL, .objects = NULL,
  outputObjects = NULL, algo = "xxhash64", cacheRepo = NULL,
  length = getOption("reproducible.length", Inf),
  compareRasterFileLength, userTags = c(), digestPathContent,
  omitArgs = NULL, classOptions = list(), debugCache = character(),
  sideEffect = FALSE, makeCopy = FALSE,
  quick = getOption("reproducible.quick", FALSE),
  verbose = getOption("reproducible.verbose", 0), cacheId = NULL,
  useCache = getOption("reproducible.useCache", TRUE),
  showSimilar = getOption("reproducible.showSimilar", FALSE))

lhs %<% rhs

Arguments

FUN

Either a function or an unevaluated function call (e.g., using quote.

...

Arguments of FUN function .

notOlderThan

load an artifact from the database only if it was created after notOlderThan.

.objects

Character vector of objects to be digested. This is only applicable if there is a list, environment (or similar) named objects within it. Only this/these objects will be considered for caching, i.e., only use a subset of the list, environment or similar objects.

outputObjects

Optional character vector indicating which objects to return. This is only relevant for list, environment (or similar) objects

algo

The algorithms to be used; currently available choices are md5, which is also the default, sha1, crc32, sha256, sha512, xxhash32, xxhash64 and murmur32.

cacheRepo

A repository used for storing cached objects. This is optional if Cache is used inside a SpaDES module.

length

Numeric. If the element passed to Cache is a Path class object (from e.g., asPath(filename)) or it is a Raster with file-backing, then this will be passed to digest::digest, essentially limiting the number of bytes to digest (for speed). This will only be used if quick = FALSE. Default is getOption("reproducible.length"), which is set to Inf.

compareRasterFileLength

Being deprecated; use length.

userTags

A character vector with Tags. These Tags will be added to the repository along with the artifact.

digestPathContent

Being deprecated. Use quick.

omitArgs

Optional character string of arguments in the FUN to omit from the digest.

classOptions

Optional list. This will pass into .robustDigest for specific classes. Should be options that the .robustDigest knows what to do with.

debugCache

Character or Logical. Either "complete" or "quick" (uses partial matching, so "c" or "q" work). TRUE is equivalent to "complete". If "complete", then the returned object from the Cache function will have two attributes, debugCache1 and debugCache2, which are the entire list(...) and that same object, but after all .robustDigest calls, at the moment that it is digested using digest, respectively. This attr(mySimOut, "debugCache2") can then be compared to a subsequent call and individual items within the object attr(mySimOut, "debugCache1") can be compared. If "quick", then it will return the same two objects directly, without evalutating the FUN(...).

sideEffect

Logical or path. Determines where the function will look for new files following function completion. See Details. NOTE: this argument is experimental and may change in future releases.

makeCopy

Logical. If sideEffect = TRUE, and makeCopy = TRUE, a copy of the downloaded files will be made and stored in the cacheRepo to speed up subsequent file recovery in the case where the original copy of the downloaded files are corrupted or missing. Currently only works when set to TRUE during the first run of Cache. Default is FALSE. NOTE: this argument is experimental and may change in future releases.

quick

Logical. If TRUE, little or no disk-based information will be assessed, i.e., mostly its memory content. This is relevant for objects of class character, Path and Raster currently. For class character, it is ambiguous whether this represents a character string or a vector of file paths. The function will assess if it is a path to a file or directory first. If not, it will treat the object as a character string. If it is known that character strings should not be treated as paths, then quick = TRUE will be much faster, with no loss of information. If it is file or directory, then it will digest the file content, or basename(object). For class Path objects, the file's metadata (i.e., filename and file size) will be hashed instead of the file contents if quick = TRUE. If set to FALSE (default), the contents of the file(s) are hashed. If quick = TRUE, length is ignored. Raster objects are treated as paths, if they are file-backed.

verbose

Numeric, with 0 being off, 1 being a little, 2 being more verbose etc. Above 1 will output much more information about the internals of Caching, which may help diagnose Caching challenges.

cacheId

Character string. If passed, this will override the calculated hash of the inputs, and return the result from this cacheId in the cacheRepo. Setting this is equivalent to manually saving the output of this function, i.e., the object will be on disk, and will be recovered in subsequent This may help in some particularly finicky situations where Cache is not correctly detecting unchanged inputs. This will guarantee the object will be identical each time; this may be useful in operational code.

useCache

Logical or "overwrite" or "devMode". See details.

showSimilar

A logical or numeric. Useful for debugging. If TRUE or 1, then if the Cache does not find an identical archive in the cacheRepo, it will report (via message) the next most similar archive, and indicate which argument(s) is/are different. If a number larger than 1, then it will report the N most similar archived objects.

lhs

A name to assign to.

rhs

A function call

Value

As with cache, returns the value of the function call or the cached version (i.e., the result from a previous call to this same cached function with identical arguments).

Details

Caching R objects using cache has five important limitations:

  1. the archivist package detects different environments as different;

  2. it also does not detect S4 methods correctly due to method inheritance;

  3. it does not detect objects that have file-base storage of information (specifically RasterLayer-class objects);

  4. the default hashing algorithm is relatively slow.

  5. heavily nested function calls may want Cache arguments to propagate through

This version of the Cache function accommodates those four special, though quite common, cases by:

  1. converting any environments into list equivalents;

  2. identifying the dispatched S4 method (including those made through inheritance) before hashing so the correct method is being cached;

  3. by hashing the linked file, rather than the Raster object. Currently, only file-backed Raster* objects are digested (e.g., not ff objects, or any other R object where the data are on disk instead of in RAM);

  4. Uses digest (formerly fastdigest, which does not translate between operating systems). This is used for file-backed objects as well.

  5. Cache will save arguments passed by user in a hidden environment. Any nested Cache functions will use arguments in this order 1) actual arguments passed at each Cache call, 2) any inherited arguments from an outer Cache call, 3) the default values of the Cache function. See section on Nested Caching.

If Cache is called within a SpaDES module, then the cached entry will automatically get 3 extra userTags: eventTime, eventType, and moduleName. These can then be used in clearCache to selectively remove cached objects by eventTime, eventType or moduleName.

Cache will add a tag to the artifact in the database called accessed, which will assign the time that it was accessed, either read or write. That way, artifacts can be shown (using showCache) or removed (using clearCache) selectively, based on their access dates, rather than only by their creation dates. See example in clearCache. Cache (uppercase C) is used here so that it is not confused with, and does not mask, the archivist::cache function.

Note

As indicated above, several objects require pre-treatment before caching will work as expected. The function .robustDigest accommodates this. It is an S4 generic, meaning that developers can produce their own methods for different classes of objects. Currently, there are methods for several types of classes. See .robustDigest.

See .robustDigest for other specifics for other classes.

Nested Caching

Commonly, Caching is nested, i.e., an outer function is wrapped in a Cache function call, and one or more inner functions are also wrapped in a Cache function call. A user can always specify arguments in every Cache function call, but this can get tedious and can be prone to errors. The normal way that R handles arguments is it takes the user passed arguments if any, and default arguments for all those that have no user passed arguments. We have inserted a middle step. The order or precedence for any given Cache function call is 1. user arguments, 2. inherited arguments, 3. default arguments. At this time, the top level Cache arguments will propagate to all inner functions unless each individual Cache call has other arguments specified, i.e., "middle" nested Cache function calls don't propagate their arguments to further "inner" Cache function calls. See example.

userTags is unique of all arguments: its values will be appended to the inherited userTags.

Caching Speed

Caching speed may become a critical aspect of a final product. For example, if the final product is a shiny app, rerunning the entire project may need to take less then a few seconds at most. There are 3 arguments that affect Cache speed: quick, length, and algo. quick is passed to .robustDigest, which currently only affects Path and Raster* class objects. In both cases, quick means that little or no disk-based information will be assessed.

Filepaths

If a function has a path argument, there is some ambiguity about what should be done. Possibilities include:

  1. hash the string as is (this will be very system specific, meaning a Cache call will not work if copied between systems or directories);

  2. hash the basename(path);

  3. hash the contents of the file.

If paths are passed in as is (i.e,. character string), the result will not be predictable. Instead, one should use the wrapper function asPath(path), which sets the class of the string to a Path, and one should decide whether one wants to digest the content of the file (using quick = FALSE), or just the filename ((quick = TRUE)). See examples.

Stochasticity

In general, it is expected that caching will only be used when stochasticity is not relevant, or if a user has achieved sufficient stochasticity (e.g., via sufficient number of calls to experiment) such that no new explorations of stochastic outcomes are required. It will also be very useful in a reproducible workflow.

useCache

If FALSE, then the entire Caching mechanism is bypassed and the function is evaluated as if it was not being Cached. Default is getOption("reproducible.useCache")), which is TRUE by default, meaning use the Cache mechanism. This may be useful to turn all Caching on or off in very complex scripts and nested functions.

If "overwrite" (which can be set with options("reproducible.useCache" = "overwrite")), then the function invoke the caching mechanism but will purge any entry that is matched, and it will be replaced with the results of the current call.

If "devMode": The point of this mode is to facilitate using the Cache when functions and datasets are continually in flux, and old Cache entries are likely stale very often. In `devMode`, the cache mechanism will work as normal if the Cache call is the first time for a function OR if it successfully finds a copy in the cache based on the normal Cache mechanism. It *differs* from the normal Cache if the Cache call does *not* find a copy in the `cacheRepo`, but it does find an entry that matches based on `userTags`. In this case, it will delete the old entry in the `cacheRepo` (identified based on matching `userTags`), then continue with normal `Cache`. For this to work correctly, `userTags` must be unique for each function call. This should be used with caution as it is still experimental. Currently, if userTags are not unique to a single entry in the cacheRepo, it will default to the behaviour of useCache = TRUE with a message. This means that "devMode" is most useful if used from the start of a project.

sideEffect

If sideEffect is not FALSE, then metadata about any files that added to sideEffect will be added as an attribute to the cached copy. Subsequent calls to this function will assess for the presence of the new files in the sideEffect location. If the files are identical (quick = FALSE) or their file size is identical (quick = TRUE), then the cached copy of the function will be returned (and no files changed). If there are missing or incorrect files, then the function will re-run. This will accommodate the situation where the function call is identical, but somehow the side effect files were modified. If sideEffect is logical, then the function will check the cacheRepo; if it is a path, then it will check the path. The function will assess whether the files to be downloaded are found locally prior to download. If it fails the local test, then it will try to recover from a local copy if (makeCopy had been set to TRUE the first time the function was run. Currently, local recovery will only work ifmakeCOpy was set to TRUE the first time Cache was run). Default is FALSE.

See also

Examples

tmpDir <- file.path(tempdir()) # Basic use ranNumsA <- Cache(rnorm, 10, 16, cacheRepo = tmpDir) # All same ranNumsB <- Cache(rnorm, 10, 16, cacheRepo = tmpDir) # recovers cached copy
#> loading cached result from previous rnorm call, adding to memoised copy
ranNumsC <- Cache(cacheRepo = tmpDir) %C% rnorm(10, 16) # recovers cached copy
#> loading memoised result from previous 'rnorm' pipe sequence call.
ranNumsD <- Cache(quote(rnorm(n = 10, 16)), cacheRepo = tmpDir) # recovers cached copy
#> loading memoised result from previous rnorm call.
############################################### # experimental devMode ############################################### opt <- options("reproducible.useCache" = "devMode") clearCache(tmpDir, ask = FALSE) centralTendency <- function(x) mean(x) funnyData <- c(1,1,1,1,10) uniqueUserTags <- c("thisIsUnique", "reallyUnique") ranNumsB <- Cache(centralTendency, funnyData, cacheRepo = tmpDir, userTags = uniqueUserTags) # sets new value to Cache showCache(tmpDir) # 1 unique artifact -- cacheId is 8be9cf2a072bdbb0515c5f0b3578f474
#> Cache size:
#> Total (including Rasters): 246 bytes
#> Selected objects (not including Rasters): 246 bytes
#> artifact tagKey #> 1: 49cde26ceb1f40fa2a552450f35281ee format #> 2: 49cde26ceb1f40fa2a552450f35281ee name #> 3: 49cde26ceb1f40fa2a552450f35281ee class #> 4: 49cde26ceb1f40fa2a552450f35281ee date #> 5: 49cde26ceb1f40fa2a552450f35281ee cacheId #> 6: 49cde26ceb1f40fa2a552450f35281ee thisIsUnique #> 7: 49cde26ceb1f40fa2a552450f35281ee reallyUnique #> 8: 49cde26ceb1f40fa2a552450f35281ee function #> 9: 49cde26ceb1f40fa2a552450f35281ee object.size #> 10: 49cde26ceb1f40fa2a552450f35281ee accessed #> 11: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 12: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 13: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 14: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 15: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 16: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 17: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 18: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 19: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 20: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 21: 49cde26ceb1f40fa2a552450f35281ee otherFunctions #> 22: 49cde26ceb1f40fa2a552450f35281ee preDigest #> 23: 49cde26ceb1f40fa2a552450f35281ee preDigest #> artifact tagKey #> tagValue createdDate #> 1: rda 2019-03-18 10:05:55 #> 2: 49cde26ceb1f40fa2a552450f35281ee 2019-03-18 10:05:55 #> 3: numeric 2019-03-18 10:05:55 #> 4: 2019-03-18 10:05:55 2019-03-18 10:05:55 #> 5: 71cd24ec3b0d0cac 2019-03-18 10:05:55 #> 6: thisIsUnique 2019-03-18 10:05:55 #> 7: reallyUnique 2019-03-18 10:05:55 #> 8: centralTendency 2019-03-18 10:05:55 #> 9: 984 2019-03-18 10:05:55 #> 10: 2019-03-18 10:05:55 2019-03-18 10:05:55 #> 11: saveRDS 2019-03-18 10:05:55 #> 12: do.call 2019-03-18 10:05:55 #> 13: build_site_local 2019-03-18 10:05:55 #> 14: build_reference 2019-03-18 10:05:55 #> 15: data_reference_topic 2019-03-18 10:05:55 #> 16: as_data 2019-03-18 10:05:55 #> 17: as_data.tag_examples 2019-03-18 10:05:55 #> 18: timing_fn 2019-03-18 10:05:55 #> 19: handle 2019-03-18 10:05:55 #> 20: try 2019-03-18 10:05:55 #> 21: withVisible 2019-03-18 10:05:55 #> 22: x:e4aa8de28dc6c1bb 2019-03-18 10:05:55 #> 23: .FUN:d5f5f91cbb662db9 2019-03-18 10:05:55 #> tagValue createdDate
# During development, we often redefine function internals centralTendency <- function(x) median(x) # When we rerun, we don't want to keep the "old" cache because the function will # never again be defined that way. Here, because of userTags being the same, # it will replace the entry in the Cache, effetively overwriting it, even though # it has a different cacheId ranNumsD <- Cache(centralTendency, funnyData, cacheRepo = tmpDir, userTags = uniqueUserTags)
#> This call to cache differs from the next closest due to:
#> ... different .FUN
#> Overwriting Cache entry with userTags: 'thisIsUnique, reallyUnique, function:centralTendency'
showCache(tmpDir) # 1 unique artifact -- cacheId is bb1195b40c8d37a60fd6004e5d526e6b
#> Cache size:
#> Total (including Rasters): 246 bytes
#> Selected objects (not including Rasters): 246 bytes
#> artifact tagKey #> 1: ba72850f0c54ec67f8a00614cd17f365 format #> 2: ba72850f0c54ec67f8a00614cd17f365 name #> 3: ba72850f0c54ec67f8a00614cd17f365 class #> 4: ba72850f0c54ec67f8a00614cd17f365 date #> 5: ba72850f0c54ec67f8a00614cd17f365 cacheId #> 6: ba72850f0c54ec67f8a00614cd17f365 thisIsUnique #> 7: ba72850f0c54ec67f8a00614cd17f365 reallyUnique #> 8: ba72850f0c54ec67f8a00614cd17f365 function #> 9: ba72850f0c54ec67f8a00614cd17f365 object.size #> 10: ba72850f0c54ec67f8a00614cd17f365 accessed #> 11: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 12: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 13: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 14: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 15: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 16: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 17: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 18: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 19: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 20: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 21: ba72850f0c54ec67f8a00614cd17f365 otherFunctions #> 22: ba72850f0c54ec67f8a00614cd17f365 preDigest #> 23: ba72850f0c54ec67f8a00614cd17f365 preDigest #> artifact tagKey #> tagValue createdDate #> 1: rda 2019-03-18 10:05:56 #> 2: ba72850f0c54ec67f8a00614cd17f365 2019-03-18 10:05:56 #> 3: numeric 2019-03-18 10:05:56 #> 4: 2019-03-18 10:05:56 2019-03-18 10:05:56 #> 5: 632cd06f30e111be 2019-03-18 10:05:56 #> 6: thisIsUnique 2019-03-18 10:05:56 #> 7: reallyUnique 2019-03-18 10:05:56 #> 8: centralTendency 2019-03-18 10:05:56 #> 9: 984 2019-03-18 10:05:56 #> 10: 2019-03-18 10:05:56 2019-03-18 10:05:56 #> 11: saveRDS 2019-03-18 10:05:56 #> 12: do.call 2019-03-18 10:05:56 #> 13: build_site_local 2019-03-18 10:05:56 #> 14: build_reference 2019-03-18 10:05:56 #> 15: data_reference_topic 2019-03-18 10:05:56 #> 16: as_data 2019-03-18 10:05:56 #> 17: as_data.tag_examples 2019-03-18 10:05:56 #> 18: timing_fn 2019-03-18 10:05:56 #> 19: handle 2019-03-18 10:05:56 #> 20: try 2019-03-18 10:05:56 #> 21: withVisible 2019-03-18 10:05:56 #> 22: x:e4aa8de28dc6c1bb 2019-03-18 10:05:56 #> 23: .FUN:af11d20d957667d9 2019-03-18 10:05:56 #> tagValue createdDate
# If it finds it by cacheID, doesn't matter what the userTags are ranNumsD <- Cache(centralTendency, funnyData, cacheRepo = tmpDir, userTags = "thisIsUnique")
#> loading cached result from previous centralTendency call, adding to memoised copy
options(opt) # For more in depth uses, see vignette
# NOT RUN { browseVignettes(package = "reproducible") # }
# Equivalent a <- Cache(rnorm, 1)
#> loading cached result from previous rnorm call, adding to memoised copy
b %<% rnorm(1)
#> loading memoised result from previous rnorm call.