Saves a wide variety function call outputs to disk and optionally RAM, for recovery later

maturing

A function that can be used to wrap around other functions to cache function calls for later use. This is normally most effective when the function to cache is slow to run, yet the inputs and outputs are small. The benefit of caching, therefore, will decline when the computational time of the "first" function call is fast and/or the argument values and return objects are large. The default setting (and first call to Cache) will always save to disk. The 2nd call to the same function will return from disk, unless options("reproducible.useMemoise" = TRUE), then the 2nd time will recover the object from RAM and is normally much faster (at the expense of RAM use).

Cache(
  FUN,
  ...,
  dryRun = getOption("reproducible.dryRun", FALSE),
  notOlderThan = NULL,
  .objects = NULL,
  .cacheExtra = NULL,
  .functionName = NULL,
  .cacheChaining = getOption("reproducible.cacheChaining", NULL),
  outputObjects = NULL,
  algo = "xxhash64",
  cachePath = NULL,
  length = getOption("reproducible.length", Inf),
  userTags = c(),
  omitArgs = NULL,
  classOptions = list(),
  debugCache = character(),
  quick = getOption("reproducible.quick", FALSE),
  verbose = getOption("reproducible.verbose", 1),
  cacheId = NULL,
  cacheSaveFormat = getOption("reproducible.cacheSaveFormat"),
  useCache = getOption("reproducible.useCache", TRUE),
  useCloud = getOption("reproducible.useCloud", FALSE),
  cloudFolderID = getOption("reproducible.cloudFolderID", NULL),
  showSimilar = getOption("reproducible.showSimilar", FALSE),
  drv = getOption("reproducible.drv", NULL),
  conn = getOption("reproducible.conn", NULL)
)

cache2(
  FUN,
  ...,
  dryRun = getOption("reproducible.dryRun", FALSE),
  notOlderThan = NULL,
  .objects = NULL,
  .cacheExtra = NULL,
  .functionName = NULL,
  .cacheChaining = getOption("reproducible.cacheChaining", NULL),
  outputObjects = NULL,
  algo = "xxhash64",
  cachePath = NULL,
  length = getOption("reproducible.length", Inf),
  userTags = c(),
  omitArgs = NULL,
  classOptions = list(),
  debugCache = character(),
  quick = getOption("reproducible.quick", FALSE),
  verbose = getOption("reproducible.verbose", 1),
  cacheId = NULL,
  cacheSaveFormat = getOption("reproducible.cacheSaveFormat"),
  useCache = getOption("reproducible.useCache", TRUE),
  useCloud = getOption("reproducible.useCloud", FALSE),
  cloudFolderID = getOption("reproducible.cloudFolderID", NULL),
  showSimilar = getOption("reproducible.showSimilar", FALSE),
  drv = getOption("reproducible.drv", NULL),
  conn = getOption("reproducible.conn", NULL)
)

CacheV2(
  FUN,
  ...,
  notOlderThan = NULL,
  .objects = NULL,
  .cacheExtra = NULL,
  .functionName = NULL,
  outputObjects = NULL,
  algo = "xxhash64",
  cacheRepo = NULL,
  cachePath = NULL,
  length = getOption("reproducible.length", Inf),
  compareRasterFileLength,
  userTags = c(),
  omitArgs = NULL,
  classOptions = list(),
  debugCache = character(),
  makeCopy = FALSE,
  quick = getOption("reproducible.quick", FALSE),
  verbose = getOption("reproducible.verbose", 1),
  cacheId = NULL,
  useCache = getOption("reproducible.useCache", TRUE),
  useCloud = FALSE,
  cloudFolderID = NULL,
  showSimilar = getOption("reproducible.showSimilar", FALSE),
  drv = getDrv(getOption("reproducible.drv", NULL)),
  conn = getOption("reproducible.conn", NULL)
)

Arguments

FUN: Either a function (e.g., rnorm), a function call (e.g., rnorm(1)), or an unevaluated function call (e.g., using quote()).
...: Arguments passed to FUN, if FUN is not an expression.
dryRun: See reproducibleOptions.
notOlderThan: A time. Load an object from the Cache if it was created after this.
.objects: Character vector of objects to be digested. This is only applicable if there is a list, environment (or similar) with named objects within it. Only this/these objects will be considered for caching, i.e., only use a subset of the list, environment or similar objects. In the case of nested list-type objects, this will only be applied outermost first.
.cacheExtra: A an arbitrary R object that will be included in the CacheDigest, but otherwise not passed into the FUN. If the user supplies a named list, then Cache will report which individual elements of .cacheExtra have changed when options("reproducible.showSimilar" = TRUE). This can allow a user more control and understanding for debugging.
.functionName: A an arbitrary character string that provides a name that is different than the actual function name (e.g., "rnorm") which will be used for messaging. This can be useful when the actual function is not helpful for a user, such as do.call.
.cacheChaining: A logical or a the name of a function. If TRUE, then the current Cache call will evaluate the function "outside" the Cache call (via sys.function(-1)) and attach the digest of that outer function to the entry for this Cache call. This will then be used by any subsequent Cache call within the same function. If the outer function is unchanged, and there is one or more objects that had been returned by a previous Cache call, then those objects will not be digested; rather their cacheId tag will be used in place of a new digest. This should cause no change in Caching outcomes, and it should be faster in cases where there are several Cache calls within the same function. If FALSE (current default), then this feature is not used. If set to NULL (i.e., unset, the current default), then it will not use cache chaining, but it will attach more information to the Cache entries for each cacheId, as well as new entries for "surroundingFunction" digest, so that if a user switches to .cacheChaining = TRUE, then it will be able to begin using cache chaining without needing to rerun the calls again. Can be set by an option.
outputObjects: Optional character vector indicating which objects to return. This is only relevant for list, environment (or similar) objects
algo: The digest algorithm to use. Default xxhash64 (see digest::digest() for others).
cachePath: A repository used for storing cached objects. This is optional if Cache is used inside a SpaDES module.
length: Numeric. If the element passed to Cache is a Path class object (from e.g., asPath(filename)) or it is a Raster with file-backing, then this will be passed to digest::digest, essentially limiting the number of bytes to digest (for speed). This will only be used if quick = FALSE. Default is getOption("reproducible.length"), which is set to Inf.
userTags: A character vector with descriptions of the Cache function call. These will be added to the Cache so that this entry in the Cache can be found using userTags e.g., via showCache().
omitArgs: Optional. A character vector of argument names in FUN to omit from the cache digest, or TRUE to omit every captured argument (the digest is then based on FUN itself – including its body, so a meaningful edit to the function source still busts the cache – and on .cacheExtra). Useful when the developer wants the cache to be insensitive to the function's inputs and pin freshness via .cacheExtra instead.
classOptions: Optional list. This will pass into .robustDigest for specific classes. Should be options that the .robustDigest knows what to do with.
debugCache: Character or Logical. Either "complete" or "quick" (uses partial matching, so "c" or "q" work). TRUE is equivalent to "complete". If "complete", then the returned object from the Cache function will have two attributes, debugCache1 and debugCache2, which are the entire list(...) and that same object, but after all .robustDigest calls, at the moment that it is digested using digest, respectively. This attr(mySimOut, "debugCache2") can then be compared to a subsequent call and individual items within the object attr(mySimOut, "debugCache1") can be compared. If "quick", then it will return the same two objects directly, without evalutating the FUN(...).
quick: Logical or character. If TRUE, no disk-based information will be assessed, i.e., only memory content. See Details section about quick in Cache().
verbose: Numeric, -1 silent (where possible), 0 being very quiet, 1 showing more messaging, 2 being more messaging, etc. Default is 1. Above 3 will output much more information about the internals of Caching, which may help diagnose Caching challenges. Can set globally with an option, e.g., options('reproducible.verbose' = 0) to reduce to minimal
cacheId: Character string. If passed, this will override the calculated hash of the inputs, and return the result from this cacheId in the cachePath. Setting this is equivalent to manually saving the output of this function, i.e., the object will be on disk, and will be recovered in subsequent This may help in some particularly finicky situations where Cache is not correctly detecting unchanged inputs. This will guarantee the object will be identical each time; this may be useful in operational code.
cacheSaveFormat: Character string: currently either qs or rds. Defaults to getOption("reproducible.cacheSaveFormat"). qs may be faster but appears to have narrower range of conditions that work; rds is safer, and may be slower.
useCache: Logical, numeric or "overwrite" or "devMode". See details.
useCloud: Logical (TRUE / FALSE / NULL) or one of "pull" / "push". See Details.
cloudFolderID: A googledrive dribble of a folder, e.g., using drive_mkdir(). If left as NULL, the function will create a cloud folder with name from last two folder levels of the cachePath path, : paste0(basename(dirname(cachePath)), "_", basename(cachePath)). This cloudFolderID will be added to options("reproducible.cloudFolderID"), but this will not persist across sessions. If this is a character string, it will treat this as a folder name to create or use on GoogleDrive.
showSimilar: A logical or numeric. Useful for debugging. If TRUE or 1, then if the Cache does not find an identical archive in the cachePath, it will report (via message) the next most recent similar archive, and indicate which argument(s) is/are different. If a number larger than 1, then it will report the N most recent similar archived objects.
drv: If using a database backend, drv must be an object that inherits from DBIDriver (e.g., RSQLite::SQLite).
conn: an optional DBIConnection object, as returned by dbConnect().
cacheRepo: Same as cachePath, but kept for backwards compatibility.
compareRasterFileLength: Being deprecated; use length.
makeCopy: Now deprecated. Ignored if used.

Value

Returns the value of the function call or the cached version (i.e., the result from a previous call to this same cached function with identical arguments).

Details

There are other similar functions in the R universe. This version of Cache has been used as part of a robust continuous workflow approach. As a result, we have tested it with many "non-standard" R objects (e.g., RasterLayer, Spat* objects) and environments (which are always unique, so do not cache readily).

This version of the Cache function accommodates those four special, though quite common, cases by:

converting any environments into list equivalents;
identifying the dispatched S4 method (including those made through inheritance) before hashing so the correct method is being cached;
by hashing the linked file, rather than the raster object. Currently, only file-backed Raster* or Spat* objects are digested (e.g., not ff objects, or any other R object where the data are on disk instead of in RAM);
Uses digest::digest() This is used for file-backed objects as well.
Cache will save arguments passed by user in a hidden environment. Any nested Cache functions will use arguments in this order: 1) actual arguments passed at each Cache call; 2) any inherited arguments from an outer Cache call; 3) the default values of the Cache function. See section on Nested Caching.

Cache will add a tag to the entry in the cache database called accessed, which will assign the time that it was accessed, either read or write. That way, cached items can be shown (using showCache) or removed (using clearCache) selectively, based on their access dates, rather than only by their creation dates. See example in clearCache().

Note

As indicated above, several objects require pre-treatment before caching will work as expected. The function .robustDigest accommodates this. It is an S4 generic, meaning that developers can produce their own methods for different classes of objects. Currently, there are methods for several types of classes. See .robustDigest().

Nested Caching

Commonly, Caching is nested, i.e., an outer function is wrapped in a Cache function call, and one or more inner functions are also wrapped in a Cache function call. A user can always specify arguments in every Cache function call, but this can get tedious and can be prone to errors. The normal way that R handles arguments is it takes the user passed arguments if any, and default arguments for all those that have no user passed arguments. We have inserted a middle step. The order or precedence for any given Cache function call is

user arguments, 2. inherited arguments, 3. default arguments. At this time, the top level Cache arguments will propagate to all inner functions unless each individual Cache call has other arguments specified, i.e., "middle" nested Cache function calls don't propagate their arguments to further "inner" Cache function calls. See example.

userTags is unique of all arguments: its values will be appended to the inherited userTags.

quick

The quick argument is attempting to sort out an ambiguity with character strings: are they file paths or are they simply character strings. When quick = TRUE, Cache will treat these as character strings; when quick = FALSE, they will be attempted to be treated as file paths first; if there is no file, then it will revert to treating them as character strings. If user passes a character vector to this, then this will behave like omitArgs: quick = "file" will treat the argument "file" as character string.

The most often encountered situation where this ambiguity matters is in arguments about filenames: is the filename an input pointing to an object whose content we want to assess (e.g., a file-backed raster), or an output (as in saveRDS) and it should not be assessed. If only run once, the output file won't exist, so it will be treated as a character string. However, once the function has been run once, the output file will exist, and Cache(...) will assess it, which is incorrect. In these cases, the user is advised to use quick = "TheOutputFilenameArgument" to specify the argument whose content on disk should not be assessed, but whose character string should be assessed (distinguishing it from omitArgs = "TheOutputFilenameArgument", which will not assess the file content nor the character string).

This is relevant for objects of class character, Path and Raster currently. For class character, it is ambiguous whether this represents a character string or a vector of file paths. If it is known that character strings should not be treated as paths, then quick = TRUE is appropriate, with no loss of information. If it is file or directory, then it will digest the file content, or basename(object). For class Path objects, the file's metadata (i.e., filename and file size) will be hashed instead of the file contents if quick = TRUE. If set to FALSE (default), the contents of the file(s) are hashed. If quick = TRUE, length is ignored. Raster objects are treated as paths, if they are file-backed.

Caching Speed

Caching speed may become a critical aspect of a final product. For example, if the final product is a shiny app, rerunning the entire project may need to take less then a few seconds at most. There are 3 arguments that affect Cache speed: quick, length, and algo. quick is passed to .robustDigest, which currently only affects Path and Raster* class objects. In both cases, quick means that little or no disk-based information will be assessed.

Filepaths

If a function has a path argument, there is some ambiguity about what should be done. Possibilities include:

hash the string as is (this will be very system specific, meaning a Cache call will not work if copied between systems or directories);
hash the basename(path);
hash the contents of the file.

If paths are passed in as is (i.e,. character string), the result will not be predictable. Instead, one should use the wrapper function asPath(path), which sets the class of the string to a Path, and one should decide whether one wants to digest the content of the file (using quick = FALSE), or just the filename ((quick = TRUE)). See examples.

Stochasticity or randomness

In general, it is expected that caching will only be used when randomness is not desired, e.g., Cache(rnorm(1)) is unlikely to be useful in many cases. However, Cache captures the call that is passed to it, leaving all functions unevaluated. As a result Cache(glm, x ~ y, rnorm(1)) will not work as a means of forcing a new evaluation each time, as the rnorm(1) is not evaluated before the call is assessed against the cache database. To force a new call each time, evaluate the randomness prior to the Cache call, e.g., ran = rnorm(1) then pass this to .cacheExtra, e.g., Cache(glm, x ~ y, .cacheExtra = ran)

`drv` and `conn`

By default, drv uses an SQLite database. This can be sufficient for most cases. However, if a user has dozens or more cores making requests to the Cache database, it may be insufficient. A user can set up a different database backend, e.g., PostgreSQL that can handle multiple simultaneous read-write situations. See https://github.com/PredictiveEcology/SpaDES/wiki/Using-alternate-database-backends-for-Cache.

`useCache`

Logical or numeric. If FALSE or 0, then the entire Caching mechanism is bypassed and the function is evaluated as if it was not being Cached. Default is getOption("reproducible.useCache")), which is TRUE by default, meaning use the Cache mechanism. This may be useful to turn all Caching on or off in very complex scripts and nested functions. Increasing levels of numeric values will cause deeper levels of Caching to occur (though this may not work as expected in all cases). The following is no longer supported: Currently, only implemented in postProcess: to do both caching of inner cropInputs, projectInputs and maskInputs, and caching of outer postProcess, use useCache = 2; to skip the inner sequence of 3 functions, use useCache = 1. For large objects, this may prevent many duplicated save to disk events.

If useCache = "overwrite" (which can be set with options("reproducible.useCache" = "overwrite")), then the function invoke the caching mechanism but will purge any entry that is matched, and it will be replaced with the results of the current call.

If useCache = "devMode": The point of this mode is to facilitate using the Cache when functions and datasets are continually in flux, and old Cache entries are likely stale very often. In devMode, the cache mechanism will work as normal if the Cache call is the first time for a function OR if it successfully finds a copy in the cache based on the normal Cache mechanism. It differs from the normal Cache if the Cache call does not find a copy in the cachePath, but it does find an entry that matches based on userTags. In this case, it will delete the old entry in the cachePath (identified based on matching userTags), then continue with normal Cache. For this to work correctly, userTags must be unique for each function call. This should be used with caution as it is still experimental. Currently, if userTags are not unique to a single entry in the cachePath, it will default to the behaviour of useCache = TRUE with a message. This means that "devMode" is most useful if used from the start of a project.

`useCloud`

This is experimental and there are many conditions under which this is known to not work correctly. This is a way to store all or some of the local Cache in the cloud. Currently, the only cloud option is Google Drive, via googledrive. For this to work, the user must be or be able to be authenticated with googledrive::drive_auth. The principle behind this useCloud is that it will be a full or partial mirror of a local Cache. It is not intended to be used independently from a local Cache. To share objects that are in the Cloud with another person, it requires 2 steps. 1) share the cloudFolderID$id, which can be retrieved by getOption("reproducible.cloudFolderID")$id after at least one Cache call has been made. 2) The other user must then set their cacheFolderID in a Cache$..., reproducible.cloudFolderID = \"the ID here\"$ call or set their option manually options$\"reproducible.cloudFolderID\" = \"the ID here\"$.

If TRUE, then this Cache call will download (if local copy doesn't exist, but cloud copy does exist), upload (local copy does or doesn't exist and cloud copy doesn't exist), or will not download nor upload if object exists in both. If TRUE will be at least 1 second slower than setting this to FALSE, and likely even slower as the cloud folder gets large. If a user wishes to keep "high-level" control, set this to getOption("reproducible.useCloud", FALSE) or getOption("reproducible.useCloud", TRUE) (if the default behaviour should be FALSE or TRUE, respectively) so it can be turned on and off with this option. NOTE: This argument will not be passed into inner/nested Cache calls.)

Two character values are also accepted, intended for separating developer and user roles when sharing a cloud-cache folder:

"push" is equivalent to TRUE (developer role) – bidirectional; downloads on a cloud hit, uploads on a miss.
"pull" is read-only (user role) – downloads on a cloud hit, but never uploads. If the local cache already has the object, the cloud is not consulted at all (the Google Drive listing is deferred until after the local lookup fails). When neither local nor cloud has the object, the call falls back to a normal local-only Cache run.

Object attributes

Users should be cautioned that object attributes may not be preserved, especially in the case of objects that are file-backed, such as Raster or SpatRaster objects. If a user needs to keep attributes, they may need to manually re-attach them to the object after recovery. With the example of SpatRaster objects, saving to disk requires terra::wrap if it is a memory-backed object. When running terra::unwrap on this object, any attributes that a user had added are lost.

`sideEffect`

This feature is now deprecated. Do not use as it is ignored.

Author

Eliot McIntire

Examples

data.table::setDTthreads(2)
tmpDir <- tempdir()
opts <- options(reproducible.cachePath = tmpDir)

# Usage -- All below are equivalent; even where args are missing or provided,
#   Cache evaluates using default values, if these are specified in formals(FUN)
a <- list()
b <- list(fun = rnorm)
bbb <- 1
ee <- new.env(parent = emptyenv())
ee$qq <- bbb

a[[1]] <- Cache(rnorm(1)) # no evaluation prior to Cache
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Saved! Cache file: ca275879d5116967.rds; fn: rnorm
a[[2]] <- Cache(rnorm, 1) # no evaluation prior to Cache
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[3]] <- Cache(do.call, rnorm, list(1))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[4]] <- Cache(do.call(rnorm, list(1)))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[5]] <- Cache(do.call(b$fun, list(1)))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: b$fun, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous b$fun call
a[[6]] <- Cache(do.call, b$fun, list(1))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: do.call, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous do.call call
a[[7]] <- Cache(b$fun, 1)
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: b$fun, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous b$fun call
a[[8]] <- Cache(b$fun(1))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: $, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous $ call
a[[10]] <- Cache(quote(rnorm(1)))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[11]] <- Cache(stats::rnorm(1))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[12]] <- Cache(stats::rnorm, 1)
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[13]] <- Cache(rnorm(1, 0, get("bbb", inherits = FALSE)))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[14]] <- Cache(rnorm(1, 0, get("qq", inherits = FALSE, envir = ee)))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[15]] <- Cache(rnorm(1, bbb - bbb, get("bbb", inherits = FALSE)))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[16]] <- Cache(rnorm(sd = 1, 0, n = get("bbb", inherits = FALSE))) # change order
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call
a[[17]] <- Cache(rnorm(1, sd = get("ee", inherits = FALSE)$qq), mean = 0)
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call

# with base pipe -- this is put in quotes ('') because R version 4.0 can't understand this
#  if you are using R >= 4.1 or R >= 4.2 if using the _ placeholder,
#  then you can just use pipe normally
usingPipe1 <- "b$fun(1) |> Cache()"  # base pipe

# For long pipe, need to wrap sequence in { }, or else only last step is cached
usingPipe2 <-
  '{"bbb" |>
      parse(text = _) |>
      eval() |>
      rnorm()} |>
    Cache()'
a[[9]] <- eval(parse(text = usingPipe1)) # recovers cached copy
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: $, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous $ call
a[[18]] <- eval(parse(text = usingPipe2)) # recovers cached copy
#> There is an `eval` call in a chain of calls for Cache; 
#>   eval is evaluated before Cache which may be undesired. 
#>   Perhaps use `do.call` if the evaluation should not occur prior to Cache
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, ca275879d5116967.rds) ...
#> Loaded! Cached result from previous rnorm call

length(unique(a)) == 1 #  all same
#> [1] FALSE

### Pipe -- have to use { } or else only final function is Cached
b1a <- 'sample(1e5, 1) |> rnorm() |> Cache()'
b1b <- 'sample(1e5, 1) |> rnorm() |> Cache()'
b2a <- '{sample(1e5, 1) |> rnorm()} |> Cache()'
b2b <- '{sample(1e5, 1) |> rnorm()} |> Cache()'
b1a <- eval(parse(text = b1a))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Saved! Cache file: 72bf1c853b8a84d4.rds; fn: rnorm
b1b <- eval(parse(text = b1b))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Saved! Cache file: 1b1ed95a5a47bfc4.rds; fn: rnorm
b2a <- eval(parse(text = b2a))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Saved! Cache file: fd995d3fc9a1ad45.rds; fn: rnorm
b2b <- eval(parse(text = b2b))
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, fd995d3fc9a1ad45.rds) ...
#> Loaded! Cached result from previous rnorm call
all.equal(b1a, b1b) # Not TRUE because the sample is run first
#> [1] "Attributes: < Component “tags”: 1 string mismatch >"
#> [2] "Numeric: lengths (30566, 20863) differ"             
all.equal(b2a, b2b) # TRUE because of {  }, sample is not run
#> [1] "Attributes: < Component “.Cache”: Component “newCache”: 1 element mismatch >"

#########################
# Advanced examples
#########################

# .cacheExtra -- add something to digest
Cache(rnorm(1), .cacheExtra = "sfessee11") # adds something other than fn args
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Saved! Cache file: 50940d5e9b89ae8d.rds; fn: rnorm
#> [1] 0.6860401
#> attr(,".Cache")
#> attr(,".Cache")$newCache
#> [1] TRUE
#> 
#> attr(,"tags")
#> [1] "cacheId:50940d5e9b89ae8d"
#> attr(,"callInCache")
#> [1] ""
Cache(rnorm(1), .cacheExtra = "nothing") # even though fn is same, the extra is different
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Saved! Cache file: 02113a5728217388.rds; fn: rnorm
#> [1] 0.8778937
#> attr(,".Cache")
#> attr(,".Cache")$newCache
#> [1] TRUE
#> 
#> attr(,"tags")
#> [1] "cacheId:02113a5728217388"
#> attr(,"callInCache")
#> [1] ""

# omitArgs -- remove something from digest (kind of the opposite of .cacheExtra)
Cache(rnorm(2, sd = 1), omitArgs = "sd") # removes one or more args from cache digest
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Saved! Cache file: 3b12b9a1308253c5.rds; fn: rnorm
#> [1]  0.3459470 -0.8371393
#> attr(,".Cache")
#> attr(,".Cache")$newCache
#> [1] TRUE
#> 
#> attr(,"tags")
#> [1] "cacheId:3b12b9a1308253c5"
#> attr(,"callInCache")
#> [1] ""
Cache(rnorm(2, sd = 2), omitArgs = "sd") # b/c sd is not used, this is same as previous
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> Object to retrieve (fn: rnorm, 3b12b9a1308253c5.rds) ...
#> Loaded! Cached result from previous rnorm call
#> [1]  0.3459470 -0.8371393
#> attr(,".Cache")
#> attr(,".Cache")$newCache
#> [1] FALSE
#> 
#> attr(,"tags")
#> [1] "cacheId:3b12b9a1308253c5"
#> attr(,"callInCache")
#> [1] ""

# cacheId -- force the use of a digest -- can give undesired consequences
Cache(rnorm(3), cacheId = "k323431232") # sets the cacheId for this call
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> cacheId passed to override automatic digesting; using k323431232
#> Saved! Cache file: k323431232.rds; fn: rnorm
#> [1] -0.1486125  0.3507219 -0.2208678
#> attr(,".Cache")
#> attr(,".Cache")$newCache
#> [1] TRUE
#> 
#> attr(,"tags")
#> [1] "cacheId:k323431232"
#> attr(,"callInCache")
#> [1] ""
Cache(runif(14), cacheId = "k323431232") # recovers same as above, i.e, rnorm(3)
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> cacheId passed to override automatic digesting; using k323431232
#> Object to retrieve (fn: runif, k323431232.rds) ...
#> Loaded! Cached result from previous runif call
#> [1] -0.1486125  0.3507219 -0.2208678
#> attr(,".Cache")
#> attr(,".Cache")$newCache
#> [1] FALSE
#> 
#> attr(,"tags")
#> [1] "cacheId:k323431232"
#> attr(,"callInCache")
#> [1] ""

# Turn off Caching session-wide
opts <- options(reproducible.useCache = FALSE)
Cache(rnorm(3)) # doesn't cache
#> useCache is FALSE; skipping Cache on function rnorm(3) (currently running
#>   nested Cache level 1)
#> [1] 1.4744105 1.2351648 0.1556144
options(opts)

# showSimilar can help with debugging why a Cache call isn't picking up a cached copy
Cache(rnorm(4), showSimilar = TRUE) # shows that the argument `n` is different
#> No cachePath supplied and getOption('reproducible.cachePath') is inside a temporary directory;
#>   this will not persist across R sessions.
#> There are 8 calls with same fn (rnorm) in the Cache repository.
#> With fewest differences (1), there are 4 similar calls in the Cache repository.
#> with different elements (4 most recent at top):
#> ----------------------
#> Compared to cacheId: fd995d3fc9a1ad45
#>       arg   cacheIdInCache     valueInCache cacheIdOfThisCall    valueThisCall
#>    <char>           <char>           <char>            <char>           <char>
#> 1:      n fd995d3fc9a1ad45 cafa3d7cd8ffbcb3  adf21923cd1e50d0 7eef4eae85fd9229._noPrefix
#> ----------------------
#> Compared to cacheId: 1b1ed95a5a47bfc4
#>       arg   cacheIdInCache     valueInCache cacheIdOfThisCall    valueThisCall
#>    <char>           <char>           <char>            <char>           <char>
#> 1:      n 1b1ed95a5a47bfc4 e9ce8ecadcc174b4  adf21923cd1e50d0 7eef4eae85fd9229._noPrefix
#> ----------------------
#> Compared to cacheId: 72bf1c853b8a84d4
#>       arg   cacheIdInCache     valueInCache cacheIdOfThisCall    valueThisCall
#>    <char>           <char>           <char>            <char>           <char>
#> 1:      n 72bf1c853b8a84d4 a3d9a40fe8caa766  adf21923cd1e50d0 7eef4eae85fd9229._noPrefix
#> ----------------------
#> Compared to cacheId: ca275879d5116967
#>       arg   cacheIdInCache     valueInCache cacheIdOfThisCall    valueThisCall
#>    <char>           <char>           <char>            <char>           <char>
#> 1:      n ca275879d5116967 853b1797f54b229c  adf21923cd1e50d0 7eef4eae85fd9229._noPrefix
#> ----------------------
#> Saved! Cache file: adf21923cd1e50d0.rds; fn: rnorm
#> [1] -0.7219279 -1.3192453 -0.7930681 -0.5921492
#> attr(,".Cache")
#> attr(,".Cache")$newCache
#> [1] TRUE
#> 
#> attr(,"tags")
#> [1] "cacheId:adf21923cd1e50d0"
#> attr(,"callInCache")
#> [1] ""

###############################################
# devMode -- enables cache database to stay
#            small even when developing code
###############################################
opt <- options("reproducible.useCache" = "devMode")
clearCache(tmpDir, ask = FALSE)
centralTendency <- function(x) {
  mean(x)
}
funnyData <- c(1, 1, 1, 1, 10)
uniqueUserTags <- c("thisIsUnique", "reallyUnique")
ranNumsB <- Cache(centralTendency, funnyData, cachePath = tmpDir,
                  userTags = uniqueUserTags) # sets new value to Cache
#> There is no similar item in the cachePath of centralTendency
#> Saved! Cache file: de10a5200e4aff1d.rds; fn: centralTendency
showCache(tmpDir) # 1 unique cacheId -- cacheId is 71cd24ec3b0d0cac
#> Cache size:
#> Total (including Rasters): 14 bytes
#> Selected objects (not including Rasters): 14 bytes
#>              cacheId              tagKey                  tagValue
#>               <char>              <char>                    <char>
#>  1: de10a5200e4aff1d            function           centralTendency
#>  2: de10a5200e4aff1d            userTags              thisIsUnique
#>  3: de10a5200e4aff1d            userTags              reallyUnique
#>  4: de10a5200e4aff1d            accessed 2026-05-18 13:47:26.20955
#>  5: de10a5200e4aff1d             inCloud                     FALSE
#>  6: de10a5200e4aff1d   elapsedTimeDigest          0.005343914 secs
#>  7: de10a5200e4aff1d           preDigest     .FUN:3df5c81377ae4909
#>  8: de10a5200e4aff1d           preDigest        x:e4aa8de28dc6c1bb
#>  9: de10a5200e4aff1d               class                   numeric
#> 10: de10a5200e4aff1d         object.size                        56
#> 11: de10a5200e4aff1d            fromDisk                     FALSE
#> 12: de10a5200e4aff1d          resultHash                          
#> 13: de10a5200e4aff1d elapsedTimeFirstRun           0.04159045 secs
#>                    createdDate
#>                         <char>
#>  1: 2026-05-18 13:47:26.252151
#>  2: 2026-05-18 13:47:26.252151
#>  3: 2026-05-18 13:47:26.252151
#>  4: 2026-05-18 13:47:26.252151
#>  5: 2026-05-18 13:47:26.252151
#>  6: 2026-05-18 13:47:26.252151
#>  7: 2026-05-18 13:47:26.252151
#>  8: 2026-05-18 13:47:26.252151
#>  9: 2026-05-18 13:47:26.252151
#> 10: 2026-05-18 13:47:26.252151
#> 11: 2026-05-18 13:47:26.252151
#> 12: 2026-05-18 13:47:26.252151
#> 13: 2026-05-18 13:47:26.252151

# During development, we often redefine function internals
centralTendency <- function(x) {
  median(x)
}
# When we rerun, we don't want to keep the "old" cache because the function will
#   never again be defined that way. Here, because of userTags being the same,
#   it will replace the entry in the Cache, effetively overwriting it, even though
#   it has a different cacheId
ranNumsD <- Cache(centralTendency, funnyData, cachePath = tmpDir, userTags = uniqueUserTags)
#> ------ devMode -------
#> Previous call(s) exist in the cache with identical userTags (thisIsUnique,
#>   reallyUnique)
#> This call to cache will replace entry with cacheId(s):
#> with different elements (1 most recent at top):
#> ----------------------
#> Compared to cacheId: de10a5200e4aff1d
#>                arg   cacheIdInCache     valueInCache cacheIdOfThisCall
#>             <char>           <char>           <char>            <char>
#> 1: centralTendency de10a5200e4aff1d 3df5c81377ae4909  d03daaa68c73e7e9
#>       valueThisCall
#>              <char>
#> 1: 88d14fe75c352ad5._noPrefix
#> ----------------------
#> ------ devMode -------
#> Saved! Cache file: d03daaa68c73e7e9.rds; fn: centralTendency
showCache(tmpDir) # 1 unique artifact -- cacheId is 632cd06f30e111be
#> Cache size:
#> Total (including Rasters): 14 bytes
#> Selected objects (not including Rasters): 14 bytes
#>              cacheId              tagKey                  tagValue
#>               <char>              <char>                    <char>
#>  1: d03daaa68c73e7e9            function           centralTendency
#>  2: d03daaa68c73e7e9            userTags              thisIsUnique
#>  3: d03daaa68c73e7e9            userTags              reallyUnique
#>  4: d03daaa68c73e7e9            accessed 2026-05-18 13:47:26.27770
#>  5: d03daaa68c73e7e9             inCloud                     FALSE
#>  6: d03daaa68c73e7e9   elapsedTimeDigest          0.005672693 secs
#>  7: d03daaa68c73e7e9           preDigest     .FUN:88d14fe75c352ad5
#>  8: d03daaa68c73e7e9           preDigest        x:e4aa8de28dc6c1bb
#>  9: d03daaa68c73e7e9               class                   numeric
#> 10: d03daaa68c73e7e9         object.size                        56
#> 11: d03daaa68c73e7e9            fromDisk                     FALSE
#> 12: d03daaa68c73e7e9          resultHash                          
#> 13: d03daaa68c73e7e9 elapsedTimeFirstRun           0.06726694 secs
#>                    createdDate
#>                         <char>
#>  1: 2026-05-18 13:47:26.346027
#>  2: 2026-05-18 13:47:26.346027
#>  3: 2026-05-18 13:47:26.346027
#>  4: 2026-05-18 13:47:26.346027
#>  5: 2026-05-18 13:47:26.346027
#>  6: 2026-05-18 13:47:26.346027
#>  7: 2026-05-18 13:47:26.346027
#>  8: 2026-05-18 13:47:26.346027
#>  9: 2026-05-18 13:47:26.346027
#> 10: 2026-05-18 13:47:26.346027
#> 11: 2026-05-18 13:47:26.346027
#> 12: 2026-05-18 13:47:26.346027
#> 13: 2026-05-18 13:47:26.346027

# If it finds it by cacheID, doesn't matter what the userTags are
ranNumsD <- Cache(centralTendency, funnyData, cachePath = tmpDir, userTags = "thisIsUnique")
#> Object to retrieve (fn: centralTendency, d03daaa68c73e7e9.rds) ...
#> Loaded! Cached result from previous centralTendency call
options(opt)

#########################################
# For more in depth uses, see vignette
if (interactive())
  browseVignettes(package = "reproducible")