reproducible.messageColourQuestionfor questions that require user input. Defaults are
greenrespectively. These are user-visible colour changes.
Cachecases where a
file.linkis used instead of saving.
options(reproducible.verbose = 0)will turn off almost all messaging.
postProcessand family now have
filename2 = NULLas the default, so not saved to disk. This is a change.
verboseis now an argument throughout, whose default is
getOption(reproducible.verbose), which is set by default to
1. Thus, individual function calls can be more or less verbose, or the whole session via option.
postProcessnow uses a simpler single call to
gdalwarp, if available, for
RasterLayerclass to accomplish
writeOutputsall at once. This should be faster, simpler and, perhaps, more stable. It will only be invoked if the
RasterLayeris too large to fit into RAM. To force it to be used the user must set
useGDAL = "force"in
postProcessor globally with
options("reproducible.useGDAL" = "force")
postProcesswhen using the new
gdalwarp, has better persistence of colour table, and NA values as these are kept with better reliability
Cachenow works as expected (e.g., with parallel processing, it will avoid collisions) with SQLite thanks to suggestion here: https://stackoverflow.com/a/44445010
Rasterclass objects to account for more of the metadata (including the colortable). This will change the digest value of all
Rasterlayers, causing re-run of
checkPaththat were moved to
Requirepackage. For backwards compatibility, these are imported and reexported
file.moveused to rename/copy files across disks (a situation where
DBItype functions now have default
Cache(prepInputs, ...on a file-backed
Raster*class object now gives the non-Cache repository folder as the
filename(returnRaster). Previously, the return object would contain the cache repository as the folder for the file-backed
versions; moved to Suggests:
Require. Now there are 12 non-base packages listed in Imports. This is down from 31 prior to Ver 1.0.0.
saveToCache. This would have resulted in C Stack overflow errors due to missing original file in the
unzipwhen extracting large (>= 4GB) files (#145, @tati-micheletti)
projectInputswhen converting to longlat projections,
Filenamesnow consistently returns a character vector (#149)
raster) are updated.
options('reproducible.cacheSaveFormat')on the fly; cache will look for the file by
cacheIdand write it using
options('reproducible.cacheSaveFormat'). If it is in another format, Cache will load it and resave it with the new format. Experimental still.
ANYas it would be dispatched for unknown classes that inherit from
environment, of which there are many and this should be intercepted
Requirecan now handle minimum version numbers, e.g.,
Require("bit (>=1.1-15.2)"); this can be worked into downstream tools. Still experimental.
file.symlinkif an existing Cache entry with identical output exists and it is large (currently
1e6bytes); this will save disk space.
preProcess). Includes 2 new functions,
tempfile2for use with
reproducible.tempPath, which is used for the new control of temporary files. Defaults to
file.path(tempdir(), "reproducible"). This feature was requested to help manage large amounts of temporary objects that were not being easily and automatically cleaned
conn; user may need to manually call
movedCacheif cache is not responding correctly. File-backed Rasters are automatically updated with new paths.
Raster*will have their filenames updated on the fly during a Cache recovery. User doesn’t need to do anything.
postProcessnow will perform simple tests and skip
projectInputswith a message if it can, rather than using
Cacheto “skip”. This should speed up
postProcessin many cases.
Cachehas change. Now,
cacheIdis shown in all cases, making it easier to identify specific items in the cache.
Copyonly creates a temporary directory for filebacked rasters; previously any
Copycommand was creating a temporary directory, regardless of whether it was needed
cropInputs.spatialObjectshad a bug when object was a large non-Raster class.
cropInputsmay have failed due to “self intersection” error when x was a
SpatialPolygons*object; now catches error, runs
crop. Great reprex by @tati-micheletti. Fixed in commit
Filenamesbugfix related to
prepInputsdoes a better job of keeping all temporary files in a temporary folder; and cleans up after itself better.
prepInputsnow will not show message that it is loading object into R if
fun = NULL(#135).
options("reproducible.useDBI" = FALSE)
DBIpackage directly, without
archivist. This has much improved speed.
options("reproducible.cacheSaveFormat"). This can be either
qs. All cached objects will be saved with this format. Previously it was
qs::qsave. In many cases, this has much improved speed and file sizes compared to
rds; however, testing across a wide range of conditions will occur before it becomes the default.
Cacheis now much faster, the default is to turn memoising off, via
options("reproducible.useMemoise" = FALSE). In cases of large objects, memoising should still be faster, so user can still activate it, setting the option to
useGDALcan now take
"force"as the default behaviour is to not use GDAL if the problem can fit into RAM and
rastertools will be faster than
Cacheand family has slightly modified functionality (see ?Cache new section
useCloud) and now has more tests including edge cases, such as
useCloud = TRUE, useCache = 'overwrite'. The cloud version now will also follow the
archivist; moved to Suggests.
tidyselect. Some of these went to Suggests.
postProcesscalls that use GDAL made more robust (including #93).
dplyras a direct dependency. It is still an indirect dependency through
reproducible.showSimilarDepthallows for a deeper assessment of nested lists for differences between the nearest cached object and the present object. This greater depth may allow more fine tuned understanding of why an object is not correctly caching
options("reproducible.futurePlan")to something other than
FALSE, then it will show download progress if the file is “large”.
googledrivev 1.0.0 (#119)
pkgDep2, a new convenience function to get the dependencies of the “first order” dependencies.
useCache, used in many functions (incl
postProcess) can now be numeric, a qualitative indicator of “how deep” nested
Cachecalls should set
useCache = TRUE– implemented as 1 or 2 in
pkgDepwas becoming unreliable for unknown reasons. It has been reimplemented, much faster, without memoising. The speed gains should be immediately noticeable (6 second to 0.1 second for
retryto use exponential backoff when attempting to access online resources (#121)
cloudFolderID. This is a new approach to cloud caching. It has been tested with file backed
RasterBrickand all normal R objects. It will not work for any other class of disk-backed files, e.g.,
bigmatrix, nor is it likely to work for R6 class objects.
downloadDatafrom Google Drive now protects against HTTP2 error by capturing error and retrying. This is a curl issue for interrupted connections.
rcnsterrors on R-devel, tested using
devtools::check(env_vars = list("R_COMPILE_PKGS"=1, "R_JIT_STRATEGY"=4, "R_CHECK_CONSTANTS"=5))
retry, a new function, wraps
trywith an explicit attempt to retry the same code upon error. Useful for flaky functions, such as
googldrive::drive_downloadwhich sometimes fails due to
Rcppfunctionality as the functions were no longer faster than their R base alternatives.
prepInputswas not correctly passing
cropInputswas reprojecting extent of y as a time saving approach, but this was incorrect if
SpatialPolygonthat is not close to filling the extent. It now reprojects
studyAreadirectly which will be slower, but correct. (#93)
CHECKSUMS.txtshould now be ordered consistently across operating systems (note:
base::orderwill not succeed in doing this –> now using
cloudSyncCachehas a new argument:
cacheIds. Now user can control entries by
cacheId, so can delete/upload individual objects by
%>%pipe that was long ago deprecated. User should use
%C%if they want a pipe that is Cache-aware. See examples.
optionsdescriptions now in
options("reproducible.cachePath")can take a vector of paths. Similar to how .libPaths() works for libraries,
Cachewill search first in the first entry in the
cacheRepo, then the second etc. until it finds an entry. It will only write to the first entry.
options("reproducible.useCache" = "devMode"). The point of this mode is to facilitate using the Cache when functions and datasets are continually in flux, and old Cache entries are likely stale very often. In
devMode, the cache mechanism will work as normal if the Cache call is the first time for a function OR if it successfully finds a copy in the cache based on the normal Cache mechanism. It differs from the normal Cache if the Cache call does not find a copy in the
cacheRepo, but it does find an entry that matches based on
userTags. In this case, it will delete the old entry in the
cacheRepo(identified based on matching
userTags), then continue with normal
Cache. For this to work correctly,
userTagsmust be unique for each function call. This should be used with caution as it is still experimental.
options("reproducible.useNewDigestAlgorithm" = FALSE). There is a message of this change on package load.
cloudCachewhich allows sharing of Cache among collaborators. Currently only works with
assessDataTypeinto single function (#71, @ianmseddy)
cc: new function – a shortcut for some commonly used options for
.rararchives, on systems with correct binaries to deal with them (#86, @tati-micheletti)
fastdigest::fastdigestas it is not return the identical hash across operating systems
prepInputson GIS objects that don’t use
raster::rasterto load object were skipping
prepInputswould cause virtually all entries in
CHECKSUMS.txtto be deleted. 2 cases where this happened were identified and corrected.
data.tableclass objects would give an error sometimes due to use of
attr(DT). Internally, attributes are now added with
data.table::setattrto deal with this.
prostProcessnow correctly matches extent (#73, @tati-micheletti)
New value possible for
options(reproducible.useCache = 'overwrite'), which allows use of
Cache in cases where the function call has an entry in the
cacheRepo, will purge it and add the output of the current call instead.
FALSE), which will be used in
prepInputs as possible directory sources (searched recursively or not) for files being downloaded/extracted/prepared. This allows the using of local copies of files in (an)other location(s) instead of downloading them. If local location does not have the required files, it will proceed to download so there is little cost in setting this option. If files do exist on local system, the function will attempt to use a hardlink before making a copy.
CHECKSUMS now sorted alphabetically.
Checksums can now have a
CHECKSUMS.txt file located in a different place than the
assessDataTypeGDAL, used in
postProcess, to identify smallest
datatype for large Raster* objects passed to GDAL system call
gdalwarpsystem call if
raster::canProcessInMemory(x,4) = FALSEfor faster and memory-safe processing
Rasterobjects, including factor rasters
extractFromArchivefor large (>2GB) zip files. In the
unzipfails for zip files >2GB. This uses a system call if the zip file is too large and fails using
Cache()when deeply nested, due to
grep(sys.calls(), ...)that would take long and hang.
preProcess(url = NULL)(#65, @tati-micheletti)
clearCache(#67), especially for large
Rasterobjects that are stored as binary
rasterpackage changes in development version of
.robustDigestnow does not include
Cachesaving to SQLite database, via
options("reproducible.futurePlan"), if the
futurepackage is installed. This is
do.callfunction is Cached, previously, it would be labelled in the database as
do.call. Now it attempts to extract the actual function being called by the
do.call. Messaging is similarly changed.
reproducible.ask, logical, indicating whether
clearCacheshould ask for deletions when in an interactive session
dlFun, to pass a custom function for downloading (e.g., “raster::getData”)
prepInputswill automatically use
readRDSif the file is a
prepInputswill return a
fun = "base::load", with a message; can still pass an
envirto obtain standard behaviour of
clearCache- new argument
assessDataType, used in
postProcess, to identify smallest
datatypefor Raster* objects, if user does not pass an explicit
git2rupdate (@stewid, #36).
.prepareRasterBackedFile– now will postpend an incremented numeric to a cached copy of a file-backed Raster object, if it already exists. This mirrors the behaviour of the
.rdafile. Previously, if two Cache events returned the same file name backing a Raster object, even if the content was different, it would allow the same file name. If either cached object was deleted, therefore, it would cause the other one to break as its file-backing would be missing.
spades.XXXand should have been
copyFiledid not perform correctly under all cases; now better handling of these cases, often sending to
file.copy(slower, but more reliable)
extractFromArchiveneeded a new
Checksumfunction call under some circumstances
extractFromArchive– when dealing with nested zips, not all args were passed in recursively (#37, @CeresBarros)
prepInputs– arguments that were same as
Cachewere not being correctly passed internally to
Cache, and if wrapped in Cache, it was not passed into prepInputs. Fixed.
.prepareFileBackedRasterwas failing in some cases (specifically if it was inside a
do.call) (#40, @CeresBarros).
Cachewas failing under some cases of
Cache(do.call, ...). Fixed.
Cache– when arguments to Cache were the same as the arguments in
FUN, Cache would “take” them. Now, they are correctly passed to the
preProcess– writing to checksums may have produced a warning if
CHECKSUMS.txtwas not present. Now it does not.
convertRasterPathsto assist with renaming moved files.
prepInputs – new features
alsoExtractnow has more options (
"similar") and defaults to extracting all files in an archive (
postProcessaltogether if no
rasterToMatch. Previously, this would invoke Cache even if there was nothing to
prepInputsto aid in data downloading and preparation problems, solved in a reproducible, Cache-aware way.
postProcesswhich is a wrapper for sequences of several other new functions (
downloadFilecan handle Google Drive and ftp/http(s) files
compareNAdoes comparisons with NA as a possible value e.g.,
compareNA(c(1,NA), c(2, NA))returns
Cache – new features:
verbosewhich can help with debugging
useCachewhich allows turning caching on and off at a high level (e.g., options(“useCache”))
cacheIdwhich allows user to hard code a result from a Cache
Cachefunction calls, unless explicitly set on the inner functions
userTagsadded automatically to cache entries so much more powerful searching via
checksums now returns a data.table with the same columns whether
write = TRUE or
write = FALSE.
showCache now give messages and require user intervention if request to
clearCache would be large quantities of data deleted
memoise::memoise now used on 3rd run through an identical
Cache call, dramatically speeding up in most cases
asPath has a new argument indicating how deep should the path be considered when included in caching (only relevant when
quick = TRUE)
New vignette on using Cache
parallel-safe, meaning there are
tryCatch around every attempt at writing to SQLite database so it can be used safely on multi-threaded machines
bug fixes, unit tests, more
imports for packages e.g.,
updates for R 3.6.0 compact storage of sequence vectors
several performance enhancements
mergeCache: a new function to merge two different Cache repositories
memoise::memoise is now used on
loadFromLocalRepo, meaning that the 3rd time
Cache() is run on the same arguments (and the 2nd time in a session), the returned Cache will be from a RAM object via memoise. To stop this behaviour and use only disk-based Caching, set
options(reproducible.useMemoise = FALSE) .
Cache assign –
%<% can be used instead of normal assign, equivalent to
lhs <- Cache(rhs).
new option: reproducible.verbose, set to FALSE by default, but if set to true may help understand caching behaviour, especially for complex highly nested code.
all options now described in
All Cache arguments other than FUN and … will now propagate to internal, nested Cache calls, if they are not specified explicitly in each of the inner Cache calls.
Cached pipe operator
%C% – use to begin a pipe sequence, e.g.,
Cache() %C% ...
sideEffect can now be a path
digestPathContent default changed from FALSE (was for speed) to TRUE (for content accuracy)
searchFull, which shows the full search path, known alternatively as “scope”, or “binding environments”. It is where R will search for a function when requested by a user.
memoise::memoise for several functions (
available.packages) for speed – will impact memory at the expense of speed.
requireon those 20 packages, but
requiredoes not check for dependencies and deal with them if missing: it just errors. This speed should be fast enough for many purposes.
dplyr from Imports
RCurl to Imports
change name of
digestRasteraffecting in-memory rasters