cloudSyncCachehas more options that are implemented and many unitTests
prepInputswasn’t correctly passing
cropInputswas reprojecting extent of y as a time saving approach, but this was incorrect if
SpatialPolygonthat is not close to filling the extent. It now reprojects
studyAreadirectly which will be slower, but correct. – fixes issue #93
CHECKSUMS.txtshould now be ordered consistently across operating systems (note:
base::orderwill not succeed in doing this –> now using
cloudSyncCachehas a new argument:
cacheIds. Now user can control entries by
cacheId, so can delete/upload individual objects by
%>%pipe that was long ago deprecated. User should use
%C%if they want a pipe that is Cache-aware. See examples.
optionsdescriptions now in
options("reproducible.cachePath")can take a vector of paths. Similar to how .libPaths() works for libraries,
Cachewill search first in the first entry in the
cacheRepo, then the second etc. until it finds an entry. It will only write to the first entry.
options("reproducible.useCache" = "devMode"). The point of this mode is to facilitate using the Cache when functions and datasets are continually in flux, and old Cache entries are likely stale very often. In
devMode, the cache mechanism will work as normal if the Cache call is the first time for a function OR if it successfully finds a copy in the cache based on the normal Cache mechanism. It differs from the normal Cache if the Cache call does not find a copy in the
cacheRepo, but it does find an entry that matches based on
userTags. In this case, it will delete the old entry in the
cacheRepo(identified based on matching
userTags), then continue with normal
Cache. For this to work correctly,
userTagsmust be unique for each function call. This should be used with caution as it is still experimental.
options("reproducible.useNewDigestAlgorithm" = FALSE). There is a message of this change on package load.
cloudCachewhich allows sharing of Cache among collaborators. Currently only works with
assessDataTypeinto single function (#71, @ianmseddy)
cc: new function – a shortcut for some commonly used options for
.rararchives, on systems with correct binaries to deal with them (#86, @tati-micheletti)
fastdigest::fastdigestas it is not return the identical hash across operating systems
prepInputson GIS objects that don’t use
raster::rasterto load object were skipping
prepInputswould cause virtually all entries in
CHECKSUMS.txtto be deleted. 2 cases where this happened were identified and corrected.
data.tableclass objects would give an error sometimes due to use of
attr(DT). Internally, attributes are now added with
data.table::setattrto deal with this.
prostProcessnow correctly matches extent (#73, @tati-micheletti)
options(reproducible.useCache = 'overwrite'), which allows use of
Cachein cases where the function call has an entry in the
cacheRepo, will purge it and add the output of the current call instead.
FALSE), which will be used in
prepInputsas possible directory sources (searched recursively or not) for files being downloaded/extracted/prepared. This allows the using of local copies of files in (an)other location(s) instead of downloading them. If local location does not have the required files, it will proceed to download so there is little cost in setting this option. If files do exist on local system, the function will attempt to use a hardlink before making a copy.
options(httr_oob_default = TRUE)if using Rstudio Server.
CHECKSUMSnow sorted alphabetically.
Checksumscan now have a
CHECKSUMS.txtfile located in a different place than the
assessDataTypeGDAL, used in
postProcess, to identify smallest
datatype for large Raster* objects passed to GDAL system call
gdalwarpsystem call if
raster::canProcessInMemory(x,4) = FALSEfor faster and memory-safe processing
Rasterobjects, including factor rasters
extractFromArchivefor large (>2GB) zip files. In the
unzipfails for zip files >2GB. This uses a system call if the zip file is too large and fails using
Cache()when deeply nested, due to
grep(sys.calls(), ...)that would take long and hang.
preProcess(url = NULL)(#65, @tati-micheletti)
clearCache(#67), especially for large
Rasterobjects that are stored as binary
rasterpackage changes in development version of
.robustDigestnow does not include
Cachesaving to SQLite database, via
options("reproducible.futurePlan"), if the
futurepackage is installed. This is
do.callfunction is Cached, previously, it would be labelled in the database as
do.call. Now it attempts to extract the actual function being called by the
do.call. Messaging is similarly changed.
reproducible.ask, logical, indicating whether
clearCacheshould ask for deletions when in an interactive session
dlFun, to pass a custom function for downloading (e.g., “raster::getData”)
prepInputswill automatically use
readRDSif the file is a
prepInputswill return a
fun = "base::load", with a message; can still pass an
envirto obtain standard behaviour of
clearCache- new argument
assessDataType, used in
postProcess, to identify smallest
datatypefor Raster* objects, if user does not pass an explicit
git2rupdate (@stewid, #36).
.prepareRasterBackedFile– now will postpend an incremented numeric to a cached copy of a file-backed Raster object, if it already exists. This mirrors the behaviour of the
.rdafile. Previously, if two Cache events returned the same file name backing a Raster object, even if the content was different, it would allow the same file name. If either cached object was deleted, therefore, it would cause the other one to break as its file-backing would be missing.
spades.XXXand should have been
copyFiledid not perform correctly under all cases; now better handling of these cases, often sending to
file.copy(slower, but more reliable)
extractFromArchiveneeded a new
Checksumfunction call under some circumstances
extractFromArchive– when dealing with nested zips, not all args were passed in recursively (#37, @CeresBarros)
prepInputs– arguments that were same as
Cachewere not being correctly passed internally to
Cache, and if wrapped in Cache, it was not passed into prepInputs. Fixed.
.prepareFileBackedRasterwas failing in some cases (specifically if it was inside a
do.call) (#40, @CeresBarros).
Cachewas failing under some cases of
Cache(do.call, ...). Fixed.
Cache– when arguments to Cache were the same as the arguments in
FUN, Cache would “take” them. Now, they are correctly passed to the
preProcess– writing to checksums may have produced a warning if
CHECKSUMS.txtwas not present. Now it does not.
convertRasterPathsto assist with renaming moved files.
prepInputs – new features
alsoExtractnow has more options (
"similar") and defaults to extracting all files in an archive (
postProcessaltogether if no
rasterToMatch. Previously, this would invoke Cache even if there was nothing to
prepInputsto aid in data downloading and preparation problems, solved in a reproducible, Cache-aware way.
postProcesswhich is a wrapper for sequences of several other new functions (
downloadFilecan handle Google Drive and ftp/http(s) files
compareNAdoes comparisons with NA as a possible value e.g.,
compareNA(c(1,NA), c(2, NA))returns
Cache – new features:
verbosewhich can help with debugging
useCachewhich allows turning caching on and off at a high level (e.g., options(“useCache”))
cacheIdwhich allows user to hard code a result from a Cache
Cachefunction calls, unless explicitly set on the inner functions
userTagsadded automatically to cache entries so much more powerful searching via
checksumsnow returns a data.table with the same columns whether
write = TRUEor
write = FALSE.
showCachenow give messages and require user intervention if request to
clearCachewould be large quantities of data deleted
memoise::memoisenow used on 3rd run through an identical
Cachecall, dramatically speeding up in most cases
asPathhas a new argument indicating how deep should the path be considered when included in caching (only relevant when
quick = TRUE)
parallel-safe, meaning there are
tryCatcharound every attempt at writing to SQLite database so it can be used safely on multi-threaded machines
importsfor packages e.g.,
%C%) and assign
several performance enhancements
mergeCache: a new function to merge two different Cache repositories
memoise::memoiseis now used on
loadFromLocalRepo, meaning that the 3rd time
Cache()is run on the same arguments (and the 2nd time in a session), the returned Cache will be from a RAM object via memoise. To stop this behaviour and use only disk-based Caching, set
options(reproducible.useMemoise = FALSE).
%<%can be used instead of normal assign, equivalent to
lhs <- Cache(rhs).
%C%– use to begin a pipe sequence, e.g.,
Cache() %C% ...
sideEffectcan now be a path
digestPathContentdefault changed from FALSE (was for speed) to TRUE (for content accuracy)
searchFull, which shows the full search path, known alternatively as “scope”, or “binding environments”. It is where R will search for a function when requested by a user.
memoise::memoisefor several functions (
available.packages) for speed – will impact memory at the expense of speed.
requireon those 20 packages, but
requiredoes not check for dependencies and deal with them if missing: it just errors. This speed should be fast enough for many purposes.
change name of
digestRasteraffecting in-memory rasters