This does downloading (via downloadFile
), checksumming (Checksums
),
and extracting from archives (extractFromArchive
), plus cleaning up of input
arguments (e.g., paths, function names).
This is the first stage of three used in prepInputs
.
preProcessParams(n = NULL)
preProcess(
targetFile = NULL,
url = NULL,
archive = NULL,
alsoExtract = NULL,
destinationPath = getOption("reproducible.destinationPath", "."),
fun = NULL,
dlFun = NULL,
quick = getOption("reproducible.quick"),
overwrite = getOption("reproducible.overwrite", FALSE),
purge = FALSE,
verbose = getOption("reproducible.verbose", 1),
.tempPath,
...
)
Number of non-null arguments passed to preProcess
.
E.g., passing n = 1
returns combinations with only a single non-NULL parameter.
If NULL
(default), all parameter combinations are returned.
Character string giving the filename (without relative or
absolute path) to the eventual file
(raster, shapefile, csv, etc.) after downloading and extracting from a zip
or tar archive. This is the file before it is passed to
postProcess
. The internal checksumming does not checksum
the file after it is postProcess
ed (e.g., cropped/reprojected/masked).
Using Cache
around prepInputs
will do a sufficient job in these cases.
See table in preProcess()
.
Optional character string indicating the URL to download from.
If not specified, then no download will be attempted. If not entry
exists in the CHECKSUMS.txt
(in destinationPath
), an entry
will be created or appended to. This CHECKSUMS.txt
entry will be used
in subsequent calls to
prepInputs
or preProcess
, comparing the file on hand with the ad hoc
CHECKSUMS.txt
. See table in preProcess()
.
Optional character string giving the path of an archive
containing targetFile
, or a vector giving a set of nested archives
(e.g., c("xxx.tar", "inner.zip", "inner.rar")
). If there is/are (an) inner
archive(s), but they are unknown, the function will try all until it finds
the targetFile
. See table in preProcess()
. If it is NA
,
then it will not attempt to see it as an archive, even if it has archive-like
file extension (e.g., .zip
). This may be useful when an R function
is expecting an archive directly.
Optional character string naming files other than
targetFile
that must be extracted from the archive
. If
NULL
, the default, then it will extract all files. Other options:
"similar"
will extract all files with the same filename without
file extension as targetFile
. NA
will extract nothing other
than targetFile
. A character string of specific file names will cause
only those to be extracted. See table in preProcess()
.
Character string of a directory in which to download
and save the file that comes from url
and is also where the function
will look for archive
or targetFile
. NOTE (still experimental):
To prevent repeated downloads in different locations, the user can also set
options("reproducible.inputPaths")
to one or more local file paths to
search for the file before attempting to download. Default for that option is
NULL
meaning do not search locally.
Optional. If specified, this will attempt to load whatever
file was downloaded during preProcess
via dlFun
. This can be either a
function (e.g., sf::st_read), character string (e.g., "base::load"),
NA (for no loading, useful if dlFun
already loaded the file) or
if extra arguments are required
in the function call, it must be a call naming
targetFile
(e.g., sf::st_read(targetFile, quiet = TRUE)
)
as the file path to the file to load. See details and examples below.
Optional "download function" name, such as "raster::getData"
, which does
custom downloading, in addition to loading into R. Still experimental.
Logical. This is passed internally to Checksums()
(the quickCheck argument), and to
Cache()
(the quick argument). This results in faster, though
less robust checking of inputs. See the respective functions.
Logical. Should downloading and all the other actions occur even if they pass the checksums or the files are all there.
Logical or Integer. 0/FALSE
(default) keeps existing
CHECKSUMS.txt
file and
prepInputs
will write or append to it. 1/TRUE
will deleted the entire
CHECKSUMS.txt
file. Other options, see details.
Numeric, -1 silent (where possible), 0 being very quiet,
1 showing more messaging, 2 being more messaging, etc.
Default is 1. Above 3 will output much more information about the internals of
Caching, which may help diagnose Caching challenges. Can set globally with an
option, e.g., options('reproducible.verbose' = 0) to reduce to minimal
Optional temporary path for internal file intermediate steps. Will be cleared on.exit from this function.
Additional arguments passed to
postProcess()
and Cache()
.
Since ...
is passed to postProcess()
, these will
...
will also be passed into the inner
functions, e.g., cropInputs()
. Possibly useful other arguments include
dlFun
which is passed to preProcess
. See details and examples.
A list with 5 elements: checkSums
(the result of a Checksums
after downloading), dots
(cleaned up ...
, including deprecated argument checks),
fun
(the function to be used to load the preProcess
ed object from disk),
and targetFilePath
(the fully qualified path to the targetFile
).
targetFile
, url
, archive
, alsoExtract
Use preProcessParams()
for a table describing various parameter combinations and their
outcomes.
*
If the url
is a file on Google Drive, checksumming will work
even without a targetFile
specified because there is an initial attempt
to get the remove file information (e.g., file name). With that, the connection
between the url
and the filename used in the CHECKSUMS.txt
file can be made.