This does downloading (via downloadFile), checksumming (Checksums), and extracting from archives (extractFromArchive), plus cleaning up of input arguments (e.g., paths, function names). This is the first stage of three used in prepInputs.

preProcess(targetFile = NULL, url = NULL, archive = NULL,
  alsoExtract = NULL,
  destinationPath = getOption("reproducible.destinationPath", "."),
  fun = NULL, dlFun = NULL, quick = getOption("reproducible.quick"),
  overwrite = getOption("reproducible.overwrite", FALSE),
  purge = FALSE, useCache = getOption("reproducible.useCache", FALSE),
  ...)

Arguments

targetFile

Character string giving the path to the eventual file (raster, shapefile, csv, etc.) after downloading and extracting from a zip or tar archive. This is the file before it is passed to postProcess. Currently, the internal checksumming does not checksum the file after it is postProcessed (e.g., cropped/reprojected/masked). Using Cache around prepInputs will do a sufficient job in these cases. See table in preProcess.

url

Optional character string indicating the URL to download from. If not specified, then no download will be attempted. If not entry exists in the CHECKSUMS.txt (in destinationPath), an entry will be created or appended to. This CHECKSUMS.txt entry will be used in subsequent calls to prepInputs or preProcess, comparing the file on hand with the ad hoc CHECKSUMS.txt. See table in preProcess.

archive

Optional character string giving the path of an archive containing targetFile, or a vector giving a set of nested archives (e.g., c("xxx.tar", "inner.zip", "inner.rar")). If there is/are (an) inner archive(s), but they are unknown, the function will try all until it finds the targetFile. See table in preProcess.

alsoExtract

Optional character string naming files other than targetFile that must be extracted from the archive. If NULL, the default, then it will extract all files. Other options: "similar" will extract all files with the same filename without file extension as targetFile. NA will extract nothing other than targetFile. A character string of specific file names will cause only those to be extracted. See table in preProcess.

destinationPath

Character string of a directory in which to download and save the file that comes from url and is also where the function will look for archive or targetFile. NOTE (still experimental): To prevent repeated downloads in different locations, the user can also set options("reproducible.inputPaths") to one or more local file paths to search for the file before attempting to download. Default for that option is NULL meaning do not search locally.

fun

Function or character string indicating the function to use to load targetFile into an R object, e.g., in form with package name: "raster::raster".

dlFun

Optional "download function" name, such as "raster::getData", which does custom downloading, in addition to loading into R. Still experimental.

quick

Logical. This is passed internally to Checksums (the quickCheck argument), and to Cache (the quick argument). This results in faster, though less robust checking of inputs. See the respective functions.

overwrite

Logical. Should downloading and all the other actions occur even if they pass the checksums or the files are all there.

purge

Logical or Integer. 0/FALSE (default) keeps existing CHECKSUMS.txt file and prepInputs will write or append to it. 1/TRUE will deleted the entire CHECKSUMS.txt file. Other options, see details.

useCache

Passed to Cache in various places. Defaults to getOption("reproducible.useCache")

...

Additional arguments passed to fun (i.e,. user supplied), postProcess and Cache. Since ... is passed to postProcess, these will ... will also be passed into the inner functions, e.g., cropInputs. See details and examples.

Value

A list with 5 elements, checkSums (the result of a Checksums after downloading), dots (cleaned up ..., including deprecated argument checks), fun (the function to be used to load the preProcessed object from disk), and targetFilePath (the fully qualified path to the targetFile).

Combinations of targetFile, url, archive, alsoExtract

# ParamsurltargetFile
archivealsoExtractResultChecksum 1st time
Checksum 2nd time------------------------------------
------------1charNULLNULL
NULLDownload, extract all files if an archive, guess attargetFile, load into Rwrite or append all new filessame as 1st -- notargetFile*
NULLcharNULLNULLloadtargetFile
into Rwrite or appendtargetFileno downloading, so no checksums useNULL
NULLcharNULLextract all files, guess attargetFile, load into Rwrite or append all new filesno downloading, so no checksums use
NULLNULLNULLcharguess attargetFile
from files inalsoExtract, load into Rwrite or append all new filesno downloading, so no checksums use------------
------------------------------------2
charcharNULLNULLDownload, extract all files if an archive, loadtargetFileinto R
write or append all new filesuse Checksums, skip downloadingcharNULLcharNULL
Download, extract all files, guess attargetFile, load into Rwrite or append all new filessame as 1st -- notargetFile*
charNULLNULLcharDownload, extract only named files inalsoExtract, guess at
targetFile, load into Rwrite or append all new filessame as 1st -- notargetFile*
NULLcharNULLcharloadtargetFileinto Rwrite or append all new files
no downloading, so no checksums useNULLcharcharNULLExtract all files, load
targetFileinto Rwrite or append all new filesno downloading, so no checksums useNULLNULL
charcharExtract only named files inalsoExtract, guess attargetFile, load into Rwrite or append all new files
no downloading, so no checksums use------------------------------------
------------3charcharchar
NULLDownload, extract all files, loadtargetFileinto Rwrite or append all new filesuse Checksums, skip downloading
charNULLcharcharDownload, extract files named inalsoExtract, guess attargetFile
, load into Rwrite or append all new filesuse Checksums, skip downloadingcharNULLchar
"similar"Download, extract all files (can't understand "similar"), guess attargetFile, load into Rwrite or append all new filessame as 1st -- no
targetFile*charcharNULLchar
Download, if an archive, extract files named intargetFileandalsoExtract, loadtargetFileinto Rwrite or append all new files
use Checksums, skip downloadingcharcharNULL"similar"
Download, if an archive, extract files with same base astargetFile, loadtargetFileinto Rwrite or append all new filesuse Checksums, skip downloading
charcharcharNULLDownload, extract all files from archive, loadtargetFile
into Rwrite or append all new filesuse Checksums, skip downloadingNULLcharchar
charExtract files named inalsoExtractfrom archive, loadtargetFileinto Rwrite or append all new filesno downloading, so no checksums use
------------------------------------------
------4charcharcharchar
Download, extract files named intargetFileandalsoExtract, loadtargetFileinto Rwrite or append all new files
use Checksums, skip downloadingcharcharchar"similar"
Download, extract all files with same base astargetFile, loadtargetFileinto Rwrite or append all new filesuse Checksums, skip downloading
# Paramsurl

* If the url is a file on Google Drive, checksumming will work even without a targetFile specified because there is an initial attempt to get the remove file information (e.g., file name). With that, the connection between the url and the filename used in the CHECKSUMS.txt file can be made.