Manage packages in OnDemand DCC command line
The following instructions guide the installation of R packages in a DCC command line. Several cases are treated, including:
- Case one : Base case (package is available on CRAN, with no dependencies)
- Case two : Package with dependencies
- Case three: Installing from a local file (not available on CRAN or custom package)
- Case four: Installing a package for Slurm batch processing
- Case five : Installing a past version of a package
Open DCC command line
In an interactive DCC session, you enter commands at the OS (Linux) command line. One sequence of commands (entered at the OS $ prompt) is:
$ module load R/4.1.1-rhel
$ R
These specify the version of R to load, launch R, and present the R command prompt (>).
Background
To install a package, source files are downloaded from a repository then decompressed, compiled, and saved in a local directory. Various R commands allow you to review and specify the current target repository, install a package, and load installed packages.
To review the current target repository (where, by default packages are installed into and loaded from), use the options("repos")
command. Example (text following the> options()
line is the result):
> options("repos")
$repos
"https://packagemanager.rstudio.com/all/__linux__/focal/latest"
To specify an alternate repository , use the command options(repos=”x”)
command. The following example sets the target repository to the Duke CRAN mirror:
> options(repos="https://archive.linux.duke.edu/cran/")
To install a package , use the command
> install.packages(package="x", lib="y", repos="z")
This instructs R to search for packagex
in repository z
and to install it in local directory y
.y
and z
may be omitted, in which case the current values of.libPaths()
and options("repos")
, respectively,
will be used.
To review the current local package installation and search directories , use .libPaths()
. Note that, in the example below, complete paths (beginning at/
) are specified. If relative paths are used, they are referenced from the current working directory, which is reported by getwd().
Example:
> .libPaths()
[1] "/hpc/home/tjb48/R/x86_64-pc-linux-gnu-library/4.1"
[2] "/usr/local/lib/R/site-library"
[3] "/usr/local/lib/R/library"
To include an additional directory in the library search path , use.libPaths()
as follows (note that, in
this example, the additional directory/hpc/group/rescomp/tjb48/rlib
is placed at the head of the
list):
> .libPaths(c("/hpc/group/rescomp/tjb48/rlib", .libPaths()))
The new search path is:
> .libPaths()
[1] "/hpc/group/rescomp/tjb48/rlib"
[2] "/hpc/home/tjb48/R/x86_64-pc-linux-gnu-library/4.1"
[3] "/usr/local/lib/R/site-library"
[4] "/usr/local/lib/R/library"
This path is also used byinstall.packages()
when repos
is not specified. In the above example,
install.packages("x")
will install package x
in /hpc/group/rescomp/tjb48/rlib
.
To load a package , uselibrary(package="x", lib.loc="y")
. If lib.loc
is specified, then
directory y
is searched for a subdirectory with name equal to x
. If y
is a relative path (does not begin
with/
) then the current working directory, as reported by getwd()
, is searched for subdirectory y
, then
a subdirectory named x
is searched for within y
. Iflib.loc
is not specified, then .libPaths()
directories are searched for a subdirectory named x
.
Note that, when loading a package, the library()
function searches the directory you specify, either
with the lib.lo
c parameter or through .libPaths()
directories, for a subdirectory with name equal
to the specified package then loads functions, help files, and related resources contained in the first
appropriately named subdirectory it finds. How the package subdirectory came to be in its current state
is not of concern to library()
, only that it is a properly constructed package directory. Copying a
properly constructed package subdirectory from one R installation to another (of like version) makes it
available for loading by library()
.
Package installation
Case one :Base case – standard repository, package with no dependencies
Space in your /hpc/home
directory (the default location that R attempts to use) is limited and will likely
be insufficient to contain more than a handful of packages. Available space is much greater in
/hpc/group
and this top level directory should be used. First, on the Terminal tab, create a
subdirectory within your group's hpc directory, with name equal to your net ID. Within you net ID
directory, create a subdirectory called rlib
. This subdirectory will contain your installed, compiled
packages. Example:
$ mkdir /hpc/your_group/your_netID
$ mkdir /hpc/your_group/your_netID/rlib
Now, on the Console tab, install your package:
> options(repos="https://archive.linux.duke.edu/cran/")
> install.packages("packageOfInterest", lib="/hpc/group/your_group/your_netID/rlib")
Test with:
> library(packageOfInterest, lib.loc="/hpc/group/your_group/your_netID/rlib")
Case two : Package with dependencies
In many cases, functions in one package refer to functions in another package. This constitutes a
dependency. When a source repository and target directory are specified, install.packages()
attempts to install dependent packages. The source repository can be specified using the repos
parameter ofinstall()
or by setting the repos option with options(repos=). The lib parameter
of install()
specifies the target installation directory, where the package being installed and dependent packages will be installed. Note that the behavior described can be modified using the
dependencies parameter of install.packages()
. The following example illustrates a common use
of library paths, package installation, and package loading when dependencies exist. If package pkgA
requires package pkgB
then an attempt to install pkgA
will cause an attempt to install pkgB
. Once
installed, loadingpkgB
requires that its location be specified in .libPaths()
. Otherwise, it cannot be
located, will not be loaded, and will cause attempts to load pkgA
to fail.
> options(repos="https://archive.linux.duke.edu/cran/")
> .libPaths(c("/hpc/group/your_group/your_netID/rlib", .libPaths())
> install.packages("pkgA", lib="/hpc/group/your_group/your_netID/rlib")
> library(pkgA)
Include the abovelib=
parameter and.libPaths()
command in any scripts that require the installed
package. Otherwise, R will not be able to load it and its dependencies.
Case three :Installing a package from a local (non-CRAN) file
You may want to install a package that is not available on CRAN. Perhaps you are developing your own
package and need to test loading it prior to publishing on CRAN. To do this, you first download the
package source then install from that file (consider the note on copying package subdirectories under
the background section, above). This method will be illustrated by an example. A package, named
feXTXc, is available as a gzip file at the githup repo
The name of the file is feXTXc_1.0.tar.gz
. Note that gzip is the standard distribution format of R
package files. Using the Terminal and Console tabs within RStudio, do the following:
1. If you have not created a package subdirectory within your group structure, do so now with
the following commands (Terminal tab):
$ mkdir /hpc/group/your_group/your_netID
$ mkdir /hpc/group/your_group/your_netID/rlib
2. Download the package to your current Linux directory (Terminal tab):
$ wget https://raw.githubusercontent.com/tbalmat/
StatisticsAndComputation/master/FixedEffectsRegression/
RPackage/feXTXc_1.0.tar.gz
Result:
-- 2022 - 06 - 17 12:42:52--
https://raw.githubusercontent.com/tbalmat/StatisticsAndComputation/master/FixedEffectsR
egression/RPackage/feXTXc_1.0.tar.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133,
185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com
(raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 35481 (35K) [application/octet-stream]
Saving to: ‘feXTXc_1.0.tar.gz’
feXTXc_1.0.tar.gz
100%[==================================================>] 34.65K --.-KB/s in
0.009s
2022 - 06 - 17 12:42:52 (3.65 MB/s) - ‘feXTXc_1.0.tar.gz’ saved [35481/35481]
3. Install the package (Console tab - note that the current R working directory is the same as the current Linux directory):
> install.packages("feXTXc_1.0.tar.gz", repos=NULL,
lib="/hpc/group/your_group/your_netID/rlib")
Result:
* installing *source* package ‘feXTXc’ ...
** using staged installation
** libs
g++ -std=gnu++14 -I"/usr/local/lib/R/include" -DNDEBUG -I'/usr/local/lib/R/site-
library/Rcpp/include' -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -
Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp
- o RcppExports.o
g++ -std=gnu++14 -I"/usr/local/lib/R/include" -DNDEBUG -I'/usr/local/lib/R/site-
library/Rcpp/include' -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -
Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c cholInvDiag.cpp
- o cholInvDiag.o
g++ -std=gnu++14 -I"/usr/local/lib/R/include" -DNDEBUG -I'/usr/local/lib/R/site-
library/Rcpp/include' -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -
Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c
choleskyDecomp.cpp -o choleskyDecomp.o
g++ -std=gnu++14 -I"/usr/local/lib/R/include" -DNDEBUG -I'/usr/local/lib/R/site-
library/Rcpp/include' -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -
Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g - c
choleskyXInverse.cpp -o choleskyXInverse.o
g++ -std=gnu++14 -I"/usr/local/lib/R/include" -DNDEBUG -I'/usr/local/lib/R/site-
library/Rcpp/include' -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -
Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c feXTXc-Yest.cpp
- o feXTXc-Yest.o
g++ -std=gnu++14 -I"/usr/local/lib/R/include" -DNDEBUG -I'/usr/local/lib/R/site-
library/Rcpp/include' -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -
Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c feXTXc.cpp -o
feXTXc.o
g++ -std=gnu++14 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o feXTXc.so
RcppExports.o cholInvDiag.o choleskyDecomp.o choleskyXInverse.o feXTXc-Yest.o feXTXc.o
- L/usr/local/lib/R/lib -lR
installing to /hpc/group/rescomp/tjb48/rlib/00LOCK-feXTXc/00new/feXTXc/libs
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (feXTXc)
Note how source files are compiled using local compilers, linkers, and OS libraries. This is a key mechanism for ensuring portability of R packages across platforms and versions.
4. Test loading of the package (Console tab - note the specification of a directory from which to load. It is, of course, the same directory specified during installation.):
> library(feXTXc, lib.loc="/hpc/group/your_group/your_netID/rlib")
5. Delete the downloaded package file, since it is no longer needed (Terminal tab):
$ rm feXTXc_1.0.tar.gz
Notes:
- An alternative to downloading the package source file is to specify its URL directly in the install.packages() command, as in:
> install.packages("https://raw.githubusercontent.com/tbalmat/
StatisticsAndComputation/master/FixedEffectsRegression/
RPackage/feXTXc_1.0.tar.gz",
repos=NULL, lib="/hpc/group/your_group/your_netID/rlib")
- When installing from a local source file, instead of from a repository install.packages("pkg", lib="x", repos=NULL) does not install dependent packages. They must be installed independently. When loading a package, all dependent packages must appear in .libPaths() directories, even when the lib.loc parameter is specified.
Case four : Installing a package for Slurm batch processing
Within a Slurm batch, R scripts are generally executed with the Rscript command. Prior to executing Rscript, a version of R must be loaded. At present, two versions are available on the DCC: 4.0.3 and 4.1.1. Our example will use version 4.1.1. Any package referenced in a library() call within an R script must first be installed. We will use an interactive R session to install a package, making it available for loading in R scripts. Because package versions intended for R 4.0.3 and 4.1.1 are potentially different from each other and potentially different than those required for OnDemand sessions (currently R 4.1.0), we will install and load packages into a directory dedicated to R version 4.1.1. As an example, the following sequence installs the SparseM package. $ indicates the Linux prompt, > indicates the R prompt.
First, create a directory to contain version 4.1.1 packages:
$ mkdir /hpc/group/your_group/your_netID
$ mkdir /hpc/group/your_group/your_netID/rlib
$ mkdir /hpc/group/your_group/your_netID/rlib/r4.1.
Launch an interactive R session:
$ module load R/4.1.1-rhel
$ R
From within the R session, specify a CRAN mirror and install SparseM, targeting the new package directory for storage:
> options(repos="https://archive.linux.duke.edu/cran/")
> install.packages("SparseM",
lib="/hpc/group/your_group/your_netID/rlib/r4.1.1")
Result:
trying URL 'https://archive.linux.duke.edu/cran/src/contrib/SparseM_1.81.tar.gz'
Content type 'application/octet-stream' length 735100 bytes (717 KB)
==================================================
downloaded 717 KB
* installing *source* package ‘SparseM’ ...
** package ‘SparseM’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
gfortran -fno-optimize-sibling-calls -fpic -g -O2 -c bckslv.f -o bckslv.o
gfortran -fno-optimize-sibling-calls -fpic -g -O2 -c chol.f -o chol.o
gfortran -fno-optimize-sibling-calls -fpic -g -O2 -c chol2csr.f -o chol2csr.o
gfortran -fno-optimize-sibling-calls -fpic -g -O2 -c cholesky.f -o cholesky.o
gfortran -fno-optimize-sibling-calls -fpic -g -O2 -c csr.f -o csr.o
gfortran -fno-optimize-sibling-calls -fpic -g -O2 -c extract.f -o extract.o
gcc -I"/opt/apps/rhel8/R-4.1.1/lib64/R/include" -DNDEBUG -I/opt/apps/rhel8/bzip2-
1.0.8/include -I/opt/apps/rhel8/xz-5.2.5/include -I/opt/apps/rhel8/libpng-1.6.37/include
- I/opt/apps/rhel8/libjpeg-turbo/include -I/opt/apps/rhel8/pcre2-10.37/include -fpic -
I/opt/apps/rhel8/bzip2-1.0.8/include -I/opt/apps/rhel8/xz- 5 .2.5/include -
I/opt/apps/rhel8/libpng-1.6.37/include -I/opt/apps/rhel8/libjpeg-turbo/include -
I/opt/apps/rhel8/pcre2-10.37/include -c init.c -o init.o
gfortran -fno-optimize-sibling-calls -fpic -g -O2 -c sparskit.f -o sparskit.o
gfortran -fno-optimize-sibling-calls -fpic -g -O2 -c subscr.f -o subscr.o
gcc -shared -L/opt/apps/rhel8/bzip2-1.0.8/lib -L/opt/apps/rhel8/xz-5.2.5/lib -
L/opt/apps/rhel8/libpng-1.6.37/lib -L/opt/apps/rhel8/libjpeg-turbo/lib -
L/opt/apps/rhel8/pcre2-10.37/lib -o SparseM.so bckslv.o chol.o chol2csr.o cholesky.o
csr.o extract.o init.o sparskit.o subscr.o -lgfortran -lm -lquadmath
installing to /hpc/group/rescomp/tjb48/rlib/r4.1.1/00LOCK-SparseM/00new/SparseM/libs
** R
** data
** demo
** inst
** byte-compile and prepare package for lazy loading
Creating a generic function for ‘diag’ from package ‘base’ in package ‘SparseM’
Creating a generic function for ‘diag<-’ from package ‘base’ in package ‘SparseM’
Creating a generic function for ‘norm’ from package ‘base’ in package ‘SparseM’
Creating a new generic function for ‘backsolve’ in package ‘SparseM’
Creating a generic function for ‘forwardsolve’ from package ‘base’ in package ‘SparseM’
Creating a generic function for ‘model.response’ from package ‘stats’ in package
‘SparseM’
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (SparseM)
The downloaded source packages are in
‘/tmp/RtmpYO94G7/downloaded_packages’
Verify that the package can be loaded:
> library(SparseM,
lib.loc="/hpc/group/your_group/your_netID/rlib/r4.1.1")
The SparseM package can now be loaded, using library()
as above, from within an R script passed to
the Rscript command of a Slurm batch job.
To install a package with dependencies, install from a local file, or install past versions of a package, use the methods presented in relative sections of this document, entering R commands into an interactive R session as in the example above, instead of an RStudio session.
Case five : Install a past version of a package
In general, installing a package from CRAN or a mirror repository installs the latest version. To install a
past version of a package, you can either download the respective package source file or specify the archived package source URL in the
install.packages()
command, as in:
> install.packages("https://cran.r-project.org/src/contrib/Archive/
SparseM/SparseM_1.02.tar.gz",
repos=NULL, lib="/hpc/group/your_group/your_netID/rlib")