Advanced Configuration

While the conda-based installation is usually sufficient for workstation installations to get started with pyiron, it can be extended to support your own executables, include your own parameter files, support commercial codes like VASP or updating the database performance by switching from SQLite to PostgreSQL.

Custom Executables and Parameter Files

pyiron can either be configured using a configuration file named ~/.pyiron located in the user’s home directory or by specifying environment variables. The options are similar either way, so we start with the configuration file. The default configuration file pyiron assumes if it does not find a configuration file is:

[DEFAULT]
PROJECT_CHECK_ENABLED = False
FILE = ~/pyiron.db
RESOURCE_PATHS = ${CONDA_PREFIX}/share/pyiron

The first line [DEFAULT] defines the current configuration to overwrite the default configuration. The second line PROJECT_CHECK_ENABLED disables the project check which enables pyiron to write to the whole file system. The third lines defines the object index to be stored in an SQLite database file FILE which is located in the home directory ~/pyiron.db. It is important to copy the database in case you change the configuration otherwise existing calculation are lost. Finally the RESOURCE_PATHS provides the path to the parameter files. Inside pyiron you can check the current configuration using:

from pyiron_base import Settings
s = Settings()
s._configuration

Below, the individual options are explained one by one:

  • the [DEFAULT] option defines the current ~/.pyiron configuration to overwrite the default configuration.

  • the RESOURCE_PATHS option defines the resource path is a list of ; separated paths where pyiron checks for resource files. A template of such a resource directory is available on github and it can be downloaded as an archive from the release page. We recommend to create a folder ~/pyiron/resources and store the parameter files and links to the executables there. The links are basically shell scripts which can be modified to load modules. By default the conda path is added, therefore there is no need to add it manually.

  • the PROJECT_PATHS option is similar to the resource path but for storing simulation protocols rather than parameter files. When the PROJECT_CHECK_ENABLED option is set to true then the read and write access within pyiron is limited to the directories defined in the PROJECT_PATHS. Again multiple directories can be separated by ;. An alternative but outdated name for this option is TOP_LEVEL_DIRS.

Besides the general variables in the ~/.pyiron configuration, the other settings are used to define the database connection. More detailed examples about the configuration can be found below; for now we continue with the configuration of the database. pyiron can use a database to build an index of the HDF5 files on the file system which accelerates job analysis. By default pyiron uses an SQLite database for this index, but the database can also be disabled or a PostgreSQL database can be used to improve performance.

  • By default the database is defined by the FILE option which is equal to the DATABASE_FILE option and gives the path to the SQLite database file. As the SQLite database is a file-based database, it struggles with parallel access on a shared file system (common for HPC clusters).

  • To address this limitation it is possible to disable the database on HPC clusters using the DISABLE_DATABASE option by setting it to true. This is commonly used when the calculations are only executed on the remote cluster but the analysis is done on a local workstation or a group server which supports an SQL-based database.

  • The other database options, namely TYPE, HOST, NAME, USER, PASSWD and JOB_TABLE define the connection details to connect to a PostgreSQL database. Inside pyiron sqlalchemy is used to support different SQL-based databases, therefore it is also possible to provide the sqlalchemy connection string directly as CONNECTION.

  • Finally some pyiron installations use a group management component which is currently in development. They might have additional options in their ~/.pyiron configuration to enable sharing calculations between different users. These options are VIEWERUSER, VIEWERPASSWD and VIEWER_TABLE. As this is a development feature it is not yet fully documented. Basically those are the access details for the global database viewer, which can read the database entries of all users. With this configuration it is possible to load jobs of other users.

In analogy to the ~/.pyiron configuration file pyiron also supports using environment variables to configure the pyiron installation. The available environment variables are:

  • the PYIRONCONFIG environment variable defines the location of the .pyiron configuration file.

  • the PYIRONRESOURCEPATHS environment variable defines the RESOURCE_PATHS option.

  • the PYIRONPROJECTPATHS environment variable defines the PROJECT_PATHS option.

  • the PYIRONPROJECTCHECKENABLED environment variable defines the PROJECT_CHECK_ENABLED option.

  • the PYIRONDISABLE environment variable defines the DISABLE_DATABASE option.

  • the PYIRONSQLTYPE, PYIRONSQLFILE, PYIRONSQHOST, PYIRONSQLDATABASE, PYIRONUSER and PYIRONSQLUSERKEY environment varaibles define the SQL database connection and can also be summarized in the PYIRONSQLCONNECTIONSTRING environment variable.

  • the PYIRONSQLVIEWTABLENAME, PYIRONSQLVIEWUSER and PYIRONSQLVIEWUSERKEY environment variables define the SQL viewer connection and can also be summarized in the PYIRONSQLVIEWCONNECTIONSTRING environment variable.

To further explain the usage of the different parameters, we discuss common use cases in the following:

Use your own Executable for LAMMPS/ S/PHI/nX or GPAW

To add your own executables or parameter files it is necessary to initialise a user-defined configuration ~/.pyiron. You can start with a basic configuration like:

[DEFAULT]
FILE = ~/pyiron.db
PROJECT_PATHS = ~/pyiron/projects
RESOURCE_PATHS = ~/pyiron/resources

In this case pyiron can only execute calculations in the ~/pyiron/projects directory. pyiron can’t delete files outside this directory. Next to the projects directory ~/pyiron/projects we create a resource directory ~/pyiron/resources to store links to the executables and the corresponding parameter files. Both directories have to be created by the user and in case no FILE option is defined pyiron by default creates an SQLite database in the resource directory. Example resource directories are available on Github . Here we just discuss the LAMMPS resource directory as one example.

resources/
  lammps/
    bin/
      run_lammps_2020.03.03.sh
      run_lammps_2020.03.03_mpi.sh
    potentials/
      potentials_lammps.csv

The resource directory contains two sub folders bin which includes links to the executables and potentials which includes links to the interatomic potentials. The links to the executables are shell script which follow the naming convention run_<code name>_<version>(_<tag>).sh the mpi tag is used to indicate the MPI-enabled executables. If we take a look at the run_lammps_2020.03.03_mpi.sh shell script, it contains the following lines:

#!/bin/bash
mpiexec -n $1 --oversubscribe lmp_mpi -in control.inp;

If you are running on a cluster with a module system like this one and may be a good idea configure a clean environment that your job can run, e.g.

#!/bin/bash
module purge
module load lammps/29Oct20
mpiexec -n $1 --oversubscribe lmp_mpi -in control.inp;

Scripts with the mpi tag are called with two parameters the first being the number of cores the second the number of threads, while regular shell scripts do not get any input parameters. By using shell scripts it is easy to link existing executables which might require loading specific modules or setting environment variables. In the same way the parameter files for pyiron are stored in the csv format which makes them human editable. For shared installations we recommend storing the pyiron resources in a shared directory.

Configure VASP

The Vienna Ab initio Simulation Package is a popular commercial DFT code which is commonly used for large DFT calculations or high-throughput studies. pyiron implements a VASP wrapper but does not provide a VASP license. Therefore users have to compile their own VASP executable and provide their own VASP pseudopotentials (included with the VASP license). An example configuration for VASP in pyiron is available on Github:

resources/
  vasp/
    bin/
      run_vasp_5.4.4_default.sh
      run_vasp_5.4.4_default_mpi.sh
    potentials/
      potpaw/
      potpaw_PBE/
      potentials_vasp.csv
      potentials_vasp_lda_default.csv
      potentials_vasp_pbe_default.csv

Similar to the LAMMPS resource directory discussed above the VASP resource directory also contains a bin diirectory and a potentials directory. By adding the default tag we can set the default executable, in particular when compiling multiple variants of the same VASP version. Finally the directories potpaw and potpaw_PBE contain the VASP pseudopotentials, which are included with the VASP license and have to be added by the user.

PostgreSQL Database

To accelerate the pyiron installation it is recommended to use a PostgreSQL database rather than the default SQLite database. To configure the database server, the following options can be added to the ~/.pyiron:

  • TYPE the typ of the database, while sqlalchemy supports a wide range of differnet databases PostgreSQL is recommended and can be selected by setting the type to Postgres.

  • HOST the database host where the database is running.

  • NAME the name of the database.

  • USER the database user, in contrast to many other software packages pyiron requires one database user per system user who is using pyiron. The database is only used to store an index of the calculations executed with pyiron, therefore knowledge gained from accessing the database is limited unless the user has also access to the file system.

  • PASSWD the database user password. While it is a bad practice to store the database password in the configuration file, the database only contains the the job index. Still it is important that the user creates an pyiron specific password and should never store their system user password in the .pyiron configuration file.

  • JOB_TABLE the name of the database table. pyiron is commonly using one table per user.

A typical .pyiron configuration with a PostgreSQL database might look like this:

[DEFAULT]
TYPE = Postgres
HOST = database.hpc-cluster.university.edu
NAME = pyiron
USER = janj
PASSWD = **********
JOB_TABLE = jobs_janj
PROJECT_PATHS = ~/pyiron/projects
RESOURCE_PATHS = ~/pyiron/resources

Be careful when updating the database configuration as pyiron does not transfer the content of the database automatically.