Installation
Quickstart installation (<15 mins)
Cluster/HPC install
First, fetch mambaforge (mamba is essentially faster conda) install script from the web:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
Note
If you already have a conda environment: conda install mamba
also works!
Execute the script to install mambaforge:
bash Mambaforge-Linux-x86_64.sh
Note
There are systems with a very tight filequota/memory quota on the home directory. In that case, you may need to install on a different directory. Usually there is a /software or /group directory that users can have permanent storage for software on.
You can adjust where mamba is installed by changing the directory when it asks you where it should be installed.
In this example, we install mamba in a folder named /software/abc123/
.
Refresh/restart your shell:
source .bashrc
Now you have the option of installing pyiron in an environment with:
mamba create -n YOURENVNAME
Change YOURENVNAME
to your liking.
Then activate your environment with:
mamba activate YOURENVNAME
Call this to install pyiron:
mamba install -c conda-forge pyiron pyiron_contrib
This can take some time, so just hang tight.
7. Now, we create a pyiron_resources
folder. This can be placed anywhere, but here we place it in our home folder (e.g. /home/abc123
).
You can figure out the absolute path of your home directory is by calling echo $HOME
:
mkdir /home/abc123/pyiron_resources
Now, create our pyiron configuration file,
.pyiron
in the home folder. Paste the following lines into the file:
[DEFAULT]
RESOURCE_PATHS = /home/abc123/pyiron_resources, /software/abc123/mambaforge/envs/pyiron/share/pyiron
PROJECT_CHECK_ENABLED = False
#DISABLE_DATABASE = True
FILE = ~/pyiron.db
Note the RESOURCE_PATHS`
contain two entries:
/home/abc123/pyiron_resources
/software/abc123/mambaforge/envs/pyiron/share/pyiron
RESOURCE_PATHS
tells pyiron where we are storing our executables, job scripts and queue configuration settings.
The first is the directory we just made. The second is where pyiron’s environment is located on the filesystem. You can find where it is using which python
with the environment activated, which yields something like:
/software/abc123/mambaforge/bin/python
And you can replace the bin/…
bit onwards with envs/YOURENVNAME/share/pyiron
Now enter the
pyiron_resources
folder and make thequeues
folder:
cd /home/abc123/pyiron_resources
mkdir queues
Configure the queue on your supercomputer (SLURM setup, for others ). Edit/create a queue.yaml
file in the queues
folder, with contents of:
queue_type: SLURM
queue_primary: work
queues:
work: {cores_max: 128, cores_min: 1, run_time_max: 1440, script: work.sh}
express: {cores_max: 128, cores_min: 1, run_time_max: 1440, script: express.sh}
Change cores_max/cores_max/run_time_max
into something fitting your HPC queue.
In the above example, the jobs submitted using pyiron are limited to somewhere between 1-128 cores, and a run time of 1440 minutes (1 day).
You can usually find this information about how many resources are allowed usually on the information pages of your cluster. It usually looks something like this.
The queue_primary string (“work” in the above script) is the name of the queue. Replace all instances of work, if you would like to use something else as the queue_name.
To add more queues, simply add more entries like the express
entry and configure the queueing script template express.sh
accordingly.
Create the
work.sh
file in the samequeues
directory, modifyYOURACCOUNT
,YOURQUEUENAME
andYOURENVNAME
accordingly:
#!/bin/bash
#SBATCH --output=time.out
#SBATCH --job-name={{job_name}}
#SBATCH --chdir={{working_directory}}
#SBATCH --get-user-env=L
#SBATCH --account=YOURACCOUNT
#SBATCH --partition=YOURQUEUENAME
#SBATCH --exclusive
{%- if run_time_max %}
#SBATCH --time={{ [1, run_time_max]|max }}
{%- endif %}
{%- if memory_max %}
#SBATCH --mem={{memory_max}}G
{%- endif %}
#SBATCH --cpus-per-task={{cores}}
source /software/abc123/mambaforge/bin/activate YOURENVNAME
{{command}}
In general, for the most pain-free experience, just replace the {{…}} fields that are present in the above template with your existing working scripts.
i.e. Replace where you put the number of cores with {{cores}}`
and {{memory_max}}
, and so on, in your already working jobscripts to generate this template for pyiron.
Notice that the environment is activated in this example script using the source …/activate
line. Make sure you do this or the queueing system can’t see the environment in which you installed pyiron.
Congrats! We’re almost there.
Now to verify the installation is working; we will conduct a test LAMMPS calculation.
Install the conda-packaged version of LAMMPS:
mamba install -c conda-forge lammps
Create a python script
test.py
containing the following (anywhere, preferably wherever you usually do calculations, e.g./scratch
). Change the username in theos.system("squeue -u abc123")
to your user.
from pyiron_atomistics import Project
import os
pr = Project("test_lammps")
basis = pr.create.structure.bulk('Al', cubic=True)
supercell_3x3x3 = basis.repeat([3, 3, 3])
job = pr.create_job(job_type=pr.job_type.Lammps, job_name='Al_T800K')
job.structure = supercell_3x3x3
job.calc_md(temperature=800, pressure=0, n_ionic_steps=10000)
pot = job.list_potentials()[0]
print ('Selected potential: ', pot)
job.potential = pot
job.run(delete_existing_job=True)
print(job['output/generic/energy_tot'])
print("If a list of numbers is printed above, running calculations on the head node works!")
# Test the queue submission
job_new = job.copy_to(new_job_name="test2")
job_new.run(run_mode="queue", delete_existing_job=True)
os.system("squeue -u abc123") # change abc123 to your username
print("If a queue table is printed out above, with the correct amount of resources, queue submission works!")
Call the script with
python test.py
If the script runs and the appropriate messages print out, you’re finished! Congratulations! You’re finished with the pyiron install.
If you’re experiencing problems, please click here for frequently encountered issues (coming soon) installation_errors
For more complex tasks, such as configuring VASP or utilising on-cluster module based executables please see below.
Install pyiron so you can submit to remote HPCs from a local machine
Local machine can be a laptop/workstation, and you can ssh into the remote machine (HPC/cluster).
If you have already installed pyiron on your cluster, and it works, we can proceed.
If not, click here and finish that first.
To install pyiron on your local machine, first install :code-block:`mamba` via:
cd /root
wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
Execute the script to install mambaforge:
bash Mambaforge-Linux-x86_64.sh
Now install pyiron via:
bash Mambaforge-Linux-x86_64.sh
Now you have the option of installing pyiron in an environment with:
mamba create -n YOURENVNAME
Change YOURENVNAME
to your liking.
Then activate your environment with:
mamba activate YOURENVNAME
Call this to install pyiron:
mamba install -c conda-forge pyiron
Now, we create a
pyiron_resources
folder. This can be placed anywhere, but here we place it in our/root
folder.
mkdir /home/pyiron_resources
Create the pyiron configuration file,
.pyiron
in the home folder. Paste the following lines into the file:
[DEFAULT]
FILE = ~/pyiron.db
RESOURCE_PATHS = /home/pyiron_resources
Now enter the
pyiron_resources
folder and make thequeues
folder:
cd /home/pyiron_resources
mkdir queues
Copy the contents of the queues folder from your remote cluster into the folder.
So now, there should be a queue.yaml
file and a work.sh
file in there.
Now we configure a
ssh_key
for the connection between your cluster/HPC and your local machine.
Call ssh-keygen
:
root@HanLaptop:~# ssh-keygen
Generating public/private rsa key pair.
When it prompts you with Enter file in which to save the key (/root/.ssh/id_rsa):
, input:
/root/.ssh/id_rsa_YOURHPC
Rename the id_rsa_YOURHPC
accordingly.
When it prompts you for the passphrases, just press Enter
twice - we don’t need a passphrase:
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
And now, the final output in your local terminal looks something like:
root@HanLaptop:~# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): /root/.ssh/id_rsa_YOURHPC
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa_YOURHPC
Your public key has been saved in /root/.ssh/id_rsa_YOURHPC.pub
The key fingerprint is:
SHA256:AVNJ4qG55/fevDfgUb3OUWDePelBBiSJBtCEiicSCjI root@laptop
The key's randomart image is:
+---[RSA 3072]----+
| .X=+...oo. |
|E = *.o .. oo |
|+o. + . o oo+o|
|oo o . . o+=|
|. o . . S .. =|
| o o * |
| . . . oo .|
| . . o. oo |
| .o +o . |
+----[SHA256]-----+
Now, copy the contents of
id_rsa_YOURHPC.pub
over to the remote cluster into the$HOME/.ssh/authorized_keys
.
If the file is not empty, make sure that there is an empty line in between entries.
Check that the key works by checking that we can ssh
into the remote cluster on your local terminal without a password:
ssh abc123@gadi.nci.org.au
If it works, it means that the ssh key works, and we can proceed.
Edit the
queue.yaml
file:
queue_type: REMOTE
queue_primary: work
ssh_host: gadi.nci.org.au
ssh_username: abc123
known_hosts: /root/.ssh/known_hosts
ssh_key: /root/.ssh/id_rsa_YOURHPC
ssh_remote_config_dir: /home/abc123/pyiron_resources/queues/
ssh_remote_path: /scratch/a01/abc123/pyiron/
ssh_local_path: /root/pyiron_remote_data/
ssh_continous_connection: True
queues:
work: {cores_max: 128, cores_min: 1, run_time_max: 1440, script: work.sh}
express: {cores_max: 128, cores_min: 1, run_time_max: 1440, script: express.sh}
Replace the following fields accordingly:
queue_primary
: The primary queue that you use. Must be present at the bottom queues
field.
ssh_host
: The host address of your remote cluster.
E.g. If you sign in usually with ssh abc123@gadi.nci.org.au
, it is gadi.nci.org.au
.
ssh_username
: The username that you usually sign in with.
E.g. If you sign in usually with ssh abc123@gadi.nci.org.au
, it is abc123
.
known_hosts
: The directory where you store your known_hosts
locally. If you don’t know what this is, you most likely don’t need to change this field.
ssh_key
: The ssh_key
that you generated in the previous step.
ssh_remote_config_dir
: Path to where you have your queues configured on the remote cluster.
ssh_remote_path
: Path to where you want to run the calculations on the remote cluster.
ssh_local_path
: Local path to place the calculations you’ve fetched the results from the cluster on your local machine.
ssh_continous_connection
: Whether or not to use a single SSH connection or multiple ones (use this if your connection is unreliable).
The entries underneath queues
should read the same as what you have in the queue.yaml file
in the remote cluster as you have previously configured:
Now, at this point, the submission should work. Let’s test a submission of a small job. On the local machine create a python script:
Warning
WARNING: pyiron
must be present in the environment that is present after you initialise a shell in the remote machine! If it is not, pyiron will fail to initialise the calculation!
To make pyiron the default environment after you initialise the shell, add the following line to your .bashrc
:
source /software/abc123/mambaforge/bin/activate pyiron
Adjust the above path to the appropriate path such that it can activate a python environment containing pyiron
.
from pyiron_atomistics import Project
import os
pr = Project("test_lammps")
job = pr.create_job(job_type=pr.job_type.Lammps, job_name='Al_T800K_remote')
basis = pr.create.structure.bulk('Al', cubic=True)
supercell_3x3x3 = basis.repeat([3, 3, 3])
job.structure = supercell_3x3x3
pot = job.list_potentials()[0]
print ('Selected potential: ', pot)
job.potential = pot
job.calc_md(temperature=800, pressure=0, n_ionic_steps=10000)
job.server.queue = "work"
job.server.cores = 2
job.server.memory_limit = 2
job.run(run_mode="queue", delete_existing_job=True)
Once the job is done on the queue, we can fetch the job back using:
pr = Project("test_lammps")
job_name = "Al_T800K_remote"
pr.wait_for_job(pr.load(job_specifier=job_name))
And then verify that the fetched job has results associated with it:
job = pr.load(job_name)
print(job["output/generic/energy_tot"])
If some list of numbers prints out in the output, then the calculation was successful!
For more complex setups - such as those involving multiple remote clusters and one host machine, please see below.
Detailed instructions
The recommended way to install pyiron is via the conda package manager in a Linux environment. So if you are using Windows we recommend installing the Windows subsystem for Linux before you install pyiron and if you are on macOS X we recommend using a virtual machine/ virtual box. Native installations on both Windows and macOS X are possible but are restricted to molecular dynamics calculations with interatomic potentials and do not support density functional theory(DFT) codes. We collaborate with the open-source community at conda-forge to not only provide the pyiron package via their community channel, but also executables for compatible simulation codes like GPAW, LAMMPS and S/PHI/nX and their parameter files like pseudopotentials and interatomic potentials. To get started you can install pyiron using:
conda install -c conda-forge pyiron
Optional Dependencies
All the optional dependencies can also be installed via conda directly to simplify the setup of your simulation environment.
NGLview (Atomistic Structure Visualisation)
In pyiron we use the NGLview package to visualise atomistic structures directly in the jupyter notebook. To enable this feature, install NGLview:
conda install -c conda-forge nglview
In case you prefer jupyter lab over jupyter notebooks, you can also install NGLview for jupyter lab. This requires a few additional dependencies:
conda install -c conda-forge nodejs nglview
jupyter labextension install @jupyter-widgets/jupyterlab-manager --no-build
jupyter labextension install nglview-js-widgets
In addition to NGLview the first line also installs nodejs which is required to install your own jupyterlab plugins and rebuild jupyter lab. The following two lines install the jupyterlab extensions. Starting with the jupyterlab manager and followed by the NGLview javascript widget. During the installation of NGLview it is important to confirm that the NGLview version installed via conda is the same as the version of the NGLview javascript widget:
conda list nglview
jupyter labextension list
Supported simulation packages (quantum engines)
The following packages are supported to work out-of-the-box with pyiron, but must be installed independently either using conda or manual compilation. Manually compiled executables can be as much as 2-3x faster than conda-installed executables, and are therefore strongly recommended for high performance computing (HPC) usage. We discuss how to link any “homemade” executables to your pyiron installation in the advanced section.
LAMMPS (Molecular Dynamics with Interatomic Potentials)
LAMMPS stands for Large-scale Atomic/Molecular Massively Parallel Simulator and it is one of the most popular open-source molecular dynamics simulation codes for simulating solid-state materials (metals, semiconductors). As part of the pyiron project we maintain the conda package for LAMMPS to simplifiy its installation.
# serial + parallel, for linux and mac systems
conda install -c conda-forge lammps
# only serial (no python bindings), for native windows
conda install -c conda-forge -c pyiron lammps
On the conda-forge channel we provide LAMMPS executables for both serial and parallel (MPI) execution as well as their respective python bindings. The LAMMPS version on the pyiron channel is for native windows installations only and it is limited to serial execution with no Python bindings. We therefore highly recommend using the Linux subsystem for Windows rather than the native Windows installation.
S/PHI/nX (Density Functional Theory)
The S/PHI/nX DFT code is an open-source DFT code developed in close collaboration with the pyiron developers, therefore it is the recommended DFT code to be used with pyiron. The applications of S/PHI/nX range from constrained magnetic calculations to charged defects which makes it suitable for ab initio thermodynamics and beyond. The S/PHI/nX DFT code is only officially supported for Linux, so we recommend the use of a Linux subsystem (on Windows) or a virtual machine (on mac).
conda install -c conda-forge sphinxdft
GPAW (Density Functional Theory)
pyiron also supports GPAW, an open-source realspace DFT simulation code which is popular because of its Python bindings which allow accessing parameters of the DFT code during the run time. GPAW can be installed on Linux directly via conda:
conda install -c conda-forge gpaw
Additional simulation packages
SQSgenerator
The sqsgenerator is command line tool written in Python/Cython for finding optimized SQS structures. It is available as a separate conda package, once it is installed pyiron is able to use it inside pyiron simulation protocols without any additional imports:
conda install -c conda-forge sqsgenerator
Advanced Configuration
While the conda-based installation is usually sufficient for workstation installations to get started with pyiron, it can be extended to support your own executables, include your own parameter files, support commercial codes like VASP or updating the database performance by switching from SQLite to PostgreSQL.
Custom Executables and Parameter Files
pyiron can either be configured using a configuration file named ~/.pyiron
located in the user’s home directory or by specifying environment variables. The options are similar either way, so we start with the configuration file. The default configuration file pyiron assumes if it does not find a configuration file is:
[DEFAULT]
PROJECT_CHECK_ENABLED = False
FILE = ~/pyiron.db
RESOURCE_PATHS = ${CONDA_PREFIX}/share/pyiron
The first line [DEFAULT]
defines the current configuration to overwrite the default configuration. The second line PROJECT_CHECK_ENABLED
disables the project check which enables pyiron to write to the whole file system. The third lines defines the object index to be stored in an SQLite database file FILE
which is located in the home directory ~/pyiron.db
. It is important to copy the database in case you change the configuration otherwise existing calculation are lost. Finally the RESOURCE_PATHS
provides the path to the parameter files. Inside pyiron you can check the current configuration using:
from pyiron_base import Settings
s = Settings()
s._configuration
Below, the individual options are explained one by one:
the
[DEFAULT]
option defines the current~/.pyiron
configuration to overwrite the default configuration.the
RESOURCE_PATHS
option defines the resource path is a list of;
separated paths where pyiron checks for resource files. A template of such a resource directory is available on github and it can be downloaded as an archive from the release page. We recommend to create a folder~/pyiron/resources
and store the parameter files and links to the executables there. The links are basically shell scripts which can be modified to load modules. By default the conda path is added, therefore there is no need to add it manually.the
PROJECT_PATHS
option is similar to the resource path but for storing simulation protocols rather than parameter files. When thePROJECT_CHECK_ENABLED
option is set totrue
then the read and write access within pyiron is limited to the directories defined in thePROJECT_PATHS
. Again multiple directories can be separated by;
. An alternative but outdated name for this option isTOP_LEVEL_DIRS
.
Besides the general variables in the ~/.pyiron
configuration, the other settings are used to define the database connection. More detailed examples about the configuration can be found below; for now we continue with the configuration of the database. pyiron can use a database to build an index of the HDF5 files on the file system which accelerates job analysis. By default pyiron uses an SQLite database for this index, but the database can also be disabled or a PostgreSQL database can be used to improve performance.
By default the database is defined by the
FILE
option which is equal to theDATABASE_FILE
option and gives the path to the SQLite database file. As the SQLite database is a file-based database, it struggles with parallel access on a shared file system (common for HPC clusters).To address this limitation it is possible to disable the database on HPC clusters using the
DISABLE_DATABASE
option by setting it totrue
. This is commonly used when the calculations are only executed on the remote cluster but the analysis is done on a local workstation or a group server which supports an SQL-based database.The other database options, namely
TYPE
,HOST
,NAME
,USER
,PASSWD
andJOB_TABLE
define the connection details to connect to a PostgreSQL database. Inside pyiron sqlalchemy is used to support different SQL-based databases, therefore it is also possible to provide the sqlalchemy connection string directly asCONNECTION
.Finally some pyiron installations use a group management component which is currently in development. They might have additional options in their
~/.pyiron
configuration to enable sharing calculations between different users. These options areVIEWERUSER
,VIEWERPASSWD
andVIEWER_TABLE
. As this is a development feature it is not yet fully documented. Basically those are the access details for the global database viewer, which can read the database entries of all users. With this configuration it is possible to load jobs of other users.
In analogy to the ~/.pyiron
configuration file pyiron also supports using environment variables to configure the pyiron installation. The available environment variables are:
the
PYIRONCONFIG
environment variable defines the location of the.pyiron
configuration file.the
PYIRONRESOURCEPATHS
environment variable defines theRESOURCE_PATHS
option.the
PYIRONPROJECTPATHS
environment variable defines thePROJECT_PATHS
option.the
PYIRONPROJECTCHECKENABLED
environment variable defines thePROJECT_CHECK_ENABLED
option.the
PYIRONDISABLE
environment variable defines theDISABLE_DATABASE
option.the
PYIRONSQLTYPE
,PYIRONSQLFILE
,PYIRONSQHOST
,PYIRONSQLDATABASE
,PYIRONUSER
andPYIRONSQLUSERKEY
environment varaibles define the SQL database connection and can also be summarized in thePYIRONSQLCONNECTIONSTRING
environment variable.the
PYIRONSQLVIEWTABLENAME
,PYIRONSQLVIEWUSER
andPYIRONSQLVIEWUSERKEY
environment variables define the SQL viewer connection and can also be summarized in thePYIRONSQLVIEWCONNECTIONSTRING
environment variable.
To further explain the usage of the different parameters, we discuss common use cases in the following:
Use your own Executable for LAMMPS/ S/PHI/nX or GPAW
To add your own executables or parameter files it is necessary to initialise a user-defined configuration ~/.pyiron
. You can start with a basic configuration like:
[DEFAULT]
FILE = ~/pyiron.db
PROJECT_PATHS = ~/pyiron/projects
RESOURCE_PATHS = ~/pyiron/resources
In this case pyiron can only execute calculations in the ~/pyiron/projects
directory. pyiron can’t delete files outside this directory. Next to the projects directory ~/pyiron/projects
we create a resource directory ~/pyiron/resources
to store links to the executables and the corresponding parameter files. Both directories have to be created by the user and in case no FILE
option is defined pyiron by default creates an SQLite database in the resource directory. Example resource directories are available on Github . Here we just discuss the LAMMPS resource directory as one example.
resources/
lammps/
bin/
run_lammps_2020.03.03.sh
run_lammps_2020.03.03_mpi.sh
potentials/
potentials_lammps.csv
The resource directory contains two sub folders bin
which includes links to the executables and potentials
which includes links to the interatomic potentials. The links to the executables are shell script which follow the naming convention run_<code name>_<version>(_<tag>).sh
the mpi
tag is used to indicate the MPI-enabled executables. If we take a look at the run_lammps_2020.03.03_mpi.sh
shell script, it contains the following lines:
#!/bin/bash
mpiexec -n $1 --oversubscribe lmp_mpi -in control.inp;
If you are running on a cluster with a module system like this one and may be a good idea configure a clean environment that your job can run, e.g.
#!/bin/bash
module purge
module load lammps/29Oct20
mpiexec -n $1 --oversubscribe lmp_mpi -in control.inp;
Scripts with the mpi
tag are called with two parameters the first being the number of cores the second the number of threads, while regular shell scripts do not get any input parameters. By using shell scripts it is easy to link existing executables which might require loading specific modules or setting environment variables. In the same way the parameter files for pyiron are stored in the csv format which makes them human editable. For shared installations we recommend storing the pyiron resources in a shared directory.
Configure VASP
The Vienna Ab initio Simulation Package is a popular commercial DFT code which is commonly used for large DFT calculations or high-throughput studies. pyiron implements a VASP wrapper but does not provide a VASP license. Therefore users have to compile their own VASP executable and provide their own VASP pseudopotentials (included with the VASP license). An example configuration for VASP in pyiron is available on Github:
resources/
vasp/
bin/
run_vasp_5.4.4_default.sh
run_vasp_5.4.4_default_mpi.sh
potentials/
potpaw/
potpaw_PBE/
potentials_vasp.csv
potentials_vasp_lda_default.csv
potentials_vasp_pbe_default.csv
Similar to the LAMMPS resource directory discussed above the VASP resource directory also contains a bin
diirectory and a potentials
directory. By adding the default
tag we can set the default executable, in particular when compiling multiple variants of the same VASP version. Finally the directories potpaw
and potpaw_PBE
contain the VASP pseudopotentials, which are included with the VASP license and have to be added by the user.
PostgreSQL Database
To accelerate the pyiron installation it is recommended to use a PostgreSQL database rather than the default SQLite database. To configure the database server, the following options can be added to the ~/.pyiron
:
TYPE
the typ of the database, while sqlalchemy supports a wide range of differnet databases PostgreSQL is recommended and can be selected by setting the type toPostgres
.HOST
the database host where the database is running.NAME
the name of the database.USER
the database user, in contrast to many other software packages pyiron requires one database user per system user who is using pyiron. The database is only used to store an index of the calculations executed with pyiron, therefore knowledge gained from accessing the database is limited unless the user has also access to the file system.PASSWD
the database user password. While it is a bad practice to store the database password in the configuration file, the database only contains the the job index. Still it is important that the user creates an pyiron specific password and should never store their system user password in the.pyiron
configuration file.JOB_TABLE
the name of the database table. pyiron is commonly using one table per user.
A typical .pyiron
configuration with a PostgreSQL database might look like this:
[DEFAULT]
TYPE = Postgres
HOST = database.hpc-cluster.university.edu
NAME = pyiron
USER = janj
PASSWD = **********
JOB_TABLE = jobs_janj
PROJECT_PATHS = ~/pyiron/projects
RESOURCE_PATHS = ~/pyiron/resources
Be careful when updating the database configuration as pyiron does not transfer the content of the database automatically.
Remote HPC Cluster
While the previous section discussed the installation of pyiron on a local workstation, the following section discusses how to configure a remote HPC cluster to transfer jobs to the HPC cluser for execution and back for analysis. For setting up pyiron on an HPC cluster there are basically three different configurations available:
Install pyiron on the HPC cluster, with jupyterhub running as a central service on the login node using the sudospawner to authorize users. In this configuration the user only needs a web browser and all simulation results will remain on the HPC cluster. The limitation of this approach is that both the global PostgreSQL database as well as the jupyterhub have to be running on the cluster with the PostgreSQL database being accessible from all compute nodes.
The second configuration is running pyiron on the HPC without the jupyterhub or a database, and storing the simulation results on a group server. Servers in the research group are commonly less strictly governed, so installing the jupyterhub on the group server as well as the PostgreSQL database for faster data analysis should be possible in most cases. From the user perspective the setup still only requires a web browser on the user’s end device, and leaves the task of backing up the simulation data on the group server side rather than the end-user.
Finally the third configuration is the workstation installation, with a PostgreSQL database or even just a SQLite file based database with using the HPC cluster only to execute the calculation and copying the simulation results to local workstation after every calculation.
We start by explaining the first configuration and then build on top of this setup to add the remote transfer capabilities.
HPC Cluster with PostgreSQL Database and Jupyterhub
The ~/.pyiron
is structured just like a workstation installation with a PostgreSQL database as explained above. In addition to the previous resource directories we add another subfolder in the resource directory to configure the queuing system using pysqa as queuing system adapter. pysqa is based on the idea of using shell script based templates to configure the different queues as modern queuing sytem provide a wide range of settings but most users commonly submit their jobs with very similar settings. We discuss a sample configuration for SLURM sample configurations for other queuing systems are available on Github.
resources/
queues/
queue_1.sh
queue_2.sh
queue.yaml
The queues directory contains one queue.yaml
configuration file and multiple jinja based shell script templates for submitting jobs. These templates define a commonly used set of parameters used to submit calculations, it can contain a restriction on a specific queue or partition but it does not have to. A typical queue template that might be used in queue_1.sh
and queue_2.sh
is shown below:
#!/bin/bash
#SBATCH --output=time.out
#SBATCH --job-name={{job_name}}
#SBATCH --workdir={{working_directory}}
#SBATCH --get-user-env=L
#SBATCH --partition=slurm
{%- if run_time_max %}
#SBATCH --time={{run_time_max // 60}}
{%- endif %}
{%- if memory_max %}
#SBATCH --mem={{memory_max}}
{%- endif %}
#SBATCH --cpus-per-task={{cores}}
{{command}}
Such a template contains the variables {{job_name}}
which is used to identify the job on the queuing system. Typically, pyiron job names are constructed using the prefix pi
followed by the pyiron job id. This allows pyiron to match the job on the queuing system with the job table. The second option is the {{working_directory}}
which is the directory where the job is located and the simulation code is executed. For pyiron this is typically a subdirectory of the simulation protocol to simplify identifiying broken calculation on the filesystem. The third option is the run_time
which specifies the run time in seconds, followed by the memory_max
which specifies the memory requirement of a given calculation. Both parameters are optional. Finally the cores
defines the number of CPU cores used for a calculation and the command
parameter is set by pyiron to load a pyiron object during the execution. When a pyiron job is executed on a compute node, a python process is first called to reload the pyiron object and then the pyiron object calls the shell script just like a regular job executed on the login node. By initially calling a python process, pyiron is able to track the progress of the calculation.
Besides the queue templates, the queues directory also contains the queue configuration queue.yaml
:
queue_type: SLURM
queue_primary: queue_one
queues:
queue_one: {cores_max: 40, cores_min: 1, run_time_max: 3600, script: queue_1.sh}
queue_two: {cores_max: 1200, cores_min: 40, run_time_max: 345600, script: queue_2.sh}
The queue configuration defines the limits of the individual queues which helps the user to select the appropriate queue for their simulation. The queue_type
defines the type of the queuing system, the queue_primary
defines the primary queue and finally queues
defines the available queues. Typically each queue is associated with a shell script template, like in this case queue_one
is associated with queue_1.sh
and queue_two
is associated with queue_2.sh
. Additional queue configuration templates are available on Github.
Submit to Remote HPC
Submitting calculations to a remote HPC requires some light configuration. On the HPC, disable the database in the .pyiron
with the following lines:
[DEFAULT]
DISABLE_DATABASE = True
PROJECT_PATHS = ~/pyiron/projects
RESOURCE_PATHS = ~/pyiron/resources
Then configure the remote HPC just like a regular HPC by adding the queuing system configuration as described above. It is recommended to test the submission on the remote HPC before configuring the datatransfer.
On the system that will be used to submit calculations to the remote HPC (e.g. your laptop or an in-between login machine), create the queues directory in the resource path, containing only the queue configuration:
resources/
queues/
queue.yaml
This queue configuration now includes additional options to handle the SSH connection to the remote cluster:
queue_type: REMOTE
queue_primary: queue_one
ssh_host: hpc-cluster.university.edu
ssh_username: janj
known_hosts: ~/.ssh/known_hosts
ssh_key: ~/.ssh/id_rsa
ssh_remote_config_dir: /u/share/pyiron/resources/queues/
ssh_remote_path: /u/janj/remote/
ssh_local_path: /home/jan/pyiron/projects/
ssh_continous_connection: True
queues:
queue_one: {cores_max: 40, cores_min: 1, run_time_max: 3600}
queue_two: {cores_max: 1200, cores_min: 40, run_time_max: 345600}
The ssh_host
defines the name of the login node, with ssh_username
the user on the remote machine and known_hosts
and ssh_key
the local configuration files to connect to the remote host. Currently pyiron only supports ssh key based authentification for remote calculation. By setting ssh_continous_connection
, the same connection is reused for data transfers which is commonly more efficient than creating individual connections for each command. Still, this assumes that the connection between the workstation or group server and the remote HPC cluster is stable. If this is not the case - for example, when using a mobile connection - it is recommended to disable this option. The ssh_remote_config_dir
defines the configuration of the queuing system on the remote cluster. Finally the calculations are copied from the local directory ssh_local_path
to the remote directory ssh_remote_path
. In the above example, if a calculation is submitted in the directory /home/jan/pyiron/projects/first/subproject
then the files are copied to /u/janj/remote/first/subproject
. By retaining the path when transfering the files it is easier to debug failed calculations. Finally the queues are defined locally to have quick access to the queue configurations, but it is not necessary to define the submission templates as those are available on the remote machine. In addition the other resources have to be identical on both systems. The easiest way to achieve this is to copy the resource directory once the installation is working on the remote machine.
Submit to multiple Remote HPC Clusters
Finally pyiron also supports configuring multiple HPC clusters. In this case rather than creating a queue.yaml
file in the queues resource directory we create a clusters.yaml
file with the following content:
cluster_primary: cluster_one
cluster:
cluster_one: cluster_1.yaml
cluster_two: cluster_2.yaml
The cluster_primary
defines the default cluster and the different clusters are each defined in their own cluster_*.yaml
file. Those cluster_*.yaml
have the same structure as the queue.yaml
file discussed above, but they cannot be named queue.yaml
as pyiron otherwise assumes that only one cluster is available.
Alternative Installation Options
So far we discussed the installation of pyiron on an individual workstation via conda or on a HPC cluster. In the following we focus on developer-specific setups to install pyiron directly from its source. It is recommended to start with a conda installation and then replace only the pyiron version so that conda can still automatically manage all dependencies/environment settings for you. In case this is not possible, e.g. if conda is not allowed on your HPC cluster, then pyiron can be installed directly from the source code.
Install from Source
For development, it is recommended to first create a conda environment containing all of pyiron’s dependencies. The dependencies are available in pyiron’s environment.yml file.
If conda is not available on your machine, the next best thing would be to install pyiron and its dependencies via pip.
Using pip
The default installation via pip installs the latest release version of pyiron. So in case your HPC cluster does not support installing pyiron via conda you can install this release version via pip and then continue with the setup of your remote HPC cluster as described above.
pip install pyiron
For those who want to test the nightly releases of pyiron which include the latest status of the master branch you can install those via pip as well:
pip install --pre pyiron
Using git
To get the latest pyiron version and access changes on development branches pyiron can also be installed via git. For example you can download the pyiron sourcecode to ~/pyiron/software
using:
git clone https://github.com/pyiron/pyiron.git ~/pyiron/software
Based on the previous workstation setup your ~/pyiron
directory should contain the following folders:
pyiron/
projects/
resources/
software/
To include this version in your PYTHONPATH
add the following line to your ~/.profile
or ~/.bashrc
configuration:
export PYTHONPATH=${HOME}/pyiron/software/:${PYTHONPATH}
When you import pyiron in any python shell or jupyter notebook it should load the version from ~/pyrion/software
. Finally you can switch to other branches using git:
git checkout -b main
In this case we switch to the master branch.
Download pyiron Parameter Files
For source code based installations it is also possible to download the pyiron resources directly from within pyiron. Simply open a python shell and import pyiron:
> import pyiron
> pyiron.install()
>>> It appears that pyiron is not yet configured, do you want to create a default start configuration (recommended: yes). [yes/no]:
> yes
> exit()
This command does the following steps in the background:
Create a
~/.pyiron
config file – with the default settings (for simple installations)Create a
~/pyiron/projects
directory – pyiron can only execute calculations within this project directory to prevent any interference with other tools or simulation management solutions.Create a
~/pyiron/resources
directory – this directory includes the link to the executables and potentials, sorted by code.
Demonstration and Training Environments
For workshops, tutorials, and lectures it is sometimes necessary to setup multiple computers with very similar configurations, and - depending on the conference location - internet access might be limited. For these cases pyiron provides setup instructions for demonstration and training environments.
Cloud Solutions
You can test pyiron on Mybinder.org (beta), without the need for a local installation. It is a flexible way to get a first impression of pyiron but it does not provide any permanent storage by default. Loading the pyiron environment on mybinder can take 5 to 15 minutes in case a new docker container needs to be built. Mybinder is a free service, so sessions on its servers are limited in duration and memory limits, and their stability is not guaranteed. We recommend having a backup plan when using mybinder for presentations/interactive tutorials, since the mybinder instance might be shutdown if it is idle for too long.
Docker Container
For demonstration purposes we provide Docker containers on Dockerhub these can be downloaded and executed locally once docker is installed. Again, these container images do not provide any permanent storage, so all information is lost once the docker container is shut down. To download the docker container use:
docker pull pyiron/pyiron:latest
After downloading the docker container you can use it either with jupyter notebook:
docker run -i -t -p 8888:8888 pyiron/pyiron /bin/bash -c "source /opt/conda/bin/activate; jupyter notebook --notebook-dir=/home/pyiron/ --ip='*' --port=8888"
or with jupyter lab:
docker run -i -t -p 8888:8888 pyiron/pyiron /bin/bash -c "source /opt/conda/bin/activate; jupyter lab --notebook-dir=/home/pyiron/ --ip='*' --port=8888"
After the run command the following line is displayed. Copy/paste this URL into your browser when you connect for the first time, to login with a token:
http://localhost:8888/?token=<your_token>
Open the link with your personal jupyter token <your_token>
in the browser of your choice. Just like the Binder image, the Docker image comes with several pyiron examples preinstalled.
Install Utility
To setup a local lab with pyiron when the internet connection is limited, we provide a classical installer for Windows, macOS X and Linux which is based on the conda constructor. If you do not have anaconda installed you can download this installer and get started with just a single download.
Getting Started
Finally once you have installed pyiron you can quickly test your installation with the following minimalistic example. Many more examples are available in the Github repository.
First Calculation
After the successful configuration you can start your first pyiron calculation. Navigate to the the projects directory and start a jupyter notebook or jupyter lab session correspondingly:
cd ~/pyiron/projects
jupyter notebook
or
cd ~/pyiron/projects
jupyter lab
Open a new jupyter notebook and inside the notebook you can now validate your pyiron calculation by creating a test project, setting up an initial structure of bcc Fe, and visualising it using NGLview.
from pyiron import Project
pr = Project('test')
basis = pr.create_structure('Fe', 'bcc', 2.78)
basis.plot3d()
Finally a first lammps calculation can be executed by:
ham = pr.create_job(pr.job_type.Lammps, 'lammpstestjob')
ham.structure = basis
ham.potential = ham.list_potentials()[0]
ham.run()
Next Steps
To get a better overview of all the available functionality inside pyiron we recommend the examples provided in the examples section - Tutorials.