Generating an Input File

The first step for any kinetic model is to properly generate an input file. Input files are processed by a “parser” class of CatMAP. The job of the parser is to convert some standard input into a Python data structure compatible with the kinetic model. The default parser is the TableParser which has been designed to accept inputs in a tabular format. This tutorial is designed to explain how users can create their own input files for any model which uses the TableParser (currently the only “parser” class implemented). If users are interested in creating inputs in some other format then it is also possible to design custom “parser” classes, but this is advanced and will not be discussed further here.

The tutorial is broken into two parts: an overview of input file structure and an example of how to create an input file for methane synthesis on \({\rm{Rh}}(111)\) from DFT.

Input File Overview

File Structure

The TableParser accepts inputs in a tab-separated text file. An example of the header and first few lines are provided below:

surface_name    site_name   species_name  formation_energy    bulk_structure  frequencies reference
None            gas         CH4             0                   None            []          By Definition
None            gas         H2O             0                   None            []          By Definition
None            gas         H2              0                   None            [4401]      By Definition
None            gas         CO              2.74                None            [2170]      Energy Environ. Sci., 3, 1311-1315 (2010)
Pt              211         CO              1.113               fcc             []          J. Phys. Chem. C, 113 (24), 10548-10553 (2009)

(Note that spaces instead of tab characters have been used here for readability. A functional input file should have fields separated by 1 tab character).

The first line provides the titles for all columns. Note that column order does not matter but header names do matter. In other words, you could put the “site_name” column before the “surface_name” column, but you could not call the “site_name” column “site_labels”. The following column titles are required for a functional input file:

  • surface_name
  • site_name
  • species_name
  • formation_energy
  • frequencies
  • reference

Any number of other input columns may be added but they will be ignored by default (like the “bulk_structure” column in this example). Details on how to parse in additional data will be covered in a future tutorial.

The text input files can be generated by using LibreOffice, Excel, or other spreadsheet programs by creating a spreadsheet with the appropriate columns and exporting as a tab separated value. Alternatively they can fairly easily be generated within Python or other programming languages.

Field Values

A brief explanation of the inputs for each field is provided below. Some references are made to the ReactionModel attributes which are used to decide which fields are parsed in. If you have not yet read the other documentation some of this might not make sense at first, but after working through other examples it should be more clear.

surface_name

This is the name of the surface to which a species is adsorbed. The names are arbitrary, but only names which appear in the “surface_names” attribute of the ReactionModel will be parsed in. The only special value is “None” which will be parsed in regardless of whether or not it appears in the “surface_names” attribute. This is typically used for gas-phase species.

site_name

These names are used to distinguish different site types. They can correspond to a facet (“111” or “211”) or be more specific (“hcp”, “fcc” or “top”). Gas-phase species should have the site defined as “gas”. Site names which appear in the { {species_definitions[site][‘site_names’]}} list will be parsed in, where “site” is the designation of any site in the model.

species_name

The names of the adsorbates, transition-states, and gasses are defined here. In principle an arbitrary string can be used (e.g. “methoxy”) but it is often practical to use a chemical formula (e.g. CH3O) since the composition of such strings can be automatically determined (the ASE function ase.atoms.strings2symbols is used). If the name cannot be parsed by ase.atoms.strings2symbols then the atomic composition must be manually specified in the {species_definitions[species_name][‘composition’]} dictionary. For example:

species_definitions['methoxy']['composition'] = {'C':1,'H':3,'O':1}

Only species which appear in the “species_names”, “gas_names”, and “transition_state_names” attributes of the ReactionModel will be parsed. These attributes are typically specified automatically by the “rxn_expressions” strings, so if, for some reason, you want to parse in a species which is not in the reaction network you can do so by adding a “dummy” reaction like “CO_g -> CO_g” to have the model parse in the energetics of gas-phase CO.

formation_energy

This is the core of the input file since it defines the energetics of the system. It should be the “generalized formation energy” (see Formation Energy Approach) of the “species_name” on the “surface_name” and “site_name”. These energies are usually very hard to come by, and must be computed by an electronic structure method such as DFT, or in some cases they can be measured experimentally. It is extremely important that all energies share a common thermodynamic reservoir for each atomic constituent (see Formation Energy Approach).

frequencies

This is a list of the vibrational frequencies of the “species_name” on the “surface_name” at the “site_name”. Although this field is required, it is possible to input an empty list “[]” if the vibrational frequencies are not known. The vibrational frequencies are used to compute the zero-point and free energy corrections for gas phase and adsorbed species. By default the units are assumed to be “wavenumbers” or “cm^-1”, but this can be changed by editing the “frequency_unit_conversion” variable (1.239842e-4 by default) so that input_frequency*frequency_unit_conversion = input_frequency [eV]. Gas-phase vibrational frequencies can be found in NIST (be careful since redundant frequencies are listed only once) and some are compiled in the catmap.data.experimental_gas_frequencies dictionary. Vibrational frequencies of adsorbed species can be costly to compute, and hence a few approximations are sometimes employed. These approximations are controlled by the “estimate_frequencies” attribute of the TableParser. The values, in order of increasing accuracy, are:

  • estimate_frequencies >3: Use empty frequency set for species without any frequencies specified.
  • estimate_frequencies >= 3: Use frequencies of atomic species (e.g. \(\nu_{CH_4}\) = \(\nu_C\) + \(4*\nu_H\) where \(\nu_X\) is a Python list of the vibrational species of species X adsorbed)
  • estimate_frequencies >= 2: Estimate frequency of transition-states from the dissociated state frequency (e.g. \(\nu_{C-O}\) = \(\nu_C\) + \(\nu_O\))
  • estimate_frequencies >= 1: Estimate frequency of adsorbed state at one site using frequency from other sites (e.g. \(\nu_{CO(111)}\) = \(\nu_{CO(211)}\) )
  • estimate_frequencies = 0: Only accept frequencies from the exact adsorbate on the correct site. However, a single set of frequencies will still be used for all surfaces. If the attribute “frequency_surface_names” is defined then an average of the frequencies from the surface(s) in this list will be used. Otherwise an average of all available frequencies for each adsorbate will be used. For example, to use only Cu vibrational frequencies set {{frequency_surface_names = [‘Cu’]}}, or to average Cu and Pt vibrational frequencies use {{frequency_surface_names = [‘Cu’, ‘Pt’]}}. Allowing frequencies to vary with site would require a way of estimating frequency as a function of descriptors and is not currently implemented.

reference

This is an arbitrary string which notes the source of the information. Usually a publication/citation is provided for previously computed work, or for your own input you could use “Unpublished”, “This work”, “DFT/GPAW/RPBE”, etc. This is used when generating a summary file for the model, and it is always good practice to note the source of inputs.

Formation Energy Approach

One key point for generating input files is that the energies are computed as a “generalized formation energy” relative to a common reference:

\(E_i = U_i - \sum_j (n_j R_j)\)

where \(E_i\) is the “generalized formation energy” of species \(i\), \(U_i\) is the raw/DFT energy of species \(i\), \(nj\) is the number of atomic species \(j\) in \(i\), and \(\left|R_j\right|\) is the reference energy of that atomic species. Mathematically this looks a little confusing (especially with such crude notation) but in practice it is pretty easy. For example, say we want to find the energy of gas-phase CO relative to carbon (C) in methane (\({\rm{CH}}_4\)), oxygen (O) in \({\rm{H}}_2{\rm{O}}\), and hydrogen (H) in molecular hydrogen (\({\rm{H}}_2\)). We first compute the reference energies (\(\left|R_j\right|\)) for each atomic species:

\[\begin{split}R_{\rm{H}} &= 0.5(U_{\rm{H}_2}) \\ R_{\rm{C}} &= U_{\rm{CH_4}} - 4R_{\rm{H}} \\ R_{\rm{O}} &= U_{\rm{H_2O}} - 2R_{\rm{H}} \\\end{split}\]

(where again U is a “raw” energy from an ab-initio calculation, or a “regular” formation energy from NIST).

Now we can compute the “generalized formation energy” of CO as:

\(E_{\rm{CO}} = U_{\rm{CO}} - R_{\rm{C}} - R_{\rm{O}}\)

In the case where CO is adsorbed to a surface, say Pt(211), we can compute a “generalized” formation energy relative to the clean surface:

\(E_{{\rm{CO}}*@{\rm{Pt}}(211)} = U_{{\rm{Pt}}(211)+{\rm{CO}}*} - U_{{\rm{Pt}}(211)} - R_{\rm{C}} - R_{\rm{O}}\)

One nice thing about the formation energy approach is that it does not distinguish between thermodynamic minima (adsorbed states) and saddle points (transition-states). Thus, it is possible to compute a formation energy of the \({\rm{C-O}}\) dissociation transition-state on \({\rm{Pt}}(211)\) as:

\(E_{{\rm{C-O}}@{\rm{Pt}}(211)} = U_{{\rm{Pt}}(211)+{\rm{C-O}}} - U_{{\rm{Pt}}(211)} - R_{\rm{C}} - R_{\rm{O}}\)

Then one could compute the barrier for \({\rm{C-O}}\) dissociation as:

\(E_{{\rm{C-O}}@{\rm{Pt}}(211)} - E_{{\rm{CO}}*@{\rm{Pt}}(211)}\)

If this still doesn’t make sense, try working through the example below.

In principle the choice of reference species is arbitrary since the reference energies \(|R_j|\) cancel out in any relative quantities. However, in many cases it is necessary to use some correction scheme for the gas-phase energies if they are poorly described by the level of theory used (e.g. DFT). In this case it is best to select a reference set for which the reference species are well-described by the level of theory. For example, it is well-known that \({\rm{O}}_2\) and \({\rm{CO}}_2\) are not properly described by DFT, so it would not make sense to use these to compute the reference energies \(|R_j|\).

It is also worth re-iterating that the same reference energies \(|R_j|\) must be used for all energies in a given input file. One can usually see which gas-phase species are used as references since their formation energies will be 0 by definition (see overview).

Example

In this example we will generate an input file for methane synthesis from \({\rm{CO}}\) and \({\rm{H}}_2\) (methanation) on Rh(111) using some previously computed DFT values and a Python script. You can copy-paste the code as you go along, or find the whole script at GitHub.

Take the simplified methanation reaction mechanism:

  • \({\rm{CO}}_{\rm{gas}} + * \rightarrow {\rm{CO}}*\)
  • \({\rm{CO}}* + * \rightarrow {\rm{C}}* + {\rm{O}}*\)
  • \({\rm{O}}* + {\rm{H}}* \leftrightarrow {\rm{OH}}*\) (quasi-equilibrated)
  • \({\rm{OH}}* + {\rm{H}}* \rightarrow {\rm{H}}_2{\rm{O}}_{\rm{gas}} + 2*\)
  • \({\rm{C}}* + {\rm{H}}* \rightarrow {\rm{CH}}* + *\)
  • \({\rm{CH}}* + {\rm{H}}* \leftrightarrow {\rm{CH}}_2* + *\) (quasi-equilibrated)
  • \({\rm{CH}}_2* + {\rm{H}}* \leftrightarrow {\rm{CH}}_3* + *\) (quasi-equilibrated)
  • \({\rm{CH}}_3* + {\rm{H}}* \leftrightarrow {\rm{CH}}_{4,{\rm{gas}}} + 2*\) (quasi-equilibrated)

Where * denotes a Rh(111) site. For this example we need energies of the following species:

  • \({\rm{CO}}\) (gas)
  • \({\rm{H}}_2\) (gas)
  • \({\rm{CH}}_4\) (gas)
  • \({\rm{H}}_2{\rm{O}}\) (gas)
  • \({\rm{CO}}\) (adsorbed)
  • \({\rm{O}}\) (adsorbed)
  • \({\rm{C}}\) (adsorbed)
  • \({\rm{H}}\) (adsorbed)
  • \({\rm{CH}}\) (adsorbed)
  • \({\rm{OH}}\) (adsorbed)
  • \({\rm{CH}}_2\) (adsorbed)
  • \({\rm{CH}}_3\) (adsorbed)
  • \({\rm{C}}-{\rm{O}}\) (transition-state)
  • \({\rm{H}}-{\rm{OH}}\) (transition-state)
  • \({\rm{H}}-{\rm{C}}\) (transition-state)
  • (111 slab)

Let’s assume that we have computed the energies of these species on a Rh(111) surface using some ab-initio method and stored them in a Python dictionary:

abinitio_energies = {
         'CO_gas': -626.611970497,
         'H2_gas': -32.9625308725,
         'CH4_gas': -231.60983421,
         'H2O_gas': -496.411394229,
         'CO_111': -115390.445596,
         'C_111': -114926.212205,
         'O_111': -115225.106527,
         'H_111': -114779.038569,
         'CH_111': -114943.455431,
         'OH_111': -115241.861661,
         'CH2_111': -114959.776961,
         'CH3_111': -114976.7397,
         'C-O_111': -115386.76440668429,
         'H-OH_111': -115257.78796158083,
         'H-C_111': -114942.25042955727,
         'slab_111': -114762.254842,
         }

(in this case the energies were generated by Quantum Espresso)

Next, we need to decide on a choice of reference molecules. One simple option for this system is to take hydrogen relative to \({\rm{H}}_2\), carbon relative to \({\rm{CH}}_4\), and water relative to \({\rm{H}}_2{\rm{O}}\). We will take all adsorption energies relative to the clean (111) \({\rm{Rh}}\) slab.

ref_dict = {}
ref_dict['H'] = 0.5*abinitio_energies['H2_gas']
ref_dict['O'] = abinitio_energies['H2O_gas'] - 2*ref_dict['H']
ref_dict['C'] = abinitio_energies['CH4_gas'] - 4*ref_dict['H']
ref_dict['111'] = abinitio_energies['slab_111']

Now we can write a function to convert these “raw” energies to “reference” energies. Note that we use the function ase.atoms.string2symbols as a convenient way to get the composition from the chemical formula.

from ase.atoms import string2symbols

def get_formation_energies(energy_dict,ref_dict):
    formation_energies = {}
    for key in energy_dict.keys(): #iterate through keys
        E0 = energy_dict[key] #raw energy
        name,site = key.split('_') #split key into name/site
        if 'slab' not in name: #do not include empty site energy (0)
            if site == '111':
                E0 -= ref_dict[site] #subtract slab energy if adsorbed
            #remove - from transition-states
            formula = name.replace('-','')
            #get the composition as a list of atomic species
            composition = string2symbols(formula)
            #for each atomic species, subtract off the reference energy
            for atom in composition:
                E0 -= ref_dict[atom]
            #round to 3 decimals since this is the accuracy of DFT
            E0 = round(E0,3)
            formation_energies[key] = E0
    return formation_energies

We can check that the formation energies are reasonable (i.e. of order 1 eV):

formation_energies = get_formation_energies(abinitio_energies,ref_dict)

for key in formation_energies:
    print key, formation_energies[key]

>>
>> OH_111 0.323
>> H_111 -0.302
>> C_111 1.727
>> H2O_gas 0.0
>> CH_111 0.965
>> CO_111 0.943
>> H2_gas 0.0
>> C-O_111 4.624
>> CO_gas 2.522
>> O_111 0.597
>> CH3_111 0.644
>> CH4_gas 0.0
>> CH2_111 1.125
>> H-OH_111 0.878
>> H-C_111 2.17
>>

This looks pretty good. The energies of our reference species (\({\rm{H}}_{2,\rm{gas}}\), \({\rm{CH}}_{4,\rm{gas}}\), and \({\rm{H}}2{\rm{O}}_{\rm{gas}}\)) are all 0 as expected, and all the numbers are of order 1. Usually if something goes wrong then the numbers will be similar to the raw DFT numbers (i.e. > 100 eV). We can also compute the CO dissociation barrier as \({\rm{E}}_{\rm{C-O}} - E_{\rm{CO}} = 3.68\,{\rm{eV}}\). This is pretty high, but the surface is a close-packed (111) facet so this is not too surprising.

Before making an input file we will want to get some vibrational frequencies. Again, lets just assume that these have previously been calculated by DFT and are stored in a Python dictionary as:

frequency_dict = {
                'CO_gas': [2170],
                'H2_gas': [4401],
                'CH4_gas':[2917,1534,1534,3019,3019,3019,1306,
                           1306,1306],
                'H2O_gas': [3657, 1595, 3756],
                'CO_111': [60.8, 230.9, 256.0, 302.9, 469.9, 1747.3],
                'C_111': [464.9, 490.0, 535.9],
                'O_111': [359.5, 393.3, 507.0],
                'H_111': [462.8, 715.9, 982.5],
                'CH_111': [413.3, 437.5, 487.6, 709.6, 735.1, 3045.0],
                'OH_111': [55, 340.9, 396.1, 670.3, 718.0, 3681.7],
                'CH2_111': [55, 305.5, 381.3, 468.0, 663.4, 790.2, 1356.1,
                            2737.7, 3003.9],
                'CH3_111': [55, 113.5, 167.4, 621.8, 686.0, 702.5, 1381.3,
                            1417.5, 1575.8, 3026.6, 3093.2, 3098.9],
                'C-O_111': [],
                'H-OH_111': [],
                'H-C_111': []
                }

Now we just need a function which will put everything together into a tab-separated table with the appropriate headers. The following Python function will do this for us:

def make_input_file(file_name,energy_dict,frequency_dict):

    #create a header
    header = '\t'.join(['surface_name','site_name',
                        'species_name','formation_energy',
                        'frequencies','reference'])

    lines = [] #list of lines in the output
    for key in energy_dict.keys(): #iterate through keys
        E = energy_dict[key] #raw energy
        name,site = key.split('_') #split key into name/site
        if 'slab' not in name: #do not include empty site energy (0)
            frequency = frequency_dict[key]
            if site == 'gas':
                surface = None
            else:
                surface = 'Rh'
            outline = [surface,site,name,E,frequency,'Input File Tutorial.']
            line = '\t'.join([str(w) for w in outline])
            lines.append(line)

    lines.sort() #The file is easier to read if sorted (optional)
    lines = [header] + lines #add header to top
    input_file = '\n'.join(lines) #Join the lines with a line break

    input = open(file_name,'w') #open the file name in write mode
    input.write(input_file) #write the text
    input.close() #close the file

    print 'Successfully created input file'

Now use this function to create the text file - in this case we call it “energies.txt”:

file_name = 'energies.txt'
make_input_file(file_name,formation_energies,frequency_dict)

>> Successfully created input file

You can view the input in a human-readable format by opening energies.txt:

surface_name    site_name   species_name    formation_energy    frequencies reference
None    gas CH4 0.0 [2917, 1534, 1534, 3019, 3019, 3019, 1306, 1306, 1306]  Input File Tutorial.
None    gas CO  2.522   [2170]  Input File Tutorial.
None    gas H2  0.0 [4401]  Input File Tutorial.
None    gas H2O 0.0 [3657, 1595, 3756]  Input File Tutorial.
Rh  111 C   1.727   [464.9, 490.0, 535.9]   Input File Tutorial.
Rh  111 C-O 4.624   []  Input File Tutorial.
Rh  111 CH  0.965   [413.3, 437.5, 487.6, 709.6, 735.1, 3045.0] Input File Tutorial.
Rh  111 CH2 1.125   [55, 305.5, 381.3, 468.0, 663.4, 790.2, 1356.1, 2737.7, 3003.9] Input File Tutorial.
Rh  111 CH3 0.644   [55, 113.5, 167.4, 621.8, 686.0, 702.5, 1381.3, 1417.5, 1575.8, 3026.6, 3093.2, 3098.9] Input File Tutorial.
Rh  111 CO  0.943   [60.8, 230.9, 256.0, 302.9, 469.9, 1747.3]  Input File Tutorial.
Rh  111 H   -0.302  [462.8, 715.9, 982.5]   Input File Tutorial.
Rh  111 H-C 2.17    []  Input File Tutorial.
Rh  111 H-OH    0.878   []  Input File Tutorial.
Rh  111 O   0.597   [359.5, 393.3, 507.0]   Input File Tutorial.
Rh  111 OH  0.323   [55, 340.9, 396.1, 670.3, 718.0, 3681.7]    Input File Tutorial.

This particular example only creates input for a single surface, but it is fairly easy to see how one could construct a for-loop over several surfaces to create an input file with the energetics for multiple surfaces. Alternatively if you keep your data stored in a spreadsheet it should be possible to convert everything to a common reference and export the spreadsheet as tab-separated values (remember to get the header names right!).

In case we want to check that the input can be parsed correctly, we could create a “dummy” ReactionModel and ask it to parse everything in. Normally this won’t be necessary since you will have an actual ReactionModel that you want to use to test the parser (see the Creating a Microkinetic Model tutorial), but it is included here for reference.

#Test that input is parsed correctly
from catmap.model import ReactionModel
from catmap.parsers import TableParser
rxm = ReactionModel()
#The following lines are normally assigned by the setup_file
#and are thus not usually necessary.
rxm.surface_names = ['Rh']
rxm.adsorbate_names = ['CO','C','O','H','CH','OH','CH2','CH3']
rxm.transition_state_names = ['C-O','H-OH','H-C']
rxm.gas_names = ['CO_g','H2_g','CH4_g','H2O_g']
rxm.species_definitions = {'s':{'site_names':['111']}}
#Now we initialize a parser instance (also normally done by setup_file)
parser = TableParser(rxm)
parser.input_file = file_name
parser.parse()
#All structured data is stored in species_definitions; thus we can
#check that the parsing was successful by ensuring that all the
#data in the input file was collected in this dictionary.
for key in rxm.species_definitions:
    print key, rxm.species_definitions[key]

The output of this should contain all species in the model along with their energies, frequencies, etc.