Translate Route Into Tables

A simple two-step demo synthetic route that will be used in this documentation is shown below. The utilization of the reagents and solvents is being shown as molar equivalents and volumes (L/kg), respectively, which would have been determined in the lab. Fake prices ($/kg) of some compounds are provided as well. In essence, the costcalc2 algorigthm will use this information to calculate the unknown raw material cost (RMC, in $/kg) values for Intermediate A and Product. As part of this calculation, process mass intensity (PMI) will also be calculated as a measure of route waste. Note: All of the information in this figure is fake and is provided for pedagogical purposes only.

_images/br_route.png

Notes on Pricing

It is important to understand that the costcalc algorithm takes price information for reagents and solvents and calculates total raw material costs (RMCs) for the intermediates and products. In a simplified sense, these RMCs are the sum of the raw material prices used to synthesize the route intermediates/product. However, RMCs are only one component of the final price of a compound, in addition to labor, facilities charges, waste disposal, profit margin, etc. So the RMC for the final product is NOT the expected sales price and needs to be interpreted with that knowledge in mind.

Accurate raw material prices are a key component of the calculation, and they are not always trivial to obtain. Prices are highly dependent on the scale, region where purchase, date when prices were quoted, etc. As such, no price information (or material information of any kind) is provided in the costcalc module.

Tables Overview

The costcalc algorithm, including the web application interface, require the chemical synthetic route information above to be converted into two different input tables. A “Materials” table will contain all of the information about the raw materials, such as molecular weight, density, and price. A “Route” table will contain the route-specific information, such as equivalents of reagents, volumes of solvent, etc. These tables require some specific formatting, which will be outlined here. The web application uses an Excel file as the tabular input. A blank template is provided here, which contains the proper column names and tabs as well as well as some conditional formatting to help guide user input. A complete working Excel file containing this demo route information is also available.

Materials Table

The Materials table will be added to the Materials tab in the Excel file, and this must be populated first. (This is because of some data validation that is done to ensure correct inputs in the Route tabs.) For the web application, there should be only one Materials tab per Excel file and the name of the tab must remain unchanged. Although this table must contain all of the materials for the synthetic route to be costed, it can contain any arbitrary number of additional materials as well. This can be seen in the completed Excel file that can be downloaded above.

A snapshot of a completed Materials table for this demo route, along with some additional materials, is shown below. Detailed column descriptions are given in the next paragraph, but a few general notes are provide here. First, the order of the materials in this table is not important. In this demo, the compounds with known prices were separated from compounds without known prices for clarity. Second, blank lines and formatting are ignored, which can be helpful to make the input file a little easier to understand. In this case, a blank line between known/unknown compounds has been added, and clarifying comments have been added to the column name cells. However; be careful about adding inadvertent information into “blank” lines. For example, if information was added into cell E9 below, the costing calculation will fail with an error. Some of these notes do not apply to the Python programming interface to the costcalc algorithm, as will be discussed in later sections.

_images/materials.png

The Materials table contains the following columns, which must be present. Additional user-defined columns can be added, but will be ignored by the costcalc algorithm.

  • Compound: This is the desired name for the compounds in the route. These names must be unique, i.e. there cannot be two compounds with the same name. As stated above, at a minimum, all of the compounds for the route need to be defined; however, any number of additional compounds can be present. Starting a compound name with the number sign (#) will comment out that row, so it will be ignored when the table is processed. This can be useful if you want to maintain historical records of prices, for example.

  • MW: The molecular weight of the compound in kg/kmol (equivalent to g/mol). Every compound must have a MW, so if a compound name is given, the MW column will be highlighted automatically to indicate that this value must be provided. This value can be a little tricky for materials like celite, which may not have a defined MW. Some notes for corner cases like this is provided below.

  • Density: The density of the compound in kg/L (equivalent to g/mL). This value is only mandatory for compounds that will be used as solvents, as described in the Route table section, otherwise they can be left blank. If a density is added for a non-solvent reagent, it will be ignored.

  • $/kg: The price of the compound in $/kg. Most compounds will need to have a defined price with the following exception. If the compound represents a product of a reaction, then the price is optional. In these cases a price can be provided, if desired, but it will be ignored during the calculations.

  • Notes: (Optional) Notes for the compound information. For example, this is a good place to add the date and source of the pricing information and the name of the person who acquired that information.

Route Table

The route information is added into a separate tab in the Excel file with the required columns described in the list below. There can be one or more route tabs per Excel file, and the tab names can be arbitrarily chosen (except for the name “Materials” as described above). In the blank template file, the route tab is called “Route 1”; in the working demo file, the route tab has been renamed to “Bromine Route”. If a new route tab is desired, the column names are the most critical component that needs to be copied. However, the route tabs in the Excel files provided here do contain some conditional formatting and data validation to ensure that values are added correctly, so if using the provided template files, it is recommended to use the “Move or copy…” dialog (accessed by right-clicking on the tab name) to duplicate a current route tab.

A snapshot of the Route table for the demo route above is shown below. Although not shown, pictures (ChemDraw or regular images files) of the route can be added to this tab, and they will be ignored by the costcalc algorithm.

_images/reactions.png

Descriptions of the Route columns are provided in the list below. As with the Materials table, arbitrary user-defined columns (with two exceptions: Mass and OPEX) or blank rows can be added; they will be ignored during the costing operation.

  • Step: An unique identifier to delineate the synthetic step in the route. These can be simply numerical numbers (e.g. 1, 2, 3) and/or text (“1a” or “Int A”). Steps do not need to be added into the table in any particular order, as they will be automatically sorted during the costing calculation. In fact, the compounds from every step could be added in arbitrary order; however, this is not recommended from a clarity standpoint. Starting a Step with the number sign (#) will comment out that row, so it will be ignored when the table is processed.

  • Compound: The name of a reagent/solvent/product for the step. These names must exactly correspond to the Materials table, so a drop-down selector is provided in the template files above to ensure that a valid name is selected. (This is why the Materials table should be created first.) The order of the compounds per Step is arbitrary, with two important exceptions.

    • The first compound must be the reference compound for the reaction, which is typically the limiting reagent. If two or more compounds are added in equal molar amounts, this is the compound for which the mass will be used for solvent volume calculations.

    • The last compound must be the reaction product. This ensures that the correct order of reactions can be determined.

  • Equiv: Molar equivalents of a reagent or product. Although this value can be used for solvents, it is more common to define solvent utilization with Volumes, as described in the next column. These values can be scaled as needed, but they are typically scaled such that the limiting reagent is 1 equivalent. For a product, the equivalents are the theoretical equivalents multiplied by the fractional percent yield. For example, in a reaction with a starting material to product ratio of 1:1 and a 75% yield of product, the equivalents of product would be \(1*0.75=0.75\). If 2 moles of product are expected (e.g. breaking up a dimer) with the same reaction yield, the equivalents of product would be \(2*0.75=1.5\).

  • Volumes: The amount of solvent utilization in volumes. This value is only required if Equiv for a particular compound is not given. The unit for volumes is L/kg, which can be interpreted as “liters of this solvent per kg of a reference compound.” (This is numerically equivalent to mL/g.) The reference compound is assumed to be the first compound in the reaction.

  • Recycle: (Values are optional.) The fractional percentage of this compound that it is expected could be recycled, which could be useful for solvents or catalysts. For example, if 95% of the solvent can be recycled, then this cell will contain the value 0.95. In our demo example, we are assuming that 75% of the solvents can be recycled. This is largely a convenience column, because the input equivalents/volumes could simply be adjusted by hand. However, when this column is used, the Excel output file will have some dynamic character associated with the values given here. Some additional information on recycling is provided in the section below.

  • Notes: (Optional) Notes for this particular compound. For example, a reference can be included here if the reaction was taken from the literature, or a short bit of text can be added to acknowledge any assumptions in the numbers.

Mass-based Inputs

In some cases (e.g. using scale-up batch records), it is more convenient to input the amounts of materials as masses rather than equivalents/volumes. This can be done by adding a new column named “Mass” to the Route table. This column can be populated by mass values in metric mass units (g, kg, metric ton, etc.), as long as the units are the same for all values. In the costcalc code, these mass values are converted back to equivalents/volumes, which is somewhat counter-intuitive, but must be done for algorithmic purposes. As a result, there are some important notes here, which are given roughly order of importance.

  1. Masses can be added on a per reaction basis. This means that masses can be given for one or more reactions in the synthetic route, rather than having to define masses for every reaction. This can be helpful if you want to mix and match reaction information from different sources.

  2. Because the first compound in a reaction is assumed to be the limiting reagent, the mass must be given for this compound. Some mixing and matching of masses/equivalents/volumes is allowed (below), but this only works if the first compound follows the rule defined here.

  3. Mass values override equivalent and volume values. If a single compound defines an equivalent/volume value and a mass, the mass will be converted into equivalents and will override the original value. This makes it possible, for example, to double check that the conversions are correct, but it may lead to some unexpected behavior if you don’t check the tables carefully.

  4. Mixtures of masses and equivalents/volumes are acceptable. For example, solid reagents could be included as masses with solvents given as volumes; however, the two rules above must be followed.

  5. Solvent masses are converted to equivalents. This doesn’t affect the calculations in any way, but it will lead to some strange looking values in the equivalents column.

Below is a picture of an example Route table for our demo route using a Mass column; this table will give the same results in the costcalc costing routine. (The color coded cells are due to the conditional formatting in the template Excel files.) Notice that only one reaction has been given mass values (note #1 above). The amount of intermediate A was only given in percent yield (75%), so no mass has been given (note #4 above). Note: the Mass column will be dropped from the results table/Excel file.

_images/br_route_mass.png

Operational Expenses (OPEX)

In some cases, some indication of the per step operational expenses (OPEX, in $/kg) may be known. These can be incorporated into the calculation by adding a new column to the Route table, as described below.

  • OPEX: (Optional) An estimate, in $/kg, of the operating expenses for a given reaction step. This number is only valid for the product of any given step. Although these values are not given for the current demo route, they could have been given for Intermediate A in Step 1 (Cell H5) and/or Product in Step 2 (cell H10). For route intermediates, these values are added to the RMC values in subsequent steps; the OPEX for the final product is added to the final RMC value in the $/kg column.

This may be a bit confusing at first, so a second model using OPEX values will be presented in the interpretation section, which clarifies this input.

Special Case Inputs

kg/kg Solid Charges

Some solids - celite, heterogeneous catalysts, silica - may be charged into a reaction on a (kg solid)/(kg limiting reagent) basis. These materials may not have well-defined MW values, which makes these inputs a little tricky. One way to handle this is to make the MW of this compound the same as the limiting reagent for the step where it is being used. When two compounds have identical molecular weights, the molar equivalents are identical to a kg/kg ratio.

For example, if Pd/C was being used in a 0.1 kg/kg ratio in the second step of our demo route, we could set the MW of Pd/C in our Materials table to be the same as Intermediate A, or 230.9. Then the equivalents value for Pd/C in the Route table would be 0.1.

If the solid is used in more than one reaction, you can may need to make multiple entries for the solid in the Materials table. Because the materials must have unique names, you could give them identifiers to associate them with a particular limiting reagent. In our example above, for example, you may want to use the name “Pd/C_IntA” and “Pd/C_SM” for Pd/C with molecular weights for Intermediate A (Step 2) and Starting Material (Step 1), respectively.

Recycling and Batch Number

The recycling calculation that is done using the Recycle values is simplistic as implemented in the core costcalc algorithm. It essentially just reduces the kg utilization of the compound to take into account that some of the material has been recovered. (In other words, the kgs of this compound in the cost calculation are the amount of material lost in the batch.) This can be helpful to reduce the impact of solvents on the RMC calculation, for example. However, when the material can only be used in a certain number of batches, such as a catalyst that slowly loses activity, then some additional math is required.

If a material is simply being reused over several batches, the math is pretty simple. For the expected number of usages (\(n\)), the scaled equivalents (\(eq_{scaled}\)) will just be the initial equivalents (\(eq_{init}\)) divided by the number of usages. For a very large number of batches, this value is going to trend to zero. In other words, the relative cost contribution of this material will disappear.

\[eq_{scaled} = \frac{eq_{init}}{n}\]

In real situations, it may be hard to fully reuse all of the original material, so some extra material should be added on each subsequent run, a so-called “top-up” addition, to account for lost/deactivated material. If this top-up is given as a fractional percentage of the original addition (\(per_{tu}\), this is essentially 1 minus the Recycle column value), then the top-up amount (\(eq_{tu}\)) for a given number of usages (\(n\)) is as shown below. The less one for the number of usages takes into account that the first usage will not require a top-up addition; the \(n\) in the denominator divides the total amount of top-up between all of the batches (for costing purposes, not in practice). For a very large number of batches, the \(n\) terms in the numerator and denominator will cancel out. This makes the scaled number of equivalents just the initial equivalents multiplied by the top-up percentage. This is essentially how the costcalc algorithm handles recycling.

\[eq_{tu} = \frac{eq_{init}*per_{tu}*(n - 1)}{n}\]

These two equations can be combined to give the scaled equivalents given a certain number of material usages and a small top-up.

\[\begin{split}eq_{scaled} &= \frac{eq_{init}}{n} + \frac{eq_{init}*per_{tu}*(n - 1)}{n} \\ &= eq_{init}*\frac{(1 + per_{tu}*(n - 1))}{n}\end{split}\]

The second form of the equation above shows we are multiplying our initial equivalents (\(eq_{init}\)) by some scaling factor. This is essentially how costcalc handles the recycling; however, to make the scaling factors equivalent, we would need to subtract the above scaling factor from 1. So if a number of batches and the top-up percentage is known, the value to use in the Recycle column will be as follows.

\[Recycle = 1 - \frac{(1 + per_{tu}*(n - 1))}{n}\]

We can use some simple Python code to visualize these different scenarios. In this model, we will assume that the material is a catalyst, so its amount relative to the limiting reagent is 10% (eq_init). We will assume that this material can be recycled up to 10 cycles (n), but that each subsequent recycle requires that 10% of the initial amount of catalyst is added as a top-up (per_tu). The simulation code for these scenarios and the resulting plot are below.

import numpy as np
import matplotlib.pyplot as plt

n = 10
ns = np.arange(1, n+1)
eq_init = 0.1*np.ones(n)
per_tu = 0.1

no_recyc = eq_init # Assuming a new batch of material in each run
simple = eq_init/ns # Assuming a simple recycle of material per run
top_up = eq_init*(1+per_tu*(ns - 1))/ns # With top-up

plt.plot(ns, no_recyc, 'o-', label='No recycle')
plt.plot(ns, simple, 'o-', label='No top-up')
plt.plot(ns, top_up, 'o-', label='With top-up')

plt.title('Amount of material vs number of batches')
plt.xlabel('Number of batches')
plt.ylabel('Amount of material used')
plt.legend()

plt.grid()
plt.show()
_images/recycle.png