Filtering a search space using oxidation states#
Let’s attempt to perform the first steps in a high-throughput compound design workflow which involves:
Generating the search space using SMACT
Filtering the search space using the oxidation states model.
# Install the required packages
try:
import google.colab
IN_COLAB = True
except:
IN_COLAB = False
if IN_COLAB:
!uv pip install smact --quiet
# Imports
import re
from itertools import combinations
import matplotlib.pyplot as plt
import multiprocess
import numpy as np
import pandas as pd
from pymatgen.core import Composition
import smact
from smact import Species, screening
from smact.oxidation_states import Oxidation_state_probability_finder
Composition generation#
Applying the oxidation states model#
The method compound_probability of the Oxidation_state_probability_finder class enables us to compute the likelihood of the metal species existing in the presence of particular anions. We will apply this to all the compounds generated by smact
# Compute the compound probabilities
compound_probabilities = [ox_prob_finder.compound_probability(spec) for spec in list_of_species]
# Create a dataframe
data = {
"formula_pretty": pretty_formulas,
"A": A_species,
"B": B_species,
"X": X_species,
"compound_probability": compound_probabilities,
}
df = pd.DataFrame(data)
df.head()
| formula_pretty | A | B | X | compound_probability | |
|---|---|---|---|---|---|
| 0 | ZnRhF6 | Rh4+ | Zn2+ | F1- | 0.775000 |
| 1 | ZnRhCl6 | Rh4+ | Zn2+ | Cl1- | 0.625000 |
| 2 | ZnRhBr6 | Rh4+ | Zn2+ | Br1- | 0.500000 |
| 3 | ZnRhI6 | Rh4+ | Zn2+ | I1- | 0.500000 |
| 4 | MnRhF6 | Rh4+ | Mn2+ | F1- | 0.514796 |
# Compute the number of non-zero compound probabilities
# Convert the probability values from a pandas.Series object to a numpy array
probs = df.compound_probability.to_numpy()
# Create a numpy array of the non-zero probabilities
non_Zero_probs = probs[(probs > 0)]
print(
f"The original smact search space produced {len(probs)} compositons of which {len(non_Zero_probs)} had a non-zero compound probability"
)
The original smact search space produced 14832 compositons of which 12671 had a non-zero compound probability
Visualising the compound probabilities of the SMACT generated compositions.#
We will now plot the number of compounds as a function of the probability threshold.
# Create a numpy array of 100 threshold values
thresh = np.linspace(0.0, 1.0, 100)
# Use a list comprehension to generate a list of the number of compounds as a function of the probability threshold
num_compounds = [len(probs[(probs >= t)]) for t in thresh]
# Set up the figure and plot the number of compounds against the threshold
fig, ax = plt.subplots()
ax.scatter(x=thresh, y=num_compounds, marker="x", color="orange")
plt.xlabel("Threshold")
plt.ylabel("Number of compounds")
plt.show()