Testing charge-neutrality of the compositions in the GNoME database

Testing charge-neutrality of the compositions in the GNoME database#

In this notebook, we will check the charge-neutrality of the compositions in the GNoME database (https://www.nature.com/articles/s41586-023-06735-9).

Open in Colab

# Install the required packages
try:
    import google.colab

    IN_COLAB = True
except:
    IN_COLAB = False

if IN_COLAB:
    !uv pip install smact[optional] --quiet
[notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: pip install --upgrade pip
# Imports
from smact.screening import smact_validity
import matplotlib.pyplot as plt
import pandas as pd
from pandarallel import pandarallel


def initialize_parallel_processing():
    """
    Initialize parallel processing using pandarallel.

    This function sets up pandarallel for parallel processing of pandas operations,
    enabling a progress bar for better visibility of the computation progress.
    """
    pandarallel.initialize(progress_bar=True)


# Set up parallel processing
initialize_parallel_processing()

# The imported modules and functions are used as follows:
# - smact_validity: To check the charge neutrality of compositions
# - matplotlib.pyplot: For creating plots and visualizations
# - pandas: For data manipulation and analysis
# - pandarallel: To enable parallel processing of pandas operations
INFO: Pandarallel will run on 12 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.
# Load the data
# Load the data into a dataframe

data_path = (
    "https://raw.githubusercontent.com/WMD-group/SMACT/refs/heads/master/docs/tutorials/stable_materials_hull.csv"
)
df = pd.read_csv(data_path)

# Get quick info about the data
print(df.info())

# Show first five entries
df.head()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 384871 entries, 0 to 384870
Data columns (total 4 columns):
 #   Column                     Non-Null Count   Dtype  
---  ------                     --------------   -----  
 0   Unnamed: 0                 384871 non-null  int64  
 1   Composition                384871 non-null  object 
 2   Formation Energy Per Atom  384871 non-null  float64
 3   Corrected Energy           384871 non-null  float64
dtypes: float64(2), int64(1), object(1)
memory usage: 11.7+ MB
None
Unnamed: 0 Composition Formation Energy Per Atom Corrected Energy
0 0 Cs1S6Zr3 -1.9058 -70.4155
1 1 Nd7Os1Pr3Si5 -0.5004 -94.0804
2 2 Ce2La3Pt16Tm15 -1.2960 -237.6016
3 3 Mn1Ni1Os2Sb12Yb1 -0.3064 -93.6931
4 4 Er8Ge4Si12Zr12 -0.8718 -254.0953

smact_validity uses the python package SMACT to test for charge neutrality, but it is a rather simple function which cannot account for elements with mixed valency in a compound. The working principle behind this function is to systematically trial all the possible oxidation states of the elements in a given material and return True if a charge-balanced set of oxidation states can be found.

# Run the SMACT validity test on the GNoME materials
df["smact_valid"] = df["Composition"].parallel_apply(
    smact_validity, **{"oxidation_states_set": "smact14"}
)  # Alloys will pass the test
df.head()
Unnamed: 0 Composition Formation Energy Per Atom Corrected Energy smact_valid
0 0 Cs1S6Zr3 -1.9058 -70.4155 False
1 1 Nd7Os1Pr3Si5 -0.5004 -94.0804 False
2 2 Ce2La3Pt16Tm15 -1.2960 -237.6016 True
3 3 Mn1Ni1Os2Sb12Yb1 -0.3064 -93.6931 True
4 4 Er8Ge4Si12Zr12 -0.8718 -254.0953 True

Plotting the results#

Let’s plot the number of materials that pass the SMACT validity test and the total number of materials in the GNoME database.

# Make a bar plot of the GNoME data
bar_labels = ["GNoME stable materials", "GNoME - SMACT valid"]
counts = [len(df), df["smact_valid"].sum()]


fig, ax = plt.subplots(figsize=(8, 6))

bars = ax.barh(bar_labels, counts)
ax.bar_label(bars)
ax.set_xlabel("Number of materials")
ax.set_title("GNoME analysis")
plt.xticks(rotation=45)
ax.set_xlim(right=max(counts) * 1.1)  # Adjust x-axis limit based on data
plt.tight_layout()
plt.show()
../_images/4d1554913c9fb53fbe1fb341c3eda6b15c28f7c8500417112a2cd1b4a377ec3f.png

Conclusion#

The SMACT validity test is a simple test that can be used to check the charge-neutrality of a material. However, it is not perfect and cannot account for elements with mixed valency in a compound. In this notebook, we have used the SMACT validity test to check the charge-neutrality of the compositions in the GNoME database. We found around 81% the materials in the GNoME database can be charged-balanced using SMACT.