Exploring crystal space#

Our goal is to generate, analyze, and categorize chemical compositions, making it easier to discover interesting and useful materials. This tutorial is based on a publication in Faraday Discussions.

1. Getting started#

In this tutorial, we’ll:

Generate binary chemical compositions using the SMACT filter.
Explore whether these compositions exist in the Materials Project database.
Categorize the compositions based on whether they pass the SMACT filter and whether they are found in the database.

The final phase will categorize the compositions into four distinct categories based on their properties. The categorization is based on whether a composition is allowed by the SMACT filter (smact_allowed) and whether it is present in the Materials Project database (mp). The categories are as follows:

smact_allowed	mp	label
yes	yes	standard
yes	no	missing
no	yes	interesting
no	no	unlikely

2. Generating compositions#

First, we’ll create binary chemical compositions using the SMACT filter. The SMACT filter is a smart tool that helps us select compositions based on important chemical rules, such as oxidation states and electronegativity.

To generate these compositions, we’ll use a function called generate_composition_with_smact. This function allows us to enumerate all possible binary compositions and filter them based on the SMACT rules.

Key parameters:#

num_elements: Number of elements in the composition (e.g., 2 for binary).
max_stoich: The maximum ratio of each element (e.g., 8 could mean up to 8 atoms of each element).
max_atomic_num: Maximum atomic number for the elements considered
num_processes: Number of processes to run in parallel to speed up calculations.
save_path: Where to save the generated compositions.

# Install the required packages
try:
    import google.colab

    IN_COLAB = True
except:
    IN_COLAB = False

if IN_COLAB:
    !uv pip install smact[crystal_space] --quiet

from smact.utils.crystal_space.generate_composition_with_smact import (
    generate_composition_with_smact,
)

df_smact = generate_composition_with_smact(
    num_elements=2,
    max_stoich=8,
    max_atomic_num=103,
    num_processes=8,
    save_path="data/binary/df_binary_label.pkl",
    oxidation_states_set="smact14",
)

df_smact

	smact_allowed
Ac2Ag	False
Ac2Ag3	False
Ac2Ag5	False
Ac2Ag7	False
Ac2Al	False
...	...
ZrZn4	False
ZrZn5	False
ZrZn6	False
ZrZn7	False
ZrZn8	False

225879 rows × 1 columns

3. Download data from the Materials Project#

Next, we download data from the Materials Project api using the download_mp_data function. This function allows us to download data for a given number of elements and maximum stoichiometry. The data includes the chemical formula, energy, and other properties of the compounds.

download_mp_data function takes in the following parameters:

Key parameters:#

mp_api_key: your Materials Project API key
num_elements: Number of elements in the composition (e.g., 2 for binary).
max_stoich: The maximum ratio of each element (e.g., 8 could mean up to 8 atoms of each element).
save_dir: Where to save the downloaded data

mp_api_key = ""  # Add your Materials Project API key here

save_mp_dir = "data/binary/mp_data"

from smact.utils.crystal_space.download_compounds_with_mp_api import download_mp_data

# download data from MP for binary compounds
docs = download_mp_data(
    mp_api_key=mp_api_key,
    num_elements=2,
    max_stoich=8,
    save_dir=save_mp_dir,
)

4. Categorise compositions#

Finally, we categorize the compositions into four labels: standard, missing, interesting, and unlikely.

from pathlib import Path
import pandas as pd

mp_data = {p.stem: True for p in Path(save_mp_dir).glob("*.json")}
df_mp = pd.DataFrame.from_dict(mp_data, orient="index", columns=["mp"])

# make category dataframe
df_category = df_smact.join(df_mp, how="left")
df_category["mp"] = df_category["mp"].notna()

# make label for each category
dict_label = {
    (True, True): "standard",
    (True, False): "missing",
    (False, True): "interesting",
    (False, False): "unlikely",
}
df_category["label"] = df_category.apply(lambda x: dict_label[(x["smact_allowed"], x["mp"])], axis=1)

# count number of each label
print(df_category["label"].value_counts())

# save dataframe
df_category.to_pickle("data/binary/df_binary_category.pkl")

# show df_category
df_category.head()

label
unlikely       205910
missing          9787
interesting      6505
standard         3677
Name: count, dtype: int64

	smact_allowed	mp	label
Ac2Ag	False	False	unlikely
Ac2Ag3	False	False	unlikely
Ac2Ag5	False	False	unlikely
Ac2Ag7	False	False	unlikely
Ac2Al	False	False	unlikely

Next steps#

move to crystal_space_visualisation.ipynb to visualize the data and explore the chemical space.

Exploring crystal space

Contents

Exploring crystal space#

1. Getting started#

2. Generating compositions#

Key parameters:#

3. Download data from the Materials Project#

Key parameters:#

4. Categorise compositions#

Next steps#