Getting Started¶

This guide covers environment setup and installation so you can run the full screening pipeline.

Prerequisites¶

Python 3.11 (required by dependency constraints)
Git
uv (recommended) or pip

Python version constraint

This project requires Python 3.11 (>=3.11,<3.12). Using Python 3.12+ or older versions will cause resolution failures.

Installation¶

Option 1: Using uv (Recommended)¶

uv manages the virtual environment, lockfile, and dependencies in a single command:

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/GarzonDiegoFEUP/chalcogenide-perovskite-screening.git
cd chalcogenide-perovskite-screening

# Install all dependencies
uv sync --extra dev --extra notebooks

Or equivalently via Make:

make install

Option 2: Using pip and venv¶

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

pip install -e ".[dev,notebooks]"

Option 3: Frozen lockfile (exact reproduction)¶

For reproducible results matching the exact development environment:

pip install -r requirements.txt

Exact reproduction

The frozen lockfile in requirements.txt pins every transitive dependency to the exact versions used during development. Use this if you need bit-for-bit reproducibility.

Configuration¶

Materials Project API Key¶

Some notebooks query the Materials Project API for structural data. To enable this:

Get your API key from Materials Project
Copy the example environment file and fill in your key:

cp .env.example .env
# Edit .env and set: MP_API_KEY=your_api_key_here

Optional dependency

The Materials Project API key is only required for notebooks that retrieve crystal structure data. The core pipeline will run without it using cached data in data/.

SISSO Features (Optional)¶

SISSO features are pre-cached in data/interim/features_sisso.csv. If you need to re-derive them from scratch, install sissopp manually after setting up the environment.

Jupyter Notebook Setup¶

Register the virtual environment as a Jupyter kernel:

uv run python -m ipykernel install --user --name chalc-pvk-sc --display-name "Python (chalc-pvk-sc)"

Then select the Python (chalc-pvk-sc) kernel when opening notebooks in VS Code or Jupyter Lab.

Next Steps¶

Once installed, head to the Pipeline page to run the analysis notebooks in order.

Project Structure¶

chalcogenide-perovskite-screening/
├── data/                    # Data directory (raw, interim, processed)
│   ├── raw/                 # Original immutable datasets
│   ├── interim/             # Intermediate transformed data
│   ├── processed/           # Final canonical datasets
│   ├── crystaLLM/           # CrystaLLM-generated CIF files
│   └── sustainability_data/ # ESG, HHI, earth abundance data
├── notebooks/               # Jupyter analysis notebooks (numbered)
├── chalcogenide_perovskite_screening/  # Python package
│   ├── config.py            # Path configuration and constants
│   ├── dataset.py           # Data loading and processing
│   ├── features.py          # SISSO feature engineering
│   ├── plots.py             # Visualization utilities
│   ├── modeling/            # ML models (GCNN, CrabNet, tolerance factor)
│   └── synthesis_planning/  # Synthesis pathway optimization
├── models/                  # Trained model weights and results
├── docs/                    # Documentation (MkDocs)
├── pyproject.toml           # Package metadata and dependencies
└── requirements.txt         # Frozen dependency lockfile