ImpactHPC documentation#

ImpactHPC is a Python library designed to estimate the environmental impact of jobs on data centers. Its main features include:

Providing explainable, sourced, and replicable results
Uncertainty computation
Support for different levels of precision in input data
Multicriteria analysis (not fully supported yet)
Whole lifecycle assessment (not fully supported yet)

The estimations provided by this library are approximate and not always accurate. Therefore, you should not rely solely on them; it is recommended to provide the explanations produced by the library along with the estimations.

Currently, it supports the estimation of usage impact (energy consumption and its impact) for the extraction, production, and distribution phases. The library computes the environmental impact based on three criteria: global warming potential (GWP, in gCO₂eq), abiotic resource depletion (ADPE, in gSbeq), and primary energy use (PE, in MJ).

Installation#

Make sure to use Python 3.12 or higher.

pip install impacts-hpc

You may have trouble installing the mip dependency using Python 3.13. In this case, a workaround is to use Python 3.12.

Example Code#

Impact of a single CPU#

Embodied impacts of a CPU :

from impacthpc import CPU, ExactName

impacts = CPU(name=ExactName("Intel Xeon Gold 6248")).estimate_embodied_impacts()

impacts_climate_change = impacts["gwp"]

print(impacts_climate_change.explain())

If you don’t know the exact name of you CPU, use find_close_cpu_model_name()

from impacthpc import CPU, ExactName, find_close_cpu_model_name

impacts = CPU(name=find_close_cpu_model_name("Intel 6248")).estimate_embodied_impacts()

impacts_climate_change = impacts["gwp"]

print(impacts_climate_change.explain())

Electric power of this cpu :

from impacthpc import CPU, ExactName

power = CPU(name=ExactName("Intel Xeon Gold 6248")).estimate_electric_power()

print(power.explain())

Impact of a server#

from impacthpc import CPU, ExactName, Server, RAM, GPU, MotherBoard

server = Server(
   components=[
      (CPU(name=ExactName("Intel Xeon Silver 4114")), 2),
      (RAM(size="96 GB"), 1),
      (GPU(name=ExactName("Nvidia Quadro P6000")), 4),
      (MotherBoard(), 1),
   ]
)

impacts = server.estimate_embodied_impacts()
impact_climate_change = impacts["gwp"]

print(impact_climate_change.explain())

Using formatters :#

explain() explains the result. You can specify a formatter for exporting to JSON or HTML instead of the default text.

from impacthpc import CPU, ExactName, Server, RAM, GPU, MotherBoard, HTMLFormatter

server = Server(
   components=[
      (CPU(name=ExactName("Intel Xeon Silver 4114")), 2),
      (RAM(size="96 GB"), 1),
      (GPU(name=ExactName("Nvidia Quadro P6000")), 4),
      (MotherBoard(), 1),
   ]
)

impacts = server.estimate_embodied_impacts()
impact_climate_change = impacts["gwp"]

result = impact_climate_change.explain(HTMLFormatter())

with open("result.html", "a") as f:
   f.write(result)
   print("Success ! Result written in result.html.")

You can also defined how uncertainties are displayed, either as min/max intervals or plus/minus standard deviation. See Uncertainty for details.

from impacthpc import CPU, ExactName, Server, RAM, GPU, MotherBoard, UncertaintyFormat
from impacthpc.src.core.formatters import TextFormatter

server = Server(
   components=[
      (CPU(name=ExactName("Intel Xeon Silver 4114")), 2),
      (RAM(size="96 GB"), 1),
      (GPU(name=ExactName("Nvidia Quadro P6000")), 4),
      (MotherBoard(), 1),
   ]
)

impacts = server.estimate_embodied_impacts()
impact_climate_change = impacts["gwp"]

print(impact_climate_change.explain(TextFormatter(uncertainty_format=UncertaintyFormat.MIN_MAX)))

Impact of a job on a park :#

Here is the code for estimating the impact of a job that runs for 100 hours on the Jean Zay High Performance Computer. You can see its specifications here.

from impacthpc import (
   ExactName,
   Park,
   Cluster,
   Server,
   CPU,
   RAM,
   MotherBoard,
   PowerSupply,
   GPU,
   find_close_gpu_model_name,
   EnergyIntensity,
   Job,
   HTMLFormatter,
   UncertaintyFormat,
)
import os


cpu = ExactName("Intel Xeon Gold 6248")

jean_zay = Park(
   clusters={
      "cpu_p1": Cluster(
            servers_count=720,
            server_model=Server(
               components=[
                  (CPU(name=cpu), 2),
                  (RAM(size="192 GB"), 1),
                  (MotherBoard(), 1),
                  (PowerSupply(), 2),
               ],
            ),
      ),
      "gpu_p2s": Cluster(
            servers_count=20,
            server_model=Server(
               components=[
                  (CPU(name=cpu), 2),
                  (GPU(name=ExactName("NVIDIA Tesla V100 SXM2 32 GB")), 8),
                  (RAM(size="384 GB"), 1),
                  (MotherBoard(), 1),
                  (PowerSupply(), 2),
               ]
            ),
      ),
      "gpu_p2l": Cluster(
            servers_count=11,
            server_model=Server(
               components=[
                  (CPU(name=cpu), 2),
                  (GPU(name=ExactName("NVIDIA Tesla V100 SXM2 32 GB")), 8),
                  (RAM(size="768 GB"), 1),
                  (MotherBoard(), 1),
                  (PowerSupply(), 2),
               ]
            ),
      ),
      "gpu_p5": Cluster(
            servers_count=52,
            server_model=Server(
               components=[
                  (CPU(name=ExactName("AMD EPYC 7543")), 2),
                  (GPU(name=ExactName("NVIDIA A100 SXM4 80 GB")), 8),
                  (RAM(size="512 GB"), 1),
                  (MotherBoard(), 1),
                  (PowerSupply(), 2),
               ]
            ),
      ),
      "gpu_p6": Cluster(
            servers_count=364,
            server_model=Server(
               components=[
                  (CPU(name=ExactName("Intel Xeon Platinum 8468")), 2),
                  (RAM(size="512 GB"), 1),
                  (GPU(name=ExactName("NVIDIA H100 SXM5")), 4),
                  (MotherBoard(), 1),
                  (PowerSupply(), 2),
               ]
            ),
      ),
      "prepost": Cluster(
            servers_count=4,
            server_model=Server(
               components=[
                  (CPU(name=ExactName("Intel Xeon Gold 6132")), 2),
                  (RAM(size="3 TB"), 1),
                  (
                        # in the spec, we don't know the exact variant of Nvidia V100 used, so we can use find_close_gpu_model_name result and it will average the matching GPUs
                        GPU(name=find_close_gpu_model_name("Nvidia Tesla V100")),
                        4,
                  ),
                  (MotherBoard(), 1),
                  (PowerSupply(), 2),
               ],
            ),
      ),
      "visu": Cluster(
            servers_count=5,
            server_model=Server(
               components=[
                  (CPU(name=ExactName("Intel Xeon Gold 6248")), 2),
                  (RAM(size="192 GB"), 1),
                  (GPU(name=ExactName("Nvidia Quadro P6000")), 4),
                  (MotherBoard(), 1),
               ]
            ),
      ),
      "compil": Cluster(
            servers_count=3,
            server_model=Server(
               components=[
                  (CPU(name=ExactName("Intel Xeon Silver 4114")), 2),
                  (RAM(size="96 GB"), 1),
                  (GPU(name=ExactName("Nvidia Quadro P6000")), 4),
                  (MotherBoard(), 1),
               ]
            ),
      ),
   },
   energy_intensity=EnergyIntensity.at_location("FRA"),
)

result = jean_zay.job_impact(
   Job(
      cluster_name="gpu_p5",
      servers_count=1,
      duration="10 h",
   )
)[
   "gwp"
].explain()

print(result)

Let’s break it down:

Park represents an entire data center. Based on the impact of its servers, casing, cooling (estimated from server energy consumption), location, and total energy consumption based on PUE, it computes the job’s environmental impact.
clusters is a dictionary containing the various clusters of the Jean Zay HPC. Each contains a Server, which itself includes components such as CPU, GPU, RAM, etc.
CPU.name and GPU.name must be filled with an instance of the abstract class Name, or left empty. If you know the exact name (as listed in the file referenced in the Configuration under csv > cpu_specs), you can use ExactName(“THE EXACT NAME”). Otherwise, use find_close_cpu_model_name() or find_close_gpu_model_name() to perform fuzzy matching. These functions return instances of FuzzymatchResult, a subclass of Name.
job_impact() is a method of Park that estimates the impact of a Job on the park.
Job describes the job for which the impact is to be estimated. Its attribute cluster_name must correspond to a cluster defined in the Park. It also includes a node count, indicating the number of nodes.
["gwp"] selects the Global Warming Potential impact, expressed in gCO₂eq. This returns an instance of ReplicableValue.
On instances of ReplicableValue, you can call explain(), which by default returns a detailed explanation including the formula and sources used to compute the impact. To get only the value, use the value attribute, which is a Quantity object from the Pint library, and you can extract a float with jean_zay.job_impact(...)["gwp"].value.magnitude.
You can also pass a formatter to explain() to output either JSON or HTML instead of TEXT.

Unknown CPU/GPU model name#

If you don’t know the exact CPU and GPU models you are using, you can instead provide the information you do have about the components:

CPU.tdp, GPU.tdp — their Thermal Design Power (TDP)
CPU.die_size, GPU.die_size — the surface area of the die
CPU.model_range — the model range, for example EPYC, Athlon X4, Core i5, Xeon Gold, etc.
Other params, see CPU, :class:`~impacthpc.GPU

I already know my CPU/GPU consumption/embodied impact#

If you already know the consumption or embodied impact of your component, you can specify it like this :

Park(
   clusters={
      "cluster_name": Cluster(
            servers_count=720,
            server_model=Server(
               components=[
                  (CPU(electric_power="500 W"), 2),
                  # ...
               ],
            ),
      ),
      # ...
   }
   # ...
)

Approximate values#

If you only have approximate values for a parameter, you can specify the uncertainty of your data.

Let’s say you only know that the number of nodes in the cpu_p1 cluster is between 400 and 600, and that each node has between 128 and 256 GB of RAM. You can use SourcedValue to specify minimum and maximum values:

jean_zay = Park(
   clusters={
      # ...
      "cpu_p1": Cluster(
            servers_count=SourcedValue(name="cpu_p1_server_count", value="500", min="400", max="600"),
            server_model=Server(
               components=[
                  # ...
                  (RAM(size=SourcedValue(name="cpu_p1_server_model_RAM_size", value="196 GB", min="128 GB", max="256 GB")), 1),
                  # ...
               ],
            ),
      ),
      # ...
   }
   # ...
)

Or, let’s say you know the number of nodes in the cpu_p1 cluster is around 700 ±30, and that each node has 128 GB of RAM ±64 GB:

jean_zay = Park(
   clusters={
      # ...
      "cpu_p1": Cluster(
            servers_count=SourcedValue(name="cpu_p1_server_count", value="700", standard_deviation="30"),
            server_model=Server(
               components=[
                  # ...
                  (RAM(size=SourcedValue(name="cpu_p1_server_model_RAM_size", value="128 GB", standard_deviation="64 GB")), 1),
                  # ...
               ],
            ),
      ),
      # ...
   }
   # ...
)

See Uncertainty for more information.

Embodied Impacts vs. Usage Impacts#

The embodied impacts of a component, server, or data center refer to the impacts resulting from the extraction, production, and distribution phases.

Usage impacts refer to the impacts from the usage phase, primarily due to energy consumption.

The embedded impacts of a job are the share of embodied impacts attributed to that job. A job does not have a “production impact” on its own — it simply runs on a server within a data center. However, the components of that data center do have embodied impacts. The job, and the people running it, are considered responsible for a share of the data center’s embodied impacts. This share is what we call the embedded impact of the job. An Allocation Method is used to assign a portion of these embodied impacts to the job.

Allocation Method#

An allocation method defines how to attribute a share of the embodied impacts of the infrastructure and components to a specific job.

In ImpactHPC, allocation methods must be of type AllocationMethod, which is a type alias for functions that take the embodied impacts, the job duration and the usage rate of the component as parameters, and return the embedded impacts (i.e., the portion of the embodied impacts attributed to the job).

type AllocationMethod = Callable[[Impact, ReplicableValue], Impact]

ImpactHPC includes two allocation methods by default:

naive_allocation() takes a lifetime parameter and returns an Allocation Method that assigns a share of the embodied impacts corresponding to the fraction of the component’s lifetime consumed by the job. For example, if a server has a 3-year lifetime and a job runs for 3 days, the job represents 0.27% of the total lifetime, so it is assigned 0.27% of the embodied impacts.
impacthpc.decrease_over_time_allocation() takes an age parameter (the age of the component or infrastructure) and returns an Allocation Method that attributes 50% of the embodied impacts to jobs run during the first year of use, 25% to the second year, 12.5% to the third year, and so on. The naive allocation method may encourage the use of new infrastructure, which is often more energy efficient — but this can lead to premature renewal of hardware, increasing environmental impact. The decrease-over-time method mitigates this by attributing a larger share of impacts to early usage and decreasing shares as the equipment ages.
Creating your own custom allocation method: Any function matching the AllocationMethod type alias can be used. It must accept three arguments: the embodied impacts (of type Impact), the job duration (of type ReplicableValue) and the usage rate of the component (also of type ReplicableValue).

You can configure the allocation method used for server, cooling, and battery embodied impacts. For example:

jean_zay = Park(
   clusters={
      # ...
      "cpu_p1": Cluster(
            servers_count=720,
            server_model=Server(
               components=[
                  # ...
               ],
               allocation_method=decrease_over_time_allocation(age="3 years")
            ),
            cooling=Cooling(
               cooling_power="200 kW",
               allocation_method=naive_allocation(lifetime="10 years")
            ),
            batteries=[
               Battery(
                  cooling_power="200 kW",
                  allocation_method=naive_allocation(lifetime="10 years")
               )
            ]
      ),
      # ...
   },
)

Configuration#

The default configuration file is located at ./impacthpc/config.yml. To use a custom configuration file, you can define the environment variable IMPACTHPC_CONFIG_FILE before importing the library.

This configuration file is written in YAML and includes inline comments explaining the purpose of each entry.

How to Modify or Adapt ImpactHPC to My Use Case?#

This library is designed to be easily readable and modifiable. You can start by following the Developer Guide.

Contents: