Developer Guide#

This short document explains the core concepts of ImpactHPC and can help you modify and adapt it to your specific needs.

How-to Examples#

How to Edit a Formula#

Example use case: Let’s say that for your needs, you want to take into account the engraving thickness of the CPU die when computing its embodied impacts.

You can find the CPU class in impacthpc/cpu.py. The code is designed to be readable; you’ll find the formula die_size_factor * die_size + base in the method embodied_impacts().

Here, die_size is an instance of ReplicableValue, while die_size_factor and base are instances of Impact. You don’t need to understand the internals of these classes to use them — just know that they behave like floats, but they also track the origin of the values and compute uncertainties automatically. (If you’re curious, see the ReplicableValue section.)

To add support for engraving thickness, start by defining a new attribute in the class. This attribute should be a ReplicableValue, but since it’s easier for users to pass strings, you can accept a ReplicableValue | str | None in the constructor, with a default of None:

def __init__(
    # ...,
    engraving_thickness: ReplicableValue | str | None = None,
):
    # ...

Now, if engraving_thickness is a string, you’ll want to convert it to a ReplicableValue. If it’s None, you’ll want to fall back to a default value defined in the Configuration.

To handle conversion from a str, int, or float to a ReplicableValue, use the constructor SourcedValue.from_argument. To handle the default value fallback, use Python’s or operator and the SourcedValue.from_config constructor:

def __init__(
    # ...,
    engraving_thickness: ReplicableValue | str | None = None,
):
    # ...
    self.engraving_thickness = (
        SourcedValue.from_argument("engraving_thickness", engraving_thickness)
        or SourcedValue.from_config("engraving_thickness", config["default_values_cpu"]["engraving_thickness"])
    )

from_argument() will convert the engraving_thickness parameter to a ReplicableValue if necessary. If the argument is None, the or operator ensures we fall back to the config-defined default using from_config().

Both methods take a first parameter name, which is used by explain() to provide meaningful explanations in the output.

Of course, you’ll also need to update the default Configuration in impacthpc/config.yml to add a default value for engraving_thickness:

...
default_values_cpu:
    ...
    engraving_thickness:
        value: "12 nm"
        explaination: "explain what is this value here"
        source: "quote your source here."
        min: "4 nm"  # optional
        max: "24 nm"  # optional
        standard_deviation: "5 nm"  # optional

Now that you’ve added a new attribute, you can edit the formula for estimating the embodied impacts in embodied_impacts().

embodied_impacts = (... your formula using self.engraving_thickness here ...)
embodied_impacts.make_intermediate_result(
    "cpu_embodied_impacts",
    "Embodied impacts of the CPU, estimated from its die surface area and the impact per mm² of die, plus a base impact that accounts for constant impacts common to all CPUs, such as power delivery, packaging, transport, etc.",
)

return embodied_impacts

make_intermediate_result() is used to create named intermediate results that will appear in the output of explain(). See the section ReplicableValue for more details.

You may also want to use data from impacthpc/data/crowdsourcing/cpu_specs.csv instead of relying solely on config values. To do so, you can first check self.cpus_matching_by_name, a pandas.DataFrame that contains only the CPU entries matching the user-provided name. If no match is found, you can use CPUs_specs, which contains the full dataset.

def __init__(
    ...,
    engraving_thickness: ReplicableValue | None = ReplicableValue.from_config(
        "engraving_thickness",
        config["default_values_cpu"]["engraving_thickness"]
    )
):
    ...
    self.engraving_thickness = (
        engraving_thickness
        or cpus_matching_by_name["io_process_size"].mean()  # fallback: use the average process size of matched CPUs
    )

Project Structure#

docs | Project documentation
- build | HTML/CSS files generated by the Sphinx documentation builder
- source | Source files for the Sphinx documentation
impacthpc | Main ImpactHPC module
- config.yml | Configuration file containing default values and settings
- data | Data files used for estimations, including component specs and national energy mixes
- src | Main source code of the library
  
  cpu.py, ram.py, park.py… | Modules computing the impacts of each component or infrastructure. EDIT THESE!
  
  core | Core engine of the library, containing most of the internal logic and computation tools

Core Concepts#

ImpactHPC’s philosophy is to concentrate complexity inside the core module. This module contains reusable classes and functions, so that the rest of the code—responsible for component and infrastructure impact calculations—remains easy to read and modify.

ReplicableValue#

ReplicableValue is a class that encapsulates:

the origin and explanation of numerical values,
their uncertainty metadata, and
the operations performed on them.

In ImpactHPC, every number and computation is represented as a tree.

Diagram showing a computation tree, where leaves are variables and nodes are operations.

Any number used in the library is represented as a ReplicableValue. This is an abstract class, with two concrete subclasses:

SourcedValue represents a constant or externally sourced value. It includes a human-readable explanation and citation in its source attribute.
Operation represents a computation node. It stores the mathematical operator (plus, minus, divide, multiply, ln, …) and a list of operands (other ReplicableValue instances involved in the operation).

Every ReplicableValue instance has:

value: a string-formatted number with a unit (using the Pint unit system),
min and max: optional bounds (also using Pint units),
standard_deviation: see the Uncertainty section,
warnings, explanation, and ontology: fields for annotating results, used by explain().

The class overrides arithmetic operators (+, -, *, /) so you can manipulate ReplicableValue objects with standard syntax. During these operations, the min, max, and standard_deviation are automatically propagated.

To make results interpretable, the method explain() returns a full explanation of how a value was obtained, including its sources and formula. Three output formats are available:

TextFormatter – for terminal output
JSONFormatter – for machine-readable logs
HTMLFormatter – for interactive web-based output

The Impacts class is a dictionary-like structure, mapping impact indicators (like “GWP”, “ADPE”, “PE”…) to their respective ReplicableValue instances. It also supports arithmetic operations: adding two Impacts instances adds their corresponding indicators automatically.

Formatters#

TextFormatter, HTMLFormatter, and JSONFormatter are the three default formatters that come with ImpactHPC. You can create a formatter by subclassing the class Formatter. You should override the following methods:

format_sourced_value() : Returns a SourcedValue in the format you are targeting.
format_operation() : Returns an Operation in the format you are targeting. Warning: at this point, the operands parameter contains a list of already formatted Operations and SourcedValues. They are already in your target format, and you should build this operation’s formatted version with those already formatted operands.
format_extracted_important_values() : Your formatter can, but it’s not mandatory, extract important values and make a short summary of what their values and the hypotheses made are. TextFormatter adds a short description of these values at the beginning of the text it returns. This method formats those important values, and the formatted important values are given to format_result(). If you don’t want this short summary, return anything here and ignore the extracted_important_values parameter in format_result().
format_result() : The last call to format_operation() returns the whole formatted result. It is then passed to format_result() to do some final processing. For example, the format_operation of JSONFormatter returns dicts; the final dict is passed to format_result. At this point, the whole dict is converted to JSON using Python’s json.dumps function. If you don’t want to do any processing, just return the result parameter.

Formatter is an abstract class that declares two Python generic types. These generics are called T and U. T is the return type of the methods format_sourced_value(), format_operation(), and format_extracted_important_values(). It is also the type of the result parameter of format_result(). U is the final return type of the formatter, returned by format_result().

Generics are used for type hinting (they are development-time only). Any subclass must give a concrete type to T and U. The concrete type of T must be the “working type” that is used by format_sourced_value(), format_operation() and format_extracted_important_values(). U is the final return type of final_result(), the target type of this formatter. Those two types can be the same, as in TextFormatter, or can be different, as in JSONFormatter

Using generics allows for great flexibility in the types returned and used by the methods of Formatter subclasses while ensuring that they are the same between the different methods.

For example, JSONFormatter’s methods return dicts, format_operation composes those dicts into a bigger one, and the final dict is passed to format_result. At this point, the dict is converted to str. This looks like:

class JSONFormatter(Formatter[dict, str]): # T is dict, U is str

    def format_sourced_value(
        #...
    ) -> dict: # The abstract signature says that format_sourced_value must return something of type T, which is dict here
        # create a dict from the params...

    def format_operation(
        # ...
        operands: list[dict], # The abstract signature says that format_operation operands params is a list of things of type T, which is dict here
        # ...
    ) -> dict: # The abstract signature says that format_operation must return something of type T, which is dict here
        # create a dict from the params, especially the sub-dicts contained in operands

    def format_extracted_important_values(
        # ...
    ) -> dict: # The abstract signature says that format_extracted_important_values must return something of type T, which is dict here
        # ...

    def format_result(
        #...
        result: dict # The abstract signature says that the result param of format_result must be of type T, which is dict here
        #...
    ) -> str: # The abstract signature says that format_result must return something of type U, which is str here
        # convert result to a str using json.dumps

Uncertainty#

If you haven’t read the ReplicableValue, we recommend doing so first.

Uncertainty propagation in ImpactHPC is handled by the ReplicableValue system. There are currently two complementary representations for uncertainty:

Bounds-based: via the min and max attributes
Probabilistic: via a standard_deviation

Both are optional, and can be combined or used independently depending on the data and context.

Min and Max#

When instantiating a SourcedValue, you can provide optional min and max values to define an uncertainty range. These bounds are propagated through arithmetic operations: the resulting Operation will compute its own min and max based on the operands.

For example, consider two SourcedValue instances:

A = 3, with min = 1 and max = 7
B = 5, with min = 2 and max = 6

If you compute A + B, the result will be:

value = 8
min = 1 + 2 = 3
max = 7 + 6 = 13

This kind of arithmetic is known as interval arithmetic. Similar rules apply for other operations; see Wikipedia: Interval operators.

A key limitation of this approach is that uncertainties always accumulate, even when the variables might compensate each other in practice. In reality, one variable may be slightly overestimated while another is underestimated, narrowing the range of the result—but interval arithmetic cannot account for this. This is why ImpactHPC also supports a second, complementary system: random variable-based uncertainty.

Random Variable#

Each ReplicableValue can also be interpreted as a random variable, where:

value represents the expected value (mean),
standard_deviation represents statistical uncertainty.

When combining two values (e.g., A + B), their standard deviations are propagated mathematically—but only if we know how the two variables are correlated. This is a subtle but essential distinction.

Consider:

A and B are independent random variables: if A is overestimated, it tells us nothing about B.
C is a variable, and we compute C - C: the result is zero, with zero standard deviation, because the operands are perfectly correlated (they’re the same variable).

So, correlation affects the uncertainty propagation.

However, tracking the correlation between every value in a complex system like a server is infeasible. For example, what’s the correlation between the die size of an Intel Core i7-1360P and France’s electricity carbon intensity? We don’t know.

To manage this complexity, ImpactHPC uses a binary model of correlation:

Either values are independent (correlation = 0),
Or values are fully positively correlated (correlation = 1).

The correlation mode can be explicitly controlled using a Python context manager:

with ReplicableValue.set_correlation_mode(CorrelationMode.DEPENDENT):
    # All standard deviations computed here assume full correlation (1.0)
    result = C - C

# Outside the context, correlation is assumed to be 0 (independent)
result2 = A + B

In practice:

The impact of CPUs in a server is considered mutually dependent, even across different models.
The same goes for GPUs or other component groups.
However, CPUs and GPUs are considered independent of each other.
This assumption applies to any different types of components (CPU vs RAM, storage vs GPU, etc.).

This correlation model allows uncertainty to be propagated realistically while keeping the logic tractable.

Pint#

Pint is the library used by ImpactHPC to handle units. For example, if I have two values:

memory_size: the size of a RAM memory, expressed in gigabytes
memory_power_factor: the power usage of 1 gigabyte of memory, expressed in watts per gigabyte

Using Pint, I can declare those values with their units:

from impacthpc.src.core.config import ureg # ureg is an instance of UnitRegistry with all the useful units like gCO2eq, gSbeq...
memory_size = ureg.Quantity("10 GB")
memory_power_factor = ureg.Quantity("0.284 W / GB")

And if I multiply those two values, Pint automatically finds the result’s unit:

print(memory_size * memory_power_factor) # 2.84 W

Here is another example: let’s say we have a job that runs for 48 hours on a CPU whose lifetime is 6 years. With a naive allocation method we would compute the embedded impact of the CPU as job_duration / cpu_lifetime * cpu_embodied_impact where:

job_duration is the duration of the job, expressed in hours
cpu_lifetime is the lifetime of the CPU, expressed in years
cpu_embodied_impact is the embodied impact of the CPU, let’s say it’s greenhouse gas emissions, expressed in gCO2eq.

The embedded impact of the CPU is:

from impacthpc.src.core.config import ureg # ureg is an instance of UnitRegistry with all the useful units like gCO2eq, gSbeq...
job_duration = ureg.Quantity("48 h")
cpu_lifetime = ureg.Quantity("6 years")
cpu_embodied_impact = ureg.Quantity("45 kgCO2eq")
cpu_embedded_impact = job_duration / cpu_lifetime * cpu_embodied_impact
print(cpu_embedded_impact) # 360.0 hour * kgCO2eq / year

By default, Pint does not reduce the units; it keeps the hour / year as is. We can call to_reduced_units() to reduce the units:

print(cpu_embedded_impact.to_reduced_units()) # 360.0 kgCO2eq

Nice! Because “48 h” and “6 years” are two durations, we can simplify the unit. Duration is the dimension of the hour and the year units. But Pint has some limitations. For example, physicists don’t think that the bit has a dimension; it is “dimensionless.” This means that when we call to_reduced_units() on a value whose unit is byte, kilobyte, gigabyte, etc., the unit disappears!

Overriding the definition of the bit isn’t possible without rewriting the whole unit file of Pint. That’s why the file impacthpc/src/core/config.yml reads the file impacthpc/src/core/units/units.txt and loads its UnitRegistry from there. It declares an alternative definition for the bit, with its own dimension (the “information”), and declares two other units and dimensions: the CO2Mass that is expressed in gCO2eq, and the SbMass expressed in gSbeq. I also declare an alias for W.h (watt hour), called Wh.

This limitation means that the maintainers should follow Pint’s unit file updates and update impacthpc/src/core/units/units.txt accordingly.

The format_reduced() and format_not_reduced() functions are used to format the values in formatters. For CO2Mass and SbMass, if the value is more than 1e06 gCO2eq or 1e06 gSbeq, they are converted to TCO2eq and TSbeq.

Deploy updates#

To deploy updates, make sure that requirements.txt is up to date if you have added new libraries. You can use pip-chill to list dependencies installed in your environment, or pipreqs to list the dependencies actually used in the project.

To start, run :

./pre-commit-checks.sh

This will : - format the code - run differents checks - run pytest

Update the project version in pyproject.toml and in docs/conf.py (change release = "x.y.z").
Build the project :

python -m build

Upload the tar.gz file to test.pypi.org :

python3 -m twine upload --verbose --repository testpypi dist/<nom_du_package>-<derniere_version>.tar.gz

Now you can test than everything is working nicely in a new python project my creating a virtual environnement and installing the package using :

python -m venv .venv
pip install --extra-index-url=https://test.pypi.org/simple/ impacts-hpc

If everything is working nicely, you can upload to the real Pypi repository :

python3 -m twine upload --verbose --repository pypi dist/<nom_du_package>-<derniere_version>.tar.gz