Developer Guide#
This short document explains the core concepts of ImpactHPC and can help you modify and adapt it to your specific needs.
How-to Examples#
How to Edit a Formula#
Example use case: Let’s say that for your needs, you want to take into account the engraving thickness of the CPU die when computing its embodied impacts.
You can find the CPU
class in impacthpc/cpu.py
. The code is designed to be readable; you’ll find the formula die_size_factor * die_size + base
in the method embodied_impacts()
.
Here, die_size
is an instance of ReplicableValue
, while die_size_factor
and base
are instances of Impact
.
You don’t need to understand the internals of these classes to use them — just know that they behave like floats, but they also track the origin of the values and compute uncertainties automatically.
(If you’re curious, see the ReplicableValue section.)
To add support for engraving thickness, start by defining a new attribute in the class.
This attribute should be a ReplicableValue
, but since it’s easier for users to pass strings, you can accept a ReplicableValue | str | None
in the constructor, with a default of None
:
def __init__(
# ...,
engraving_thickness: ReplicableValue | str | None = None,
):
# ...
Now, if engraving_thickness is a string, you’ll want to convert it to a ReplicableValue
.
If it’s None
, you’ll want to fall back to a default value defined in the Configuration.
To handle conversion from a str
, int
, or float
to a ReplicableValue
, use the constructor SourcedValue.from_argument
.
To handle the default value fallback, use Python’s or operator and the SourcedValue.from_config
constructor:
def __init__(
# ...,
engraving_thickness: ReplicableValue | str | None = None,
):
# ...
self.engraving_thickness = (
SourcedValue.from_argument("engraving_thickness", engraving_thickness)
or SourcedValue.from_config("engraving_thickness", config["default_values_cpu"]["engraving_thickness"])
)
from_argument()
will convert the engraving_thickness parameter to a ReplicableValue if necessary.
If the argument is None
, the or operator ensures we fall back to the config-defined default using from_config()
.
Both methods take a first parameter name, which is used by explain()
to provide meaningful explanations in the output.
Of course, you’ll also need to update the default Configuration in impacthpc/config.yml
to add a default value for engraving_thickness:
...
default_values_cpu:
...
engraving_thickness:
value: "12 nm"
explaination: "explain what is this value here"
source: "quote your source here."
min: "4 nm" # optional
max: "24 nm" # optional
standard_deviation: "5 nm" # optional
Now that you’ve added a new attribute, you can edit the formula for estimating the embodied impacts in embodied_impacts()
.
embodied_impacts = (... your formula using self.engraving_thickness here ...)
embodied_impacts.make_intermediate_result(
"cpu_embodied_impacts",
"Embodied impacts of the CPU, estimated from its die surface area and the impact per mm² of die, plus a base impact that accounts for constant impacts common to all CPUs, such as power delivery, packaging, transport, etc.",
)
return embodied_impacts
make_intermediate_result()
is used to create named intermediate results that will appear in the output of
explain()
.
See the section ReplicableValue for more details.
You may also want to use data from impacthpc/data/crowdsourcing/cpu_specs.csv instead of relying solely on config values.
To do so, you can first check self.cpus_matching_by_name
, a pandas.DataFrame
that contains only the CPU entries matching the user-provided name.
If no match is found, you can use CPUs_specs
, which contains the full dataset.
def __init__(
...,
engraving_thickness: ReplicableValue | None = ReplicableValue.from_config(
"engraving_thickness",
config["default_values_cpu"]["engraving_thickness"]
)
):
...
self.engraving_thickness = (
engraving_thickness
or cpus_matching_by_name["io_process_size"].mean() # fallback: use the average process size of matched CPUs
)
Project Structure#
- docs | Project documentation
build | HTML/CSS files generated by the Sphinx documentation builder
source | Source files for the Sphinx documentation
- impacthpc | Main ImpactHPC module
config.yml | Configuration file containing default values and settings
data | Data files used for estimations, including component specs and national energy mixes
- src | Main source code of the library
cpu.py, ram.py, park.py… | Modules computing the impacts of each component or infrastructure. EDIT THESE!
core | Core engine of the library, containing most of the internal logic and computation tools
Core Concepts#
ImpactHPC’s philosophy is to concentrate complexity inside the core
module.
This module contains reusable classes and functions, so that the rest of the code—responsible for component and infrastructure impact calculations—remains easy to read and modify.
ReplicableValue#
ReplicableValue
is a class that encapsulates:
the origin and explanation of numerical values,
their uncertainty metadata, and
the operations performed on them.
In ImpactHPC, every number and computation is represented as a tree.

Any number used in the library is represented as a ReplicableValue. This is an abstract class, with two concrete subclasses:
SourcedValue
represents a constant or externally sourced value. It includes a human-readable explanation and citation in itssource
attribute.Operation
represents a computation node. It stores the mathematicaloperator
(plus, minus, divide, multiply, ln, …) and a list ofoperands
(otherReplicableValue
instances involved in the operation).
Every ReplicableValue instance has:
value
: a string-formatted number with a unit (using the Pint unit system),min
andmax
: optional bounds (also using Pint units),standard_deviation
: see the Uncertainty section,warnings
,explanation
, andontology
: fields for annotating results, used byexplain()
.
The class overrides arithmetic operators (+, -, *, /) so you can manipulate ReplicableValue objects with standard syntax.
During these operations, the min
, max
, and standard_deviation
are automatically propagated.
To make results interpretable, the method explain()
returns a full explanation of how a value was obtained, including its sources and formula.
Three output formats are available:
TextFormatter
– for terminal outputJSONFormatter
– for machine-readable logsHTMLFormatter
– for interactive web-based output
The Impacts
class is a dictionary-like structure, mapping impact indicators (like “GWP”, “ADPE”, “PE”…) to their respective ReplicableValue
instances.
It also supports arithmetic operations: adding two Impacts
instances adds their corresponding indicators automatically.
Formatters#
TextFormatter
, HTMLFormatter
, and JSONFormatter
are the three default formatters that come with ImpactHPC. You can create a formatter by subclassing the class Formatter
. You should override the following methods:
format_sourced_value()
: Returns a SourcedValue in the format you are targeting.format_operation()
: Returns an Operation in the format you are targeting. Warning: at this point, theoperands
parameter contains a list of already formatted Operations and SourcedValues. They are already in your target format, and you should build this operation’s formatted version with those already formatted operands.format_extracted_important_values()
: Your formatter can, but it’s not mandatory, extract important values and make a short summary of what their values and the hypotheses made are.TextFormatter
adds a short description of these values at the beginning of the text it returns. This method formats those important values, and the formatted important values are given toformat_result()
. If you don’t want this short summary, return anything here and ignore theextracted_important_values
parameter informat_result()
.format_result()
: The last call toformat_operation()
returns the whole formatted result. It is then passed toformat_result()
to do some final processing. For example, theformat_operation
ofJSONFormatter
returns dicts; the final dict is passed toformat_result
. At this point, the whole dict is converted to JSON using Python’s json.dumps function. If you don’t want to do any processing, just return theresult
parameter.
Formatter
is an abstract class that declares two Python generic types. These generics are called T
and U
. T
is the return type of the methods format_sourced_value()
, format_operation()
, and format_extracted_important_values()
. It is also the type of the result
parameter of format_result()
. U
is the final return type of the formatter, returned by format_result()
.
Generics are used for type hinting (they are development-time only). Any subclass must give a concrete type to T
and U
. The concrete type of T
must be the “working type” that is used by format_sourced_value()
, format_operation()
and format_extracted_important_values()
. U
is the final return type of final_result()
, the target type of this formatter. Those two types can be the same, as in TextFormatter
, or can be different, as in JSONFormatter
Using generics allows for great flexibility in the types returned and used by the methods of Formatter subclasses while ensuring that they are the same between the different methods.
For example, JSONFormatter
’s methods return dicts, format_operation
composes those dicts into a bigger one, and the final dict is passed to format_result
. At this point, the dict is converted to str
. This looks like:
class JSONFormatter(Formatter[dict, str]): # T is dict, U is str
def format_sourced_value(
#...
) -> dict: # The abstract signature says that format_sourced_value must return something of type T, which is dict here
# create a dict from the params...
def format_operation(
# ...
operands: list[dict], # The abstract signature says that format_operation operands params is a list of things of type T, which is dict here
# ...
) -> dict: # The abstract signature says that format_operation must return something of type T, which is dict here
# create a dict from the params, especially the sub-dicts contained in operands
def format_extracted_important_values(
# ...
) -> dict: # The abstract signature says that format_extracted_important_values must return something of type T, which is dict here
# ...
def format_result(
#...
result: dict # The abstract signature says that the result param of format_result must be of type T, which is dict here
#...
) -> str: # The abstract signature says that format_result must return something of type U, which is str here
# convert result to a str using json.dumps
Uncertainty#
If you haven’t read the ReplicableValue, we recommend doing so first.
Uncertainty propagation in ImpactHPC is handled by the ReplicableValue system. There are currently two complementary representations for uncertainty:
Bounds-based: via the
min
andmax
attributesProbabilistic: via a
standard_deviation
Both are optional, and can be combined or used independently depending on the data and context.
Min and Max#
When instantiating a SourcedValue
, you can provide optional min
and max
values to define an uncertainty range.
These bounds are propagated through arithmetic operations: the resulting Operation
will compute its own min
and max
based on the operands.
For example, consider two SourcedValue instances:
A = 3, with
min = 1
andmax = 7
B = 5, with
min = 2
andmax = 6
If you compute A + B
, the result will be:
value = 8
min = 1 + 2 = 3
max = 7 + 6 = 13
This kind of arithmetic is known as interval arithmetic. Similar rules apply for other operations; see Wikipedia: Interval operators.
A key limitation of this approach is that uncertainties always accumulate, even when the variables might compensate each other in practice. In reality, one variable may be slightly overestimated while another is underestimated, narrowing the range of the result—but interval arithmetic cannot account for this. This is why ImpactHPC also supports a second, complementary system: random variable-based uncertainty.
Random Variable#
Each ReplicableValue
can also be interpreted as a random variable, where:
value
represents the expected value (mean),standard_deviation
represents statistical uncertainty.
When combining two values (e.g., A + B), their standard deviations are propagated mathematically—but only if we know how the two variables are correlated. This is a subtle but essential distinction.
Consider:
A and B are independent random variables: if A is overestimated, it tells us nothing about B.
C is a variable, and we compute
C - C
: the result is zero, with zero standard deviation, because the operands are perfectly correlated (they’re the same variable).
So, correlation affects the uncertainty propagation.
However, tracking the correlation between every value in a complex system like a server is infeasible. For example, what’s the correlation between the die size of an Intel Core i7-1360P and France’s electricity carbon intensity? We don’t know.
To manage this complexity, ImpactHPC uses a binary model of correlation:
Either values are independent (correlation = 0),
Or values are fully positively correlated (correlation = 1).
The correlation mode can be explicitly controlled using a Python context manager:
with ReplicableValue.set_correlation_mode(CorrelationMode.DEPENDENT):
# All standard deviations computed here assume full correlation (1.0)
result = C - C
# Outside the context, correlation is assumed to be 0 (independent)
result2 = A + B
In practice:
The impact of CPUs in a server is considered mutually dependent, even across different models.
The same goes for GPUs or other component groups.
However, CPUs and GPUs are considered independent of each other.
This assumption applies to any different types of components (CPU vs RAM, storage vs GPU, etc.).
This correlation model allows uncertainty to be propagated realistically while keeping the logic tractable.
Pint#
Pint is the library used by ImpactHPC to handle units. For example, if I have two values:
memory_size
: the size of a RAM memory, expressed in gigabytesmemory_power_factor
: the power usage of 1 gigabyte of memory, expressed in watts per gigabyte
Using Pint, I can declare those values with their units:
from impacthpc.src.core.config import ureg # ureg is an instance of UnitRegistry with all the useful units like gCO2eq, gSbeq...
memory_size = ureg.Quantity("10 GB")
memory_power_factor = ureg.Quantity("0.284 W / GB")
And if I multiply those two values, Pint automatically finds the result’s unit:
print(memory_size * memory_power_factor) # 2.84 W
Here is another example: let’s say we have a job that runs for 48 hours on a CPU whose lifetime is 6 years. With a naive allocation method
we would compute the embedded impact of the CPU as job_duration / cpu_lifetime * cpu_embodied_impact
where:
job_duration
is the duration of the job, expressed in hourscpu_lifetime
is the lifetime of the CPU, expressed in yearscpu_embodied_impact
is the embodied impact of the CPU, let’s say it’s greenhouse gas emissions, expressed in gCO2eq.
The embedded impact of the CPU is:
from impacthpc.src.core.config import ureg # ureg is an instance of UnitRegistry with all the useful units like gCO2eq, gSbeq...
job_duration = ureg.Quantity("48 h")
cpu_lifetime = ureg.Quantity("6 years")
cpu_embodied_impact = ureg.Quantity("45 kgCO2eq")
cpu_embedded_impact = job_duration / cpu_lifetime * cpu_embodied_impact
print(cpu_embedded_impact) # 360.0 hour * kgCO2eq / year
By default, Pint does not reduce the units; it keeps the hour / year
as is. We can call to_reduced_units()
to reduce the units:
print(cpu_embedded_impact.to_reduced_units()) # 360.0 kgCO2eq
Nice! Because “48 h” and “6 years” are two durations, we can simplify the unit. Duration is the dimension of the hour and the year units. But Pint has some limitations. For example, physicists don’t think that the bit has a dimension; it is “dimensionless.” This means that when we call to_reduced_units() on a value whose unit is byte, kilobyte, gigabyte, etc., the unit disappears!
Overriding the definition of the bit isn’t possible without rewriting the whole unit file of Pint. That’s why the file impacthpc/src/core/config.yml
reads the file impacthpc/src/core/units/units.txt
and loads its UnitRegistry from there.
It declares an alternative definition for the bit, with its own dimension (the “information”), and declares two other units and dimensions: the CO2Mass that is expressed in gCO2eq, and the SbMass expressed in gSbeq. I also declare an alias for W.h (watt hour), called Wh
.
This limitation means that the maintainers should follow Pint’s unit file updates and update impacthpc/src/core/units/units.txt
accordingly.
The format_reduced()
and format_not_reduced()
functions are used to format the values in formatters.
For CO2Mass and SbMass, if the value is more than 1e06 gCO2eq or 1e06 gSbeq, they are converted to TCO2eq and TSbeq.
Deploy updates#
To deploy updates, make sure that requirements.txt is up to date if you have added new libraries. You can use pip-chill to list dependencies installed in your environment, or pipreqs to list the dependencies actually used in the project.
To start, run :
./pre-commit-checks.sh
This will : - format the code - run differents checks - run pytest
Update the project version in pyproject.toml and in docs/conf.py (change
release = "x.y.z"
).Build the project :
python -m build
Upload the tar.gz file to test.pypi.org :
python3 -m twine upload --verbose --repository testpypi dist/<nom_du_package>-<derniere_version>.tar.gz
Now you can test than everything is working nicely in a new python project my creating a virtual environnement and installing the package using :
python -m venv .venv
pip install --extra-index-url=https://test.pypi.org/simple/ impacts-hpc
If everything is working nicely, you can upload to the real Pypi repository :
python3 -m twine upload --verbose --repository pypi dist/<nom_du_package>-<derniere_version>.tar.gz