GHG Emissions Simulator for Generative AI

This simulator estimates the greenhouse gas emissions (CO₂e) generated by generative AI systems, using the TokenFlop modeling method developed by Digital4Better. It covers training and inference phases, accounts for hardware manufacturing and usage footprints, and supports text, image, audio, and video modalities.

The TokenFlop method comes from the Data4Impact research program, which won the BPI/ADEME innovation competition, conducted by Digital4Better to develop rigorous tools for evaluating the environmental impact of digital technology.

Read the full TokenFlop methodology

Token

Universal unit: ~4 characters for text, spatial patches for images, temporal frames for video/audio.

FLOPs

Compute load estimated by use case (6×P for training, 2×P×tokens for inference).

GPU time

FLOPs ÷ (GPU capacity × MFU). MFU between 25% and 50%, default 40%.

Energy

GPU power × time × PUE (data center efficiency, default 1.2).

Carbon

Energy × regional emission factor + hardware manufacturing amortized over 5 years.

AI Model

Tokens per day

Data center region

0.0e+0

kg CO₂e / month

0.0e+0

kWh / month

0.0M

tokens / month

—

0% operational, 100% embodied (manufacturing carbon).

Results are orders of magnitude from theoretical modeling based on publicly available data. They do not constitute a direct measurement of actual emissions. Results depend on input parameters and assumptions; consult the methodology for scope and limitations.

TokenFlop Methodology

Bottom-up modeling: estimating compute load (FLOPs) from model usage, converting to GPU time, then to energy consumption and GHG emissions. Integrates manufacturing footprint following LCA logic (ISO 14040 / ITU L.1410).^[3][4]

Base unit and input data

The base unit is the token — a discrete unit the model manipulates for input/output. Depending on the modality, a token can be a word fragment, a spatial position, or a coded temporal unit.

Modality	What is a token	Example
Text	Word fragment (3-4 characters average)	1,000 tokens ≈ 750 words in English
Image	Spatial patch (e.g. 16×16 px)
Audio	Temporal token (codec, e.g. EnCodec)
Video	Spatial token per frame × frame count

Compute load estimation (FLOPs)

Computational load is estimated by usage phase:^[1]

Phase	Formula
Training
Fine-tuning
Inference — prompt
Inference — text generation
Image generation
Video generation

Inference assumption: systematic KV cache presence, reducing prompt cost to ~1 FLOP per parameter/token.

Conversion to GPU time (GPUh)

FLOPs are converted to effective processing time:

: Theoretical GPU capacity in FLOP/h (e.g. 989 TFLOPS FP8 for an H100)
MFU : Model FLOP Utilization — percentage of theoretical capacity actually usable, estimated between 25% and 50%. Default value: 40%.^[8]

Energy consumption conversion

GPU time is translated into energy consumed:

: GPU power in watts (e.g. 700 W for an H100)
PUE : Power Usage Effectiveness — data center energy efficiency. Default value: 1.2

Operational environmental impact

Energy is converted to GHG emissions via the regional electricity emission factor:

: emission factor by region, from the Digital4Better open data repository (e.g. 0.420 kgCO₂e/kWh for the US, 0.040 kgCO₂e/kWh for France).^[6]

Manufacturing impact (embodied footprint)

Hardware manufacturing footprint is allocated proportionally to usage time:

Default lifespan: 5 years. Non-GPU server components (CPU, RAM, storage, chassis) are distributed proportionally to GPU count per server, following LCA logic (ISO 14040 / ITU L.1410).^[3][4]

Validation — Llama 3.1 405B

For consistency verification, TokenFlop was applied to the open-source Llama 3.1 model (405B parameters), trained on ~15 trillion tokens with 24,576 H100 GPUs:

Model	Estimated GPU time	Estimated emissions
Llama 3.1 8B	1,46 M GPUh	~420 tCO₂e
Llama 3.1 70B	7,0 M GPUh	~2 040 tCO₂e
Llama 3.1 405B	30,84 M GPUh	~8 930 tCO₂e

Deviation from Hugging Face data: < 2%, validating the modeling coherence. For inference, with a 400-token average prompt on Llama 3.1 405B: ~0.1 gCO₂e per request.^[5]

Assumptions and limitations

Results are estimates from theoretical modeling and do not constitute direct measurement of actual emissions. Main sources of uncertainty:

Actual model characteristics often confidential (training data, effective MFU, hidden dimensions count).
Lack of reliable LCA data on certain AI-specific equipment.
TPU, FPGA, and ASIC specificities are not accounted for.
Model-to-hardware memory adequacy is not verified.

The method is suited for relative scenario comparison, project framing, and prospective evaluation — not for certified emissions reporting.

Bibliography

Schwartz, R., et al. (2020). Green AI. Communications of the ACM. arXiv: 1907.10597
IEA (2024). Energy and AI.
ISO 14040/14044. Environmental management — Life Cycle Assessment.
ITU L.1410. Methodology for the assessment of the environmental life cycle impact of ICT goods, networks and services.
Meta (2024). The Llama 3 Herd of Models. arXiv: 2407.21783
Digital4Better. Open Data Repository. digital4better.github.io/data
Digital4Better. Open Methodology for Generative AI. digital4better.github.io/methodology/ai
NVIDIA (2025). Llama 3.1 70B DGXC Benchmarking.