AI’s Environmental Footprint: Carbon, Water, and Infrastructure

AI’s Environmental Footprint: Carbon, Water, and Infrastructure

Every AI query rides an invisible supply chain: electricity drawn from grids with wildly different carbon intensities, water evaporated at cooling towers, and silicon manufactured in energy‑intensive fabs. A single month‑long training run on 10,000 GPUs can consume around 3–4 GWh after cooling overheads, emitting over a thousand tons of CO2e on a moderately clean grid and using over a million liters of water if evaporative cooling is employed.

This article maps the ecological footprint of AI—Экологический след ИИ—across carbon, water, and infrastructure. You will find concrete numbers, mechanisms, and trade‑offs, plus decision rules to shrink impact without derailing performance or cost.

What Actually Makes Up AI’s Footprint

AI’s footprint clusters into three interacting buckets: operational carbon (electricity for training and inference), water (primarily for cooling, directly and indirectly via power generation), and embodied impacts (chips, servers, and data centers). For a realistic estimate, you must consider all three and how they shift with model scale, deployment geography, and time of day.

A practical formula for a project’s operational emissions is: Emissions ≈ Energy Use × Grid Emissions Factor × PUE effect. Energy Use depends on hardware power draw, utilization, and runtime; grid factors range from near-zero in hydro/nuclear regions to 0.7–0.9 kgCO2e/kWh in coal‑heavy grids. Power Usage Effectiveness (PUE) multiplies IT energy by overheads for cooling and power distribution; the closer PUE is to 1.0, the better.

Water footprint has an analogous shorthand: Water Use ≈ Energy Use × Water Intensity of Electricity + Cooling Water. The first term captures freshwater used at power plants; the second reflects on‑site data center cooling. Regions using evaporative cooling and thermoelectric power can push total water intensities above 1 liter per kWh; dry cooling and low‑water power sources can drop below 0.2 liter per kWh, often at the cost of higher electricity consumption.

IEA, 2024: Data centers consumed roughly 460 TWh in 2022 (~2% of global electricity), potentially rising to 620–1,050 TWh by 2026 as AI workloads scale.

Carbon: Training, Inference, and Location Effects

Training grabs headlines, but at product scale inference often dominates lifecycle emissions because it runs continuously and scales with users. A large training run might emit 500–2,000 tCO2e depending on size and siting, whereas serving millions of daily queries can exceed that within months if models are large and poorly optimized.

Consider a reference training job: 10,000 GPUs at 400 W average draw for 30 days. IT energy is 10,000 × 0.4 kW × 720 h ≈ 2,880 MWh. With a PUE of 1.2, total energy is about 3,456 MWh. On a grid at 0.4 kgCO2e/kWh, that’s ≈1,382 tCO2e. Move the same job to a low‑carbon grid at 0.05 kgCO2e/kWh and emissions fall to ≈173 tCO2e—an 87% drop without touching the code, simply by siting and scheduling.

Inference intensity depends on model size, batching, precision, and hardware. Energy per 1,000 generated tokens for a 7–70B parameter model can span at least an order of magnitude, from single‑digit Wh with aggressive batching and 8‑bit quantization to tens of Wh at low batch and FP16 precision. Evidence is mixed across vendors and workloads because measurement methods vary; if you do not log joules per token in production, you are likely flying blind on the dominant contributor.

Carbon market choices matter. Annual renewable energy credits neutralize footprints on paper but can mask hourly mismatches: running a model at 7 p.m. on a coal‑heavy grid still emits today, even if you bought wind power generated at noon. Hourly or 24/7 carbon‑free energy matching aligns procurement with actual grid conditions and usually reveals that siting and load‑shifting can outperform certificates.

Uptime Institute, 2023: Reported average PUE across data centers was ~1.58, with hyperscalers achieving near 1.1–1.2 under optimized conditions.

Water: Cooling, Seasonality, and Hidden Indirect Use

AI’s water footprint arises from two channels: on‑site cooling systems and off‑site power generation. Evaporative cooling can use 0.2–1.0 liter of water per kWh of IT load on‑site depending on climate and control strategy, while the water intensity of electricity ranges from near‑zero (wind, solar PV) to significant (thermoelectric power with cooling towers). Summed, a water‑efficient site might be below 0.2 L/kWh; a hot, arid region with evaporative cooling and water‑intensive power can exceed 1.5 L/kWh.

Using the training example above (3,456 MWh including overhead), at a total water intensity of 0.5 L/kWh the job would consume around 1.7 million liters. If you push the job to a cooler coastal climate with seawater heat rejection or dry cooling and a low‑water grid mix (wind and nuclear), the same workload can drop below 0.1 L/kWh—about 345,600 liters—while possibly increasing electricity use because dry cooling is less efficient. There is a straightforward trade‑off: saving water can raise energy and carbon, so optimization is multi‑objective.

Inference water impacts can be surprisingly high due to 24/7 load. A service handling 100 million queries per month on large models might draw tens of MWh per day; at 0.5 L/kWh that is several hundred thousand liters per day. Seasonality matters: in summer, evaporative cooling ramps up, and grid carbon often spikes with air‑conditioning demand. Scheduling non‑urgent fine‑tuning to cooler months and nights can shave both water and carbon without affecting user SLAs.

Public debate often cites per‑query water numbers, but estimates vary widely because they are sensitive to regional weather, cooling technology, and whether indirect power‑sector water is included. A defensible practice is to report both direct (on‑site) and total (direct + indirect) water use per kWh and per unit of work (e.g., liter per million tokens), with date and location stamps.

Infrastructure And Supply Chain: The Embodied Side

Embodied impacts are the “upfront” emissions and resource use baked into chips, servers, networking, and buildings before a single token is processed. Leading‑edge semiconductor fabrication is energy‑intensive: life‑cycle assessments indicate that a high‑end accelerator plus its share of a server can embody hundreds of kilograms to a few metric tons of CO2e. An 8‑GPU training server can carry 2–3 tCO2e; a rack‑scale system can total tens of tons when power and cooling gear are included. These figures vary by supplier and should be treated as estimates.

Data center construction adds tens of thousands of tons of CO2e for a 10–30 MW facility due to steel, concrete, and mechanical/electrical equipment. Locating in regions with low‑carbon cement options and reusing existing shells can materially lower that footprint. The payback equation is simple: the greener the grid and the higher the utilization, the faster operational savings amortize embodied carbon.

There are physical constraints too. Water rights can limit evaporative cooling in arid regions; interconnection queues can delay grid upgrades by years; and supply of high‑end accelerators is finite, pushing organizations toward sharing or colocation. Circular practices—component reuse, modular upgrades, and certified e‑waste pathways—shift impact curves. For example, extending accelerator lifetime from three to five years lowers the annualized embodied footprint by roughly 40%, assuming similar performance needs.

A Practical Playbook To Shrink The Footprint

Right‑size the model. For many tasks, a distilled 7–13B parameter model with retrieval performs within a few points of a 70B+ model while cutting inference energy by 3–10×. Use quantization (8‑bit or 4‑bit where accuracy holds), sparsity, and mixture‑of‑experts to activate fewer parameters per token. Track quality deltas with hold‑out datasets and only accept energy‑saving changes that meet pre‑agreed accuracy thresholds.

Engineer for utilization. Batch aggressively until latency SLA breaks; prefer hardware‑aware schedulers that co‑locate compatible workloads. Enable low‑precision kernels (FP8/INT8) and operator fusion. On training clusters, keep jobs at ≥70% utilization and backfill with elastic workloads to avoid idle GPUs. Simple math: improving average utilization from 40% to 70% achieves a 1.75× effective efficiency gain before buying any new hardware.

Shift in space and time. Place training in regions with grid factors below 0.1 kgCO2e/kWh and water‑smart cooling. Use workload managers to defer non‑urgent runs to hours when the grid is cleaner. For inference, edge‑deploy small models on devices to offload trivial prompts, reserving the large model for escalations; this reduces data center energy and network traffic. When you must centralize, consider heat reuse partnerships in cold climates to offset community heating demand.

Procure better electrons, not just certificates. Move from annual renewable matching to hourly carbon‑free energy targets and demand‑response participation. Co‑locate with generation where possible and secure additionality—new builds rather than existing capacity. Monitor grid marginal emissions to avoid “green‑looking” hours that are backed by fossil ramping.

Measure what matters. Instrument joules per token and liters per million tokens by region and over time. Declare PUE and, where relevant, Water Usage Effectiveness. Include embodied carbon in planning by amortizing server and facility footprints over expected useful life and utilization. Publish model cards with energy and water profiles so teams can make informed trade‑offs.

FAQ

Q: Is training or inference the bigger driver of AI’s footprint?

It depends on scale. A frontier‑size training run can emit hundreds to thousands of tons of CO2e, but at product scale, inference usually dominates because it runs 24/7. If your service handles tens to hundreds of millions of queries per month on large models, inference will likely exceed training within months unless you right‑size, batch, and quantize.

Q: How much water does a single AI query use?

There is no single number. Direct on‑site water may be zero (dry cooling) or a fraction of a liter per query under evaporative cooling at high loads. Including the power sector’s water, totals can vary by more than 10× across regions and seasons. The most reliable approach is to report liters per kWh at your sites and multiply by measured energy per query.

Q: Do renewables make my AI carbon‑free?

Not automatically. Annual renewable credits can leave hourly gaps where you draw fossil power. Hourly carbon‑free matching, location‑aligned procurement, and load shifting reduce real emissions more effectively. If you cannot move workloads, target demand‑response programs and grid‑aware scheduling to shave peaks.

Q: What is a good PUE and why does it matter for AI?

A PUE near 1.1–1.2 is best‑in‑class for hyperscale sites; the global average is closer to 1.5+. AI’s high power density increases cooling load, so PUE directly multiplies your energy and emissions. Cutting PUE from 1.5 to 1.2 lowers total energy by 20% for the same IT work.

Q: How should we account for embodied carbon of GPUs and servers?

Use supplier LCAs where available, then amortize over expected life and utilization. As a rule of thumb, an 8‑GPU server may carry 2–3 tCO2e upfront; if you keep it for 4 years at high utilization on a clean grid, operational emissions can be smaller than embodied. Low utilization on a dirty grid flips the balance; buying fewer, running hotter, and longer often wins.

Conclusion

Treat Экологический след ИИ as a multi‑objective optimization: choose low‑carbon, low‑water locations; right‑size models and precision; maximize utilization; and adopt hourly carbon‑free power. Measure joules and liters per unit of work, include embodied impacts in planning, and make siting and scheduling first‑class levers. If a change does not reduce at least one of carbon, water, or cost without breaking quality or latency, it is noise—ignore it and focus on the next measurable win.