Magnetic control of tokamak plasmas through deep reinforcement learning.

Degrave J, Felici F, Buchli J, Neunert M, Tracey B, Carpanese F, Ewalds T, Hafner R, Abdolmaleki A, de Las Casas D, Donner C, Fritz L, Galperti C, Huber A, Keeling J, Tsimpoukelli M, Kay J, Merle A, Moret JM, Noury S, Pesamosca F, Pfau D, Sauter O, Sommariva C, Coda S, Duval B, Fasoli A, Kohli P, Kavukcuoglu K, Hassabis D
Nature 2022
Open on PubMed

Nuclear fusion using magnetic confinement, in particular in the tokamak configuration, is a promising path towards sustainable energy. A core challenge is to shape and maintain a high-temperature plasma within the tokamak vessel. This requires high-dimensional, high-frequency, closed-loop control using magnetic actuator coils, further complicated by the diverse requirements across a wide range of plasma configurations. In this work, we introduce a previously undescribed architecture for tokamak magnetic controller design that autonomously learns to command the full set of control coils. This architecture meets control objectives specified at a high level, at the same time satisfying physical and operational constraints. This approach has unprecedented flexibility and generality in problem specification and yields a notable reduction in design effort to produce new plasma configurations. We successfully produce and control a diverse set of plasma configurations on the Tokamak à Configuration Variable1,2, including elongated, conventional shapes, as well as advanced configurations, such as negative triangularity and 'snowflake' configurations. Our approach achieves accurate tracking of the location, current and shape for these configurations. We also demonstrate sustained 'droplets' on TCV, in which two separate plasmas are maintained simultaneously within the vessel. This represents a notable advance for tokamak feedback control, showing the potential of reinforcement learning to accelerate research in the fusion domain, and is one of the most challenging real-world systems to which reinforcement learning has been applied.

9 Figures Extracted
Fig. 1
Fig. 1 PMC
Representation of the components of our controller design architecture. a , Depiction of the learning loop. The controller sends voltage commands on t...
Fig. 2
Fig. 2 PMC
Fundamental capability demonstration. Demonstration of plasma current, vertical stability, position and shape control. Top, target shape points with 2...
Fig. 3
Fig. 3 PMC
Control demonstrations. Control demonstrations obtained during TCV experiments. Target shape points with 2 cm radius (blue circles), compared with the...
Extended Data Fig. 1
Extended Data Fig. 1 PMC
Pictures and illustration of the TCV. a, b  Photographs showing the part of the TCV inside the bioshield. c CAD drawing of the vessel and coils of t...
Fig. 4
Fig. 4 PMC
Droplets. Demonstration of sustained control of two independent droplets on TCV for the entire 200-ms control window. Left, control of I p for each ...
Extended Data Fig. 4
Extended Data Fig. 4 PMC
Further observations. a , When asked to stabilize the plasma without further specifications, the agent creates a round shape. The agent is in control ...
Extended Data Fig. 5
Extended Data Fig. 5 PMC
Training progress. Episodic reward for the deterministic policy smoothed across 20 episodes with parameter variations enabled, in which 100 means that...
Extended Data Fig. 2
Extended Data Fig. 2 PMC
A larger overview of the shots in Fig. 3 . We plotted the reconstructed values for the normalized pressure β p and safety factor q A , along with ...
Extended Data Fig. 3
Extended Data Fig. 3 PMC
Control variability. To illustrate the variability of the performance that our deterministic controller achieves on the environment, we have plotted t...