Introduction

Computation forms the backbone of modern society while consuming a large amount of energy. New computation hardware with a smaller carbon footprint and higher computing speed is in great need1,2. Through a fruitful history3,4,5,6, optical computing has gained increasing attention in the past decades owing to its potential advantages such as scalability, low latency, and power efficiency. For example, integrated photonic circuits have been designed to perform neural network-based computing as stand-alone devices7,8,9 or accelerate it in data centers10,11. Various optical implementations of task-specific non-von-Neumann computers, such as spiking neural networks12,13 and reservoir computers14, have been developed to solve different computational problems. These specialized optical circuitries or networks lit up the future of optical computing, and yet showed relatively limited capability in performing general-purpose computation.

An important building block for universal computing is the logical NAND gate, as it can be cascaded to perform any logic operation. Numerous studies have been reported so far on all-optical realizations of NAND gates15,16,17,18,19,20,21,22,23,24,25,26. However, none of these previous optical approaches provided sufficient levels of cascadability to form complex logical circuits using their designed NAND gates as the building block of an optical processor. Some of these earlier techniques relied on nonlinear interferometers15,16,17,18, photonic crystals19,20,21, and plasmonic devices22,23,24, which brought stringent requirements on the input optical signals in terms of phase, intensity, and/or polarization states of light6, which partially complicate the hardware design, and more importantly, limit the cascadability of the optical NAND gates. Although various efforts have been made to reduce the required input power levels27,28, NAND gates based on e.g., nonlinear photonic crystals still require relatively strong optical fields to operate, which partially restricts their utility. Another set of implementations used micro-ring resonators25 that encode the input and output logical values into different frequencies, also creating fundamental challenges for cascadability.

Diffractive deep neural networks (D2NNs) are optical computing platforms that use coherent light to process the information encoded in the phase and/or amplitude channels of an input field-of-view29,30,31,32,33,34. The incident light from an input plane (encoding the information) propagates through successive passive diffractive layers, each of which comprises thousands of individual modulation units (termed neurons) that alter the phase and/or amplitude of the light at their corresponding location. The free-space propagation and engineered light-matter interaction through a diffractive network collectively perform a computational task that is learned through a one-time training process. These passive diffractive layers are designed (trained) in a computer using conventional deep learning tools, e.g., stochastic gradient descent and error back-propagation. Once the design converges through this deep learning-based training phase, the diffractive layers are fabricated to form a passive, physical computing unit that does not consume computing power except for the illumination light. Other than performing statistical inference tasks such as object classification29,32,35,36 and image reconstruction37,38, enabled by their data-driven training process, diffractive neural networks also show great potential in designing non-intuitive, task-specific, deterministic optical elements, including e.g., spectral filters39 and pulse shapers40. The broad design space provided by diffractive neural networks was also utilized to design logic gates26; however, similar to former all-optical implementations discussed earlier15,16,17,18,19,20,21,22,23,24,25, cascadability was not feasible in this earlier work, limiting its use as a diffractive building block to optically perform an arbitrary logic operation.

In this paper, we present a cascadable all-optical NAND gate based on diffractive neural networks (Fig. 1a). We encoded the input/output optical logical values in the relative power of two spatially-separated apertures, where the aperture corresponding to the larger optical signal determines whether the value is 0 (the bottom aperture signal is larger) or 1 (the top aperture signal is larger). A four-layered diffractive network was trained to all-optically perform NAND operation on two optical input logical values (Fig. 1b). By projecting the output optical signal/wave (i.e., phase and amplitude) of one diffractive NAND gate onto the input aperture of another diffractive NAND gate, one can cascade these NAND gates to perform complex logical computations (Fig. 1c). Despite the fact that the cascadability of the presented diffractive NAND gate design is a highly desired feature for all-optical computing applications, there are also some limitations of this approach. The optical fields at the output plane of a diffractive NAND gate do not uniquely represent ones or zeros since there are (in principle) infinitely many different complex fields that might result from a diffractive NAND gate positioned within an arbitrary, complex logic circuit. Due to the field intensity decay and phase distortions that the diffracted light goes through within a passive network, it is not guaranteed that all possible combinations of the diffractive NAND output fields can successfully serve as inputs for a successive diffractive NAND operation in a given logic circuit. To tackle this challenge, in this work we created a design map that specifically guides the users/designers on cascading of the same diffractive NAND gate to perform logic operations, avoiding the output optical field combinations that result in false inference at the next level of a successive NAND operation. We used this design map as a guide to numerically demonstrate the cascadability of our diffractive NAND gate by building all-optical AND and OR gates, as well as a half-adder. Our analysis indicates that, despite using passive optical components without any optical nonlinearity, it is possible to build cascaded optical networks that share the same repeating diffractive gate design to perform all-optical logic operations. An important future direction of research would be to automate the search of the correct cascading path that satisfies the desired logic operation with the least number of cascaded diffractive optical gates. Similarly, exploring the limits of this design map approach in order to understand for what classes of logic operations it fails to find a cascadable diffractive solution would be a valuable future research direction. Optical signal amplification/boost and the use of nonlinearities would, in general, be needed to expand the utility of the presented approach and generalize over any class of functions.

Figure 1
figure 1

Schematic of cascadable all-optical NAND gates using diffractive networks. (a) Design of a cascadable diffractive NAND gate. (b) The input plane intensity profile of the diffractive NAND gate with all the possible combinations of ideal optical logical values, i.e., (\({x}_{1}^{0}\),\({x}_{2}^{0}\)) = (T,T), (T,F), (F,T) and (F,F), as well as the corresponding optical intensity profiles at the output plane of the diffractive network, with output optical logical values of \({x}_{i}^{1}\)=(F, T, T, T) respectively. (c) Schematic of cascading diffractive NAND gates.

Results

Diffractive NAND gate design and numerical testing

One method to encode optical logical values of a NAND gate can be to use an additional pump/probe light17,21,22 in order to successfully handle the case of both input digits being zero, i.e.\(\mathrm{NAND}(\mathrm{0,0})=1\). However, the existence of the probe light creates challenges in the cascadability of a NAND gate while also consuming more energy. Instead, here in this work, we encoded the logical states of our diffractive network into the spatial distribution of the optical power (Fig. 1a). Each optical logical value is represented by the power within a pair of encoding apertures placed in a vertical column (Fig. 1a). The upper/top aperture denotes True (T) and the lower one denotes False (F) signal, and the larger optical signal determines the logical state. For example, True is encoded with more power in the top aperture when compared with the optical power of the lower aperture. Based on this definition, either one of the logical states (T, F) can inject the same amount of energy into the successive NAND gate that is cascaded. In our design, each encoding aperture was selected to have a size of 4λ × 4λ, and the vertical separation between the two apertures was designed to be 2λ, where λ is the wavelength of the incidence light, which can be selected at any part of the electromagnetic spectrum based on the availability of sources and high-resolution fabrication methods. Ideally, an optical logical value has uniform intensity within the corresponding encoding aperture, while no light should be present in the other aperture. However, deviations from this ideal scenario in successive, cascaded diffractive NAND gates do not create an issue as the definition of our encoding scheme compares the total power in each encoding aperture regardless of the uniformity of the optical wave intensity.

A four-layered diffractive neural network was trained to perform the NAND operation. The input plane of our diffractive NAND gate had two ports hosting the input optical logical values, where the two columns of apertures were placed side-by-side, with a horizontal separation of 2λ. This 2 × 2 aperture grid was placed at the center of the input plane, with zero transmittance elsewhere (i.e., blocking regions surrounding the signal apertures). The output port of the diffractive NAND gate was designed at the center of the output field of view of the network, which is marked using purple squares in Fig. 1b. This diffractive NAND gate architecture was iteratively trained using conventional deep learning-based optimization tools in a computer, during which only ideal optical logical values with uniform intensity profiles were used as inputs. A training loss function was applied to the diffractive network’s output field-of-view, comparing the resulting optical waves within the output apertures with the ideal output profiles. Further details regarding the training of this diffractive NAND gate can be found in the Methods section. The training log of the diffractive NAND gate design is reported in Supplementary Fig. S1.

After its training/design phase, the success of the all-optical NAND computation was first demonstrated by numerically testing it with ideal input optical logical values. The input plane of the diffractive NAND gate with different input values, i.e., (\({x}_{1}^{0}\),\({x}_{2}^{0}\)) = (T,T), (T,F), (F,T) and (F,F), as well as the corresponding optical intensity profiles at the output plane of the diffractive network (with output logical values of \({x}_{i}^{1}\)=(F, T, T, T) respectively) are reported in Fig. 1b. The notation \({x}_{1}^{0}\) (\({x}_{2}^{0}\)) denotes a set of ideal optical logical values, with uniform intensity in each aperture, being injected into the diffractive NAND gate using the left (right) input port at the input plane. Similarly, the notation \({x}_{i}^{1}\) denotes the set of output optical logical values that result from a diffractive NAND gate, taking ideal optical logical values as its inputs. The subscript \(i\) indicates that the output optical signals/waves can be projected to either input port (left or right) of another diffractive NAND gate that is cascaded. The superscripts in both the input and output notations denote the “level” of the optical signals, where the ideal input values are denoted as level 0 and the output optical signals resulting from two ideal input values are denoted as level 1. Therefore, this notation represents a collection of all the possible optical logical values that share the same origin. For example, \({x}_{i}^{1}\) represents all four possible output optical fields generated using combinations of level 0 input optical logical values ((\({x}_{1}^{0}\),\({x}_{2}^{0}\)) = (T,T), (T,F), (F,T) and (F,F)).

Cascading diffractive NAND gates to all-optically perform logic operations

A unique feature of the presented diffractive NAND gate lies in its cascadability. One NAND gate’s output optical wave can be injected into another diffractive NAND gate’s input ports for further computation through its diffractive layers. In this context, we would like to further extend the “level” definition to better describe the diverse origins of the optical signals/waves that are cascaded into successive diffractive NAND gates. If we assume that a diffractive NAND gate uses the input optical logical values from levels a and b (i.e., the input optical logical values belong to sets \({x}_{1}^{a}\) and \({x}_{2}^{b}\)), then the corresponding output level shall be defined as level (a,b), i.e., the output optical wave lies within the set of \({x}_{i}^{a,b}\). Given the fact that a diffractive NAND gate, after its training phase, does not necessarily possess output wave symmetry, \({x}_{i}^{a,b}\) and \({x}_{i}^{b,a}\) are different sets of optical signals/waves. For example, the output optical logical values calculated from input waves that belong to \({x}_{1}^{1}\) and \({x}_{2}^{0}\) should be denoted as \({x}_{i}^{\mathrm{1,0}}\), and it is different from the set of output waves defined by \({x}_{i}^{\mathrm{0,1}}\).

Using this notation, next we numerically demonstrate that our diffractive NAND gate design can successfully perform logic operations with input values arising from different levels of cascading. Figure 2a reports all the possible combinations of optical wave cascading using level 0, level 1, level (1,0), level (0,1) and level (1,1) as inputs to our diffractive NAND gate. For example, at level 0 we have 22 = 4 different optical waves that can result at the output of the diffractive NAND gate using ideal uniform input waves; these four optical waves are then combined into the same set along with the ideal uniform optical inputs, which result in a total of (4 + 2)2 = 36 unique optical waves at the output of the diffractive NAND. Following the same flow of logic, at the next level of cascading, we have in total (36 + 2)2 = 1444 different optical waves representing logical values. For these 1444 different wave combinations represented in Fig. 2a, the calculation/inference accuracy at the output of the diffractive NAND gate is measured to be 91.48%. The black squares in Fig. 2a indicate the input wave combinations leading to a miscalculation (i.e., an incorrect inference) at the output of the diffractive NAND gate (i.e., ~ 8.52% of the cases out of 1444 different combinations reported). The output plane intensity profiles of the optical waves that belong to sets \({x}_{i}^{1}\), \({x}_{i}^{\mathrm{1,0}}\), \({x}_{i}^{\mathrm{0,1}}\) and \({x}_{i}^{\mathrm{1,1}}\) and their corresponding input waves are also shown in Fig. 2b. In fact, further cascading of diffractive NAND gates is also possible: using all the optical logical values and the output optical waves presented in Fig. 2a, the diffractive NAND gate inference accuracy over the (1444 + 2)2 = 2,090,916 different input wave combinations is found to be 80.30%.

Figure 2
figure 2

Performance of cascaded diffractive NAND gates. (a) A design map for cascading diffractive NAND gates. The black squares indicate the input wave combinations that lead to a miscalculation, i.e., an inference error. (b) The output plane intensity profiles of the optical waves that belong to sets \({x}_{i}^{1}\), \({x}_{i}^{\mathrm{1,0}}\), \({x}_{i}^{\mathrm{0,1}}\) and \({x}_{i}^{\mathrm{1,1}}\) as well as their corresponding input waves.

These random inference errors that are observed in e.g., Fig. 2a do not constitute a roadblock for synthesizing an all-optical logic processor using cascaded diffractive NAND gates; instead, these error maps (e.g., the black squares in Fig. 2a) actually serve as a design map/guide for properly cascading diffractive NAND gates to build a logic operator without hitting any one of these error points. Stated differently, by knowing these input–output maps that reveal these rare combinations of faulty diffractive computing points, one can correctly design a diffractive logic processor composed of cascaded all-optical NANDs that avoid using these error points identified in the design guide (Fig. 2a). To shed more light on this cascadability design map, we first built basic logical gates (i.e., AND and OR gates) performed by cascading of our diffractive NAND gate. Mathematically, a logical AND operation can be formulated using NAND operations as shown in Fig. 3a, i.e.,

Figure 3
figure 3

Logical AND operator that is composed of cascaded diffractive NAND gates. (a) Digital implementation of an AND operator using NAND gates. (b) All-optical AND gate design that is composed of cascaded diffractive NAND gates. (c) A portion of the design map (Fig. 2b) showing the correctness of the intermediate all-optical calculation steps (\({x}_{i}^{1}\)) and the output of the AND gate (O) under different input combinations. (d) The intensity profiles at the output plane of the final diffractive NAND gate. The inserted numbers in white font color indicate the relative optical signal within each aperture. Each input optical logical value is assumed to have a relative optical signal level of 256 (a.u.), defining the ideal signal level per aperture.

$$\begin{array}{c}AND\left({I}_{1}, {I}_{2}\right)=NAND\left(\mathrm{NAND}\left({I}_{1}, {I}_{2}\right), \mathrm{NAND}\left({I}_{1},{I}_{2}\right)\right).\end{array}$$
(1)

Therefore, an all-optical implementation of the AND operation can be realized with two diffractive NAND gates and a beam splitter (see Fig. 3b). \({x}_{1}^{0}\) and \({x}_{2}^{0}\), with uniform optical intensity within the input apertures, were injected into both input ports of the first NAND gate. A beam splitter duplicated the resulting output wave with a 50/50 splitting ratio, and the duplicated logical values were used as the input waves for the second NAND gate that is cascaded. For different input logical value combinations, i.e., (\({x}_{1}^{0}\),\({x}_{2}^{0}\)) = (T,T), (T,F), (F,T) and (F,F), the correctness of the intermediate all-optical calculation steps (\({x}_{i}^{1}\)) and the output of the AND gate (O) are shown in Fig. 3c. The intensity profiles at the output plane of the final diffractive NAND gate are also shown in Fig. 3d, demonstrating the success of the cascaded diffractive NAND system for all-optically performing AND operation. The optical signal values within each output port are labelled in Fig. 3d.

A similar demonstration of an all-optical OR gate that is composed of cascaded diffractive NAND gates is shown in Fig. 4. OR operation can be formulated using NAND gates as shown in Fig. 4a, i.e.,

Figure 4
figure 4

Logical OR operator that is composed of cascaded diffractive NAND gates. (a) Digital implementation of an OR operator using NAND gates. (b) All-optical OR gate that is composed of cascaded diffractive NAND gates. (c) A portion of the design map (Fig. 2b) showing the correctness of the intermediate all-optical calculation steps (\({x}_{i}^{1}\)) and the output of the OR gate (O) under different input combinations. (d) The intensity profiles at the output plane of the final diffractive NAND gate. The inserted numbers in white font color indicate the relative optical signal within each aperture. Each input optical logical value is assumed to have a relative optical signal level of 256 (a.u.), defining the ideal signal level per aperture.

$$\begin{array}{c}OR\left({I}_{1}, {I}_{2}\right)=NAND\left(\mathrm{NAND}\left({I}_{1}, {I}_{1}\right), \mathrm{NAND}\left({I}_{2},{I}_{2}\right)\right).\end{array}$$
(2)

This all-optical OR gate implementation uses three diffractive NAND gates, as shown in Fig. 4b. In this case, each input optical logical value was first duplicated using beam splitters and injected into both input ports/apertures of the corresponding diffractive NAND gate. The outputs of the two diffractive NAND gates were then cascaded onto the input ports/apertures of the last diffractive NAND gate, as shown in Fig. 4b. For different input combinations, the correctness of the intermediate calculation steps (\({x}_{i}^{1}\)) and the output of the OR gate (O) are shown in Fig. 4c. The intensity profiles at the output plane of the final diffractive NAND gate are also shown in Fig. 4d, demonstrating the success of the cascaded diffractive system for all-optically performing OR logic operation. The optical signal values within each output port are labelled in Fig. 4d.

As another basic logic function, NOT operation can be formulated as:

$$\begin{array}{c}NOT\left(I\right)=NAND\left(I,I\right).\end{array}$$
(3)

Therefore, NOT can be realized using a single NAND gate without the need for diffractive cascading. In fact, since it is solely based on the correct implementation of the NAND gate, the accuracy of the NOT calculation using our diffractive NAND design has already been validated through the first and last columns of Fig. 1b as well as the diagonal elements of the optical logical values \({x}_{i}^{1}\) shown in Fig. 2b.

Other than these basic logical operations (AND, OR, NOT), we also designed an all-optical implementation of a half-adder using cascaded, five diffractive NAND gates. A half-adder adds two binary input values (I1 and I2) and returns their sum (S) and the carry (C). Figure 5a presents an electronic realization of a half-adder using 5 NAND gates. The same logical circuit design can be all-optically implemented using cascaded diffractive NANDs, as shown in Fig. 5b. Based on our design map/guide shown in Fig. 2a, one can see that a correct calculation of the sum can only be achieved when the NAND gate IV takes the input optical logical values from the set of \({x}_{i}^{\mathrm{1,0}}\). The correct routing of different optical waves was accordingly optimized to avoid the inference errors marked in Fig. 2a. A portion of the design map reflecting this optimized optical signal routing is reported in Fig. 5c, showing the correctness of the intermediate calculation steps. The output intensity profiles of the sum (diffractive NAND gate IV) and the carry (diffractive NAND gate V) are also shown in Fig. 5d, together with the corresponding inputs, demonstrating the success of the cascaded diffractive system for all-optically performing a half-adder operation. The optical signal values within each output port are labelled in Fig. 5d.

Figure 5
figure 5

A half-adder that is composed of cascaded diffractive NAND gates. (a) Digital implementation of a half-adder using NAND gates. (b) An all-optical half-adder composed of cascaded diffractive NAND gates. (c) The correctness of the intermediate all-optical calculation steps (\({x}_{i}^{1}\)) and the sum (S) and carry (C) of the half-adder using different input combinations. (d) The intensity profiles at the output plane of the diffractive NAND IV (representing the sum) and the diffractive NAND V (representing the carry) gates. The inserted numbers in white font color indicate the relative optical signal within each aperture. Each input optical logical value is assumed to have a relative optical signal level of 1024 (a.u.), defining the ideal signal level per aperture.

Discussion

In our optical designs and numerical simulations reported so far, the diffractive NAND gates are cascaded to each other using ideal optical projection systems to map an output optical field (phase and amplitude) of a diffractive NAND gate onto another input aperture of a cascaded diffractive NAND gate. However, it is not critical to use high numerical aperture (NA) image projection systems when building complex computational units formed by cascaded diffractive networks. A low NA projection system applies a low pass filter to the output profile of a diffractive gate, which in fact makes it similar to the ideal input signals, with a more uniform intensity profile within each input aperture. To shed more light on this phenomenon, we simulated the cascading of different diffractive NAND gates with projection systems that have lower NA values, i.e., 0.9, 0.75, 0.5 and 0.25. The all-optical calculation accuracy using level 0, level 1, level (1,0), level (0,1) and level (1,1) inputs, i.e., representing 1444 unique combinations of input optical waves, were found to be 91.48%, 91.27%, 90.72%, and 90.86%, respectively. These numerical results demonstrate the robustness of the diffractive NAND gate to a reduction in the NA of the cascading optical projection system between successive diffractive networks.

In summary, we illustrated a cascadable all-optical NAND gate design based on diffractive networks. Successful numerical demonstrations of basic logical gates (AND, OR, NOT), as well as a half-adder, were reported using cascaded diffractive NAND gates. These proof-of-concept results highlight that logic operations can be performed all-optically by cascading the same passive diffractive network through the use of a design map that helps avoid output fields that cannot be cascaded into a successive diffractive gate. Beyond these preliminary studies, future work will explore the limits of this design map approach to automatically find a viable cascading path (or prove the lack of it) to all-optically perform a given logic operation based on the complexity of the desired inference task. Overall, the presented cascadable all-optical NAND gates and the optical signal encoding schemes demonstrated in this work using spatially-engineered passive diffractive layers can potentially serve future optical computing platforms.

Methods

Forward propagation model

The input plane of a diffractive NAND gate is positioned at z = 0 and provides a complex optical field \({u}_{0}\left(x,y,z=0\right)\) that propagates within the diffractive network. The propagation within a diffractive network is modeled following the Rayleigh-Sommerfeld equation29,

$$\begin{array}{c}{n}_{0}\left(x,y,{z}_{0}\right)={u}_{0}\left(x,y,0\right)*w\left(x,y,{z}_{0}\right),\end{array}$$
(4)

where \({n}_{0}\) represents the optical wave right before the first diffractive layer and \(*\) denotes 2D convolution operation. \(w\) is the complex-valued propagation kernel, which is given by,

$$\begin{array}{c}w\left(x,y,z\right)=\frac{z}{{r}^{2}}\left(\frac{1}{2\pi r}+\frac{1}{j\lambda }\right)\mathrm{exp}\left(\frac{j2\pi r}{\lambda }\right),\end{array}$$
(5)

with \(r=\sqrt{{x}^{2}+{y}^{2}+{z}^{2}}\), \(j=\sqrt{-1}\) and \(\lambda\) being the illumination wavelength. The resulting optical field is further modulated by the spatially-engineered diffractive layers. Assuming each diffractive layer (positioned at \(z={z}_{l}, l=1,\dots ,4\)) to be a thin phase element, the wave modulation provided by the lth diffractive layer can be formulated as:

$$\begin{array}{c}{t}_{l}=\mathrm{exp}\left(j\phi \left(x,y,{z}_{l}\right)\right).\end{array}$$
(6)

Therefore, the optical field right after each diffractive layer can be written as

$$\begin{array}{c}{u}_{l}\left(x,y, {z}_{l}\right)={t}_{l}\left(x,y,{z}_{l}\right)\cdot {[u}_{l-1}\left(x,y,{z}_{l-1}\right)*w\left(x, y,{z}_{l}-{z}_{l-1}\right)]\end{array}$$
(7)

After being modulated by all the four diffractive layers of a NAND gate design, the optical field further propagates to the output plane, located at \(z={z}_{d}\), which can be written as:

$$\begin{array}{c}o\left(x,y\right)={u}_{4}*w\left(x,y,{z}_{d}-{z}_{4}\right) .\end{array}$$
(8)

Diffractive NAND gate training

Each diffractive layer of our NAND gate design contains 80 × 80 neurons (diffractive features) that provide structured phase modulation with a pixel pitch of λ/2. The axial distance between the input plane and the first diffractive layer, between successive diffractive layers, and between the last diffractive layer and the output plane of the NAND is selected to be 50λ. In the network training phase, each batch contained 80 pairs of ideal optical logical values (with uniform intensity profiles) as inputs to the diffractive network, and 200 batches formed one epoch. The logical state of each input value was randomly assigned to be True with ~ 70% probability and the rest ~ 30% to be False in order to ensure that the output of a NAND gate has an equal probability of being True or False during the training phase. The optical fields within the designated apertures for the output optical logical values were used to calculate the training loss. For example, if the output value should be True (False), the upper (lower) aperture should be denoted as \({A}_{correct}\) and the optical field within the corresponding aperture should be denoted as \({o}_{correct}(x,y)\), while the other remaining aperture is referred to as \({A}_{wrong}\), with the corresponding field denoted as \({o}_{wrong}\left(x,y\right)\). Note that the correct and wrong subscripts in this notation do not represent the output logical state but instead indicate if the corresponding aperture label matches the logical calculation result. Based on this notation, our training loss function can be expressed as:

$$\begin{array}{c}Loss=\alpha \times {L}_{correct}+\beta \times {L}_{wrong}+\gamma \times {L}_{unif}.\end{array}$$
(9)

where \({L}_{correct}\) calculates the l2 distance between the output intensity profile in the correct aperture and a plane wave with unit amplitude, i.e.,

$$\begin{array}{c}{L}_{correct}=\sum_{x,y\in {A}_{correct}}{\left(1-{\left|{o}_{correct}\left(x,y\right)\right|}^{2}\right)}^{2} .\end{array}$$
(10)

\({L}_{wrong}\) was accordingly defined to represent the optical power that resides in the wrong aperture, i.e.,

$$\begin{array}{c}{L}_{wrong}=\sum_{x,y\in {A}_{wrong}}{\left|{o}_{wrong}\left(x,y\right)\right|}^{2}.\end{array}$$
(11)

\({L}_{unif}\), on the other hand, quantified the phase (\(\phi\)) variation of the optical field within only the correct aperture, i.e.,

$$\begin{array}{c}{L}_{unif}=std\left(\phi \left({o}_{correct}\left(x,y\right)\right)\right).\end{array}$$
(12)

The relative weights \(\alpha\), \(\beta\) and \(\gamma\) in Eq. (9) were empirically selected to be 100, 10, and 50, respectively. After calculating the loss values, the phase profiles on each diffractive layer were updated using an Adam optimizer41, which concludes one training batch. The diffractive NAND gate model was trained using Python (v3.7.3) and TensorFlow (v.1.15.0, Google Inc.) for 50 epochs on a desktop computer, with a GeForce GTX 1080 Ti graphical processing unit (GPU, Nvidia Inc.), an Intel® Core™ i9-7900X central processing unit (CPU, Intel Inc.) and 64 GB of RAM. This training process of a diffractive NAND gate takes ~ 100 min to complete (50 epochs).