Wednesday, 25 August 2021 12:00 Written by Philipp Schneider

Parameter Reconstruction 2: Variance-Based Sensitivity Analysis¶ Jupyter Notebook

This is the second blog post in a series dedicated to methods and numerical tools in the context of a parameter reconstruction. For an introduction of the basic concepts, please see see the first blog post of the series.

Motivation¶

An important question in the context of parameter reconstruction is how to set up an experimental measurement in order to retrieve enough meaningful information for the accurate reconstruction of the parameters $p_1,\dots,p_M$ that are of interest. As in the previous post, the physical measurement process is modeled by a vectorial function \begin{equation} \mathbf{f}: \mathbf{p} \in \mathcal{X}\subset\mathbb{R}^M \mapsto \mathbf{y}\in\mathbb{R}^K \end{equation} that maps the $M$-dimensional parameter vector $\mathbf{p}$ to a $K$-dimensional output vector $\mathbf{y}$ that can be measured with some added noise.

In the previous post we have also introduced point estimates $\mathbf{p}_{\rm ML}$ of the most likely model parameters given a vector of measurements $\mathbf{y}^*$. The probability distribution close to this point estimate \begin{equation} p(\mathbf{p}) \propto {\rm exp}\left[-\frac{1}{2 a_{\rm ub}^2} (\mathbf{p} - \mathbf{p}_{\rm ML})^{\top}\mathrm{J}^{\top}\mathrm{W}\mathrm{J} (\mathbf{p} - \mathbf{p}_{\rm ML})\right] \end{equation} allowed the confidence intervals for each parameter to be inferred. Here, the entries of the Jacobian matrix are $J_{kl}=\left.\frac{\partial f_k(\mathbf{p})}{\partial p_l}\right|_{\mathbf{p} = \mathbf{p}_{\rm ML}}$, the weight matrix $\mathrm{W} = {\rm diag}\left(\frac{1}{\varepsilon_1^2},\cdots,\frac{1}{\varepsilon_K^2}\right)$ collects the information of the initially assumed measurement uncertainties $\varepsilon_i$, and $a_{\rm ub}$ is an unbiased correction factor for the measurement uncertainties.

Let us consider the behavior of $p(\mathbf{p})$ if we fix all but one parameter $p_l$ to the maximum likelihood point estimates, i.e. $p_i = (\mathbf{p}_{\rm ML})_i$ for $i \neq l$. Then, the probability distribution of $p_l$ is \begin{equation} p(p_l) \propto {\rm exp}\left[-\frac{1}{2 a_{\rm ub}^2} \left(p_l - (\mathbf{p}_{\rm ML})_l\right)^2\sum_{k=1}^K\frac{J_{k l}^2}{\varepsilon_k^2}\right]. \end{equation} We will call this the on-axis probability distribution which has an on-axis variance
\begin{equation} \sigma_l^2 = a_{\rm ub}^2 \left(\sum_{k=1}^K\frac{J_{k l}^2}{\varepsilon_k^2}\right)^{-1}. \end{equation} This variance decreases with an increasing derivative $J_{k l}$ leading to a reconstruction of $p_l$ with a smaller uncertainty.

Of course, we want to assess the performance of the measurement setup independent of a specific sample that is measured and thus independent of the derivatives of the model function for a specific maximum-likelihood parameter value $\mathbf{p}_{\rm ML}$. To this end, we first assume that the model function $f_k(\mathbf{p})$ of each channel $k$ can be well approximated by a sum $$ f_k(\mathbf{p}) \approx f_k^{(0)} + f_k^{(1)}(p_1) + f_k^{(2)}(p_2) + \cdots f_k^{(M)}(p_M),$$ such that $$ J_{kl}=\left.\frac{\partial f_k(\mathbf{p})}{\partial p_l}\right|_{\mathbf{p} = \mathbf{p}_{\rm ML}}\approx\left.\frac{\partial f_k^{(l)}(p_l)}{\partial p_l}\right|_{p_l = (\mathbf{p}_{\rm ML})_l}. $$ Below, we will see that this corresponds to neglecting second and higher order Sobol coefficients of $f_k(\mathbf{p})$.

Using a Taylor expansion of the moments one finds that for a small random perturbation of $p_l$ with ${\rm E}[p_l] = (\mathbf{p}_{\rm ML})_l$ the following approximation holds for the gradients $J_{k l}$ $${\rm Var}\left[f_k^{(l)}(p_l)\right] \approx J_{k l}^2 {\rm Var}\left[p_l\right].$$

Hence, we can estimate the average on-axis variance as \begin{equation} \label{eq:sigma_on_axis}\tag{5} \sigma_l^2 \approx a_{\rm ub}^2 \left(\sum_{k=1}^K\frac{{\rm Var}\left[f_k^{(l)}(p_l)\right]}{{\rm Var}\left[p_l\right] \varepsilon_k^2}\right)^{-1} = a_{\rm ub}^2 {\rm Var}\left[p_l\right] \left(\sum_{k=1}^K{\rm Var}\left[ \frac{f_k^{(l)}(p_l)}{\varepsilon_k}\right]\right)^{-1} \end{equation}

An accurate reconstruction of $p_l$ is more probable for small values of this estimate and hence for large values of ${\rm Var}\left[\frac{f_k^{(l)}(p_l)}{\varepsilon_k}\right]$, i.e. for a large variance of the model function relative to the measurement error.

Please note, that the on-axis variance $\sigma_l^2$ is typically much smaller than the total variance of the parameter $\sigma_{p_l}^2 = \mathrm{MSE} \cdot \mathrm{Cov}_{ii}$ which also takes non-diagonal elements of the matrix $\mathrm{Cov}^{-1} = \mathrm{J}^{\top}\mathrm{W}\mathrm{J}$ into account. Only, if the uncertainties of all other parameters $p_i$ with $i\neq l$ are negligible, the on-axis estimate of $\sigma_l$ is comparable to the total reconstruction uncertainty $\sigma_{p_l}$.

Sobol coefficients¶

The considerations above show that an analysis of the model function $\mathbf{f}(\mathbf{p})$ in terms of the variance of the function is an important tool for assessing if a measurement setup is suitable for parameter reconstruction. In order to assess the ability to reconstruct a specific parameter $p_l$, it is important to quantify how much of the total variance of the model function stems from variations of $p_l$.

To this end, we use the fact that for each channel $k$, the scalar function $g(\mathbf{p}) = f_k(\mathbf{p})/\varepsilon_k$ can be decomposed into a unique sum called a Sobol decomposition: $$ g(\mathbf{p}) = g_0 + \sum_{i=1}^M g_i(p_i) + \sum_{i<j}^{M} g_{ij}(p_i,p_j) + \cdots + g_{1,2,\dots,d}(p_1,p_2,\dots,p_d). $$ Here, $g_0$ is a constant and the expectation value of all other terms with respect to the prior density $\rho_k(p_k)$ of each parameter vanishes, i.e. $$ \int g_{i_1,i_2,\dots,i_s}(p_{i_1},p_{i_2},\dots,p_{i_s}) \rho_k(p_k) \text{d}p_k = 0 \text{ for all } k=i_1,i_2,\dots,i_s. $$ That is, all the terms in the functional decomposition are orthogonal. The variance of the function $g(\mathbf{p})$ is given as $$ \text{Var}[g] = \int g^2(\mathbf{p}) \rho_1(p_1)\cdots\rho_d(p_d)\text{d}p_1\cdots\text{d}p_k - g_0^2. $$ Due to the orthogonality, the variance can be decomposed into a sum of variances stemming from each term in the Sobol decomposition $$ \text{Var}[g] = \sum_i \text{Var}[g_i] + \sum_{i<j} \text{Var}[g_{ij}] + \cdots + \text{Var}[g_{1 2\dots d}] $$ where $$ \text{Var}[g_{i_1,i_2,\dots,i_s}] = \int g_{i_1,i_2,\dots,i_s}^2(p_{i_1},p_{i_2},\dots,p_{i_s}) \rho_{i_1}(p_{i_1})\cdots\rho_{i_s}(p_{i_s})\text{d}p_{i_1}\cdots\text{d}p_{i_s}. $$ The Sobol coefficient $S_{i_1,\dots,i_s}$ is defined as the fraction of the total variance that stems from the additive term $g_{i_1,i_2,\dots,i_s}$, i.e. \begin{equation} S_{i_1,\dots,i_s} = \frac{\text{Var}[g_{i_1,i_2,\dots,i_s}]}{\text{Var}[g]}. \end{equation} Hence, each Sobol coefficient determines the fraction of the total variation that stems from the joint random variation of the parameters $p_{i_1},\dots,p_{i_s}$. The sum of all Sobol coefficients is 1, $$ \sum_{i=1}^M S_i + \sum_{i<j}^M S_{i j} + \dots + S_{1 2 \dots d} = 1. $$ For example, if $S_{1} = 0.1$, then 10% of the variance of $g(\mathbf{p})$ stems from the variance of an additive part $g_{1}(p_1)$ of the function that only depends on the random variations of parameter $p_1$.

For the following considerations, we will neglect all second and higher order Sobol coefficients. That is, we assume that $g(\mathbf{p}) = f_k(\mathbf{p})/\varepsilon_k$ can be decomposed as $$ g(\mathbf{p}) \approx g_0 + \sum_{i=1}^M g_i(p_i). $$

Example¶

As a concrete example we consider again the scatterometry setup introduced in the first blog post. Monochromatic polarized UV light with a wavelength of $\lambda = 266\,{\rm nm}$ hits a periodic lamellar grating at a specific angle $\theta,\phi$. The light is reflected from the sub-wavelength grating and the intensities are measured experimentally. The grating is parameterized by a vector $\mathbf{p} = [cd,h,swa,t,r_{\rm top},r_{\rm bot}]^T$ with six entries (see image below).

A question of interest for the optimization of the measurement setup is, which polarizations and incoming angles provide most of the information for the reconstruction of the six parameters? To this end, we determine the Sobol coefficient for a set of angles and polarizations. We use the set that was also determined experimentally in M. Wurm et al. Meas. Sci. Technol. 22, 094024 (2011).

JCMsuite's Analysis and Optimization Toolbox includes the specific driver PCESensitivityAnalysis for performing a sensitivity analysis based on a polynomial chaos expansion (PCE). Details on the PCE approach regarding the very same example have been published by Nando Farchmin, et al. Efficient global sensitivity analysis for silicon line gratings using polynomial chaos Proc. SPIE 11057 (2019) (arXiv version).

For running the sensitivity analysis, we first import some relevant packages, including the python package jcmwave.

In [1]:

#=====================
# Import some packages
#=====================

import sys, os
import numpy as np
import itertools
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import set_matplotlib_formats, HTML, display
from forward_problem import ForwardProblem

%matplotlib inline
set_matplotlib_formats('svg') # render plots as svg

sys.path.insert(0,os.path.join(os.getenv('JCMROOT'), 'ThirdPartySupport', 'Python'))
import jcmwave

Next, we create a client object to communicate with the Analysis and Optimization Toolkit, which is automatically started in the background.

In [2]:

client = jcmwave.optimizer.client()

Next, we define the parameter domain of the sensitivity analysis:

In [3]:

domain = [
 {'name': 'cd', 'domain': [15, 35]},
 {'name': 'h', 'domain': [40, 60]},
 {'name': 'swa', 'domain': [84, 90]},
 {'name': 't', 'domain': [2.5, 6.5]},
 {'name': 'r_top', 'domain': [3, 18]},
 {'name': 'r_bot', 'domain': [0, 10]},
 {'name': 'pitch', 'type': 'fixed', 'domain': 50}
]

Not all combinations of parameters in the domain lead to a valid geometry. For example, the thickness of the oxide layer $t$ cannot be larger than the top radius $r_{\rm top}$, which itself cannot be larger than half of the top width of the structure $\frac{1}{2}\left(cd + \frac{h}{\tan(swa)}\right)$. In total four constraints must be fulfilled that are defined by functions that are negative if the constraints are fulfilled.

In [4]:

constraints = [
 {'name': 'r_top_t', 'constraint': 't-r_top+1'},
 {'name': 'r_top', 'constraint': 'r_top - cd/2 + h/2/tan(Pi*swa/180) + 1'},
 {'name': 'r_bot', 'constraint': 'r_bot - pitch/2 + cd/2 + h/2/tan(Pi*swa/180) + 1'},
 {'name': 'r_bot_t', 'constraint': 'r_bot + t - pitch/2 + cd/2 + h/2/tan(Pi*swa/180) - t*tan(Pi*swa/2/180) + 1'}
]

With this information, a new Study object can be created and we can configure the study by calling study.set_parameters().

In [5]:

study = client.create_study(domain=domain, constraints=constraints,
                            driver="PCESensitivityAnalysis",
                            name="PCE Sensitivity Analysis",
                            save_dir=os.getcwd(),
                            study_id='sensitivity_analysis')

study.set_parameters(max_iter=400)

2021-08-25 10:09:01: The dashboard is accessible via http://localhost:4554/dashboard/sensitivity_analysis
2021-08-25 10:09:02: Optimization of the PCE after 390 samples took 1.31s

The model function itself is implemented in a python package ForwardProblem that comes with this Jupyter notebook. It contains the experimental data and a function to run FEM simulations of the scattering process.

Using the package, we can define the objective function of the study that returns an Observation object with a list of all scaled intensities, i.e. the list $[f_1(\mathbf{p})/\varepsilon_1,\dots,f_k(\mathbf{p})/\varepsilon_k]$.

In [6]:

m = ForwardProblem()

intensities_exp, uncertainties, thetas = m.exp_data()
uncertainties = np.array([
        *uncertainties["S"]["phi_0"],
        *uncertainties["S"]["phi_90"],
        *uncertainties["P"]["phi_0"],
        *uncertainties["P"]["phi_90"],       
    ])

def objective(**kwargs):
    observation = study.new_observation()
    intensities = m.model_eval(kwargs)
    intensities = np.array([
        *intensities["S"]["phi_0"],
        *intensities["S"]["phi_90"],
        *intensities["P"]["phi_0"],
        *intensities["P"]["phi_90"],       
    ])
    observation.add(list(intensities/uncertainties))
    return observation
study.set_objective(objective)

Now we can acquire function values for the determination of the Sobol coefficients.

In [7]:

study.run()

After the study is complete, we can retrieve all relevant information from the function study.driver_info(). We use the determined Sobol coefficient values to assess the possibility of reconstructing certain parameters. Neglecting all higher-order contributions, we assume, that for each channel $k$ the function $g(\mathbf{p}) = f_k(\mathbf{p})/\varepsilon_k$ is a sum \begin{equation} g(\mathbf{p}) = g_0 + g_1(p_1) + g_2(p_2) + \cdots + g_M(p_M) \end{equation} of functions that only depend on one parameter

A good reconstruction is associated with a large value of the inverse on-axis variance estimate $\sigma_l^{-2} \approx a_{\rm ub}^{-2} \sum_{k=1}^K c_k$ with (see Eq. 5) \begin{equation} c_k = \frac{1}{ {\rm Var}\left[p_l\right] } {\rm Var}\left[ \frac{f_k(p_l)}{\varepsilon_k}\right]. \end{equation} Here, the variance of a uniform distribution of a parameter in the interval $[a,b]$ is given as ${\rm Var}\left[p_l\right]=(b-a)^2/12$. Let us plot the value of the first order Sobol coefficients and the value $c_k$ of the scaled variance for each parameter and each channel.

In [8]:

di = study.driver_info() #driver-specific information of PCESensitivityAnalysis 
var = np.array(di['variance']) #total variances of each output of the model function

# names of polarizations and incoming phi-angles
names = [r'S-pol $\phi$ = 0°',r'S-pol $\phi$ = 90°', r'P-pol $\phi$ = 0°', r'P-pol $\phi$ = 90°']

# map variances to polarizations and incoming phi-angles
l = len(thetas)
variances = np.array([var[:l],var[l:2*l],var[2*l:3*l],var[3*l:4*l]])

#Create two sets of figures
plt.rcParams.update({'font.size': 9})
fig1, axs1 = plt.subplots(2, 3, figsize=(11,6))
fig2, axs2 = plt.subplots(2, 3, figsize=(11,6))
plot_indices = itertools.product(range(2),range(3))
style = dict(marker='o', linestyle='-', linewidth=1, markersize=3)

for key, val in di['sobol_coefficients'].items():
    sobol_index = di['sobol_indices'][key] #list of indices of domain belonging to Sobol index
    if len(sobol_index) > 1: continue # we are only interested in 1st. order coefficients
    
    # map sobol indices to polarizations and incoming phi-angles
    sobol_values = [val[:l],val[l:2*l],val[2*l:3*l],val[3*l:4*l]]
    
    dom = domain[sobol_index[0]-1] #domain entry of corresponding parameter
    #name of parameter
    param_name = '${' + '}_{\\rm'.join(dom['name'].split('_')) + '}$'#name of parameter as latex
    
    var_p = (dom['domain'][1]-dom['domain'][0])**2/12 #variance of uniform distribution of parameter
    
    plot_idx = next(plot_indices)
    ax1, ax2 = axs1[plot_idx], axs2[plot_idx]
    for idx in range(4):
        #plot of sobol index values
        ax1.plot(thetas,sobol_values[idx], label=names[idx], **style)
        #plot of scaled variances c_k
        ax2.plot(thetas,variances[idx]*sobol_values[idx]/var_p, label=names[idx], **style)
        
    ax1.set_title(r'Parameter {}'.format(param_name), fontsize=10)
    if plot_idx[0] == 1: ax1.set_xlabel(r'$\theta$ (°)')
    if plot_idx[1] == 0: ax1.set_ylabel('Sobol coefficient')
    ax1.legend()
    ax2.set_title(r'Parameter {}'.format(param_name), fontsize=10)
    if plot_idx[0] == 1: ax2.set_xlabel(r'$\theta$ (°)')
    if plot_idx[1] == 0: ax2.set_ylabel(r'scaled variance $c_k$')
    ax2.legend()
    
t1 = fig1.suptitle('1st order Sobol coefficients', fontsize=14)
t2 = fig2.suptitle('Scaled variance of Sobol decomposition to 1st order', fontsize=14)

From the six plots of the scaled variances $c_k$ for each parameter we see that the S-polarization for an incident angle of $\phi=90^\circ$ (orange line) carries most of the information to reconstruct the parameters. Only the thickness $t$ of the silicon dioxide layer can be better reconstructed from P-polarized light at $\phi=90^\circ$ (red line) and S-polarized light at $\phi=0^\circ$ (blue line). On the other hand, P-polarized light at $\phi=0^\circ$ (green line) provides very little information on the grating parameters.

From the results, one can infer that acquiring more data for S-polarized light around $\phi=90^\circ$ and for $\theta \in [60^\circ,80^\circ]$ can enhance the accuracy of the parameter reconstruction. Reducing the corresponding measurement errors $\varepsilon_k$ would have the same effect.

Finally, we want to estimate the value of the on-axis standard deviation $\sigma_p$ for each parameter. We assume that the measurement errors are accurately estimated, such that the mean standard error $\mathrm{MSE} = a_{\rm ub}^2$ is approximately 1. Then \begin{equation} \sigma_p \approx \left(\sum_{k=1}^K c_k\right)^{-\frac{1}{2}} \end{equation} can be determined from the sum of $c_k$ from all measurement channels.

In [9]:

param_names, sigma_values = '', ''
for key, sobol_val in di['sobol_coefficients'].items():
    sobol_index = di['sobol_indices'][key] #list of indices of domain belonging to Sobol index
    if len(sobol_index) > 1: continue # we are only interested in 1st. order coefficients
    
    dom = domain[sobol_index[0]-1] #domain entrx of corresponding parameter
    param_name = '${' + '}_{\\rm '.join(dom['name'].split('_')) + '}$'#name of parameter as latex
    var_p = (dom['domain'][1]-dom['domain'][0])**2/12 #variance of uniform distribution of parameter
    
    c_k = var*sobol_val/var_p #array of c_k values
    param_names += '<td style="text-align:center">{}</td>'.format(param_name)
    sigma_values += '<td>{:.3f}</td>'.format(1/np.sqrt(np.sum(c_k)))
    
display(HTML(f'''
  <table>
     <thead>
         <tr>
             <th>Paramter $p$</th>
             {param_names}
         </tr>
     </thead>
     <tbody>
         <tr>
             <th>On axis standard deviation $\sigma_p$</th>
             {sigma_values}
         </tr>
     </tbody>
  </table>'''))

Paramter $p$	${cd}$	${h}$	${swa}$	${t}$	${r}_{\rm top}$	${r}_{\rm bot}$
On axis standard deviation $\sigma_p$	0.036	0.102	0.121	0.039	0.257	0.371

Apparently, the critical dimension $cd$ and the thickness of the oxide layer $t$ can be best reconstructed with this type of measurement. The value of radii of the corner rounding at the top and the bottom, $r_{\rm top}$ and $t_{\rm bot}$, can be reconstructed with a more limited resolution. As already pointed out earlier, the actual reconstruction uncertainties can be much larger that their on-axis estimates.

Conclusion and Outlook¶

This blog post introduced the concept of a variance based sensitivity analysis for indirect measurement setups. The goal was to see which measurements are sensitive to the values of the certain sample parameters and thus contain relevant information to reconstruct these values from measurements. Based on this information one can improve the measurement process, e.g. by acquiring a larger number of parameter-sensitive measurements.

In the next blog post on parameter reconstruction, we will perform a least-square minimization in order to determine the maximum-likelihood values of the parameter values from measurement data. Moreover, we will perform analyses in order to determine the probability distribution and confidence intervals of the parameter values. We will see that indeed parameters with a smaller estimated on-axis standard deviation can be reconstructed more accurately.

Additional Resources¶

Download a free trial version of JCMsuite.
Documentation of JCMsuite's Analysis and Optimization Toolkit for Python and Matlab.
Tutorials on scatterometry, ellipsometry, and scattering on a line grating.

Home

Logo Position

JCMsuite

Services

Company