Tutorial for Stacked Trapezoid model

This tutorial shows how to do simulations for stacked trapezoid model and how to fit the experimental data using this model.

[1]:

from cdsaxs.simulations.stacked_trapezoid import StackedTrapezoidSimulation
import numpy as np

Simulation

Prepare the data

[2]:

pitch = 100 #nm distance between two lines
qzs = np.linspace(-0.5, 0.5, 121)
qxs = 2 * np.pi / pitch * np.ones_like(qzs)

#parameters
dwx = 0.1
dwz = 0.1
i0 = 10
bkg = 0.1
y1 = 0.
height = 20. #same as [20., 20.]
bot_cd = 40.
swa = [80., 80.0]

langle = np.deg2rad(np.asarray(swa))
rangle = np.deg2rad(np.asarray(swa))

#simulation data
i_params = {'heights': height,
            'langles': langle,
            'rangles': rangle,
            'y_start': y1,
            'bot_cd': bot_cd,
            'dwx': dwx,
            'dwz': dwz,
            'i0': i0,
            'bkg_cste': bkg,
            }

Description of the parameters:

dwx, dwz, i0 and bkg are the parameters necessary to calculate the Debye-waller factor which accounts for the real world imperfections in the model. You can read more about it here.

The other parameters are the geometrical parameters of the model. The parameters are as follows:

y_start - starting y-coordinate of the base of the nano structure

bot_cd - bottom width (CD for critical dimension) of the nano structure

heights - height of each individual trapezoid can be a single value or a list of values in case of list of values, each value corresponds to the height of each individual trapezoid.

langles - list of all the left bottom angles of each individual trapezoid. The dictionary which is passed should have it in radians.

rangles - list of all the right bottom angles of each individual trapezoid. The dictionary which is passed should have it in radians.

weight - weight of each individual trapezoid to account for the fact that they could be made of different materials. here we will assume that all the trapezoids are made of the same material hence the weight is not necessary.

Note: In symmetric case either left or right angle can be passed and the other will be calculated using the symmetry.

So we are constructing a line space pattern and the cross section of each line looks like following:

Create instance of the Simulation class and call `simulate_diffraction` method

[3]:

Simulation1 = StackedTrapezoidSimulation(qys=qxs, qzs=qzs)

intensity = Simulation1.simulate_diffraction(params=i_params)

#plot
import matplotlib.pyplot as plt
plt.plot(qzs, intensity)
plt.xlabel(r'$q_{z}$')
plt.ylabel('Intensity')
plt.title('Stacked Trapezoid diffraction simulation')

[3]:

Text(0.5, 1.0, 'Stacked Trapezoid diffraction simulation')

Fitting

A txt or csv file with the experimental data containing the values for \(Q_{x}\), \(Q_{z}\) and intensities can be in the format as shown below:

```
qx, qz, intensity
0.1, 0.2, 0.3
0.2, 0.3, 0.4
0.3, 0.4, 0.5
...
```

for a text file you can read the data using the following code:

import numpy as np
data = np.genfromtxt('path_to_file.txt', delimiter=',', skip_header=1)
qx = data[:,0]
qz = data[:,1]
intensity = data[:,2]

We will suppose that we’ve read the data and stored it in the variables qx, qz and intensity. And we’ll use the simulated intensities used in the previous section.

Let’s introduce a bit of noise to the simulated intensities to make it more realistic.

[4]:

import matplotlib.pyplot as plt
intensity_noisy = intensity + np.sqrt(intensity) * np.random.normal(0, 300, intensity.shape)
plt.plot(qzs, intensity, label='original')
plt.plot(qzs, intensity_noisy, label='added noise')
plt.ylabel('Intensity')
plt.xlabel('qz')
plt.legend()

[4]:

<matplotlib.legend.Legend at 0x700729f7f1d0>

../_images/Tutorials_stacked_trapezoid_simulation_and_fitting_12_1.png

Now that we have our experimental data good to go for the fitting we need to provide a list of initial values and bounds for the parameters. The bounds are necessary to make sure that the fitting algorithm doesn’t lose its way looking for impossible values.

[5]:

initial_params = {'heights': {'value': height+2, 'variation': 10E-1},
                    'langles': {'value': langle-0.2, 'variation': 10E-1},
                    'rangles': {'value': rangle-0.50, 'variation': 10E-1},
                    'y_start': {'value': y1+1, 'variation': 10E-1},
                    'bot_cd': {'value': bot_cd-1, 'variation': 10E-2},
                    'dwx': {'value': dwx, 'variation': 10E-5},
                    'dwz': {'value': dwz, 'variation': 10E-5},
                    'i0': {'value': i0, 'variation': 10E-5},
                    'bkg_cste': {'value': bkg, 'variation': 10E-5}
                    }

Create an instance of the Simulation class and pass it to the Fitter class along with data to fit

[6]:

from cdsaxs.fitter import Fitter
Simulation2 = StackedTrapezoidSimulation(use_gpu=False, qys=qxs, qzs=qzs, initial_guess=initial_params)

Fitter1 = Fitter(Simulation=Simulation2, exp_data=intensity)

Then we’ll call the cmaes method of the Fitter class to fit the data. It’s important to know that this method provides a normalized set of values between -sigma and sigma and it is rescaled to the parameter values according to the model used.

sigma - the standard deviation of the values generated by the cmaes method.

Note: Similar to bound denoted by variation the parameter sigma helps to control the exploration of the algorithm. The larger the value of sigma the further the algorithm explores the parameter space and vice versa. Thus they need to be adjusted for the algorithm to converge to the optimal solution.

ngen - the number of generations to run the algorithm

popsize - the population size

mu - the number of parents to select for the next generation

n_default - the number parameters to fit

restarts - the number of times to restart the algorithm

tolhistfun - the tolerance for the history of the function value

ftarget - the target fitness value

restart_from_best - whether to restart from the best solution found so far

verbose - whether to print the progress of the algorithm

dir_save - the directory to save the results of the fitting

[7]:

bestfit, fitness = Fitter1.cmaes(sigma=1, ngen=100, popsize=500, mu=10, n_default=11, restarts=1, tolhistfun=10E-5, ftarget=0, restart_from_best=True, verbose=False)
print(bestfit)

Iteration terminated due to ngen criterion after 100 gens
Doubled popsize
Iteration terminated due to ngen criterion after 100 gens
[]
     height1   langle1  langle2   rangle1   rangle2  y_start     bot_cd  \
0  19.999831  1.397397  1.39493  1.395469  1.397299  0.15517  39.996793

        dwx       dwz        i0  bkg_cste
0  0.099187  0.100135  9.999407  0.099898

/nobackup/nd276333/Workspace/cdsaxs/src/cdsaxs/fitter.py:290: RuntimeWarning: invalid value encountered in scalar divide
  if(n_infs / fitness_arr.shape[0] > 0.5):

Now we have the bestfit parameters in the form of a pandas dataframe. We will look at the statistical information generated by the mcmc_bestfit_stats method.

[8]:

with np.errstate(divide='ignore', invalid='ignore', over='ignore'):
    stats = Fitter1.mcmc_bestfit_stats(N=11, sigma = 1., nsteps=100, nwalkers=500)

    print(stats)

11 parameters

  0%|          | 0/100 [00:00<?, ?it/s]100%|██████████| 100/100 [00:02<00:00, 41.05it/s]

               mean       std  count        min        max   lower_ci  \
height1   20.036754  0.244560  28695  18.724427  23.587721  20.033035
langle1    1.462781  0.235986  28695   0.056644   3.139290   1.459192
langle2    1.428477  0.225850  28695   0.060566   3.096835   1.425043
rangle1    1.442650  0.239630  28695   0.011439   3.079889   1.439006
rangle2    1.422168  0.236631  28695   0.036405   2.946014   1.418570
y_start    0.606899  0.224636  28695   0.001792   3.918192   0.603483
bot_cd    39.989187  0.037260  28695  39.633367  40.461651  39.988620
dwx        0.099176  0.000036  28695   0.098975   0.099463   0.099175
dwz        0.100136  0.000026  28695   0.099906   0.100385   0.100136
i0         9.999419  0.000030  28695   9.999232   9.999656   9.999419
bkg_cste   0.099896  0.000025  28695   0.099704   0.100075   0.099895

           upper_ci  uncertainity
height1   20.040473  3.719023e-03
langle1    1.466369  3.588627e-03
langle2    1.431912  3.434493e-03
rangle1    1.446294  3.644053e-03
rangle2    1.425767  3.598446e-03
y_start    0.610315  3.416029e-03
bot_cd    39.989753  5.666087e-04
dwx        0.099176  5.474902e-07
dwz        0.100137  3.993230e-07
i0         9.999420  4.615437e-07
bkg_cste   0.099896  3.858098e-07