API reference

Submodules

fitter

This module contains the Fitter class, which is designed to fit the cdsaxs experimental data using the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and then do a statstical analysis of the best fit parameters using the MCMC (Markov Chain Monte Carlo) algorithm.

Classes:: Fitter: A class that fits the cdsaxs experimental data using the CMA-ES and MCMC algorithms.

class cdsaxs.fitter.Fitter(Simulation: Simulation, exp_data)[source]

Bases: object

This class is designed to fit the cdsaxs experimental data using the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and then do a statstical analysis of the best fit parameters using the MCMC (Markov Chain Monte Carlo) algorithm. It takes an instance of the Simulation class and fits this simulated data to the experimental data.

Simulation

An instance of the Simulation class representing the simulated diffraction pattern.

Type:: Simulation

exp_data

Experimental diffraction data.

Type:: numpy.ndarray

np

NumPy or CuPy module, depending on whether GPU acceleration is used.

Type:: module

best_fit_cmaes

List containing the best fit parameters obtained using the CMA-ES algorithm.

Type:: list or None

cmaes()[source]: Perform fitting using the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) algorithm.

mcmc(): Give a set of statstical data on the best fit parameters using the MCMC (Markov Chain Monte Carlo) algorithm.

cmaes(sigma, ngen, popsize, mu, n_default, restarts, tolhistfun, ftarget, restart_from_best=True, verbose=True, dir_save=None, test=False)[source]

Fit experimental data using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm.

This method utilizes a modified version of the CMA-ES algorithm to fit experimental data.

Parameters:

sigma (float) – The initial standard deviation for each parameter.
ngen (int) – The number of generations to run the algorithm.
popsize (int) – The size of the population (number of candidate solutions) in each generation.
mu (int) – The number of parents/points for recombination.
n_default (int) – The number of parameters to be optimized.
restarts (int) – The number of restarts allowed during the optimization process.
tolhistfun (float) – The tolerance for the history of the best fitness value.
ftarget (float) – The target fitness value.
restart_from_best (bool, optional) – Determines whether to restart from the best individual found so far. Default is True.
verbose (bool, optional) – Controls whether to print progress information during optimization. Default is True.
dir_save (str, optional) – The directory to save the output. Default is None.
test (bool, optional) – Controls whether to test the function and return best value instead of performing the full optimization process. If True, the function returns best value. Default is False.

Returns:

A tuple containing the best fit parameters and the corresponding fitness value.

Return type:

tuple

best_fit_cmaes

List containing the best fit parameters obtained using the CMA-ES algorithm.

Type:: list or None

Notes

This method is modified from deap/algorithms.py to return a list of populations instead of the final population and to incorporate additional termination criteria based on neuromorphic algorithms. The function was originally extracted from XiCam and has been modified for specific use cases.

static do_stats(df, ci=0.95)[source]

This method generates a set of statistical data on the best fit parameters obtained from the MCMC fitting process.

Parameters:

df (pandas.DataFrame) – The DataFrame containing the best fit parameters.
ci (float, optional) – The confidence interval. Default is 0.95.

Returns:

A DataFrame containing the statistical data on the best fit parameters.

Return type:

pandas.DataFrame

mcmc_bestfit_stats(N, sigma, nsteps, nwalkers, gaussian_move=False, seed=None, verbose=False, test=False, dir_save=None, tau=None, c=1e-05)[source]

Generate a set of statstical data on the best fit parameters using the MCMC (Markov Chain Monte Carlo) algorithm. Two kinds of options for moves to explore solution space are provided gaussian and stretch move. Default is strech move and recommended.

This method utilizes the emcee package’s implementation of the MCMC algorithm and generates a csv file with the statistical data of the best fit parameters.

Parameters:

N (int) – The number of parameters to be optimized.
sigma (float or list) – The initial standard deviation for each parameter. If a float is provided, it is applied to all parameters. If a list is provided, each parameter is initialized with the corresponding value.
nsteps (int) – The number of MCMC steps to perform.
nwalkers (int) – The number of MCMC walkers to use.
gaussian_move (bool, optional) – Determines whether to use Gaussian moves for proposal distribution. If True, Gaussian moves are used. If False, stretch moves are used. Default is False.
seed (int, optional) – The seed for the random number generator. If None, a random seed is generated. Default is None.
verbose (bool, optional) – Controls whether to print progress information during fitting. If True, progress information is printed. Default is False.
test (bool, optional) – Controls whether to test the function and return mean values instead of performing the full fitting process. If True, the function returns mean values. Default is True.
tau (float, optional) – The autocorrelation time to find burnin steps. If None, is provided emcee package is used to estimate the autocorrelation time. If emcee fails to do so 1/3 of the first nsteps are discarded. Default is None.
c (float, optional) – Empirical factor to modify the MCMC acceptance rate. Default is 1e-5.

Returns:

None

best_uncorr

The best uncorrected individual obtained from the MCMC fitting process.

Type:: numpy.ndarray

best_fitness

The fitness value of the best individual obtained from the MCMC fitting process.

Type:: float

minfitness_each_gen

The minimum fitness value at each generation during the MCMC fitting process.

Type:: numpy.ndarray

Sampler

An instance of emcee.EnsembleSampler with detailed output of the MCMC algorithm.

Type:: emcee.EnsembleSampler

plot_correlation(file, dir_save=None)[source]

Generate a corner plot of the best fit parameters obtained from the MCMC fitting process.

This method utilizes the corner package to generate a corner plot of the best fit parameters obtained from the MCMC fitting process.

Parameters:

file (str) – The path to the file containing the data.
dir_save (str, optional) – The directory to save the output. If not provided, the plot will be displayed instead of being saved.

Returns:

None

save_population(population_arr, fitness_arr, dir_save, fit_mode='cmaes')[source]

Save the population array to a csv file.

Parameters:

population (numpy.ndarray) – The population array to save.
dir_save (str) – The directory to save the output.

Returns:

None

set_best_fit_cmaes(best_fit)[source]

Set the best fit parameters obtained using the CMA-ES algorithm.

Parameters:: best_fit (pandas.DataFrame) – The best fit parameters obtained using the CMA-ES algorithm.
Returns:: None

residual

class cdsaxs.residual.Residual(data, fit_mode='cmaes', xp=<module 'numpy' from '/opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/numpy/__init__.py'>, Simulation: Simulation | None = None, c=1e-05, best_fit=None)[source]

Bases: object

A class to calculate the residual between experimental and simulated data, used for fitness evaluation in optimization algorithms.

mdata: numpy.ndarray Experimental intensity data.

mfit_mode: str Method to calculate fitness, differentiating between ‘cmaes’ and ‘mcmc’.

xp: module NumPy or CuPy module.

Simulation: Optional[‘Simulation’] Class to simulate the diffraction pattern (for now only StackedTrapezoidSimulation).

c: float Empirical factor to modify the MCMC acceptance rate.

best_fit: list or None List containing the best fit parameters obtained from the optimization algorithm (optional).

__call__()[source]: Calculate the residual between experimental and simulated data.

log_error()[source]: Return the difference between experimental and simulated data using the log error.

fix_fitness_mcmc()[source]: Fix the fitness for the MCMC algorithm using the Metropolis-Hastings criterion.

fix_fitness_mcmc(fitness)[source]

Metropolis-Hastings criterion: acceptance probability equal to ratio between P(new)/P(old) where P is proportional to probability distribution we want to find for our case we assume that probability of our parameters being the best is proportional to a Gaussian centered at fitness=0 where fitness can be log, abs, squared error, etc. emcee expects the fitness function to return ln(P(new)), P(old) is auto-calculated

Parameters:

fitness – float Fitness value to be fixed.

Returns:

float: Fixed fitness value.

log_error(exp_i_array, sim_i_array)[source]

Return the difference between two set of values (experimental and simulated data), using the log error

Parameters:

exp_i_array (numpy.ndarray((n))) – Experimental intensities data
sim_i_array (numpy.ndarray((n))) – Simulated intensities data

Returns:

numpy.ndarray: Difference between experimental and simulated data using the log error.

Return type:

Returns

Subpackages

cdsaxs.simulations package