API reference
Submodules
fitter
This module contains the Fitter class, which is designed to fit the cdsaxs experimental data using the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and then do a statstical analysis of the best fit parameters using the MCMC (Markov Chain Monte Carlo) algorithm.
- Classes:
Fitter: A class that fits the cdsaxs experimental data using the CMA-ES and MCMC algorithms.
- class cdsaxs.fitter.Fitter(Simulation: Simulation, exp_data)[source]
Bases:
object
This class is designed to fit the cdsaxs experimental data using the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) and then do a statstical analysis of the best fit parameters using the MCMC (Markov Chain Monte Carlo) algorithm. It takes an instance of the Simulation class and fits this simulated data to the experimental data.
- Simulation
An instance of the Simulation class representing the simulated diffraction pattern.
- Type:
- exp_data
Experimental diffraction data.
- Type:
numpy.ndarray
- np
NumPy or CuPy module, depending on whether GPU acceleration is used.
- Type:
module
- best_fit_cmaes
List containing the best fit parameters obtained using the CMA-ES algorithm.
- Type:
list or None
- cmaes()[source]
Perform fitting using the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) algorithm.
- mcmc()
Give a set of statstical data on the best fit parameters using the MCMC (Markov Chain Monte Carlo) algorithm.
- cmaes(sigma, ngen, popsize, mu, n_default, restarts, tolhistfun, ftarget, restart_from_best=True, verbose=True, dir_save=None, test=False)[source]
Fit experimental data using the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) algorithm.
This method utilizes a modified version of the CMA-ES algorithm to fit experimental data.
- Parameters:
sigma (float) – The initial standard deviation for each parameter.
ngen (int) – The number of generations to run the algorithm.
popsize (int) – The size of the population (number of candidate solutions) in each generation.
mu (int) – The number of parents/points for recombination.
n_default (int) – The number of parameters to be optimized.
restarts (int) – The number of restarts allowed during the optimization process.
tolhistfun (float) – The tolerance for the history of the best fitness value.
ftarget (float) – The target fitness value.
restart_from_best (bool, optional) – Determines whether to restart from the best individual found so far. Default is True.
verbose (bool, optional) – Controls whether to print progress information during optimization. Default is True.
dir_save (str, optional) – The directory to save the output. Default is None.
test (bool, optional) – Controls whether to test the function and return best value instead of performing the full optimization process. If True, the function returns best value. Default is False.
- Returns:
A tuple containing the best fit parameters and the corresponding fitness value.
- Return type:
tuple
- best_fit_cmaes
List containing the best fit parameters obtained using the CMA-ES algorithm.
- Type:
list or None
Notes
This method is modified from deap/algorithms.py to return a list of populations instead of the final population and to incorporate additional termination criteria based on neuromorphic algorithms. The function was originally extracted from XiCam and has been modified for specific use cases.
- static do_stats(df, ci=0.95)[source]
This method generates a set of statistical data on the best fit parameters obtained from the MCMC fitting process.
- Parameters:
df (pandas.DataFrame) – The DataFrame containing the best fit parameters.
ci (float, optional) – The confidence interval. Default is 0.95.
- Returns:
A DataFrame containing the statistical data on the best fit parameters.
- Return type:
pandas.DataFrame
- mcmc_bestfit_stats(N, sigma, nsteps, nwalkers, gaussian_move=False, seed=None, verbose=False, test=False, dir_save=None, tau=None, c=1e-05)[source]
Generate a set of statstical data on the best fit parameters using the MCMC (Markov Chain Monte Carlo) algorithm. Two kinds of options for moves to explore solution space are provided gaussian and stretch move. Default is strech move and recommended.
This method utilizes the emcee package’s implementation of the MCMC algorithm and generates a csv file with the statistical data of the best fit parameters.
- Parameters:
N (int) – The number of parameters to be optimized.
sigma (float or list) – The initial standard deviation for each parameter. If a float is provided, it is applied to all parameters. If a list is provided, each parameter is initialized with the corresponding value.
nsteps (int) – The number of MCMC steps to perform.
nwalkers (int) – The number of MCMC walkers to use.
gaussian_move (bool, optional) – Determines whether to use Gaussian moves for proposal distribution. If True, Gaussian moves are used. If False, stretch moves are used. Default is False.
seed (int, optional) – The seed for the random number generator. If None, a random seed is generated. Default is None.
verbose (bool, optional) – Controls whether to print progress information during fitting. If True, progress information is printed. Default is False.
test (bool, optional) – Controls whether to test the function and return mean values instead of performing the full fitting process. If True, the function returns mean values. Default is True.
tau (float, optional) – The autocorrelation time to find burnin steps. If None, is provided emcee package is used to estimate the autocorrelation time. If emcee fails to do so 1/3 of the first nsteps are discarded. Default is None.
c (float, optional) – Empirical factor to modify the MCMC acceptance rate. Default is 1e-5.
- Returns:
None
- best_uncorr
The best uncorrected individual obtained from the MCMC fitting process.
- Type:
numpy.ndarray
- best_fitness
The fitness value of the best individual obtained from the MCMC fitting process.
- Type:
float
- minfitness_each_gen
The minimum fitness value at each generation during the MCMC fitting process.
- Type:
numpy.ndarray
- Sampler
An instance of emcee.EnsembleSampler with detailed output of the MCMC algorithm.
- Type:
emcee.EnsembleSampler
- plot_correlation(file, dir_save=None)[source]
Generate a corner plot of the best fit parameters obtained from the MCMC fitting process.
This method utilizes the corner package to generate a corner plot of the best fit parameters obtained from the MCMC fitting process.
- Parameters:
file (str) – The path to the file containing the data.
dir_save (str, optional) – The directory to save the output. If not provided, the plot will be displayed instead of being saved.
- Returns:
None
residual
- class cdsaxs.residual.Residual(data, fit_mode='cmaes', xp=<module 'numpy' from '/opt/hostedtoolcache/Python/3.11.11/x64/lib/python3.11/site-packages/numpy/__init__.py'>, Simulation: Simulation | None = None, c=1e-05, best_fit=None)[source]
Bases:
object
A class to calculate the residual between experimental and simulated data, used for fitness evaluation in optimization algorithms.
- mdata
numpy.ndarray Experimental intensity data.
- mfit_mode
str Method to calculate fitness, differentiating between ‘cmaes’ and ‘mcmc’.
- xp
module NumPy or CuPy module.
- Simulation
Optional[‘Simulation’] Class to simulate the diffraction pattern (for now only StackedTrapezoidSimulation).
- c
float Empirical factor to modify the MCMC acceptance rate.
- best_fit
list or None List containing the best fit parameters obtained from the optimization algorithm (optional).
- log_error()[source]
Return the difference between experimental and simulated data using the log error.
- fix_fitness_mcmc()[source]
Fix the fitness for the MCMC algorithm using the Metropolis-Hastings criterion.
- fix_fitness_mcmc(fitness)[source]
Metropolis-Hastings criterion: acceptance probability equal to ratio between P(new)/P(old) where P is proportional to probability distribution we want to find for our case we assume that probability of our parameters being the best is proportional to a Gaussian centered at fitness=0 where fitness can be log, abs, squared error, etc. emcee expects the fitness function to return ln(P(new)), P(old) is auto-calculated
- Parameters:
fitness – float Fitness value to be fixed.
- Returns:
- float
Fixed fitness value.
- log_error(exp_i_array, sim_i_array)[source]
Return the difference between two set of values (experimental and simulated data), using the log error
- Parameters:
exp_i_array (numpy.ndarray((n))) – Experimental intensities data
sim_i_array (numpy.ndarray((n))) – Simulated intensities data
- Returns:
- numpy.ndarray
Difference between experimental and simulated data using the log error.
- Return type:
Returns