PyUnfold: The Python Unfolding Package

Zigfried Hampel-Arias

IIHE -- Brussels, BE

23 May, 2018

The Measurement Process

An ideal detector:

  • Makes no error measuring a desired quantity
  • Makes no bias or shift in a desired quantity

Real world detectors have:

  • Finite resolutions
  • Characteristic biases
  • Efficiencies <100%
  • Statistical + systematic uncertainties

Measuring Distributions

  • Distribution $\phi(C)$ of causes $C_{\mu}$
  • Distribution $n(E)$ of effects $E_j$
  • Detector response matrix $R_{\mu j}$ connecting causes → effects

$$ n(E) = \mathbf{R} \, \phi(C) $$

  • $n(E)$ is what we observe
  • $\mathbf{R}$ is known or estimated
  • $\phi(C)$ is the truth what we want

Physics Example: Cosmic Ray Spectrum

effects response
$n(E)$ $\mathbf{R}$ $\phi(C)$

From Phys. Rev. D 96, 122001

Getting What We Want

Two ways to get $\phi(C)$:

  • Guess a form of $\phi(C)$ and fold with response $\mathbf{R}$
    • Easy, cheap to evaluate
    • Restricted to analytic forms
  • Start with $n(E)$ and unfold $\phi(C)$
    • No restrictions on form
    • Have to trust solution

Cosmic Rays Again

effects response
$n(E)$ $\mathbf{R}$ $\phi(C)$
Unfolding Folding

From Phys. Rev. D 96, 122001

Deconvolution

So why don't we just invert $\mathbf{R}$?

What if

  • an analytic form of $\mathbf{R}$ d.n.e.?
  • $\mathbf{R}$ is built from finite simulation?
  • $\mathbf{R}$ is not be invertible?

Proposal: build a matrix $\mathbf{M}$ s.t.

$$ \begin{align} \mathbf{M} &\approx \mathbf{R}^{-1} \\ \phi(C) &\approx \mathbf{M} \, n(E) \end{align} $$

PyUnfold

pyunfold

A Python package to account for imperfect measurements in an iterative unfolding.

PyUnfold

A Python package to account for imperfect measurements in an iterative unfolding*.

Authors: James Bourbeau, Zig Hampel

PyUnfold provides users:

  • A tool for the analysis of measurement effects on physical distributions, i.e. calculate $\mathbf{M}$
  • A straightforward API that's experimentally agnostic, beyond HEP
  • The ability to easily incorporate both $\sigma_{\text{stat}}$ and $\sigma_{\text{sys}}$

Installation: pip install pyunfold

* D'Agostini (1995)

PyUnfold Inputs

To use PyUnfold, one needs only to provide the

  • Measured effects distribution
  • Response matrix
  • Prior distribution
In [2]:
# Perform iterative unfolding
unfolded_result = iterative_unfold(data=data,
                                   data_err=data_err,
                                   response=response,
                                   response_err=response_err,
                                   efficiencies=efficiencies,
                                   efficiencies_err=efficiencies_err)

Features: Prior

Flexibility of prior to test results's robustness

  • Uniform
  • Jeffreys
  • User-defined

Other unfolding toolkits do not permit user defined prior.

In [4]:
from pyunfold.priors import jeffreys_prior, uniform_prior
cause_lim = np.logspace(0, 3, num_causes)
uni_prior = uniform_prior(num_causes)
jeff_prior = jeffreys_prior(cause_lim)

Features: Prior

priors

In [5]:
print('Running with uniform prior...')
unfolded_uniform = iterative_unfold(data=data_observed,
                                    data_err=data_observed_err,
                                    response=response,
                                    response_err=response_err,
                                    efficiencies=efficiencies,
                                    efficiencies_err=efficiencies_err,
                                    ts='ks',
                                    ts_stopping=0.01,
                                    callbacks=[Logger()])

print('\nRunning with Jeffreys prior...')
unfolded_jeffreys = iterative_unfold(data=data_observed,
                                     data_err=data_observed_err,
                                     response=response,
                                     response_err=response_err,
                                     efficiencies=efficiencies,
                                     efficiencies_err=efficiencies_err,
                                     prior=jeff_prior,
                                     ts='ks',
                                     ts_stopping=0.01,
                                     callbacks=[Logger()])
Running with uniform prior...
Iteration 1: ts = 0.1460, ts_stopping = 0.01
Iteration 2: ts = 0.0060, ts_stopping = 0.01

Running with Jeffreys prior...
Iteration 1: ts = 0.7231, ts_stopping = 0.01
Iteration 2: ts = 0.0171, ts_stopping = 0.01
Iteration 3: ts = 0.0006, ts_stopping = 0.01

prior-unfolded

Features: Convergence

Stopping criteria based on test statistic (TS)

  • User defined TS tolerance
  • K-S, $\chi^2$, RMD, Bayes factor

Other unfolding toolkits set default $N_{\text{iter}}=4$.

M. Kuusela: Countless LHC papers using $4$ iterations of D'Agostini unfolding.

Features: Regularization

  • Smoothing at each iteration with univariate spline
    • Optional callback
    • Strength parameter is required variable
  • Multidimensional capabilities
    • Groups or blocks of causes
    • $R_{\mu j} \to R_{\mu \nu j}$ unravelled
    • Regularization only in groups

Optimal reg. strength problem dependent, default does not exist.

Regularizing a Bad Prior

bumpy-prior

Regularizing a Bad Prior

regularization

Extensive Documentation

PyUnfold Site:

https://jrbourbeau.github.io/pyunfold/index.html

PyUnfold GitHub repo:

https://github.com/jrbourbeau/pyunfold

Other:

Introductory and advanced example notebooks

Mathematical details

Other Unfolding Resources

D'Agostini:

Statistician M. Kuusela:

Current Status

  • Can be used used out of the box right now: pip install pyunfold

  • Flexible framework. Submit an issue if you see something you want to change or see!

  • PyUnfold submitted to J.O.S.S. Under review

  • Designed to make it easier to:

    • run an iterative unfolding
    • test robustness of results
  • Hoping to make unfolding accessible beyond HEP

Thank you!

sunset