-- Main.PierreArtoisenet - 2011-06-27
I. Organization of the work
We want to investigate the following signatures:
WH -> ell+- + 2b
ZH -> ell+ ell- + 2b
tth -> ell+ ell- 4b
We are now focusing on ttH
II. Goal
The project is to investigate the significance that can be achieved in the search for a Higgs produced in association with top quark pair
considering the dileptonic channel.
The main background is $ t \bar t $ + 2 jets.
The significance can be studied in different scenarios, for example
one can start to fill the following table
Statistical significance | "thin" TF | "Broad TF" |
events with ISR | S_! | S_2 |
events with no ISR | S_3 | S_4 |
By estimating S_1,S_2 ... we will provide a reasonable estimate
of the maximum significance that can be reached at the LHC, and will
also show what are the most important factors controlling this maximum significance.
The strategy is to organize the work into a validation procedure (first step) and a pheno study (second step).
III. Validation procedure
Idea: apply MEM to reconstruct the mass of the Higgs or reconstruct the fraction of signal
events g>UNDER CONTROLLED CONDITIONS.
"under controlled conditions" means that the events in the prepared samples follow EXACTLY the probability
distribution function that is used to evaluate the matrix element weights.
In this way, one can set up the calculation of the weights and check that
NO BIAS is observed in the final result, and hence validate the whole procedure in the absence of systematic errors.
The procedure for the calculation of the weights should be the same as the one used later on for the second step
(i.e. one should consider a finite resolution on jet energies, correct for ISR if necessary, ...)
Only the samples of events are prepared in an artificial way so that we control exactly how the events are distributed in phase-space.
In particular:
- the energy of the final-state partons are smeared exaclty according to the shape of the transfer functions
- the effect of ISR (if taken into account) is to boost the events in the transverse plan, according to a known distribution in pT
The idea is that once this procedure is validated, it can be used in a reliable way for the pheno study.
The subsection below gives theg> work plan for the validation procedure
A. Reconstruction of m_H (signal events only)
1. Parton level + infinite resolution (DONE)
Reconstruction of the mass of the Higgs using a pure sample of parton-level signal events, and considering a narrow transfer function
for jet energies
This is done (Priscilla): there is no bias in the reconstructed mass of the Higgs -> OK
2. Parton level + finite resolution (DONE)
generation of a parton-level event sample (no showering), smearing of the partons energies according to a "broad" transfer
function, reconstruction of the mass of the Higgs with the same TF that has been used to smear the parton energies.
Things to keep in mind:
- when we smear the energy of the partons, we are forced to apply some cuts on jet energies
-> one should include the acceptance term in the likelihood. - to save some time, we are now considering only the gluon-gluon initiated process (for this
sanity check phase, it is ok). But then we should also consider only the gluon-gluon initiated process
when generating the parton-level events. Otherwise we may introduce a bias. - one should also check the convergence for the evalutation of the matrix element weights in the regime where
the resolution on b-jet energies is much worse than the width of the Higgs.
This is a delicate point, because the Higgs decay process $ H \rightarrow b \bar b$ is overconstrained.
By default madweight consider 2 integration channels:
- in channel 1, the invariant mass of the Higgs is mapped onto one variable of integration,
- in channel 2, the energies of the b-quarks originating from the Higgs are mapped onto 2 variables of integration,
but the invariant mass of the Higgs is not.
When the width of the Higgs is orders of magnitude smaller than the resolution in jet energies, we expect that channel 1 is the most appropriate. This is indeed the case. But I also observed that running with \textbf{only} channel 1 makes a big difference: when I compare the values of the weights calculated with the one-channel integrator and with the two-channel integrator, the weights are systematically underestimated in the second case. The difference is quite sizable when we look at the likelihood: the difference is roughtly 4 units of $Log(L)$. So I would suggest to run MadWeight with only one channel of integration (the first one). This can be done by copying the files main_code_one_channel.f and data_one_channel.inc (available in the drop box) in the MW_P1_gg_bbxbmu+vmbxmu-vmx.
UPDATE (06/10/2011, PA): I put a report in the DropBox ("Validation_A2") with a description of the results. I think these are good enough to move to the next step.
B. Testing S+B hypothesis against B-only hypothesis [or reconstructing S/(S+B)]
The idea is the same, i.e. generate a sample of B+S events for which we know the probability distribution exactly, then use MadWeight to reconstruct the fraction of signal events and check that there is no bias.
Inputs:
- for the transfer function, we consider the TF descibed in A.2.
- for the cuts, we consider the cuts givein in V.A. below. ATTENTION: since we will smear the energy of the final state parton, we need to apply a milder pT cut on the partons (pT>15 GeV). The final cut pT(jet)>30 GeV is applied after the smearing procedure.
- for the background processes: as a first check, only one background: gg>tt~+bb~
IV. Pheno study
- Redo the analysis, but with samples of events that are as realistic as possible.
- Evaluate all the systematic uncertainties.
V. Inputs of the analysis
We will do the analysis for the LHC at 14 TeV.
There are several input parameters that need to be fixed right now.
Even during the validation procedure, it will be very useful if we
consider realistic values the parameters associated with the
final-state cuts, the reconstruction efficiencies,
the b-taggings and the energy resolution.
In such a way, the significance that we will obtain
at the end of the validation procedure will not be completely unrealistic,
and this will give us some insights to jump into the second part
(e.g. if we find that the significance is extremely low for a given
signature even under ideal conditions, it may not be worth to push
the analysis further for this signature.)
For the theoretical parameters, we can stick to the default param_card.dat file on the web.
A. Cuts
We need to agree on a set of cuts to be applied on the jets and on the leptons.
I think a resonable set of cuts are (see http://arxiv.org/pdf/1106.0902.pdf)
eta(jet) | <2.4 delta R (p_i,p_j) > 0.3 with p_i, p_j =jet or lepton |
eta(e) | <2.5, pT(mu)> 30 GeV, | eta(mu) | < 2.5 |
<em>Parton-level cuts vs reconstructed-level cuts</em>:
In the validation procedure, parton-level cuts are different from reconstructed-level cuts
because:
- parton-level events are boosted in the transverse plan (if ISR is taken into account)
- final-state parton energies are smeared according to the shape of the transfer function
So one need to apply looser cuts at the parton-level, and then apply the correct set of cuts
at the "reconstructed level".
B. Transfer function
For the parametrization of the transfer functions,
we can stick to the usual asumptions: a superposition
of two Gaussian distributions for the energy of the jets,
a delta function for all other visible quantities.
The parametrization of the TF for jet energies is given by
with $ \delta=E_p-E_j $ (parton-level energy minus reconstructed energy).It would be good to choose values for the parameters
The parameters $p_i$ can be assumed to depend linearly on the parton-level energy ($ p_i=a_1+b_i*E_p $).
a_i | b_i | |
p_1 | XXXX | XXXX |
p_2 | XXXX | XXXX |
p_3 | XXXX | XXXX |
p_4 | XXXX | XXXX |
p_5 | XXXX | XXXX |
It would be good to choose values for the parameters $a_i, b_i$ in the TF that capture the typical resolution of the
CMS detector. Olivier, do you think you could get these values ?
ANSWER from Olivier:
In fact we cann't use the CMS tf since they are not public yet. If we use them, we will create trouble for Vincent/Arnaud (even more if they sign the paper).So the best that we can do is to use the TF computed for Delphes. Arnaud computed TF which are very close to the CMS resolution. So this should be ok.
C. Efficiencies
At some point, we will also need to know the typical reconstruction efficiencies for each channel (taking into account the b-tagging). If we evaluate the matrix element weights for each channel and under each asumption separately, one can incorporate the relative efficiencies for each channel when the likelihood is evaluated. So we don't need to address this problem right now.