Navigate: 
   PgDn / PgUp - next / previous slide 
   Esc - expo mode



 
courtesy of G. Perdue
 
courtesy of G. Perdue
tracking based algorithms fail for high energy events
"by eye" method is very often more accurate
idea: use algorithms for images analysis and pattern recognition
ImageNet is an image database

Siberian Husky or Alaskan Malamute?
If you can't explain it simply, you don't understand it well enough.
Albert Einstein
epoch = one loop over the whole training sample
for each feature vector weights are updated using gradient descent method
target: \(y = 0, 1\)
not really efficient for classification
imagine having some data ~ 100
We can do classification
We can do regression
But real problems are nonlinear


x XOR y = (x AND NOT y) OR (y AND NOT x)



src: deeplearning.net



src: wildml.com

src: arxiv

src: wildml.com
The first goal is to use CNN to find vertex in nuclear target region
Next steps: NC\(\pi^0\)? \(\pi\) momentum? hadron multiplicities?

test accuracy:        92.67 %
    target 0 accuracy:            75.861 %
    target 1 accuracy:            94.878 %
    target 2 accuracy:            94.733 %
    target 3 accuracy:            93.596 %
    target 4 accuracy:            90.404 %
    target 5 accuracy:            94.011 %
    target 6 accuracy:            87.775 %
    target 7 accuracy:            85.225 %
    target 8 accuracy:            94.109 %
    target 9 accuracy:            53.077 %
    target 10 accuracy:           96.608 %
 
In order to attain the impossible, one must attempt the absurd.
Miguel de Cervante

 
Logistic function: \[g(z) = \frac{1}{1 + e^{-z}}\]
Probability of 1: \[P (y = 1 | x, w) = h(x)\]
Probability of 0: \[P (y = 0 | x, w) = 1 - h(x)\]
Probability: \[p (y | x, w) = (h(x))^y\cdot(1 - h(x))^{1 - y}\]
Likelihood: \[L(w) = \prod\limits_{i=0}^n p(y^{(i)} | x^{(i)}, w) = \prod\limits_{i=0}^n (h(x^{(i)}))^{y^{(i)}}\cdot(1 - h(x^{(i)}))^{1 - y^{(i)}}\]
Log-likelihood: \[l(w) = \log L(w) = \sum\limits_{i=0}^n y^{(i)}\log h(x^{(i)}) + (1 - y^{(i)})\log (1-h(x^{(i)}))\]
Learning step (maximize \(l(w)\)): \[w_j = w_j + \alpha\frac{\partial l(w)}{\partial w_j} = w_j + \alpha\sum\limits_{i=0}^n\left(y^{(i)} - h (x^{(i)})\right)x_j\]
 

Feature vector: \[(x,y) \rightarrow (x,y,x^2,y^2)\]
Hypothesis: \[h (x) = \frac{1}{1 + e^{-w_0 - w_1x - w_2y - w_3x^2 - w_4y^2}}\]
In general, adding extra dimension by hand would be hard / impossible. Neural networks do that for us.


| \(x_1\) | 0 | 1 | 0 | 1 | 
| \(x_2\) | 0 | 0 | 1 | 1 | 
| AND | 0 | 0 | 0 | 1 | 
\[h(x) = \frac{1}{1 + e^{-w^Tx}}\]
Intuition:

 
 