Early illustrations
We are interested in prices and carats of diamonds. To get a glimpse of their distribution, it is customary to plot histograms. The bins parameter corresponds to the number of small rectangles on the x-axis.
library(tidyverse)
diamonds %>% filter(carat < 2.55) %>% ggplot() + geom_histogram(aes(x = carat), bins = 100)
diamonds %>% filter(carat < 1.1) %>% ggplot() + geom_histogram(aes(x = carat), bins = 200) # Higher precision for small diamonds
diamonds %>% filter(carat < 2.55) %>% ggplot() + geom_histogram(aes(x = price), bins = 30)
diamonds %>% ggplot(aes(x = cut)) + geom_bar(aes(y = (..count..)/sum(..count..))) + ylab("Proportion")
Sometimes, it is easier to quantify the distribution of a variable with a few numbers: the mean, the standard deviation, etc. They are usually called the ‘descriptive statistics’.
diamonds %>% summary() # Canonical R function for descriptive statistics
carat cut color clarity depth table price
Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065 Min. :43.00 Min. :43.00 Min. : 326
1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258 1st Qu.:61.00 1st Qu.:56.00 1st Qu.: 950
Median :0.7000 Very Good:12082 F: 9542 SI2 : 9194 Median :61.80 Median :57.00 Median : 2401
Mean :0.7979 Premium :13791 G:11292 VS1 : 8171 Mean :61.75 Mean :57.46 Mean : 3933
3rd Qu.:1.0400 Ideal :21551 H: 8304 VVS2 : 5066 3rd Qu.:62.50 3rd Qu.:59.00 3rd Qu.: 5324
Max. :5.0100 I: 5422 VVS1 : 3655 Max. :79.00 Max. :95.00 Max. :18823
J: 2808 (Other): 2531
x y z
Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.: 4.710 1st Qu.: 4.720 1st Qu.: 2.910
Median : 5.700 Median : 5.710 Median : 3.530
Mean : 5.731 Mean : 5.735 Mean : 3.539
3rd Qu.: 6.540 3rd Qu.: 6.540 3rd Qu.: 4.040
Max. :10.740 Max. :58.900 Max. :31.800
diamonds %>% select(carat, price) %>% apply(2,sd) # Computing the standard deviation over carats & prices
carat price
0.4740112 3989.4397381
diamonds %>% select(carat, price) %>% filter(carat < 2.1) %>% summary()
carat price
Min. :0.200 Min. : 326
1st Qu.:0.400 1st Qu.: 942
Median :0.700 Median : 2357
Mean :0.775 Mean : 3765
3rd Qu.:1.030 3rd Qu.: 5166
Max. :2.090 Max. :18818
The last computation illustrates the sensitivity of the mean. In the first batch of stats, the average price was 3933 and it is 3765 over the filtered data (omitting large diamonds makes the average price go down). The mean is sensitive to extreme points. It’s not the case for the median (2401 => 2357). The median is much more stable and less sensitive to outliers.
The reason why it is convenient to work with a single figure is that it can easily be computed on many subsamples. It is harder to visually analyse many distributions. Below, we analyse many subcases, i.e., when working with subgroups pertaining to each combination of cut, clarity and color.
means <- diamonds %>% group_by(cut, clarity, color) %>% # We build a pivot table over cut, clarity and color
summarize(avg_carat = mean(carat), avg_price = mean(price))
means %>% ggplot(aes(x = avg_carat, y = avg_price)) + geom_point(aes(color = clarity, size = cut))
means %>% ggplot(aes(x = avg_carat, y = avg_price)) + geom_point(aes(color = cut, size = clarity))
The above plots reveal different clusters! Typically, they show the impact of the clarity on the size of the diamonds (and hence on their price). Or is it the other way around? Heuristically, it is almost impossible to reach flawlessness on a large diamond.
For the sake of illustration, we plot a series of histograms that characterize the price distribution over various subsets of diamonds. The x-axis and y-axis are common across all histograms to ease comparison.
diamonds %>% ggplot(aes(x = price)) +
geom_histogram(bins = 10) +
facet_grid(cut ~ color) +
theme(text = element_text(size=12))
While the view is much more complete, the analysis takes longer.
Convergence from histogram (discrete) to density (continuous)
Below, we illustrate the convergence of a histogram to the density of a continuous distribution. We use the canonical Gaussian law as example. If we want to switch from counts to density values, we must specify ‘y = ..density..’ inside the aes() of ggplot.
x <- rnorm(10^6) %>% data.frame()
colnames(x) <- "simulation"
x %>% ggplot(aes(x = simulation)) + geom_histogram(bins = 20)
x %>% ggplot(aes(x = simulation)) + geom_histogram(bins = 100)
x %>% ggplot(aes(x = simulation, y = ..density..)) + geom_histogram(bins = 1000)
Parametric families
For modelling purposes, it is convenient to specify simple functions that characterize distributions. An exhaustive list of probability distributions is provided here: https://en.wikipedia.org/wiki/List_of_probability_distributions
Continuous distributions over the real line
The most famous of them is the Gaussian (normal) distribution, which was just used above. We show the impact of its parameters on its density.
ggplot(data.frame(x = c(-6, 9)), aes(x)) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 1), aes(color = "m = 0.0, s = 1.0")) +
stat_function(fun = dnorm, args = list(mean = 0.5, sd = 1.5), aes(color = "m = 0.5, s = 1.5")) +
stat_function(fun = dnorm, args = list(mean = 1, sd = 2), aes(color = "m = 1.0, s = 2.0")) +
stat_function(fun = dnorm, args = list(mean = 1.5, sd = 2.5), aes(color = "m = 1.5, s = 2.5")) +
scale_colour_manual("Legend", values = c("#B266FF", "#6666FF", "#66B2FF", "#00FF80"))
The coding of many RGB colors can be found here: https://www.rapidtables.com/web/color/RGB_Color.html
The Gaussian distribution has very light tails: extreme points are very rare. Its heavy-tailed cousin is the Student distribution. Below, we show the difference between the two: it is salient for extreme values (to the left or to the right).
ggplot(data.frame(x = c(-7, 7)), aes(x)) +
stat_function(fun = dt, args = list(df = 0.5), aes(color = "df = 0.5 (Student)")) +
stat_function(fun = dt, args = list(df = 1.0), aes(color = "df = 1.0 (Student)")) +
stat_function(fun = dt, args = list(df = 2.0), aes(color = "df = 2.0 (Student)")) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 2), aes(color = "m = 0, s = 2.0 (Gaussian)"))
ggplot(data.frame(x = c(-9, -3)), aes(x)) +
stat_function(fun = dt, args = list(df = 0.5), aes(color = "df = 0.5 (Student)")) +
stat_function(fun = dt, args = list(df = 1.0), aes(color = "df = 1.0 (Student)")) +
stat_function(fun = dt, args = list(df = 2.0), aes(color = "df = 2.0 (Student)")) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 2), aes(color = "m = 0, s = 2.0 (Gaussian)"))
On the second graph, we have zoomed on the left tail to see the difference.
Positive continuous distributions
The gamma distribution is one archetypal such distribution. It nests the exponential family as a special case. Below, the cases a = 1 correspond to exponential distributions.
ggplot(data.frame(x = c(0, 6)), aes(x)) +
stat_function(fun = dgamma, args = list(rate = 1, shape = 1), aes(color = "a = 1, b = 1")) +
stat_function(fun = dgamma, args = list(rate = 1, shape = 2), aes(color = "a = 2, b = 1")) +
stat_function(fun = dgamma, args = list(rate = 2, shape = 1), aes(color = "a = 1, b = 2")) +
stat_function(fun = dgamma, args = list(rate = 2, shape = 2), aes(color = "a = 2, b = 2")) +
scale_colour_manual("Legend", values = c("#FF3333", "#FF9933", "#66B2FF", "#00FF80"))
The beta distribution is a convenient choice when working with continuous data over a finite interval. Its density is very flexible with only two parameters. The diversity in shapes is very useful.
ggplot(data.frame(x = c(0, 1)), aes(x)) +
stat_function(fun = dbeta, args = list(shape1 = 0.4, shape2 = 0.6), aes(color = "a = 0.4, b = 0.6")) +
stat_function(fun = dbeta, args = list(shape1 = 1, shape2 = 3), aes(color = "a = 1, b = 3")) +
stat_function(fun = dbeta, args = list(shape1 = 1, shape2 = 5), aes(color = "a = 1, b = 5")) +
stat_function(fun = dbeta, args = list(shape1 = 2, shape2 = 2), aes(color = "a = 2, b = 2")) +
stat_function(fun = dbeta, args = list(shape1 = 2, shape2 = 6), aes(color = "a = 2, b = 6")) +
scale_colour_manual("Legend", values = c("#9933FF","#FF3333", "#FF9933", "#66B2FF", "#00FF80"))
Other examples include the Pareto and the lognormal distributions.
Another class of continuous distributions is that of laws with bounded support. The uniform and the beta laws belong to this class.
Discrete distributions
The most classical ones are the Poisson (infinite support) and binomial distributions (finite support).
ggplot(data.frame(x = c(0,8)), aes(x)) +
stat_function(aes(color = "lambda = 1"), geom = "point", n = 9, fun = dpois, args = list(lambda = 1)) +
stat_function(aes(color = "lambda = 2"), geom = "point", n = 9, fun = dpois, args = list(lambda = 2)) +
stat_function(aes(color = "lambda = 3"), geom = "point", n = 9, fun = dpois, args = list(lambda = 3))
Other usual families: hypergeometric, beta-binomial (finite support), negative binomial (infinite support).
One last graph to show the shape of cumulative distribution functions.
ggplot(data.frame(x = c(-3,5)), aes(x)) +
stat_function(fun = pnorm, args = list(mean = 0, sd = 1), aes(color = "GAUSSIAN, m = 0.0, s = 1.0")) +
stat_function(aes(color = "POISSON, lambda = 2"), geom = "point", n = 9, fun = ppois, args = list(lambda = 2))
Strange mixes
Densities can have strange shapes.
dens = function(x){(dnorm(x,2,3) + dt(x,2))/2 } # DENSITY NOT EQUAL TO THAT OF THE SUM OF THE TWO CORRESPONDING RVs!!!
dens_2 = function(x){(dnorm(x,-2,0.8) + dnorm(x,1,1.2))/2 }
ggplot(data.frame(x = c(-6, 9)), aes(x)) + stat_function(fun = dens, aes(color = "mix_1")) +
stat_function(fun = dens_2, aes(color = "mix_2"))
Correlations
Very often, variables come in groups. Studying their univariate properties is of course useful, but understanding how they interact is sometimes even more insightful. Because we expect the size of a diamond to have a predominant impact on its price, the variables price and carat should be strongly positively correlated. In R, we simply use the cor() function.
cor(diamonds$carat, diamonds$price) # Correlation
[1] 0.9215913
And it is indeed the case. Correlations above 0.9 indicate a strong link between the two variables.
A stylized sample is visually clear.
library(MASS)
Attachement du package : ‘MASS’
The following object is masked from ‘package:dplyr’:
select
m <- c(0,0) # Means
rho <- 0.9 # Correlation
sig <- rbind(c(1,rho), c(rho,1)) # Covariance matrix
z <- mvrnorm(1000, mu = m, Sigma = sig) %>% data.frame() # Data generation
ggplot(z, aes(x = X1, y = X2)) + geom_point()
Let’s see what happens with diamonds! Below, we use a great feature of R and ggplot that allows to show a multitude of subplots at the same time: facets!
diamonds %>%
ggplot(aes(x = carat, y = price, color = clarity)) +
geom_point() + # Scatter plots
facet_grid(color ~ cut) + # The faceting
theme(text = element_text(size=12)) # Font size
Obviously, the correlation is positive. The slopes are probably non-linear and depend both on clarity and color. Note that there are almost no pure diamonds with Fair cuts. Pure diamonds are those associated to the strongest slopes: their prices increase steadily with their weights.
LS0tCnRpdGxlOiAiRGlzdHJpYnV0aW9ucyIKb3V0cHV0OiAKICBodG1sX25vdGVib29rOgogICAjdG9jOiB0cnVlCiAgICN0b2NfZmxvYXQ6IHRydWUKICAjIGdpdGh1Yl9kb2N1bWVudDogZGVmYXVsdAotLS0KCiMjIEVhcmx5IGlsbHVzdHJhdGlvbnMKV2UgYXJlIGludGVyZXN0ZWQgaW4gcHJpY2VzIGFuZCBjYXJhdHMgb2YgZGlhbW9uZHMuIFRvIGdldCBhIGdsaW1wc2Ugb2YgdGhlaXIgZGlzdHJpYnV0aW9uLCBpdCBpcyBjdXN0b21hcnkgdG8gcGxvdCBoaXN0b2dyYW1zLiBUaGUgYmlucyBwYXJhbWV0ZXIgY29ycmVzcG9uZHMgdG8gdGhlIG51bWJlciBvZiBzbWFsbCByZWN0YW5nbGVzIG9uIHRoZSB4LWF4aXMuCmBgYHtyIGV4MSwgbWVzc2FnZSA9IEZBTFNFfQpsaWJyYXJ5KHRpZHl2ZXJzZSkKZGlhbW9uZHMgJT4lIGZpbHRlcihjYXJhdCA8IDIuNTUpICU+JSBnZ3Bsb3QoKSArIGdlb21faGlzdG9ncmFtKGFlcyh4ID0gY2FyYXQpLCBiaW5zID0gMTAwKQpkaWFtb25kcyAlPiUgZmlsdGVyKGNhcmF0IDwgMS4xKSAlPiUgZ2dwbG90KCkgKyBnZW9tX2hpc3RvZ3JhbShhZXMoeCA9IGNhcmF0KSwgYmlucyA9IDIwMCkgIyBIaWdoZXIgcHJlY2lzaW9uIGZvciBzbWFsbCBkaWFtb25kcwpkaWFtb25kcyAlPiUgZmlsdGVyKGNhcmF0IDwgMi41NSkgJT4lIGdncGxvdCgpICsgZ2VvbV9oaXN0b2dyYW0oYWVzKHggPSBwcmljZSksIGJpbnMgPSAzMCkKZGlhbW9uZHMgJT4lIGdncGxvdChhZXMoeCA9IGN1dCkpICsgZ2VvbV9iYXIoYWVzKHkgPSAoLi5jb3VudC4uKS9zdW0oLi5jb3VudC4uKSkpICsgeWxhYigiUHJvcG9ydGlvbiIpCmBgYAoKU29tZXRpbWVzLCBpdCBpcyBlYXNpZXIgdG8gcXVhbnRpZnkgdGhlIGRpc3RyaWJ1dGlvbiBvZiBhIHZhcmlhYmxlIHdpdGggYSBmZXcgbnVtYmVyczogdGhlIG1lYW4sIHRoZSBzdGFuZGFyZCBkZXZpYXRpb24sIGV0Yy4gVGhleSBhcmUgdXN1YWxseSBjYWxsZWQgdGhlICdkZXNjcmlwdGl2ZSBzdGF0aXN0aWNzJy4KCmBgYHtyIGRlc2Nfc3RhdHN9CmRpYW1vbmRzICU+JSBzdW1tYXJ5KCkgICAgICAgICAgICAgICAgICAgICAgICAgICAgIyBDYW5vbmljYWwgUiBmdW5jdGlvbiBmb3IgZGVzY3JpcHRpdmUgc3RhdGlzdGljcwpkaWFtb25kcyAlPiUgc2VsZWN0KGNhcmF0LCBwcmljZSkgJT4lIGFwcGx5KDIsc2QpICMgQ29tcHV0aW5nIHRoZSBzdGFuZGFyZCBkZXZpYXRpb24gb3ZlciBjYXJhdHMgJiBwcmljZXMKZGlhbW9uZHMgJT4lIHNlbGVjdChjYXJhdCwgcHJpY2UpICU+JSBmaWx0ZXIoY2FyYXQgPCAyLjEpICU+JSBzdW1tYXJ5KCkgCmBgYAoKVGhlIGxhc3QgY29tcHV0YXRpb24gaWxsdXN0cmF0ZXMgdGhlIHNlbnNpdGl2aXR5IG9mIHRoZSBtZWFuLiBJbiB0aGUgZmlyc3QgYmF0Y2ggb2Ygc3RhdHMsIHRoZSBhdmVyYWdlIHByaWNlIHdhcyAzOTMzIGFuZCBpdCBpcyAzNzY1IG92ZXIgdGhlIGZpbHRlcmVkIGRhdGEgKG9taXR0aW5nIGxhcmdlIGRpYW1vbmRzIG1ha2VzIHRoZSBhdmVyYWdlIHByaWNlIGdvIGRvd24pLiBUaGUgbWVhbiBpcyBzZW5zaXRpdmUgdG8gZXh0cmVtZSBwb2ludHMuIEl0J3Mgbm90IHRoZSBjYXNlIGZvciB0aGUgbWVkaWFuICgyNDAxID0+IDIzNTcpLiBUaGUgbWVkaWFuIGlzIG11Y2ggbW9yZSBzdGFibGUgYW5kIGxlc3Mgc2Vuc2l0aXZlIHRvIG91dGxpZXJzLgoKVGhlIHJlYXNvbiB3aHkgaXQgaXMgY29udmVuaWVudCB0byB3b3JrIHdpdGggYSBzaW5nbGUgZmlndXJlIGlzIHRoYXQgaXQgY2FuIGVhc2lseSBiZSBjb21wdXRlZCBvbiBtYW55IHN1YnNhbXBsZXMuIEl0IGlzIGhhcmRlciB0byB2aXN1YWxseSBhbmFseXNlIG1hbnkgZGlzdHJpYnV0aW9ucy4gQmVsb3csIHdlIGFuYWx5c2UgbWFueSBzdWJjYXNlcywgaS5lLiwgd2hlbiB3b3JraW5nIHdpdGggc3ViZ3JvdXBzIHBlcnRhaW5pbmcgdG8gZWFjaCBjb21iaW5hdGlvbiBvZiBjdXQsIGNsYXJpdHkgYW5kIGNvbG9yLgoKYGBge3IgZ3JvdXBfYnl9Cm1lYW5zIDwtIGRpYW1vbmRzICU+JSBncm91cF9ieShjdXQsIGNsYXJpdHksIGNvbG9yKSAlPiUgICMgV2UgYnVpbGQgYSBwaXZvdCB0YWJsZSBvdmVyIGN1dCwgY2xhcml0eSBhbmQgY29sb3IKICBzdW1tYXJpemUoYXZnX2NhcmF0ID0gbWVhbihjYXJhdCksIGF2Z19wcmljZSA9IG1lYW4ocHJpY2UpKQptZWFucyAlPiUgZ2dwbG90KGFlcyh4ID0gYXZnX2NhcmF0LCB5ID0gYXZnX3ByaWNlKSkgKyBnZW9tX3BvaW50KGFlcyhjb2xvciA9IGNsYXJpdHksIHNpemUgPSBjdXQpKQptZWFucyAlPiUgZ2dwbG90KGFlcyh4ID0gYXZnX2NhcmF0LCB5ID0gYXZnX3ByaWNlKSkgKyBnZW9tX3BvaW50KGFlcyhjb2xvciA9IGN1dCwgc2l6ZSA9IGNsYXJpdHkpKQpgYGAKICAKVGhlIGFib3ZlIHBsb3RzIHJldmVhbCBkaWZmZXJlbnQgY2x1c3RlcnMhIFR5cGljYWxseSwgdGhleSBzaG93IHRoZSBpbXBhY3Qgb2YgdGhlIGNsYXJpdHkgb24gdGhlIHNpemUgb2YgdGhlIGRpYW1vbmRzIChhbmQgaGVuY2Ugb24gdGhlaXIgcHJpY2UpLiBPciBpcyBpdCB0aGUgb3RoZXIgd2F5IGFyb3VuZD8gSGV1cmlzdGljYWxseSwgaXQgaXMgYWxtb3N0IGltcG9zc2libGUgdG8gcmVhY2ggZmxhd2xlc3NuZXNzIG9uIGEgbGFyZ2UgZGlhbW9uZC4KCkZvciB0aGUgc2FrZSBvZiBpbGx1c3RyYXRpb24sIHdlIHBsb3QgYSBzZXJpZXMgb2YgaGlzdG9ncmFtcyB0aGF0IGNoYXJhY3Rlcml6ZSB0aGUgcHJpY2UgZGlzdHJpYnV0aW9uIG92ZXIgdmFyaW91cyBzdWJzZXRzIG9mIGRpYW1vbmRzLiBUaGUgeC1heGlzIGFuZCB5LWF4aXMgYXJlIGNvbW1vbiBhY3Jvc3MgYWxsIGhpc3RvZ3JhbXMgdG8gZWFzZSBjb21wYXJpc29uLgpgYGB7ciBmYWNldHMsIGZpZy53aWR0aCA9IDEyfQpkaWFtb25kcyAlPiUgZ2dwbG90KGFlcyh4ID0gcHJpY2UpKSArIAogICAgZ2VvbV9oaXN0b2dyYW0oYmlucyA9IDEwKSArIAogICAgZmFjZXRfZ3JpZChjdXQgfiBjb2xvcikgKwogICAgdGhlbWUodGV4dCA9IGVsZW1lbnRfdGV4dChzaXplPTEyKSkKYGBgCgpXaGlsZSB0aGUgdmlldyBpcyBtdWNoIG1vcmUgY29tcGxldGUsIHRoZSBhbmFseXNpcyB0YWtlcyBsb25nZXIuCgojIyMgQ29udmVyZ2VuY2UgZnJvbSBoaXN0b2dyYW0gKGRpc2NyZXRlKSB0byBkZW5zaXR5IChjb250aW51b3VzKQpCZWxvdywgd2UgaWxsdXN0cmF0ZSB0aGUgY29udmVyZ2VuY2Ugb2YgYSBoaXN0b2dyYW0gdG8gdGhlIGRlbnNpdHkgb2YgYSBjb250aW51b3VzIGRpc3RyaWJ1dGlvbi4gV2UgdXNlIHRoZSBjYW5vbmljYWwgR2F1c3NpYW4gbGF3IGFzIGV4YW1wbGUuIElmIHdlIHdhbnQgdG8gc3dpdGNoIGZyb20gY291bnRzIHRvIGRlbnNpdHkgdmFsdWVzLCB3ZSBtdXN0IHNwZWNpZnkgJ3kgPSAuLmRlbnNpdHkuLicgaW5zaWRlIHRoZSBhZXMoKSBvZiBnZ3Bsb3QuCmBgYHtyIGNvbnZlcmdlbmNlfQp4IDwtIHJub3JtKDEwXjYpICU+JSBkYXRhLmZyYW1lKCkKY29sbmFtZXMoeCkgPC0gInNpbXVsYXRpb24iCnggJT4lIGdncGxvdChhZXMoeCA9IHNpbXVsYXRpb24pKSArIGdlb21faGlzdG9ncmFtKGJpbnMgPSAyMCkKeCAlPiUgZ2dwbG90KGFlcyh4ID0gc2ltdWxhdGlvbikpICsgZ2VvbV9oaXN0b2dyYW0oYmlucyA9IDEwMCkKeCAlPiUgZ2dwbG90KGFlcyh4ID0gc2ltdWxhdGlvbiwgeSA9IC4uZGVuc2l0eS4uKSkgKyBnZW9tX2hpc3RvZ3JhbShiaW5zID0gMTAwMCkKYGBgCgoKIyMgUGFyYW1ldHJpYyBmYW1pbGllcwpGb3IgbW9kZWxsaW5nIHB1cnBvc2VzLCBpdCBpcyBjb252ZW5pZW50IHRvIHNwZWNpZnkgc2ltcGxlIGZ1bmN0aW9ucyB0aGF0IGNoYXJhY3Rlcml6ZSBkaXN0cmlidXRpb25zLiAKQW4gZXhoYXVzdGl2ZSBsaXN0IG9mIHByb2JhYmlsaXR5IGRpc3RyaWJ1dGlvbnMgaXMgcHJvdmlkZWQgaGVyZTogaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGlzdF9vZl9wcm9iYWJpbGl0eV9kaXN0cmlidXRpb25zCgojIyMgQ29udGludW91cyBkaXN0cmlidXRpb25zIG92ZXIgdGhlIHJlYWwgbGluZQpUaGUgbW9zdCBmYW1vdXMgb2YgdGhlbSBpcyB0aGUgR2F1c3NpYW4gKG5vcm1hbCkgZGlzdHJpYnV0aW9uLCB3aGljaCB3YXMganVzdCB1c2VkIGFib3ZlLiBXZSBzaG93IHRoZSBpbXBhY3Qgb2YgaXRzIHBhcmFtZXRlcnMgb24gaXRzIGRlbnNpdHkuCmBgYHtyIG5vcm1fZGVuc30KZ2dwbG90KGRhdGEuZnJhbWUoeCA9IGMoLTYsIDkpKSwgYWVzKHgpKSArIAogIHN0YXRfZnVuY3Rpb24oZnVuID0gZG5vcm0sIGFyZ3MgPSBsaXN0KG1lYW4gPSAwLCBzZCA9IDEpLCBhZXMoY29sb3IgPSAibSA9IDAuMCwgcyA9IDEuMCIpKSArCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkbm9ybSwgYXJncyA9IGxpc3QobWVhbiA9IDAuNSwgc2QgPSAxLjUpLCBhZXMoY29sb3IgPSAibSA9IDAuNSwgcyA9IDEuNSIpKSArCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkbm9ybSwgYXJncyA9IGxpc3QobWVhbiA9IDEsIHNkID0gMiksIGFlcyhjb2xvciA9ICJtID0gMS4wLCBzID0gMi4wIikpICsKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGRub3JtLCBhcmdzID0gbGlzdChtZWFuID0gMS41LCBzZCA9IDIuNSksIGFlcyhjb2xvciA9ICJtID0gMS41LCBzID0gMi41IikpICsKICBzY2FsZV9jb2xvdXJfbWFudWFsKCJMZWdlbmQiLCB2YWx1ZXMgPSBjKCIjQjI2NkZGIiwgIiM2NjY2RkYiLCAiIzY2QjJGRiIsICIjMDBGRjgwIikpCmBgYAoKVGhlIGNvZGluZyBvZiBtYW55IFJHQiBjb2xvcnMgY2FuIGJlIGZvdW5kIGhlcmU6IGh0dHBzOi8vd3d3LnJhcGlkdGFibGVzLmNvbS93ZWIvY29sb3IvUkdCX0NvbG9yLmh0bWwgIApUaGUgR2F1c3NpYW4gZGlzdHJpYnV0aW9uIGhhcyB2ZXJ5IGxpZ2h0IHRhaWxzOiBleHRyZW1lIHBvaW50cyBhcmUgdmVyeSByYXJlLiBJdHMgaGVhdnktdGFpbGVkICpjb3VzaW4qIGlzIHRoZSBTdHVkZW50IGRpc3RyaWJ1dGlvbi4KQmVsb3csIHdlIHNob3cgdGhlIGRpZmZlcmVuY2UgYmV0d2VlbiB0aGUgdHdvOiBpdCBpcyBzYWxpZW50IGZvciBleHRyZW1lIHZhbHVlcyAodG8gdGhlIGxlZnQgb3IgdG8gdGhlIHJpZ2h0KS4KYGBge3IgdF9kZW5zfQpnZ3Bsb3QoZGF0YS5mcmFtZSh4ID0gYygtNywgNykpLCBhZXMoeCkpICsgCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkdCwgYXJncyA9IGxpc3QoZGYgPSAwLjUpLCBhZXMoY29sb3IgPSAiZGYgPSAwLjUgKFN0dWRlbnQpIikpICsKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGR0LCBhcmdzID0gbGlzdChkZiA9IDEuMCksIGFlcyhjb2xvciA9ICJkZiA9IDEuMCAoU3R1ZGVudCkiKSkgKyAKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGR0LCBhcmdzID0gbGlzdChkZiA9IDIuMCksIGFlcyhjb2xvciA9ICJkZiA9IDIuMCAoU3R1ZGVudCkiKSkgKyAKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGRub3JtLCBhcmdzID0gbGlzdChtZWFuID0gMCwgc2QgPSAyKSwgYWVzKGNvbG9yID0gIm0gPSAwLCBzID0gMi4wIChHYXVzc2lhbikiKSkgCmdncGxvdChkYXRhLmZyYW1lKHggPSBjKC05LCAtMykpLCBhZXMoeCkpICsgCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkdCwgYXJncyA9IGxpc3QoZGYgPSAwLjUpLCBhZXMoY29sb3IgPSAiZGYgPSAwLjUgKFN0dWRlbnQpIikpICsKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGR0LCBhcmdzID0gbGlzdChkZiA9IDEuMCksIGFlcyhjb2xvciA9ICJkZiA9IDEuMCAoU3R1ZGVudCkiKSkgKyAKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGR0LCBhcmdzID0gbGlzdChkZiA9IDIuMCksIGFlcyhjb2xvciA9ICJkZiA9IDIuMCAoU3R1ZGVudCkiKSkgKyAKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGRub3JtLCBhcmdzID0gbGlzdChtZWFuID0gMCwgc2QgPSAyKSwgYWVzKGNvbG9yID0gIm0gPSAwLCBzID0gMi4wIChHYXVzc2lhbikiKSkgCmBgYAoKT24gdGhlIHNlY29uZCBncmFwaCwgd2UgaGF2ZSB6b29tZWQgb24gdGhlIGxlZnQgdGFpbCB0byBzZWUgdGhlIGRpZmZlcmVuY2UuCgojIyMgUG9zaXRpdmUgY29udGludW91cyBkaXN0cmlidXRpb25zIApUaGUgZ2FtbWEgZGlzdHJpYnV0aW9uIGlzIG9uZSBhcmNoZXR5cGFsIHN1Y2ggZGlzdHJpYnV0aW9uLiBJdCBuZXN0cyB0aGUgZXhwb25lbnRpYWwgZmFtaWx5IGFzIGEgc3BlY2lhbCBjYXNlLiBCZWxvdywgdGhlIGNhc2VzIGEgPSAxIGNvcnJlc3BvbmQgdG8gZXhwb25lbnRpYWwgZGlzdHJpYnV0aW9ucy4KYGBge3IgZ2FtbWFfZGVuc30KZ2dwbG90KGRhdGEuZnJhbWUoeCA9IGMoMCwgNikpLCBhZXMoeCkpICsgCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkZ2FtbWEsIGFyZ3MgPSBsaXN0KHJhdGUgPSAxLCBzaGFwZSA9IDEpLCBhZXMoY29sb3IgPSAiYSA9IDEsIGIgPSAxIikpICsKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGRnYW1tYSwgYXJncyA9IGxpc3QocmF0ZSA9IDEsIHNoYXBlID0gMiksIGFlcyhjb2xvciA9ICJhID0gMiwgYiA9IDEiKSkgKwogIHN0YXRfZnVuY3Rpb24oZnVuID0gZGdhbW1hLCBhcmdzID0gbGlzdChyYXRlID0gMiwgc2hhcGUgPSAxKSwgYWVzKGNvbG9yID0gImEgPSAxLCBiID0gMiIpKSArCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkZ2FtbWEsIGFyZ3MgPSBsaXN0KHJhdGUgPSAyLCBzaGFwZSA9IDIpLCBhZXMoY29sb3IgPSAiYSA9IDIsIGIgPSAyIikpICsKICBzY2FsZV9jb2xvdXJfbWFudWFsKCJMZWdlbmQiLCB2YWx1ZXMgPSBjKCIjRkYzMzMzIiwgIiNGRjk5MzMiLCAiIzY2QjJGRiIsICIjMDBGRjgwIikpCmBgYAoKVGhlIGJldGEgZGlzdHJpYnV0aW9uIGlzIGEgY29udmVuaWVudCBjaG9pY2Ugd2hlbiB3b3JraW5nIHdpdGggY29udGludW91cyBkYXRhIG92ZXIgYSBmaW5pdGUgaW50ZXJ2YWwuIEl0cyBkZW5zaXR5IGlzIHZlcnkgZmxleGlibGUgd2l0aCBvbmx5IHR3byBwYXJhbWV0ZXJzLiBUaGUgZGl2ZXJzaXR5IGluIHNoYXBlcyBpcyB2ZXJ5IHVzZWZ1bC4KCmBgYHtyIGJldGFfZGVuc30KZ2dwbG90KGRhdGEuZnJhbWUoeCA9IGMoMCwgMSkpLCBhZXMoeCkpICsgCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkYmV0YSwgYXJncyA9IGxpc3Qoc2hhcGUxID0gMC40LCBzaGFwZTIgPSAwLjYpLCBhZXMoY29sb3IgPSAiYSA9IDAuNCwgYiA9IDAuNiIpKSArCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkYmV0YSwgYXJncyA9IGxpc3Qoc2hhcGUxID0gMSwgc2hhcGUyID0gMyksIGFlcyhjb2xvciA9ICJhID0gMSwgYiA9IDMiKSkgKwogIHN0YXRfZnVuY3Rpb24oZnVuID0gZGJldGEsIGFyZ3MgPSBsaXN0KHNoYXBlMSA9IDEsIHNoYXBlMiA9IDUpLCBhZXMoY29sb3IgPSAiYSA9IDEsIGIgPSA1IikpICsKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGRiZXRhLCBhcmdzID0gbGlzdChzaGFwZTEgPSAyLCBzaGFwZTIgPSAyKSwgYWVzKGNvbG9yID0gImEgPSAyLCBiID0gMiIpKSArCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBkYmV0YSwgYXJncyA9IGxpc3Qoc2hhcGUxID0gMiwgc2hhcGUyID0gNiksIGFlcyhjb2xvciA9ICJhID0gMiwgYiA9IDYiKSkgKwogIHNjYWxlX2NvbG91cl9tYW51YWwoIkxlZ2VuZCIsIHZhbHVlcyA9IGMoIiM5OTMzRkYiLCIjRkYzMzMzIiwgIiNGRjk5MzMiLCAiIzY2QjJGRiIsICIjMDBGRjgwIikpCmBgYAoKT3RoZXIgZXhhbXBsZXMgaW5jbHVkZSB0aGUgUGFyZXRvIGFuZCB0aGUgbG9nbm9ybWFsIGRpc3RyaWJ1dGlvbnMuICAKQW5vdGhlciBjbGFzcyBvZiBjb250aW51b3VzIGRpc3RyaWJ1dGlvbnMgaXMgdGhhdCBvZiBsYXdzIHdpdGggYm91bmRlZCBzdXBwb3J0LiBUaGUgdW5pZm9ybSBhbmQgdGhlIGJldGEgbGF3cyBiZWxvbmcgdG8gdGhpcyBjbGFzcy4KCiMjIyBEaXNjcmV0ZSBkaXN0cmlidXRpb25zClRoZSBtb3N0IGNsYXNzaWNhbCBvbmVzIGFyZSB0aGUgUG9pc3NvbiAoaW5maW5pdGUgc3VwcG9ydCkgYW5kIGJpbm9taWFsIGRpc3RyaWJ1dGlvbnMgKGZpbml0ZSBzdXBwb3J0KS4KYGBge3IgZGlzY3JldGV9CmdncGxvdChkYXRhLmZyYW1lKHggPSBjKDAsOCkpLCBhZXMoeCkpICsgCiAgc3RhdF9mdW5jdGlvbihhZXMoY29sb3IgPSAibGFtYmRhID0gMSIpLCBnZW9tID0gInBvaW50IiwgbiA9IDksIGZ1biA9IGRwb2lzLCBhcmdzID0gbGlzdChsYW1iZGEgPSAxKSkgKwogIHN0YXRfZnVuY3Rpb24oYWVzKGNvbG9yID0gImxhbWJkYSA9IDIiKSwgZ2VvbSA9ICJwb2ludCIsIG4gPSA5LCBmdW4gPSBkcG9pcywgYXJncyA9IGxpc3QobGFtYmRhID0gMikpICsKICBzdGF0X2Z1bmN0aW9uKGFlcyhjb2xvciA9ICJsYW1iZGEgPSAzIiksIGdlb20gPSAicG9pbnQiLCBuID0gOSwgZnVuID0gZHBvaXMsIGFyZ3MgPSBsaXN0KGxhbWJkYSA9IDMpKQpgYGAKCk90aGVyIHVzdWFsIGZhbWlsaWVzOiBoeXBlcmdlb21ldHJpYywgYmV0YS1iaW5vbWlhbCAoZmluaXRlIHN1cHBvcnQpLCBuZWdhdGl2ZSBiaW5vbWlhbCAoaW5maW5pdGUgc3VwcG9ydCkuCgpPbmUgbGFzdCBncmFwaCB0byBzaG93IHRoZSBzaGFwZSBvZiBjdW11bGF0aXZlIGRpc3RyaWJ1dGlvbiBmdW5jdGlvbnMuCmBgYHtyIGNkZn0KZ2dwbG90KGRhdGEuZnJhbWUoeCA9IGMoLTMsNSkpLCBhZXMoeCkpICsgCiAgc3RhdF9mdW5jdGlvbihmdW4gPSBwbm9ybSwgYXJncyA9IGxpc3QobWVhbiA9IDAsIHNkID0gMSksIGFlcyhjb2xvciA9ICJHQVVTU0lBTiwgbSA9IDAuMCwgcyA9IDEuMCIpKSArCiAgc3RhdF9mdW5jdGlvbihhZXMoY29sb3IgPSAiUE9JU1NPTiwgbGFtYmRhID0gMiIpLCBnZW9tID0gInBvaW50IiwgbiA9IDksIGZ1biA9IHBwb2lzLCBhcmdzID0gbGlzdChsYW1iZGEgPSAyKSkgCmBgYAoKIyMjIFN0cmFuZ2UgbWl4ZXMKRGVuc2l0aWVzIGNhbiBoYXZlIHN0cmFuZ2Ugc2hhcGVzLgpgYGB7ciBtaXh9CmRlbnMgPSBmdW5jdGlvbih4KXsoZG5vcm0oeCwyLDMpICsgZHQoeCwyKSkvMiB9ICMgREVOU0lUWSBOT1QgRVFVQUwgVE8gVEhBVCBPRiBUSEUgU1VNIE9GIFRIRSBUV08gQ09SUkVTUE9ORElORyBSVnMhISEKZGVuc18yID0gZnVuY3Rpb24oeCl7KGRub3JtKHgsLTIsMC44KSArIGRub3JtKHgsMSwxLjIpKS8yIH0KZ2dwbG90KGRhdGEuZnJhbWUoeCA9IGMoLTYsIDkpKSwgYWVzKHgpKSArIHN0YXRfZnVuY3Rpb24oZnVuID0gZGVucywgYWVzKGNvbG9yID0gIm1peF8xIikpICsKICBzdGF0X2Z1bmN0aW9uKGZ1biA9IGRlbnNfMiwgYWVzKGNvbG9yID0gIm1peF8yIikpCmBgYAoKCiMjIENvcnJlbGF0aW9ucwpWZXJ5IG9mdGVuLCB2YXJpYWJsZXMgY29tZSBpbiBncm91cHMuIFN0dWR5aW5nIHRoZWlyIHVuaXZhcmlhdGUgcHJvcGVydGllcyBpcyBvZiBjb3Vyc2UgdXNlZnVsLCBidXQgdW5kZXJzdGFuZGluZyBob3cgdGhleSBpbnRlcmFjdCBpcyBzb21ldGltZXMgZXZlbiBtb3JlIGluc2lnaHRmdWwuIEJlY2F1c2Ugd2UgZXhwZWN0IHRoZSBzaXplIG9mIGEgZGlhbW9uZCB0byBoYXZlIGEgcHJlZG9taW5hbnQgaW1wYWN0IG9uIGl0cyBwcmljZSwgdGhlIHZhcmlhYmxlcyBwcmljZSBhbmQgY2FyYXQgc2hvdWxkIGJlIHN0cm9uZ2x5IHBvc2l0aXZlbHkgY29ycmVsYXRlZC4gSW4gUiwgd2Ugc2ltcGx5IHVzZSB0aGUgY29yKCkgZnVuY3Rpb24uCgpgYGB7ciBjb3JyZWxhdGlvbn0KY29yKGRpYW1vbmRzJGNhcmF0LCBkaWFtb25kcyRwcmljZSkgIyBDb3JyZWxhdGlvbgpgYGAKQW5kIGl0IGlzIGluZGVlZCB0aGUgY2FzZS4gQ29ycmVsYXRpb25zIGFib3ZlIDAuOSBpbmRpY2F0ZSBhIHN0cm9uZyBsaW5rIGJldHdlZW4gdGhlIHR3byB2YXJpYWJsZXMuCgpBIHN0eWxpemVkIHNhbXBsZSBpcyB2aXN1YWxseSBjbGVhci4KYGBge3IgY29yX3NpbXVsYXRlZH0KbGlicmFyeShNQVNTKQptIDwtIGMoMCwwKSAjIE1lYW5zCnJobyA8LSAwLjkgICMgQ29ycmVsYXRpb24Kc2lnIDwtIHJiaW5kKGMoMSxyaG8pLCBjKHJobywxKSkgIyBDb3ZhcmlhbmNlIG1hdHJpeAp6IDwtIG12cm5vcm0oMTAwMCwgbXUgPSBtLCBTaWdtYSA9IHNpZykgJT4lIGRhdGEuZnJhbWUoKSAjIERhdGEgZ2VuZXJhdGlvbgpnZ3Bsb3QoeiwgYWVzKHggPSBYMSwgeSA9IFgyKSkgKyBnZW9tX3BvaW50KCkKYGBgCgpMZXQncyBzZWUgd2hhdCBoYXBwZW5zIHdpdGggZGlhbW9uZHMhIEJlbG93LCB3ZSB1c2UgYSBncmVhdCBmZWF0dXJlIG9mIFIgYW5kIGdncGxvdCB0aGF0IGFsbG93cyB0byBzaG93IGEgbXVsdGl0dWRlIG9mIHN1YnBsb3RzIGF0IHRoZSBzYW1lIHRpbWU6IGZhY2V0cyEKYGBge3IgY29yX2RpYW1vbmRzLCBmaWcud2lkdGggPSA2fQpkaWFtb25kcyAlPiUgCiAgICBnZ3Bsb3QoYWVzKHggPSBjYXJhdCwgeSA9IHByaWNlLCBjb2xvciA9IGNsYXJpdHkpKSArIAogICAgZ2VvbV9wb2ludCgpICsgICAgICAgICAgICAgICAgICAgICAgICMgU2NhdHRlciBwbG90cwogICAgZmFjZXRfZ3JpZChjb2xvciB+IGN1dCkgKyAgICAgICAgICAgICMgVGhlIGZhY2V0aW5nCiAgICB0aGVtZSh0ZXh0ID0gZWxlbWVudF90ZXh0KHNpemU9MTIpKSAgIyBGb250IHNpemUKYGBgCgpPYnZpb3VzbHksIHRoZSBjb3JyZWxhdGlvbiBpcyBwb3NpdGl2ZS4gVGhlIHNsb3BlcyBhcmUgcHJvYmFibHkgbm9uLWxpbmVhciBhbmQgZGVwZW5kIGJvdGggb24gY2xhcml0eSBhbmQgY29sb3IuIE5vdGUgdGhhdCB0aGVyZSBhcmUgYWxtb3N0IG5vIHB1cmUgZGlhbW9uZHMgd2l0aCBGYWlyIGN1dHMuIFB1cmUgZGlhbW9uZHMgYXJlIHRob3NlIGFzc29jaWF0ZWQgdG8gdGhlIHN0cm9uZ2VzdCBzbG9wZXM6IHRoZWlyIHByaWNlcyBpbmNyZWFzZSBzdGVhZGlseSB3aXRoIHRoZWlyIHdlaWdodHMuCgojIyBFeGVyY2lzZXMKCiMjIyBQbG90dGluZyAKMSkgUGxvdCB0aGUgZGVuc2l0eSBvZiBhIENhdWNoeSBkaXN0cmlidXRpb24gd2l0aCBwYXJhbWV0ZXJzOiBsb2NhdGlvbiA9IDEsIHNjYWxlID0gMi4KMikgUGxvdCB0aGUgZm9sbG93aW5nIGRlbnNpdHk6IGYoeCkgPSA0KigxK2Ficyh4KSleKC0xLjUpLgoKIyMjIERlc2NyaXB0aXZlIHN0YXRpc3RpY3MgZm9yIHRoZSBDYXVjaHkgZGlzdHJpYnV0aW9uCjEpIFRoZSBzdGFuZGFyZCBDYXVjaHkgZGlzdHJpYnV0aW9uIGhhcyBhIGRlbnNpdHkgZih4KSA9IDEgLyAocGkqKDEreF4yKSkuIFdoYXQgYXJlIGl0cyBtZWFuIGFuZCB2YXJpYW5jZT8KMikgVXNpbmcgdGhlIHJjYXVjaHkobiwwLDEpIGZ1bmN0aW9uIHRvIHNpbXVsYXRlIHNhbXBsZXMsIGNyZWF0ZSBhIGRhdGFmcmFtZSB0aGF0IHN0b3JlcyB0aGUgbWVhbiBhbmQgdmFyaWFuY2UgZm9yIG4gPSAxMF5rIHNhbXBsZXMgd2l0aCBrID0gMSwuLi4sNy4KMykgQ29tbWVudCB0aGUgcmVzdWx0cyBhbmQgcGxvdCB0aGUgdmFyaWFuY2UgdXNpbmcgYSBsb2dhcml0aG1pYyBzY2FsZSBmb3IgdGhlIHktYXhpczogKyBzY2FsZV95X2xvZzEwKCkuIElzIHRoZSB0aGVvcmV0aWNhbCByZXN1bHQgZnJvbSAxKSB2ZXJpZmllZD8KCiMjIyBTaWduaWZpY2FuY2UgbGV2ZWxzIGZvciB0aGUgR2F1c3NpYW4gZGlzdHJpYnV0aW9uClVzaW5nIHRoZSBybm9ybShuLDAsMSkgZnVuY3Rpb24sIGFwcHJveGltYXRlIHZpYSBhIGxhcmdlIG51bWJlciBvZiBzYW1wbGVzIHRoZSBwcm9iYWJpbGl0eSB0aGF0IGEgc3RhbmRhcmQgTm9ybWFsIHZhcmlhYmxlIGJlIGdyZWF0ZXIgdGhhbiAxLjk2IGluIGFic29sdXRlIHZhbHVlLgoKIyMjIEhBUkRFUjogZnVuY3Rpb24gY3JlYXRpb24KQ3JlYXRlIGEgZnVuY3Rpb24gd2hpY2ggY29tcHV0ZXMsIGZyb20gYSBnaXZlbiBzZXJpZXMgaXRzIHNhbXBsZSBza2V3bmVzcy4gU2FtZSBmb3Iga3VydG9zaXMuCg==