Skip to content
Snippets Groups Projects
Commit bd10d910 authored by J. Gergaud's avatar J. Gergaud
Browse files

maj

parent 6e41d20f
Branches
No related tags found
No related merge requests found
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
# Introduction to `julia` for statistic # Introduction to `julia` for statistic
To contact me, here is my email : [joseph.gergaud@toulouse-inp.fr](joseph.gergaud@toulouse-inp.fr) - To contact me, here is my email : [joseph.gergaud@toulouse-inp.fr](joseph.gergaud@toulouse-inp.fr)
The files ares here : - The files ares here :
[https://gitlab.irit.fr/toc/etu-n7/julia](https://gitlab.irit.fr/toc/etu-n7/julia) [https://gitlab.irit.fr/toc/etu-n7/julia](https://gitlab.irit.fr/toc/etu-n7/julia)
directory M2 directory M2
- Evaluation : Homework (notebook)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## empirical cumulative distribution function, eCDF ## empirical cumulative distribution function, eCDF
### Exercise 1 ### Exercise 1
1. Build the empirical cumulative distribution function 1. Build the empirical cumulative distribution function
Let $t$ a vector of reals and $x$ a real, then Let $t$ a vector of reals and $x$ a real, then
$F(t,x) =$ the number of datas in $t < $ to $x$. $F(t,x) =$ the number of datas in $t < $ to $x$.
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
# use broadcasting # use broadcasting
a = [1,2,3.5] a = [1,2,3.5]
a .< 2 a .< 2
``` ```
%% Output %% Output
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
using Test # for tests using Test # for tests
""" """
Compute de number of element in the vactor t less than a value x Compute de number of element in the vactor t less than a value x
input input
t : Vector of Real t : Vector of Real
x : Real x : Real
Output Output
Integer Integer
""" """
function empirique(t::Vector{<:Real}, x::Real)::Int function empirique(t::Vector{<:Real}, x::Real)::Int
# to complete and modify # to complete and modify
return 0 return 0
end end
println("empirique([1.,2,3],1.5) = ", empirique([1.,2,3],1.5)) println("empirique([1.,2,3],1.5) = ", empirique([1.,2,3],1.5))
@test empirique([1.,2,3],1.5) == 1 @test empirique([1.,2,3],1.5) == 1
``` ```
%% Output %% Output
empirique([1.,2,3],1.5) = 0 empirique([1.,2,3],1.5) = 0
Test Failed at /Users/gergaud/git-ENS/Julia-TSE/etudiants/M2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W4sZmlsZQ==.jl:19 Test Failed at /Users/gergaud/git-ENS/Julia-TSE/etudiants/M2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W4sZmlsZQ==.jl:19
Expression: empirique([1.0, 2, 3], 1.5) == 1 Expression: empirique([1.0, 2, 3], 1.5) == 1
Evaluated: 0 == 1 Evaluated: 0 == 1
Test.FallbackTestSetException("There was an error during testing") Test.FallbackTestSetException("There was an error during testing")
Stacktrace: Stacktrace:
[1] record(ts::Test.FallbackTestSet, t::Union{Test.Error, Test.Fail}) [1] record(ts::Test.FallbackTestSet, t::Union{Test.Error, Test.Fail})
@ Test /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1000 @ Test /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:1000
[2] do_test(result::Test.ExecutionResult, orig_expr::Any) [2] do_test(result::Test.ExecutionResult, orig_expr::Any)
@ Test /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:705 @ Test /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:705
[3] macro expansion [3] macro expansion
@ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined] @ /Applications/Julia-1.10.app/Contents/Resources/julia/share/julia/stdlib/v1.10/Test/src/Test.jl:672 [inlined]
[4] top-level scope [4] top-level scope
@ ~/git-ENS/Julia-TSE/etudiants/M2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W4sZmlsZQ==.jl:19 @ ~/git-ENS/Julia-TSE/etudiants/M2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W4sZmlsZQ==.jl:19
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
# If the type of the vector elements is not a real then there is an error # If the type of the vector elements is not a real then there is an error
println("empirique([1.+2im,2,3],1.5) = ", empirique([1.,2+2im,3],1.5)) println("empirique([1.+2im,2,3],1.5) = ", empirique([1.,2+2im,3],1.5))
``` ```
%% Output %% Output
MethodError: no method matching empirique(::Vector{ComplexF64}, ::Float64) MethodError: no method matching empirique(::Vector{ComplexF64}, ::Float64)
Closest candidates are: Closest candidates are:
empirique(!Matched::Vector{<:Real}, ::Real) empirique(!Matched::Vector{<:Real}, ::Real)
@ Main ~/git-ENS/Julia-TSE/etudiants/M2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W4sZmlsZQ==.jl:11 @ Main ~/git-ENS/Julia-TSE/etudiants/M2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W4sZmlsZQ==.jl:11
Stacktrace: Stacktrace:
[1] top-level scope [1] top-level scope
@ ~/git-ENS/Julia-TSE/etudiants/M2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W5sZmlsZQ==.jl:2 @ ~/git-ENS/Julia-TSE/etudiants/M2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_W5sZmlsZQ==.jl:2
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
2. Generate a sample of N=1000 datas from a uniform distribution on [0,2] and plot the eCDF of this sample 2. Generate a sample of N=1000 datas from a uniform distribution on [0,2] and plot the eCDF of this sample
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
using Plots # for plots using Plots # for plots
N = 1000 # number of datas N = 1000 # number of datas
u = 2*rand(N) # uniform law on [0,2] u = 2*rand(N) # uniform law on [0,2]
x_grid = -1:0.1:3 x_grid = -1:0.1:3
# Plot of the empirical cumulative distribution function # Plot of the empirical cumulative distribution function
using Plots using Plots
F(x) = empirique(u,x)/N F(x) = empirique(u,x)/N
p_uniform_cdf = plot(x_grid,F,xlabel="x", ylabel="F(x)", legend=false) p_uniform_cdf = plot(x_grid,F,xlabel="x", ylabel="F(x)", legend=false)
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
3. Add on the plot the Cumulative Distribution Function 3. Add on the plot the Cumulative Distribution Function
For thie use de Distributions Package For thie use de Distributions Package
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
# add the cumulative distribution function # add the cumulative distribution function
using Distributions using Distributions
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Distributions Package ## Distributions Package
### Introduction ### Introduction
There is lots of libraries (Packages in `julia`) : [https://julialang.org/packages/](https://julialang.org/packages/) There is lots of libraries (Packages in `julia`) : [https://julialang.org/packages/](https://julialang.org/packages/)
For the documentation of the Distributions Package see For the documentation of the Distributions Package see
[https://juliastats.org/Distributions.jl/stable/](https://juliastats.org/Distributions.jl/stable/) [https://juliastats.org/Distributions.jl/stable/](https://juliastats.org/Distributions.jl/stable/)
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
using Distributions using Distributions
using Plots using Plots
using LaTeXStrings using LaTeXStrings
a = 0; b = 2; a = 0; b = 2;
dist = Uniform(a,b) # dist is an object : the uniform distribution on [a,b] dist = Uniform(a,b) # dist is an object : the uniform distribution on [a,b]
println("type de dist = ",typeof(dist)) println("type de dist = ",typeof(dist))
# you can acces to the mean or median of the distribution # you can acces to the mean or median of the distribution
println("mean(dist) = ", mean(dist)) println("mean(dist) = ", mean(dist))
println("median(dist) = ", median(dist)) println("median(dist) = ", median(dist))
# and the the PDF, CDF and inverse CDF function of the distribution # and the the PDF, CDF and inverse CDF function of the distribution
println("pdf(1.2) = ", pdf(dist,1.2)) println("pdf(1.2) = ", pdf(dist,1.2))
println("pdf(3) = ", pdf(dist,3)) println("pdf(3) = ", pdf(dist,3))
println("cdf(1.2) = ", cdf(dist,1.2)) println("cdf(1.2) = ", cdf(dist,1.2))
println("cdf(3) = ", cdf(dist,3)) println("cdf(3) = ", cdf(dist,3))
println("inverse of cdf(0.75) = ", quantile(dist,0.75)) println("inverse of cdf(0.75) = ", quantile(dist,0.75))
``` ```
%% Output %% Output
type de dist = Uniform{Float64} type de dist = Uniform{Float64}
mean(dist) = 1.0 mean(dist) = 1.0
median(dist) = 1.0 median(dist) = 1.0
pdf(1.2) = 0.5 pdf(1.2) = 0.5
pdf(3) = 0.0 pdf(3) = 0.0
cdf(1.2) = 0.6 cdf(1.2) = 0.6
cdf(3) = 1.0 cdf(3) = 1.0
inverse of cdf(0.75) = 1.5 inverse of cdf(0.75) = 1.5
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
#### Exercise 2 #### Exercise 2
Plot on the same first graph the CFD of the uniform distribution on [0,2] Plot on the same first graph the CFD of the uniform distribution on [0,2]
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
# to complete # to complete
``` ```
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Triangular Distribution ### Triangular Distribution
We consider the distribution with the following density distribution We consider the distribution with the following density distribution
$$f(x) = \begin{cases} $$f(x) = \begin{cases}
x\quad\textrm{pour}\quad x\in[0,1]\\ x\quad\textrm{pour}\quad x\in[0,1]\\
2-x\quad\textrm{pour}\quad x\in[1,2]\\ 2-x\quad\textrm{pour}\quad x\in[1,2]\\
0\quad\textrm{sinon} 0\quad\textrm{sinon}
\end{cases}$$ \end{cases}$$
Plot the density, cumulative dendity and inverse cumulative function Plot the density, cumulative dendity and inverse cumulative function
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
using Distributions using Distributions
using Plots using Plots
using LaTeXStrings using LaTeXStrings
a = 0; b = 2; a = 0; b = 2;
dist = TriangularDist(a,b,1) # min = a; max = b; mode = 1 dist = TriangularDist(a,b,1) # min = a; max = b; mode = 1
println("type de dist = ",typeof(dist)) println("type de dist = ",typeof(dist))
println("params(dist) = ", params(dist)) println("params(dist) = ", params(dist))
x_grid = -1:0.1:3 x_grid = -1:0.1:3
# Density function # Density function
p1 = plot(x_grid, x->pdf(dist,x), color = :blue, linewidth=2, xlabel=(L"x"), ylabel=(L"f(x)")) p1 = plot(x_grid, x->pdf(dist,x), color = :blue, linewidth=2, xlabel=(L"x"), ylabel=(L"f(x)"))
# Cumulative density function # Cumulative density function
p2 = plot(a-1:0.01:b+1, x->cdf(dist,x), linewidth=2, xlabel=(L"x"), ylabel=(L"F(x)")) p2 = plot(a-1:0.01:b+1, x->cdf(dist,x), linewidth=2, xlabel=(L"x"), ylabel=(L"F(x)"))
# Inverse cumulative density function # Inverse cumulative density function
p3 = plot(0:0.01:1, x->quantile(dist,x), xlims=(0,1), ylims=(0,2), color = :green, linewidth=2, xlabel=(L"u"), ylabel=(L"F^{-1}(u)")) p3 = plot(0:0.01:1, x->quantile(dist,x), xlims=(0,1), ylims=(0,2), color = :green, linewidth=2, xlabel=(L"u"), ylabel=(L"F^{-1}(u)"))
plot(p1,p2,p3, layout=(1,3),legend = false,size = (1200,300), margin = 0.6Plots.cm) plot(p1,p2,p3, layout=(1,3),legend = false,size = (1200,300), margin = 0.6Plots.cm)
``` ```
%% Output %% Output
type de dist = TriangularDist{Float64} type de dist = TriangularDist{Float64}
params(dist) = (0.0, 2.0, 1.0) params(dist) = (0.0, 2.0, 1.0)
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
#### Histogram #### Histogram
Generate a sample of 100 datas from the triangular distribution and plot on the same graph the histogram of the simple and the PDF function Generate a sample of 100 datas from the triangular distribution and plot on the same graph the histogram of the simple and the PDF function
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
# Sample of 100 datas # Sample of 100 datas
t = rand(dist,100) t = rand(dist,100)
histogram(t) histogram(t)
plot!(a-0.5:0.1:b+0.5, x->pdf.(dist,x), linewidth=2, xlabel=(L"x"), ylabel=(L"f(x)")) plot!(a-0.5:0.1:b+0.5, x->pdf.(dist,x), linewidth=2, xlabel=(L"x"), ylabel=(L"f(x)"))
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
#### Question #### Question
What is the problem ? What is the problem ?
1. Use the normalize=true parameter in the histogram function for solving the problem 1. Use the normalize=true parameter in the histogram function for solving the problem
2. Execute for a sample of N = 10000 datas 2. Execute for a sample of N = 10000 datas
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
### Example of discret distribution : the binomial distribution ### Example of discret distribution : the binomial distribution
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
n, p, N = 10, 0.2, 10^3 n, p, N = 10, 0.2, 10^3
bDist = Binomial(n,p) bDist = Binomial(n,p)
xgrid = 0:n xgrid = 0:n
plot(xgrid,pdf.(bDist,xgrid), color=:orange, seriestype = :scatter) plot(xgrid,pdf.(bDist,xgrid), color=:orange, seriestype = :scatter)
plot!(xgrid,pdf.(bDist,xgrid), line = :stem, linewidth=2, color=:orange) plot!(xgrid,pdf.(bDist,xgrid), line = :stem, linewidth=2, color=:orange)
``` ```
%% Output %% Output
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
#### ####
%% Cell type:markdown id: tags: %% Cell type:markdown id: tags:
## Central limit theorem ## Central limit theorem
We are going to illustrate the central limit theorem : We are going to illustrate the central limit theorem :
Suppose $X_1,X_2,\ldots$ is a sequence of Independent and identically distributed random variables with $E(X_i)=\mu$ and $Var(X_i)=\sigma^2 < +\infty$. Then, as $n$ approaches infinity, the random variables $\sqrt{n}(\bar{X}_n - \mu)$ converge in distribution to a normal distribution $\mathcal{N}(0,\sigma^2)$ Suppose $X_1,X_2,\ldots$ is a sequence of Independent and identically distributed random variables with $E(X_i)=\mu$ and $Var(X_i)=\sigma^2 < +\infty$. Then, as $n$ approaches infinity, the random variables $\sqrt{n}(\bar{X}_n - \mu)$ converge in distribution to a normal distribution $\mathcal{N}(0,\sigma^2)$
### Exercise ### Exercise
1. Choose a distribution law dist, compute its mean $\mu$ and its variance $\sigma^2$ and $N$ the number of sanple 1. Choose a distribution law dist, compute its mean $\mu$ and its variance $\sigma^2$ and $N$ the number of sanple
2. For $n$ in (1,2,5,20) 2. For $n$ in (1,2,5,20)
1. Generate N=10000 samples of lenght n from the dist distribution 1. Generate N=10000 samples of lenght n from the dist distribution
2. Compute the means of the $N$ samples and the $N$ values $\sqrt{n}(\bar{X}_n - \mu)$ 2. Compute the means of the $N$ samples and the $N$ values $\sqrt{n}(\bar{X}_n - \mu)$
3. Plot the histogram of these $N$ values and the normal distribution $\mathcal{N}(0,\sigma^2)$ 3. Plot the histogram of these $N$ values and the normal distribution $\mathcal{N}(0,\sigma^2)$
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
dist = Uniform(0,12) dist = Uniform(0,12)
# To complete # To complete
``` ```
%% Output %% Output
%% Cell type:code id: tags: %% Cell type:code id: tags:
``` julia ``` julia
``` ```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment