Financial Statistics and Econometrics: Homework 2

第一部分

  1. 现有一组样本 Y_1,Y_2,\cdots,Y_n;X_1,X_2,\cdots,X_n. 如果采用线性模型 Y_i=\alpha+\beta\cdot X_i+u_i\;

(a) 请推导 OLS 估计的过程,并算出估计量 \widehat\beta 的方差。

为推导 OLS 估计量,需要假设 E(u)=0,Cov(u,X)=0,D(u_i)=\sigma_u^2,Cov(u_i,u_j)=0

假设 M
M=\sum_{i=1}^nu_i^2=\sum_{i=1}^n\left(Y_i-\widehat\alpha-\widehat\beta\cdot X_i\right)^2
分别对 \widehat\alpha\widehat\beta 求一阶偏导:
\frac{\partial M}{\partial\widehat\alpha}=-2\sum_{i=1}^n(Y_i-\widehat\alpha-\widehat\beta\cdot X_i)

\frac{\displaystyle\partial M}{\displaystyle\partial\widehat\beta}=-2\sum_{i=1}^n(Y_i-\widehat\alpha-\widehat\beta\cdot X_i)\cdot X_i

\frac{\partial M}{\partial\widehat\alpha}=0,\frac{\displaystyle\partial M}{\displaystyle\partial\widehat\beta}=0,n\overline X=\sum_{i=1}^nX_i,n\overline Y=\sum_{i=1}^nY_i.

可得:
\widehat\alpha=\overline Y-\widehat\beta\cdot\overline X

\widehat\beta=\frac{{\displaystyle\sum_{i=1}^n}X_iY_i-\overline Y{\displaystyle\sum_{i=1}^n}X_i}{{\displaystyle\sum_{i=1}^n}X_i^2-\overline X{\displaystyle\overset n{\underset{i=1}{\sum X_i}}}}=\frac{{\displaystyle\sum_{i=1}^n}(X_i-\overline X)(Y_i-\overline{Y)}}{\displaystyle\overset n{\underset{i=1}{\sum(X_i-\overline X)^2}}}=\frac{{\displaystyle\sum_{i=1}^n}x_iy_i}{{\displaystyle\sum_{i=1}^n}x_i^2}

因为:
\widehat\beta=\frac{\sum x_iy_i}{\sum x_i^2}=\frac{\sum x_i(Y_i-\overline Y)}{\sum x_i^2}=\frac{\sum x_iY_i}{\sum x_i^2}-\frac{\overline Y\sum x_i}{\sum x_i^2}
因为 \sum x_i=0 ,并令 k_i=\frac{x_i}{\sum x_i^2} ,可得:
\widehat\beta=\sum k_iY_i
所以,估计量 \widehat\beta 的方差为:
Var(\widehat\beta)=\frac{\sum x_i^2}{(\sum x_i^2)^2}\cdot\sigma^2=\frac{\sigma^2}{\sum x_i^2}

(b) 如果 u_i 独立同分布于 N(0,\sigma^2) ,你能推导出关于 \alpha,\beta 极大似然估计结果吗?

因为 u_i 独立同分布于 N(0,\sigma^2) ,所以其满足经典假设前提。那么该线性模型的概率密度函数为:
f(x,\theta)=\frac1{\sigma\sqrt{2\pi}}\cdot e^{-\frac{(Y_i-\widehat\alpha-\widehat\beta\cdot X_i)^2}{2\sigma^2}}
极大似然函数为:
\begin{array}{l}\mathrm L(\mathrm\theta\vert{\mathrm x}_1,{\mathrm x}_2,\cdots,{\mathrm x}_\mathrm n)=\prod_{\mathrm i=1}^\mathrm n\frac1{\mathrm\sigma\sqrt{2\mathrm\pi}}\cdot\mathrm e^{-\frac{({\mathrm Y}_\mathrm i-\widehat{\mathrm\alpha}-\widehat{\mathrm\beta}\cdot{\mathrm X}_\mathrm i)^2}{2\mathrm\sigma^2}}=(\frac1{\mathrm\sigma\sqrt{2\mathrm\pi}})^2\cdot\mathrm e^{-\frac{\sum({\mathrm Y}_\mathrm i-\widehat{\mathrm\alpha}-\widehat{\mathrm\beta}\cdot{\mathrm X}_\mathrm i)^2}{2\mathrm\sigma^2}}\\\end{array}
取极大似然函数的对数:
\begin{array}{l}\mathrm{lnL}(\mathrm\theta\vert{\mathrm x}_1,{\mathrm x}_2,\cdots,{\mathrm x}_\mathrm n)=-\frac{\mathrm n}2\ln\left(2\mathrm\pi\right)-\mathrm{nln}\left(\mathrm\sigma\right)-\frac1{2\mathrm\sigma^2}\cdot\sum({\mathrm Y}_\mathrm i-\widehat{\mathrm\alpha}-\widehat{\mathrm\beta}\cdot{\mathrm X}_\mathrm i)^2\\\end{array}
根据公式(11),可得:
\begin{array}{l}\mathrm{Max}\left[\mathrm L(\mathrm\theta\vert{\mathrm x}_1,{\mathrm x}_2,\cdots,{\mathrm x}_\mathrm n)\right]\Leftrightarrow\mathrm{Max}\left[\mathrm{lnL}(\mathrm\theta\vert{\mathrm x}_1,{\mathrm x}_2,\cdots,{\mathrm x}_\mathrm n)\right]\Leftrightarrow\mathrm{Min}\left[\sum({\mathrm Y}_\mathrm i-\widehat{\mathrm\alpha}-\widehat{\mathrm\beta}\cdot{\mathrm X}_\mathrm i)^2\right]\\\end{array}
令:
\frac{\displaystyle\partial{\sum({\mathrm Y}_\mathrm i-\widehat{\mathrm\alpha}-\widehat{\mathrm\beta}\cdot{\mathrm X}_\mathrm i)^2}}{\displaystyle\partial\widehat{\mathrm\alpha}}=0

\frac{\displaystyle\partial{\sum({\mathrm Y}_\mathrm i-\widehat{\mathrm\alpha}-\widehat{\mathrm\beta}\cdot{\mathrm X}_\mathrm i)^2}}{\displaystyle\partial\widehat{\mathrm\beta}}=0

可得:
\widehat\alpha=\overline Y-\widehat\beta\cdot\overline X

\widehat\beta=\frac{{\displaystyle\sum_{i=1}^n}X_iY_i-\overline Y{\displaystyle\sum_{i=1}^n}X_i}{{\displaystyle\sum_{i=1}^n}X_i^2-\overline X{\displaystyle\overset n{\underset{i=1}{\sum X_i}}}}=\frac{{\displaystyle\sum_{i=1}^n}(X_i-\overline X)(Y_i-\overline{Y)}}{\displaystyle\overset n{\underset{i=1}{\sum(X_i-\overline X)^2}}}=\frac{{\displaystyle\sum_{i=1}^n}x_iy_i}{{\displaystyle\sum_{i=1}^n}x_i^2}

(c) 请问满足经典假设前提下,OLS 估计的性质有哪些?

  • 具有一致性

    根据公式(7),有:
    Var(\widehat\beta)=\frac{\sum x_i^2\cdot\sigma^2}{(\sum x_i^2)^2}=\frac{\sigma^2}{\sum x_i^2}
    则,\lim_{n\rightarrow\infty}Var(\widehat\beta)=0 .

    根据切比雪夫不等式,对于任意 c>0 ,有:P(\left|\widehat\beta-\beta\right|\geq c)\leq\frac{Var(\widehat\beta)}{c^2}

    因为 \lim_{n\rightarrow\infty}Var(\widehat\beta)=0 ,所以 P(\left|\widehat\beta-\beta\right|\geq c)\leq0

    所以 \widehat\beta 具有一致性。

  • 具有无偏性

    根据公式(7),可得:
    \widehat\beta=\sum k_i(\alpha+\beta\cdot X_i+u_i)=\alpha\sum k_i+\beta\sum k_iX_i+\sum k_iu_i
    因为 \sum k_i=\frac{\sum X_i}{\sum X_i^2}=0\sum k_iX_i=\frac{\sum X_iX_i}{\sum X_i^2}=1 ,所以可将公式(18)化简为:
    \widehat\beta=\beta+\sum k_iu_i
    于是有:
    E(\widehat\beta)=E(\beta+\sum k_iu_i)=\beta+\sum k_iE(u)=\beta
    所以 \widehat\beta\beta 的无偏估计量。

  • 具有有效性

    根据高斯—马尔可夫定理,在线性回归模型中,如果 u_i 满足零均值、同方差且互不相关,则回归系数的最佳线性无偏估计就是普通最小二乘法估计。所以 Var(\widehat\beta) 是所有 \beta 估计量的方差中最小的,即具有有效性。

(d) 如果存在一个变量 Z , 也属于影响 Y 的因素,由于某些原因,我们可能忽略了它,请问会对上述模型估计造成哪些可能的影响?

  • 如果由于变量 Z 的存在使得 E(u\vert x)\neq0 , 那么会造成估计量不具有一致性、有偏并且不具有有效性。
  • 如果由于变量 Z 的存在使得 Var(u\vert x)=\sigma^2I 即存在异方差,那么会造成估计量有偏且不具有有效性。
  • 如果由于变量 Z 的存在使得 Var(u_i,u_j)\neq0 , 那么会造成估计量有偏且不具有有效性。

第二部分

  1. 请下载最近 34 天的 601666(Y)上证指数(X) 的数据,然后用(前30天的数据,留四个数据)lnYlnX 做回归模型 (lnY = \alpha + \beta lnX + u)

(a) 请输出 \alpha,\beta 的估计结果;

(b) 请给出 \alpha,\beta 统计量的大小,及其 p 值大小以及95%的置信区间。

(c) 假设有人怀疑 \alpha = 0 , 你如何做 F 检验。

(d) 你能根据这个预测后面4天的股价吗?与真实的股价差距大吗?

Program:

# -*- coding: utf-8 -*-
"""
Created on Tue Oct  9 22:48:41 2018

@author: Wengsway

"""

import numpy as np
import tushare as ts
from scipy import stats
import statsmodels.api as sm
import matplotlib.pyplot as plt

# Part (a)
# Download data and arrange in ascending order
Y = ts.get_hist_data('601666',start = '2018-08-22',end = '2018-10-09')
X = ts.get_hist_data('sh',start = '2018-08-22',end = '2018-10-09')
Y = Y.sort_index(axis = 0,ascending = True)
X = X.sort_index(axis = 0,ascending = True)
# Get the results of OLS model
new_X = sm.add_constant(np.log(X['close']))
model = sm.OLS(np.log(Y['close']),new_X)
results = model.fit()
print('The value of estimator α and β:',"\n",results.params)

# Part (b)
print('\n','The Statistics and P-Value in the output:',"\n",results.summary())
# Calculate 95% confidence interval
average_ror = np.mean(Y['p_change'])
var_ror = np.var(Y['p_change'])
interval = stats.norm.interval(0.95,average_ror,var_ror)
# Draw a Probability Density Map
plt.rcParams['axes.unicode_minus']=False
x=np.linspace(-3,3,60)
y=stats.norm.pdf(x,average_ror,var_ror)
plt.plot(x,y)
plt.title('Probability Density Map')
plt.vlines(average_ror,0,0.35,linestyle='--')
# Output result
print('\n','The average return of ratio is: ',average_ror)
print('The variance of return of ratio is: ',var_ror)
print('\n','The 95% confidence interval is: ',interval)

# Part (c)
A = np.identity(len(results.params))
A = A[0:1,:]
print('\n','F-test results:',results.f_test(A))

# Part (d)
future_Y = ts.get_hist_data('601666',start = '2018-10-10',end = '2018-10-15')
future_X = ts.get_hist_data('sh',start = '2018-10-10',end = '2018-10-15')
future_X = future_X.sort_index(axis = 0,ascending = True)
future_Y = future_Y.sort_index(axis = 0,ascending = True)
pre_logY = results.params.const+results.params.close*np.log(future_X['close'])
gap = np.e**pre_logY - future_Y['close']
gap_ratio = gap / future_Y['close']
print('\n','Predicted absolute gap:',gap)
print('\n','Predicted relative gap:',gap_ratio)

Answer:

runfile('E:/Documents/HUST/20180913 - Financial_Statistics_and_Econometrics/20180916 - Homework/Homework 2/Homework_2_Part 2_1.py', wdir='E:/Documents/HUST/20180913 - Financial_Statistics_and_Econometrics/20180916 - Homework/Homework 2')
The value of estimator α and β: 
const   -4.349430
close    0.725556
dtype: float64

The Statistics and P-Value in the output: 
                             OLS Regression Results                            
==============================================================================
Dep. Variable:                  close   R-squared:                       0.521
Model:                            OLS   Adj. R-squared:                  0.504
Method:                 Least Squares   F-statistic:                     29.40
Date:                Tue, 16 Oct 2018   Prob (F-statistic):           9.82e-06
Time:                        16:41:13   Log-Likelihood:                 88.113
No. Observations:                  29   AIC:                            -172.2
Df Residuals:                      27   BIC:                            -169.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -4.3494      1.059     -4.108      0.000      -6.522      -2.177
close          0.7256      0.134      5.422      0.000       0.451       1.000
==============================================================================
Omnibus:                        4.093   Durbin-Watson:                   0.487
Prob(Omnibus):                  0.129   Jarque-Bera (JB):                3.132
Skew:                           0.805   Prob(JB):                        0.209
Kurtosis:                       3.054   Cond. No.                     3.81e+03
==============================================================================

The average return of ratio is:  0.04965517241379311
The variance of return of ratio is:  1.336044708680143

The 95% confidence interval is:  (-2.568944338334596, 2.668254683162182)

F-test results: <F test: F=array([[16.87895733]]), p=0.00033203674441426603, df_denom=27, df_num=1>

Predicted absolute gap: date
2018-10-10   -0.074725
2018-10-11    0.021987
2018-10-12    0.017390
2018-10-15    0.105314
Name: close, dtype: float64

Predicted relative gap: date
2018-10-10   -0.018270
2018-10-11    0.005726
2018-10-12    0.004494
2018-10-15    0.028159
Name: close, dtype: float64

![Probability Density Map](E:\Documents\HUST\20180913 – Financial_Statistics_and_Econometrics\20180916 – Homework\Homework 2\Probability Density Map.png)

  1. Monte Carlo 模拟题目:请首先生成一组随机数列 X, u ,其中 X 来自(-10,10) 均匀分布(共30个),而 u 来自 N(0,1) (共30个);生成数列 Y ,其中 Y 是根据上述数列 Y = 1+2X+u ;

(a) 请根据上述生成的数据 YX ,请你用 YX 进行最小二乘法回归,得到系数。重复这个过程500次(得到500个 \alpha,\beta ),请画出者500个估计值的分布图,并分别计算这500个数的平均值。同时分别计算每次参数的方差以及500次估计方差的均值。

(b) 假如生成一个数据 X_2 = 0.5X+u ,生成另一个数据 Y_1=1+2X+X_2+u 。请用 Y_1X 进行最小二乘法回归,得到系数。重复这个过程500次(得到500个 \alpha,\beta ),请画出这500个估计值的分布图,并分别计算这500个数的平均值(这些均值与其真实值差距大吗?)

(c) 假如 u_2\sim N(0,1) ,生成数据时,令 u_{2i}=0.4\cdot u_{2,i-1}+\varepsilon_i ,这里 \varepsilon_i 来自 N(0,0.25) 。然后生成数列 Y_2=1+2X+u_2 ,然后用 Y_2X 进行 OLS 回归,得到系数。重复这个过程500次(得到500个 \alpha,\beta ),请画出这500个估计值的分布图,并分别计算这500个估计值的平均值。同时每次计算参数的方差以及500次方差的均值,这个跟 (a) 计算出来的有差距吗》

Program:

# -*- coding: utf-8 -*-
"""
Created on Tue Oct 16 09:35:19 2018

@author: Wengsway

"""

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

X = np.linspace(-10,10,30)
new_X = sm.add_constant(X)

# Part (a)
# Define a function to get Y
def linearmodel():
      u = np.random.normal(0,1,30)
      Y = 1 + 2*X + u
      return Y
# Generate lists to store values
alpha_a, beta_a, var_alpha_a, var_beta_a = [],[],[],[]
# Repeat OLS process 500 times
for i in range(500):
      model_a = sm.OLS(linearmodel(),new_X)
      results_a = model_a.fit()
      alpha_a.append(results_a.params[0])
      beta_a.append(results_a.params[1])
      var_alpha_a.append((results_a.bse[0])**2)
      var_beta_a.append((results_a.bse[1])**2)
mean_alpha_a, mean_beta_a = np.mean(alpha_a), np.mean(beta_a)
mean_var_alpha_a, mean_var_beta_a = np.mean(var_alpha_a), np.mean(var_beta_a)
# Output result
print('the mean value of all alpha in(a) is:',mean_alpha_a)
print('the mean value of all beta in (a) is:',mean_beta_a)
print('the mean value of var_alpha in (a) is:',mean_var_alpha_a)
print('the mean value of var_beta in (a) is:',mean_var_beta_a)
# Draw a graphic
plt.figure(1,figsize = (10,5))
plt.plot(alpha_a,label = 'alpha_a')
plt.plot(beta_a,label = 'beta_a')
plt.hlines(mean_alpha_a,0,500, colors = "r", linestyles = "--",label = 
           'mean_alpha_a')
plt.hlines(np.max(alpha_a),0,500,colors = "k",linestyle = ":")
plt.hlines(np.min(alpha_a),0,500,colors = "k",linestyle = ":")
plt.hlines(mean_beta_a,0,500, colors = "g", linestyles = "dashed",label = 
           'mean_beta_a')
plt.hlines(np.max(beta_a),0,500,colors = "k",linestyle = ":")
plt.hlines(np.min(beta_a),0,500,colors = "k",linestyle = ":")
plt.legend(loc = 'best')

# Part (b)
# Define a function to get Y_1
def linearmodel2():
      u = np.random.randn(30)
      Y_1 = 1 + 2 * X + (0.5 * X + u) + u
      return(Y_1)
# Generate lists to store values
alpha_b, beta_b, var_alpha_b, var_beta_b = [],[],[],[]
# Repeat OLS process 500 times
for i in range(500):
      model_b = sm.OLS(linearmodel2(),new_X)
      results_b = model_b.fit()
      alpha_b.append(results_b.params[0])
      beta_b.append(results_b.params[1])
      var_alpha_b.append((results_b.bse[0])**2)
      var_beta_b.append((results_b.bse[1])**2)
mean_alpha_b, mean_beta_b = np.mean(alpha_b), np.mean(beta_b)
mean_var_alpha_b, mean_var_beta_b = np.mean(var_alpha_b), np.mean(var_beta_b)
# Output result
print('\n','the mean value of all alpha in(b) is:',mean_alpha_b)
print('the mean value of all beta in (b) is:',mean_beta_b)
print('the mean value of var_alpha in (b) is:',mean_var_alpha_b)
print('the mean value of var_beta in (b) is:',mean_var_beta_b)
# Draw a graphic
plt.figure(2,figsize = (10,5))
plt.plot(alpha_b,label = 'alpha_b')
plt.plot(beta_b,label = 'beta_b')
plt.hlines(mean_alpha_b,0,500, colors = "r", linestyles = "--", label = 
           'mean_alpha_b')
plt.hlines(np.max(alpha_b),0,500,colors = "k",linestyle = ":")
plt.hlines(np.min(alpha_b),0,500,colors = "k",linestyle = ":")
plt.hlines(mean_beta_b,0,500, colors = "g", linestyles = "dashed",label = 
           'mean_bata_b')
plt.hlines(np.max(beta_b),0,500,colors = "k",linestyle = ":")
plt.hlines(np.min(beta_b),0,500,colors = "k",linestyle = ":")
plt.legend(loc = 'best')

# Part (c)
# Define a function to get Y_2
def linearmodel3():
      ε = np.random.normal(0,0.5,30)
      u_2 = np.random.normal(0,1,1)
      u_2 = u_2.tolist()
      for i in range(1,30):
            u_2.append((0.4*u_2[i-1]+ε[i]))
      Y_2 = 1 + 2 * X + u_2
      return Y_2
# Generate lists to store values
alpha_c, beta_c, var_alpha_c, var_beta_c = [],[],[],[]
# Repeat OLS process 500 times
for i in range(500):
      model_c = sm.OLS(linearmodel3(),new_X)
      results_c = model_c.fit()
      alpha_c.append(results_c.params[0])
      beta_c.append(results_c.params[1])
      var_alpha_c.append((results_c.bse[0])**2)
      var_beta_c.append((results_c.bse[1])**2)
mean_alpha_c, mean_beta_c = np.mean(alpha_c), np.mean(beta_c)
mean_var_alpha_c, mean_var_beta_c = np.mean(var_alpha_c), np.mean(var_beta_c)
gap_alpha_a_to_alpha_c = [alpha_a[i]-alpha_c[i] for i in range(len(alpha_a))] 
gap_beta_a_to_beta_c = [beta_a[i]-beta_c[i] for i in range(len(beta_a))]
# Output result
print('\n','the mean value of all alpha in(c) is:',mean_alpha_c)
print('the mean value of all beta in (c) is:',mean_beta_c)
print('the mean value of var_alpha in (c) is:',mean_var_alpha_c)
print('the mean value of var_beta in (c) is:',mean_var_beta_c)
# Draw a graphic
plt.figure(3,figsize = (10,5))
plt.plot(alpha_c,label = 'alpha_c')
plt.plot(beta_c,label = 'beta_c')
plt.hlines(mean_alpha_c,0,500,colors = "r", linestyles = "--", label = 
           'mean_alpha_c')
plt.hlines(np.max(alpha_c),0,500,colors = "k",linestyle = ":")
plt.hlines(np.min(alpha_c),0,500,colors = "k",linestyle = ":")
plt.hlines(mean_beta_c,0,500,colors = "g", linestyles = "dashed", label = 
           'mean_beta_c')
plt.hlines(np.max(beta_c),0,500,colors = "k",linestyle = ":")
plt.hlines(np.min(beta_c),0,500,colors = "k",linestyle = ":")
plt.legend(loc = 'best')
# Compare the results of part (c) and part (a)
plt.figure(4,figsize = (10,5))
plt.plot(alpha_a,label = 'alpha_a')
plt.plot(beta_a,label = 'beta_a')
plt.plot(alpha_c,label = 'alpha_c')
plt.plot(beta_c,label = 'beta_c')
plt.plot(gap_alpha_a_to_alpha_c,label = 'gap_alpha_a_to_alpha_c')
plt.plot(gap_beta_a_to_beta_c, label = 'gap_beta_a_to_beta_c')
plt.legend(loc = 'best')

Answer:

runfile('E:/Documents/HUST/20180913 - Financial_Statistics_and_Econometrics/20180916 - Homework/Homework 2/Homework_2_Part 2_2.py', wdir='E:/Documents/HUST/20180913 - Financial_Statistics_and_Econometrics/20180916 - Homework/Homework 2')
the mean value of all alpha in(a) is: 1.004444338639324
the mean value of all beta in (a) is: 1.9984596329704911
the mean value of var_alpha in (a) is: 0.03353353834968027
the mean value of var_beta in (a) is: 0.0009411025278781239

the mean value of all alpha in(b) is: 0.976883356505685
the mean value of all beta in (b) is: 2.50194908657034
the mean value of var_alpha in (b) is: 0.1341837121384748
the mean value of var_beta in (b) is: 0.0037658009535636475

the mean value of all alpha in(c) is: 0.9953336873646347
the mean value of all beta in (c) is: 2.0032752239851077
the mean value of var_alpha in (c) is: 0.009653422845796808
the mean value of var_beta in (c) is: 0.000270918641156233

《Financial Statistics and Econometrics: Homework 2》

《Financial Statistics and Econometrics: Homework 2》

从上图可知,500个 \alpha 的均值和它的真实值差距较大,500个 \beta 的均值和它的真实值差距较小。

《Financial Statistics and Econometrics: Homework 2》

《Financial Statistics and Econometrics: Homework 2》

从上图可知,(c)部分估计的 \beta 和(a)部分估计的 \beta 差距很小,(c)部分估计的 \alpha 和(a)部分估计的 \alpha 差距稍大。

发表评论

电子邮件地址不会被公开。 必填项已用*标注

*
*