利用SVM（支持向量机）预测股价

简要介绍：
使用SKlearn和LinearRegression两种方法进行股价预测，最后对两种方法的结果进行对比。

一、安装sklearn
在网上查了一下，要安装sklearn，需要先安装scipy,
要安装scipy，需要先安装numpy+mkl，不过我在自己的电脑上测试一下，发现可以直接import sklearn，所以一切OK。

二、测试
1.代码出错
然后测试代码，打印几次之后，报错：

raise klass(message, resp.status_code, resp.text, resp.headers, code)
quandl.errors.quandl_error.LimitExceededError: (Status 429) (Quandl Error QELx01
) You have exceeded the anonymous user limit of 50 calls per day. To make more c
alls today, please register for a free Quandl account and then include your API
key with your requests.

好吧，我去注册账号吧，结果发现注册到step 3 of 3的时候，发现无法显示验证码，重新换一个浏览器也不行。

好吧，改用米筐的数据吧，这一次发现自己找的代理可以使用https://www.2980.com来注册邮箱。这样就解决了以后注册米筐的问题。

好吧，这台电脑上好像又没装rqdata，继续改用joinquant，成功下载到了数据。

三、小知识
1.SVM（SVR）算法
是英文support vector machine的缩写，翻译成中文为支持向量机。它是一种分类算法，但是也可以做回归，根据输入的数据不同可做不同的模型（若输入标签为连续值则做回归，若输入标签为分类值则用SVC()做分类）

SVR(kernel='rbf', C=1e3, gamma=0.1)
（1）kernel:核函数的类型，一般常用的有’rbf’，’linear’，’poly’，等如图4-1-2-1所示，发现使用rbf参数时函数模型的拟合效果最好。

（2）C:惩罚因子。C表征你有多么重视离群点，C越大越重视，越不想丢掉它们。C值大时对误差分类的惩罚增大，C值小时对误差分类的惩罚减小。当C越大，趋近无穷的时候，表示不允许分类误差的存在，margin越小，容易过拟合；当C趋于0时，表示我们不再关注分类是否正确，只要求margin越大，容易欠拟合。如图4-1-2-2所示发现当使用1e3时最为适宜。

（3）gamma: 是’rbf’，’poly’和’sigmoid’的核系数且gamma的值必须大于0。随着gamma的增大，存在对于测试集分类效果差而对训练分类效果好的情况，并且容易泛化误差出现过拟合。如图4-1-2-3所示发现gamma=0.01时准确度最高。

可参考：https://blog.csdn.net/qq_16633405/article/details/70243030

四、源码：

import quandl
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split

#Get the stock data
# df = quandl.get("WIKI/AMZN")
# Take a look at the data
# print(df.head())

from jqdatasdk import *
auth('138xxxxxxx','a...4')
# quotes = pd.read_csv('002.csv')  # 读取训练数据

stock_code = '601318.XSHG'
start_date = '2016-02-05'
end_date = '2017-02-07'



df = get_price(stock_code, start_date, end_date, frequency='daily',skip_paused=False,fq='pre',fields=['open','high','low','close','money'])
print(df.head())

df = df[['close']]
print(df.head())


# A variable for predicting 'n' days out into the future
forecast_out = 30 #'n=30' days
#Create another column (the target or dependent variable) shifted 'n' units up
df['Prediction'] = df[['close']].shift(-forecast_out)
#print the new data set
print(df.tail())


### Create the independent data set (X)  #######
# Convert the dataframe to a numpy array
X = np.array(df.drop(['Prediction'],1))

#Remove the last 'n' rows
X = X[:-forecast_out]
# print(X)


### Create the dependent data set (y)  #####
# Convert the dataframe to a numpy array (All of the values including the NaN's)
y = np.array(df['Prediction'])
# Get all of the y values except the last 'n' rows
y = y[:-forecast_out]
# print(y)


# Split the data into 80% training and 20% testing
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and train the Support Vector Machine (Regressor)
svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
svr_rbf.fit(x_train, y_train)

# Testing Model: Score returns the coefficient of determination R^2 of the prediction. 
# The best possible score is 1.0
svm_confidence = svr_rbf.score(x_test, y_test)
print("svm confidence: ", svm_confidence)


# Create and train the Linear Regression  Model
lr = LinearRegression()
# Train the model
lr.fit(x_train, y_train)

# Testing Model: Score returns the coefficient of determination R^2 of the prediction. 
# The best possible score is 1.0
lr_confidence = lr.score(x_test, y_test)
print("lr confidence: ", lr_confidence)


# Set x_forecast equal to the last 30 rows of the original data set from Adj. Close column
x_forecast = np.array(df.drop(['Prediction'],1))[-forecast_out:]
# print(x_forecast)


# Print linear regression model predictions for the next 'n' days
lr_prediction = lr.predict(x_forecast)
print("linear regression model predictions for the next n days:\n",lr_prediction)
#将numpy数据存为csv
np.savetxt("lr_prediction.csv", lr_prediction, delimiter=',')

# Print support vector regressor model predictions for the next 'n' days
svm_prediction = svr_rbf.predict(x_forecast)
print("support vector regressor model predictions for the next n days:\n",svm_prediction)
#将numpy数据存为csv
np.savetxt("svm_prediction.csv", svm_prediction, delimiter=',')

#Original stock price
z = np.array(df['close'])
print("Original stock price for the next n days:\n",z[-forecast_out:])
np.savetxt("z.csv", z[-forecast_out:], delimiter=',')

五、执行结果

https://www.youtube.com/watch?v=EYnC4ACIt2g&t=9s

六、分析

很明显，在执行“df['Prediction'] = df[['close']].shift(-forecast_out)”这句的时候，其实就是将未来30天的价格移到了现在，又去预测未来价格，使用了未来函数。

参考资料：
https://www.joinquant.com/view/community/detail/3f5f73ce173b157abac9aef8c0a71a96

https://www.youtube.com/watch?v=JgC9BEVS9Tk
https://www.youtube.com/watch?v=wxKLtaUryCw

https://www.youtube.com/watch?v=zwqwlR48ztQ

蜗牛之路

最新

利用SVM（支持向量机）预测股价

暧昧帖

发表评论点击这里取消回复。

最新

推荐

暧昧帖

发表评论 点击这里取消回复。

发表评论点击这里取消回复。