量化交易学习(十五)Qlib人工智能量化框架快速入门(2)

上一篇文章跟着一篇研报过了一遍安装流程,以及一些数据获取相关的命令。今天在跑研报提供的例子的时候发现有些代码已经跑不通了,估计是qlib有些api已经改了,在qlib源码中的examples找到了官方的入门例子。

图片

workflow_by_code.ipynb 与 workflow_by_code.py 是用qlib跑回测的完整流程代码。在 tutorial 目录中的 detailed_workflow.ipynb 是详细工作流程的代码。

接下来照着 workflow_by_code.ipynb 把整个流程跑一遍。

导入 qlib 库:

1
2
3
4
5
6
7
8
9
10
11
import os
import sys, site
from pathlib import Path

import qlib
import pandas as pd
from qlib.constant import REG_CN
from qlib.utils import exists_qlib_data, init_instance_by_config
from qlib.workflow import R
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
from qlib.utils import flatten_dict

初始化数据

1
2
3
4
# use default data
# NOTE: need to download data from remote: python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
provider_uri = "~/.qlib/qlib_data/cn_data" # target_dir
qlib.init(provider_uri=provider_uri, region=REG_CN)

设置股票池为沪深300中的股票,基准为沪深300指数:

1
2
market = "csi300"
benchmark = "SH000300"

训练模型:
模型采用GBDT,训练特征采用Alpha158

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
###################################
# train model
###################################
data_handler_config = {
"start_time": "2008-01-01",
"end_time": "2020-08-01",
"fit_start_time": "2008-01-01",
"fit_end_time": "2014-12-31",
"instruments": market,
}

task = {
"model": {
"class": "LGBModel",
"module_path": "qlib.contrib.model.gbdt",
"kwargs": {
"loss": "mse",
"colsample_bytree": 0.8879,
"learning_rate": 0.0421,
"subsample": 0.8789,
"lambda_l1": 205.6999,
"lambda_l2": 580.9768,
"max_depth": 8,
"num_leaves": 210,
"num_threads": 20,
},
},
"dataset": {
"class": "DatasetH",
"module_path": "qlib.data.dataset",
"kwargs": {
"handler": {
"class": "Alpha158",
"module_path": "qlib.contrib.data.handler",
"kwargs": data_handler_config,
},
"segments": {
"train": ("2008-01-01", "2014-12-31"),
"valid": ("2015-01-01", "2016-12-31"),
"test": ("2017-01-01", "2020-08-01"),
},
},
},
}

# model initiaiton
model = init_instance_by_config(task["model"])
dataset = init_instance_by_config(task["dataset"])

# start exp to train model
with R.start(experiment_name="train_model"):
R.log_params(**flatten_dict(task))
model.fit(dataset)
R.save_objects(trained_model=model)
rid = R.get_recorder().id

预测、回测以及数据分析:

回测的策略使用TopkDropout。

TopkDropout 策略如下:

  • TopK: 持有的股票数
  • Drop: 每天卖出的股票数

在每个交易日,卖掉预测分最低的Drop个股票,同时买入同等数量的评分最高的其他股票。这样就保证了换手率。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
###################################
# prediction, backtest & analysis
###################################
port_analysis_config = {
"executor": {
"class": "SimulatorExecutor",
"module_path": "qlib.backtest.executor",
"kwargs": {
"time_per_step": "day",
"generate_portfolio_metrics": True,
},
},
"strategy": {
"class": "TopkDropoutStrategy",
"module_path": "qlib.contrib.strategy.signal_strategy",
"kwargs": {
"model": model,
"dataset": dataset,
"topk": 50,
"n_drop": 5,
},
},
"backtest": {
"start_time": "2017-01-01",
"end_time": "2020-08-01",
"account": 100000000,
"benchmark": benchmark,
"exchange_kwargs": {
"freq": "day",
"limit_threshold": 0.095,
"deal_price": "close",
"open_cost": 0.0005,
"close_cost": 0.0015,
"min_cost": 5,
},
},
}

# backtest and analysis
with R.start(experiment_name="backtest_analysis"):
recorder = R.get_recorder(recorder_id=rid, experiment_name="train_model")
model = recorder.load_object("trained_model")

# prediction
recorder = R.get_recorder()
ba_rid = recorder.id
sr = SignalRecord(model, dataset, recorder)
sr.generate()

# backtest & analysis
par = PortAnaRecord(recorder, port_analysis_config, "day")
par.generate()

分析图表:

1
2
3
4
5
6
7
8
9
from qlib.contrib.report import analysis_model, analysis_position
from qlib.data import D

recorder = R.get_recorder(recorder_id=ba_rid, experiment_name="backtest_analysis")
print(recorder)
pred_df = recorder.load_object("pred.pkl")
report_normal_df = recorder.load_object("portfolio_analysis/report_normal_1day.pkl")
positions = recorder.load_object("portfolio_analysis/positions_normal_1day.pkl")
analysis_df = recorder.load_object("portfolio_analysis/port_analysis_1day.pkl")

分析头寸

报告:

1
analysis_position.report_graph(report_normal_df)

图片

图片上,x轴为交易日,y轴上不同的曲线含义不同:

  • cum bench 基准累积收益率
  • cum return wo cost 不含交易费的投资组合累积收益率
  • cum return w cost 包含交易费的投资组合累积收益率
  • return wo mdd 不含交易费的累积收益最大回撤
  • return w cost mdd 包含交易费的累积收益最大回撤
  • cum ex return wo cost 不含交易成本基准下,投资组合的总超额收益
  • cum ex return w cost 计入交易成本基准下,投资组合的总超额收益
  • turnover 换手率
  • cum ex return wo cost mdd 不计交易成本,投资组合超额收益的最大跌幅
  • cum ex return w cost mdd 计入交易成本,投资组合超额收益的最大跌幅
  • 上半部分的阴影表示 不含交易成本的累积收益所对应的最大亏损
  • 下半部分的阴影表示 不含交易成本的超额收益对应的最大回撤

风险分析:

1
analysis_position.risk_analysis_graph(analysis_df, report_normal_df)

图片

图表说明:

  • std 标准差
  • annualized_return 年化收益率
  • information_ratio 信息比率(Information Ratio – IR).
  • max_drawdown 最大回撤
  • excess_return_without_cost 不计交易成本累计超额收益 (CAR)
  • excess_return_with_cost 含交易成本的累计超额收益 (CAR)

下面图表中,x轴为按月聚合的交易日。

图片

图片

图片

图片

分析模型

1
2
3
4
5
label_df = dataset.prepare("test", col_set="label")
label_df.columns = ["label"]
Score IC
pred_label = pd.concat([label_df, pred_df], axis=1, sort=True).reindex(label_df.index)
analysis_position.score_ic_graph(pred_label)

图片

图中x轴为交易日,y轴的含义如下:

  • ic 标签和预测分数之间的皮尔逊相关系数,在例子被公式化为 Ref($close, -2)/Ref($close, -1)-1
  • rank_ic 标签和预测分数之间的斯皮尔曼等级相关系数

模型性能

1
analysis_model.model_performance_graph(pred_label)

图片

累积回报图

  • Group1:The Cumulative Return series of stocks group with (ranking ratio of label <= 20%)
  • Group2:The Cumulative Return series of stocks group with (20% < ranking ratio of label <= 40%)
  • Group3:The Cumulative Return series of stocks group with (40% < ranking ratio of label <= 60%)
  • Group4:The Cumulative Return series of stocks group with (60% < ranking ratio of label <= 80%)
  • Group5:The Cumulative Return series of stocks group with (80% < ranking ratio of label)
  • long-short:The Difference series between Cumulative Return of Group1 and of Group5
  • long-averageThe Difference series between Cumulative Return of Group1 and average Cumulative Return for all stocks.

The ranking ratio can be formulated as follows.

𝑟𝑎𝑛𝑘𝑖𝑛𝑔 𝑟𝑎𝑡𝑖𝑜=(𝐴𝑠𝑐𝑒𝑛𝑑𝑖𝑛𝑔 𝑅𝑎𝑛𝑘𝑖𝑛𝑔 𝑜𝑓 𝑙𝑎𝑏𝑒𝑙) / (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑡𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑃𝑜𝑟𝑡𝑓𝑜𝑙𝑖𝑜)

图片

long-short/long-average

The distribution of long-short/long-average returns on each trading day

图片

  • Information Coefficient
    • The Pearson correlation coefficient series between labels and prediction scores of stocks in portfolio.
    • The graphics reports can be used to evaluate the prediction scores.

图片

  • Monthly IC
    • Monthly average of the Information Coefficient

图片

  • ICThe distribution of the Information Coefficient on each trading day.
  • IC Normal Dist. Q-QThe Quantile-Quantile Plot is used for the normal distribution of Information Coefficient on each trading day.

图片

  • Auto Correlation
    • The Pearson correlation coefficient series between the latest prediction scores and the prediction scores lag days ago of stocks in portfolio on each trading day.
    • The graphics reports can be used to estimate the turnover rate.

江达小记