量化交易学习（十五）Qlib人工智能量化框架快速入门（2）

发表于 2024-02-04 分类于量化

上一篇文章跟着一篇研报过了一遍安装流程，以及一些数据获取相关的命令。今天在跑研报提供的例子的时候发现有些代码已经跑不通了，估计是qlib有些api已经改了，在qlib源码中的examples找到了官方的入门例子。

workflow_by_code.ipynb 与 workflow_by_code.py 是用qlib跑回测的完整流程代码。在 tutorial 目录中的 detailed_workflow.ipynb 是详细工作流程的代码。

接下来照着 workflow_by_code.ipynb 把整个流程跑一遍。

导入 qlib 库：

import os
import sys, site
from pathlib import Path

import qlib
import pandas as pd
from qlib.constant import REG_CN
from qlib.utils import exists_qlib_data, init_instance_by_config
from qlib.workflow import R
from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
from qlib.utils import flatten_dict

初始化数据

# use default data
# NOTE: need to download data from remote: python scripts/get_data.py qlib_data_cn --target_dir ~/.qlib/qlib_data/cn_data
provider_uri = "~/.qlib/qlib_data/cn_data"  # target_dir
qlib.init(provider_uri=provider_uri, region=REG_CN)

设置股票池为沪深300中的股票，基准为沪深300指数：

1 2	market = "csi300" benchmark = "SH000300"

训练模型：
模型采用GBDT，训练特征采用Alpha158

###################################
# train model
###################################
data_handler_config = {
    "start_time": "2008-01-01",
    "end_time": "2020-08-01",
    "fit_start_time": "2008-01-01",
    "fit_end_time": "2014-12-31",
    "instruments": market,
}

task = {
    "model": {
        "class": "LGBModel",
        "module_path": "qlib.contrib.model.gbdt",
        "kwargs": {
            "loss": "mse",
            "colsample_bytree": 0.8879,
            "learning_rate": 0.0421,
            "subsample": 0.8789,
            "lambda_l1": 205.6999,
            "lambda_l2": 580.9768,
            "max_depth": 8,
            "num_leaves": 210,
            "num_threads": 20,
        },
    },
    "dataset": {
        "class": "DatasetH",
        "module_path": "qlib.data.dataset",
        "kwargs": {
            "handler": {
                "class": "Alpha158",
                "module_path": "qlib.contrib.data.handler",
                "kwargs": data_handler_config,
            },
            "segments": {
                "train": ("2008-01-01", "2014-12-31"),
                "valid": ("2015-01-01", "2016-12-31"),
                "test": ("2017-01-01", "2020-08-01"),
            },
        },
    },
}

# model initiaiton
model = init_instance_by_config(task["model"])
dataset = init_instance_by_config(task["dataset"])

# start exp to train model
with R.start(experiment_name="train_model"):
    R.log_params(**flatten_dict(task))
    model.fit(dataset)
    R.save_objects(trained_model=model)
    rid = R.get_recorder().id

预测、回测以及数据分析：

回测的策略使用TopkDropout。

TopkDropout 策略如下：

TopK: 持有的股票数
Drop: 每天卖出的股票数

在每个交易日，卖掉预测分最低的Drop个股票，同时买入同等数量的评分最高的其他股票。这样就保证了换手率。

###################################
# prediction, backtest & analysis
###################################
port_analysis_config = {
    "executor": {
        "class": "SimulatorExecutor",
        "module_path": "qlib.backtest.executor",
        "kwargs": {
            "time_per_step": "day",
            "generate_portfolio_metrics": True,
        },
    },
    "strategy": {
        "class": "TopkDropoutStrategy",
        "module_path": "qlib.contrib.strategy.signal_strategy",
        "kwargs": {
            "model": model,
            "dataset": dataset,
            "topk": 50,
            "n_drop": 5,
        },
    },
    "backtest": {
        "start_time": "2017-01-01",
        "end_time": "2020-08-01",
        "account": 100000000,
        "benchmark": benchmark,
        "exchange_kwargs": {
            "freq": "day",
            "limit_threshold": 0.095,
            "deal_price": "close",
            "open_cost": 0.0005,
            "close_cost": 0.0015,
            "min_cost": 5,
        },
    },
}

# backtest and analysis
with R.start(experiment_name="backtest_analysis"):
    recorder = R.get_recorder(recorder_id=rid, experiment_name="train_model")
    model = recorder.load_object("trained_model")

    # prediction
    recorder = R.get_recorder()
    ba_rid = recorder.id
    sr = SignalRecord(model, dataset, recorder)
    sr.generate()

    # backtest & analysis
    par = PortAnaRecord(recorder, port_analysis_config, "day")
    par.generate()

分析图表：

from qlib.contrib.report import analysis_model, analysis_position
from qlib.data import D

recorder = R.get_recorder(recorder_id=ba_rid, experiment_name="backtest_analysis")
print(recorder)
pred_df = recorder.load_object("pred.pkl")
report_normal_df = recorder.load_object("portfolio_analysis/report_normal_1day.pkl")
positions = recorder.load_object("portfolio_analysis/positions_normal_1day.pkl")
analysis_df = recorder.load_object("portfolio_analysis/port_analysis_1day.pkl")

分析头寸

报告：

1	analysis_position.report_graph(report_normal_df)

图片上，x轴为交易日，y轴上不同的曲线含义不同：

cum bench 基准累积收益率
cum return wo cost 不含交易费的投资组合累积收益率
cum return w cost 包含交易费的投资组合累积收益率
return wo mdd 不含交易费的累积收益最大回撤
return w cost mdd 包含交易费的累积收益最大回撤
cum ex return wo cost 不含交易成本基准下，投资组合的总超额收益
cum ex return w cost 计入交易成本基准下，投资组合的总超额收益
turnover 换手率
cum ex return wo cost mdd 不计交易成本，投资组合超额收益的最大跌幅
cum ex return w cost mdd 计入交易成本，投资组合超额收益的最大跌幅
上半部分的阴影表示不含交易成本的累积收益所对应的最大亏损
下半部分的阴影表示不含交易成本的超额收益对应的最大回撤

风险分析：

1	analysis_position.risk_analysis_graph(analysis_df, report_normal_df)

图表说明：

std 标准差
annualized_return 年化收益率
information_ratio 信息比率（Information Ratio – IR）.
max_drawdown 最大回撤
excess_return_without_cost 不计交易成本累计超额收益 (CAR)
excess_return_with_cost 含交易成本的累计超额收益 (CAR)

下面图表中，x轴为按月聚合的交易日。

分析模型

label_df = dataset.prepare("test", col_set="label")
label_df.columns = ["label"]
Score IC
pred_label = pd.concat([label_df, pred_df], axis=1, sort=True).reindex(label_df.index)
analysis_position.score_ic_graph(pred_label)

图中x轴为交易日，y轴的含义如下：

ic 标签和预测分数之间的皮尔逊相关系数，在例子被公式化为 Ref($close, -2)/Ref($close, -1)-1
rank_ic 标签和预测分数之间的斯皮尔曼等级相关系数

模型性能

1	analysis_model.model_performance_graph(pred_label)

累积回报图

Group1:The Cumulative Return series of stocks group with (ranking ratio of label <= 20%)
Group2:The Cumulative Return series of stocks group with (20% < ranking ratio of label <= 40%)
Group3:The Cumulative Return series of stocks group with (40% < ranking ratio of label <= 60%)
Group4:The Cumulative Return series of stocks group with (60% < ranking ratio of label <= 80%)
Group5:The Cumulative Return series of stocks group with (80% < ranking ratio of label)
long-short:The Difference series between Cumulative Return of Group1 and of Group5
long-averageThe Difference series between Cumulative Return of Group1 and average Cumulative Return for all stocks.

The ranking ratio can be formulated as follows.

𝑟𝑎𝑛𝑘𝑖𝑛𝑔 𝑟𝑎𝑡𝑖𝑜=(𝐴𝑠𝑐𝑒𝑛𝑑𝑖𝑛𝑔 𝑅𝑎𝑛𝑘𝑖𝑛𝑔 𝑜𝑓 𝑙𝑎𝑏𝑒𝑙) / (𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑆𝑡𝑜𝑐𝑘𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑃𝑜𝑟𝑡𝑓𝑜𝑙𝑖𝑜)

long-short/long-average

The distribution of long-short/long-average returns on each trading day

Information Coefficient
- The Pearson correlation coefficient series between labels and prediction scores of stocks in portfolio.
- The graphics reports can be used to evaluate the prediction scores.

Monthly IC
- Monthly average of the Information Coefficient

ICThe distribution of the Information Coefficient on each trading day.
IC Normal Dist. Q-QThe Quantile-Quantile Plot is used for the normal distribution of Information Coefficient on each trading day.

Auto Correlation
- The Pearson correlation coefficient series between the latest prediction scores and the prediction scores lag days ago of stocks in portfolio on each trading day.
- The graphics reports can be used to estimate the turnover rate.

江达小记