Behind the quant door量化专区

The math behind the lists名单背后的数学

Santa’s Naughty & Nice Lists is a toy on the surface and a disciplined signal underneath. Here is the rigorous backtest — every number is point-in-time and reproducible from the code.《圣诞老人的调皮与乖孩子名单》表面是个玩具,底层是一套严谨的信号。以下是完整回测——每个数字均为时点口径,且可由代码复现。

Why 93% and 62% are both true为什么 93% 与 62% 同时成立

A single flagged name only tells you up or down, and on that sign test it lands the predicted way about 62% of the time. The list is scored on size, not just direction, and that is where the jump comes from: the big moves sit disproportionately on the right side, so a few large correct moves outweigh the many small misses. Add that more of the 50 names move right than wrong, and the list’s average return lands on the predicted side about 93% of days. The 62% is the sign of one name; the 93% is the magnitude-weighted verdict on the whole list. Both are true, measuring different things.单个上榜名字只告诉你涨或跌,在这一方向判断上,按预测方向兑现的概率约 62%。但榜单是按幅度而非仅按方向评分的,跃升正源于此:大的波动不成比例地落在正确一侧,少数几笔大的正确波动便盖过众多小幅反向偏差。再加上 50 个名字中方向正确者多于错误者,榜单的平均收益便在约 93% 的交易日落在预测一侧。62% 是单个名字的方向命中;93% 是对整张榜单的幅度加权判断。两者皆真,只是衡量对象不同。

62% / 60%
per-name hit, sign only (naughty / goody)单名命中·仅方向(调皮 / 乖孩子)
95% / 93%
basket hit, magnitude-weighted, day over day篮子命中·幅度加权·日复一日

Information coefficient信息系数 (IC)

Daily cross-sectional rank correlation between the score and the next open move, 2016–2026. Adding the news-tone term lifts IC from 0.065 (event code alone) to 0.0727.评分与次日开盘跳空的每日横截面秩相关,2016–2026。加入新闻情绪项后,IC 从 0.065(仅事件体系)提升至 0.0727。

+0.0727
mean daily IC · t = 78.0每日平均 IC · t = 78.0
1.628
ICIR — daily info ratio of the IC seriesICIR——IC 序列的每日信息比率
0.10’170.08’180.09’190.09’200.09’210.06’220.05’230.06’240.05’250.05’26
IC by year — positive every year. These are open-PREDICTION numbers (the move completes in the auction), not a tradeable Sharpe.逐年 IC——每年为正。这是开盘预测口径的数字(跳空在集合竞价内完成),并非可交易的夏普比率。

The rank earns its order排名名副其实

Average open move by list depth — the top of the list moves far more than the tail. The news-tone term (dashed = event code alone) sharpens the top: the Naughty Top-10 deepens from -81.7 to -123.9 bps.按名单深度的平均开盘跳空——名单顶端的跳空远大于尾部。新闻情绪项(虚线=仅事件体系)让顶端更锐利:调皮名单 Top-10 从 -81.7 加深至 -123.9 bps。

+67-124Top 10+59-89Top 25+48-64Top 50
Solid bar = combined signal · dashed line = event code only (λ=0). Goody up (green), Naughty down (red).实心柱=合并信号 · 虚线=仅事件体系(λ=0)。乖孩子向上(绿),调皮向下(红)。

What’s driving it驱动来源

Fama-MacBeth decomposition on a company’s own news — the marginal contribution of each ingredient, in bps per standard deviation. The event taxonomy is the workhorse; news tone is a real, significant secondary refinement.基于公司自身新闻的 Fama-MacBeth 分解——各成分的边际贡献,单位为每标准差的 bps。事件体系是主力;新闻情绪是真实且显著的次级精修。

+35.8
bps/σ — event taxonomy (the workhorse) · t = 59bps/σ——事件体系(主力)· t = 59
+7.0
bps/σ — news tone (significant secondary) · t = 28bps/σ——新闻情绪(显著的次级项)· t = 28
The co-mention / network sentiment channel tested null and is excluded.共同提及 / 网络情绪通道经检验无效,已排除。

The pop fades; the drop deepens高开回落;低开加深

List average excess move from the open outward. The Goody pop completes in the auction and then fades (+48.1 → +19.7 bps over 20 days) — don’t chase. The Naughty drop, by contrast, keeps going (-63.8 → -96.8).各名单自开盘起的平均超额跳空。乖孩子的高开在集合竞价内完成,随后回落(20 日内 +48.1 → +19.7 bps)——不要追高。相反,调皮的低开持续加深(-63.8 → -96.8)。

open+1d+5d+10d+20d+20-97
Market-neutral average excess return of each 50-name list, at the open and 1 / 5 / 10 / 20 trading days after.各 50 名单在开盘及其后 1 / 5 / 10 / 20 个交易日的市场中性平均超额收益。

What can you do with this?你能用它做什么?

The open pop or drop completes in the auction, but the prediction is genuinely useful. Four honest uses:开盘的高开或低开在集合竞价内完成,但这一预测确有用处。四种诚实的用法:

1
Time what you already own. If you hold a Goody name you meant to sell, the open is statistically your best exit of the next several weeks: the pop is largest at the open and then fades. If you hold a Naughty name, do not sit on it, because it keeps sliding for weeks after the open.把握你已持有的仓位。若你持有本打算卖出的乖孩子名字,开盘在统计上是未来数周最好的卖点:高开在开盘时最大,随后回落。若你持有调皮名字,不要干等,因为它在开盘后会持续下滑数周。
2
Avoid the bad entry. Do not buy a Naughty name this morning, and do not chase a Goody pop at the open. Avoiding a loss is doing something with the information, and for a long-only investor, which is most of the A-share market, that alone is an edge.避免糟糕的买入。今天不要买入调皮名单上的名字,也不要在开盘追高乖孩子的跳涨。避免亏损也是对信息的运用,而对以多头为主的 A 股市场而言,这本身就是一种优势。
3
Lean on the drift after the open. The Naughty list keeps falling about 35 bps over the 20 trading days after the open, and the Goody pop gives back about 28. It is real and statistically strong. It is modest in size, and shorting A-shares is restricted, so the downside is hard to capture directly, but as a tilt or an overlay it has value.借助开盘后的漂移。调皮名单在开盘后的 20 个交易日里再跌约 35 bps,乖孩子的高开回吐约 28 bps。这是真实且统计显著的。幅度不大,且 A 股做空受限,下行端难以直接获取,但作为倾斜或叠加,它有价值。
4
Feed it into a bigger process. The list is a clean, point-in-time signal. A quant or a portfolio manager can blend it into an existing book as one input among many. The value is the information itself, not a standalone strategy.纳入更大的流程。榜单是一个干净的、时点口径的信号。量化或基金经理可将其作为众多输入之一融入现有组合。价值在于信息本身,而非独立策略。

Methodology方法

Signal. Per company-day, score = sscale(event signal) + λ·sscale(news tone), λ=1.0. The event signal sums a company’s own qualifying events (weight 1) plus events at materially connected companies (filings ∪ bid awards ∪ ownership, connection-weighted): Σ weight × ln(1+articles) × event polarity. Polarity comes from SmarTag’s official event taxonomy (a media-tone extension fills neutral types). The news tone term is SmarTag entity sentiment on the company’s own news (entity-level, not per-article). Everything is point-in-time: news is stamped by availability and assigned to the morning window before the open it predicts; connection edges use the prior month-end snapshot. No model judgment enters the ranking. Backtest 2016–2026, 2,295 trading days, 5,283,606 company-day observations. This is a prediction engine, not a trading strategy — the move completes in the opening auction.信号。每个公司-日,评分 = sscale(事件信号) + λ·sscale(新闻情绪),λ=1.0。事件信号汇总公司自身符合条件的事件(权重 1)及重要关联公司事件(财报披露 ∪ 中标 ∪ 股权,按连接权重):Σ 权重 × ln(1+报道数) × 事件极性。极性来自 SmarTag 官方事件体系(中性类型由媒体语调扩展补足)。新闻情绪项为 SmarTag 对公司自身新闻的实体级情绪(非单篇)。全程时点口径:新闻按可得时间戳归入其预测开盘前的早盘窗口;关系边使用上月末快照。排名不含模型主观判断。回测 2016–2026,2,295 个交易日,5,283,606 条公司-日观测。这是预测引擎,而非交易策略——跳空在开盘集合竞价内完成。