When News Meets Algorithms: Using AI Sentiment to Build Smarter Portfolios
https://ssrn.com/abstract=5344082
Financial markets are driven not just by numbers on a screen but also by human emotions. Fear, optimism, panic, and hype, these intangible forces often move stock prices just as much as earnings reports or balance sheets. For decades, investors have known that sentiment matters, but it has been notoriously difficult to capture in models.
Now, with the rise of large language models (LLMs) and advanced reinforcement learning (RL), researchers are finding ways to systematically measure and use investor sentiment for portfolio optimization. A recent paper by Kemal Kirtac and Guido Germano introduces an innovative framework that combines both worlds: reinforcement learning with a sentiment-aware twist.
The approach is called Sentiment-Augmented Proximal Policy Optimization (SAPPO). It builds on a well-known RL method called PPO (Proximal Policy Optimization) but adds a new ingredient: daily sentiment extracted from financial news using the LLaMA 3.3 model.
Their findings are striking:
SAPPO boosts the Sharpe ratio from 1.55 (PPO baseline) to 1.90.
It improves annualized returns (30.2% vs. 26.5%).
It lowers maximum drawdowns (-13.8% vs. -17.5%).
Why Portfolio Optimization Needs More Than Prices
Traditional portfolio optimization dates back to Harry Markowitz’s mean-variance framework (1952). The idea was simple but powerful: combine assets in a way that maximizes expected return for a given level of risk. Later came refinements, like the Sharpe ratio, to measure risk-adjusted performance.
But all of these methods rely almost exclusively on historical price data. That’s both their strength and weakness. Prices reflect information, but they don’t capture everything. Market participants constantly react to news headlines, policy changes, analyst opinions, and broader economic sentiment. Ignoring these signals can leave a strategy blind to important shifts.
Reinforcement learning (RL) seemed to offer a way forward. Instead of statically optimizing based on past returns, RL agents learn dynamically by interacting with market environments. They adapt to changes, explore new allocation strategies, and adjust in real-time.
PPO in particular has become a popular RL algorithm for finance. It’s stable, robust, and works well with continuous decision spaces (like portfolio weights). But even PPO, in its standard form, is still price-only. It doesn’t listen to the “mood of the market.”
That’s where SAPPO comes in.
Sentiment: The Missing Signal
The central insight of Kirtac and Germano’s paper is straightforward: if markets are influenced by investor sentiment, then reinforcement learning models should incorporate it.
Thanks to recent advances in natural language processing (NLP) and large language models, sentiment can now be measured at scale. Models like FinBERT (fine-tuned for finance) and LLaMA 3.3 (a general-purpose transformer) can process massive volumes of financial news and generate sentiment scores.
In SAPPO, each day’s financial news is run through LLaMA 3.3 to produce sentiment values normalized between -1 and +1:
Positive sentiment → optimism, good news.
Negative sentiment → fear, pessimism, bad news.
These scores are then integrated directly into the RL agent’s decision-making process.
How SAPPO Works
At its core, SAPPO modifies PPO in two main ways:
State Representation
Instead of using only portfolio weights and asset prices, SAPPO adds a third component: daily sentiment scores. So the agent’s state vector becomes:
state = (portfolio_weights, asset_prices, sentiment_scores)
Sentiment-Weighted Advantage Function
PPO relies on an “advantage function” to decide whether an action is good or bad compared to expectations. SAPPO tweaks this by adding a sentiment influence term: A′(s,a)=A(s,a)+λw⋅mA(s,a): traditional advantage (price-based)
w: portfolio weights
m: sentiment scores
λ: a parameter controlling how much sentiment matters
After testing different values, they found that λ = 0.1 works best.
Filtering Sentiment Noise
Not all news is useful. To avoid duplicate or redundant signals, SAPPO uses cosine similarity to filter out repeated news stories that are too similar within a five-day window. This ensures that the model isn’t biased by the same story echoed across multiple outlets.
Training Setup
Assets: Google, Microsoft, Meta
Period: 2013–2019 (training), 2020 (testing)
Execution: VWAP (volume-weighted average price) of the first 10 minutes of trading each day
Transaction cost: 0.05% per trade
Framework: Stable-Baselines3 with PyTorch
Both PPO and SAPPO used identical architectures: two hidden layers (128 and 64 units), trained with Adam optimizer.
Results: Numbers That Matter
So, what happened when SAPPO was unleashed on the data?
Outperformance Across the Board
Compared to the vanilla PPO model, SAPPO delivered:
The results are clear: SAPPO isn’t just more profitable, it’s also less risky. Lower drawdowns mean better downside protection. Higher Sharpe ratios mean more efficient returns per unit of risk.
Sharpe Ratio Matters
For professional investors, the Sharpe ratio is one of the most important metrics. A jump from 1.55 to 1.90 is significant. The improvement was also statistically validated with a t-test (p < 0.001), meaning it’s unlikely to be just luck.
Volatility and Turnover
Interestingly, SAPPO was more active with a 12% daily turnover rate compared to 3.5% for PPO. This reflects its sensitivity to news sentiment, leading to more frequent rebalancing. That activity did increase volatility slightly, but because the returns were stronger and drawdowns smaller, the tradeoff was worth it.
Why This Matters
The research highlights a broader trend in finance: multi-modal models that combine structured numerical data with unstructured text.
For investors: It shows that ignoring sentiment means leaving money (and risk management) on the table.
For hedge funds and institutions: SAPPO points to a path for designing adaptive strategies that can respond to real-world news in near real-time.
For AI in finance: It’s another proof that LLMs are not just toys for chatbots but can meaningfully enhance decision-making in high-stakes environments.
Limitations and Open Questions
As promising as SAPPO is, it’s important to keep the results in perspective. The authors acknowledge several limitations:
Data Source
Sentiment was derived only from Refinitiv financial news. That’s high quality, but limited. What about Twitter, Reddit, earnings call transcripts, or analyst reports? Adding more diverse sources could make the model stronger.Asset Universe
The experiments focused on just three tech stocks: Google, Microsoft, and Meta. These are large, liquid, and well-covered. Would SAPPO work as well on small caps, emerging markets, or commodities?Backtesting Only
The model was tested on historical data (2013–2020). That’s good for proof-of-concept but doesn’t reflect live trading constraints like slippage, liquidity crunches, or flash crashes.Daily Frequency
SAPPO used end-of-day sentiment applied to next-day trades. In reality, markets move on headlines minute by minute. Could a higher-frequency version capture even more?
Future Directions
The paper opens exciting paths for further research and practical deployment:
Multi-source sentiment: Integrating social media, earnings calls, and analyst notes.
Expanded portfolios: Applying SAPPO to diverse sectors, ETFs, or even cross-asset strategies.
Live simulations: Testing with paper trading or real capital in production systems.
Intra-day trading: Adapting the framework for real-time sentiment streams.
These steps could bring sentiment-aware reinforcement learning closer to being a staple of institutional trading desks.
Broader Impact
Beyond finance, SAPPO illustrates a fundamental shift: AI systems are becoming multi-modal decision-makers, combining numbers, language, and context.
Think about it: just a few years ago, reinforcement learning in finance was all about crunching price data. Today, it can also “read the news” and adjust strategies accordingly. This is a big leap toward models that mirror how human traders think, integrating both hard data and soft signals.
For investors, this could mean:
Smarter risk management (cutting exposure when sentiment turns negative).
Faster reaction times (catching trends earlier).
New alpha opportunities (capitalizing on mood-driven mispricings).
Conclusion
Markets are more than math. They are living systems of information, perception, and emotion. By combining reinforcement learning with LLM-based sentiment analysis, Kirtac and Germano’s SAPPO framework offers a way to bring that reality into algorithms. While still limited in scope, SAPPO is a compelling demonstration that the future of investing will not be price-only. It will be sentiment-aware, adaptive, and multi-modal.