Societies of LLM-powered agents have advanced automated problem-solving, particularly in finance. Yet, most frameworks don’t replicate the collaborative workflows of real trading firms. TradingAgents addresses this gap by assigning specialized LLM-powered agents—analysts, researchers, traders, and risk managers—to simulate a dynamic, team-based environment. These agents collaborate through debates, structured outputs, and risk checks. Experiments show that TradingAgents significantly improves key performance metrics over baseline models, highlighting the promise of multi-agent LLM frameworks in financial trading.
Autonomous agents equipped with Large Language Models (LLMs) can mimic human problem-solving in finance—an intricate domain shaped by fundamentals, market sentiment, and macro factors. While deep learning models have long struggled with explainability, LLM-based systems show promise by pairing structured reasoning with interpretability. However, current solutions often lack organizational realism and rely on purely conversational interfaces susceptible to context loss.
TradingAgents fills these gaps by emulating the multi-agent decision-making processes of trading firms. The framework includes fundamental, sentiment, news, and technical analysts, along with bullish and bearish researchers, traders, and a risk management team. They coordinate using structured documents and concise dialogues. Our architecture leverages specialized LLM roles, combining clarity with deeper debates. Through extensive evaluations, TradingAgents delivers robust performance across multiple assets, validating the importance of multi-agent collaboration for real-world trading systems.
Specialized LLMs in finance have improved domain understanding via fine-tuning or from-scratch training on financial corpora (e.g., FinGPT, BloombergGPT). These models often excel at classification tasks but face challenges in generative quality compared to powerful general-purpose models like GPT-4.
Fine-Tuned LLMs for FinanceFine-tuning boosts performance on tasks such as financial sentiment analysis. Examples include PIXIU (FinMA) and Instruct-FinGPT. They outperform generic open-source LLMs but still lag behind top-tier proprietary models in some generative tasks.
Finance LLMs Trained from ScratchModels like BloombergGPT and XuanYuan 2.0 blend general corpora with specialized financial data, delivering strong domain-specific results. While they may not match larger closed-source models, they remain competitive among open-source counterparts.
LLMs directly executing trades often rely on news-driven or reasoning-driven prompts, sometimes enhanced by reinforcement learning. Debate and reflection modules help overcome hallucinations and bolster factual accuracy.
News-Driven AgentsThese agents use market news to gauge sentiment. Both closed-source (GPT-4) and open-source (Qwen) models show promising gains via simple sentiment-driven strategies.
Reasoning-Driven AgentsFrameworks like FinMem and TradingGPT integrate multi-round reasoning, reflection, and debates between agents with different stances, enabling more robust trading signals.
Reinforcement Learning-Driven AgentsRL aligns LLM outputs with backtest rewards, often leveraging memorized states and technical signals to refine decision-making.
Some frameworks focus on generating alpha factors rather than final trades. Systems like QuantAgent and AlphaGPT iteratively refine alpha scripts through feedback from an LLM-based judge and real-market performance, accelerating systematic strategy development.
TradingAgents assigns each LLM agent a clear role. This mirrors how real trading firms split responsibilities—e.g., fundamental, sentiment, news, and technical analysts gather data, while researchers balance bullish and bearish arguments. A trader synthesizes these inputs, and risk managers ensure exposures stay within safe limits. This structured approach fosters comprehensive coverage of market signals.
The analyst team (Figure 2) covers fundamental, sentiment, news, and technical aspects. Each member focuses on different market signals, providing the basis for research and trading decisions.
(Figure 3) Bullish and bearish researchers debate the analysts’ findings, challenging each other’s viewpoints to produce a balanced outcome.
(Figure 4) Trader agents synthesize all insights to form buy/sell decisions, weighing returns against potential downside.
(Figure 5) Risk managers ensure safety by evaluating volatility, liquidity, and other exposures. They enforce stop-loss measures and signal portfolio rebalancing when necessary.
All agents follow a ReAct-style prompting framework. Their actions—like research, debate, or trade execution—are tracked in a shared environment, creating a cohesive multi-agent ecosystem reminiscent of real trading firms.
Relying solely on natural language can lead to “telephone effect” issues for complex, long-horizon tasks. TradingAgents introduces structured reports to preserve key details and reduce message distortion, drawing inspiration from frameworks like MetaGPT. Each agent produces or queries structured entries—concise and focused—to streamline interactions.
Instead of lengthy dialogues, TradingAgents agents exchange structured documents containing critical data. Short natural language debates occur when merging contrasting opinions (e.g., bullish vs. bearish). Key communication types include:
Debates among researchers or risk managers occur in natural language but are recorded as structured entries. This approach maintains clarity while enabling multi-round reasoning.
We employ both “quick-thinking” and “deep-thinking” LLMs, choosing models based on complexity and speed requirements. Analysts and traders use robust reasoning models for decision-making, while simpler tasks (e.g., data retrieval) rely on faster LLMs. This modular design, requiring no GPUs, allows easy swapping of different local or API-based models and ensures future scalability.
We evaluate our framework on multi-asset data spanning a realistic time period, combining historical prices, news, social sentiment, insider transactions, and more. Baselines include traditional strategies like Buy-and-Hold, MACD, and SMA, ensuring a fair comparison.
Our dataset includes stocks like Apple and Google, daily news, social media sentiment, and technical indicators. Agents process only the data available up to each trading day, avoiding look-ahead bias.
The simulation runs from June 19, 2024, to November 19, 2024. TradingAgents autonomously generates buy, sell, or hold signals, then records performance metrics. This daily cycle repeats for each asset under study.
We benchmark against several baselines:
| Categories | Models | AAPL | GOOGL | AMZN | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CR%↑ | ARR%↑ | SR↑ | MDD%↓ | CR%↑ | ARR%↑ | SR↑ | MDD%↓ | CR%↑ | ARR%↑ | SR↑ | MDD%↓ | ||||
| Market | B&H | -5.23 | -5.09 | -1.29 | 11.90 | 7.78 | 8.09 | 1.35 | 13.04 | 17.1 | 17.6 | 3.53 | 3.80 | ||
| Rule-based | MACD | -1.49 | -1.48 | -0.81 | 4.53 | 6.20 | 6.26 | 2.31 | 1.22 | - | - | - | - | ||
| KDJ&RSI | 2.05 | 2.07 | 1.64 | 1.09 | 0.4 | 0.4 | 0.02 | 1.58 | -0.77 | -0.76 | -2.25 | 1.08 | |||
| ZMR | 0.57 | 0.57 | 0.17 | 0.86 | -0.58 | 0.58 | 2.12 | 2.34 | -0.77 | -0.77 | -2.45 | 0.82 | |||
| SMA | -3.2 | -2.97 | -1.72 | 3.67 | 6.23 | 6.43 | 2.12 | 2.34 | 11.01 | 11.6 | 2.22 | 3.97 | |||
| Ours | TradingAgents | 26.62 | 30.5 | 8.21 | 0.91 | 24.36 | 27.58 | 6.39 | 1.69 | 23.21 | 24.90 | 5.60 | 2.11 | ||
| Improvement(%) | 24.57 | 28.43 | 6.57 | - | 16.58 | 19.49 | 4.26 | - | 6.10 | 7.30 | 2.07 | - | |||
Table 1: TradingAgents: Comparison of Performance Metrics across AAPL, GOOGL, and AMZN.
TradingAgents consistently beats all baselines in risk-adjusted returns, showing Sharpe Ratios above 5.60 and surpassing the nearest competitors by at least 2.07 points. Its adaptability and robust debate mechanism enable high returns with controlled risk.
Rule-based baselines limit downside but sacrifice overall returns. TradingAgents balances both, keeping maximum drawdown below 2% while generating superior returns, aided by dedicated risk-control agents.
Unlike dense deep-learning models, TradingAgents provides transparent logs of its ReAct-style reasoning for every trade decision. This approach greatly enhances human interpretability, facilitating debugging and fine-tuning in real markets.
We introduced TradingAgents, a multi-agent LLM trading framework inspired by professional trading firms. Its specialized analysts, researcher debates, and risk management teams create a rich decision-making ecosystem. By effectively combining structured reports and targeted dialogues, TradingAgents exceeds baseline performance across returns, Sharpe ratio, and drawdown metrics. Future work will explore live trading, expanded agent roles, and real-time data integration for even more refined trading outcomes.