Skip to main content
Profitec AI

Case Study / 01 — Market Intelligence Crew

8 agents, 200+ features, a 7-check risk gate — and a backtest that beat SPY buy & hold.

Built for a trading desk. CrewAI hierarchical orchestration, LightGBM with walk-forward validation, FinBERT and LunarCrush sentiment, Interactive Brokers integration. Out-of-sample across 90 tickers: +323.1% return, Sharpe 0.82, 53.9% win rate, 1,910 trades.

8

Agents

200+

ML features

7

Risk gates

10+

Data sources

A note on this case. This case is included as a technical systems example, not as a financial performance promise. The purpose is to show multi-agent orchestration, data validation, risk gates, and refusal logic.

The system in motion

Eight agents synthesised. One verdict per trade.

The CRO orchestrator receives each specialist’s signal, weighs it against the LightGBM and HMM priors, and produces a single direction with conviction — or refuses to enter. Across 1,910 trades on 90 tickers, the resulting strategy returned +323.1% while SPY buy & hold lagged.

The manual baseline

Before the system, a single research analyst reviewed every SEC filing, earnings call, peer comparison, technical chart, and sentiment shift by hand — then made a call. Coverage: one or two instruments per day, with no audit trail of why a decision was made or rejected.

The crew

Eight specialists. One synthesis. The CRO has final word.

Hierarchical orchestration via CrewAI Process.hierarchical

market_intel_crew.flow · live

● running
CRO · Risk Manager
└── synthesis

Specialists →
  ├── Fundamental    (10-K · earnings)
  ├── Competitor     (peers · P/E)
  ├── Sentiment      (news · social)
  ├── Technical      (price · volume)
  ├── Quant          (Z · ATR · HMM)
  ├── Dev Infra      (APIs · risk)
  └── ML Data QA     (leakage · quality)

ML Brain (parallel signal) ┄┄→
  └── LightGBM · 200+ features

The crew

One card per agent. Real prompts. Real tools.

01fundamental_task

Fundamental Analyst

Lead Research

Analyze 10-K and 10-Q filings. Identify revenue drivers, margin risks, earnings quality, and guidance shifts.

Tools

  • sec_filing
  • earnings_cal
  • av_overview
02competitor_task

Competitor Analyst

Market Positioning

Compare the asset to peers on P/E, PEG, revenue growth, and margin profile. Score relative positioning.

Tools

  • polygon_profile
  • av_overview
03sentiment_task

Sentiment Analyst

Narrative Voyager

Read news and social signals. Identify the dominant narrative, sentiment shift, and retail-flow impact.

Tools

  • news_sentiment
  • social_sentiment
  • lunarcrush
  • serper_news
  • market_sentiment
04technical_task

Technical Analyst

Price & Volume

Read price action, volume, order flow, HFT-interest zones, support/resistance, and VPVR context. Define trade setup.

Tools

  • ohlcv
  • orderbook
  • crypto_ohlcv
  • tiingo
  • ibkr
  • chart:technical
  • chart:candlestick
  • chart:s_r
  • chart:indicators
05quant_task

Quantitative Researcher

Math Core

Compute and interpret quantitative features: Z-score, ATR, Hurst exponent, ADX, and market-regime hypotheses (HMM).

Tools

  • ohlcv
  • crypto_ohlcv
  • tiingo
  • chart:quant
  • chart:indicators
06dev_infra_task

Senior Developer

Infrastructure

Audit pipeline stability: API availability, error handling, code security, infrastructure readiness for backtest.

Tools

  • sec_filing
  • ohlcv
  • ibkr
07ml_data_quality_task

Senior ML Engineer

Model Optimization

Assess data quality for ML readiness: completeness, consistency, leakage risks, bias, and overfitting indicators.

Tools

  • ohlcv
  • crypto_ohlcv
  • news_sentiment
  • serper_news
08risk_synthesis_task

Risk Manager

Chief Risk Officer · CRO

Synthesize the team's findings. Define trade decision, entry/exit levels, stop-loss, and position size using ATR and risk limits. Must explicitly declare agreement with the ML signal.

Tools

no direct tools — delegation only

Prompt internals

Two of the eight system prompts shown verbatim. The CRO prompt below is bilingual — the left column is the actual prompt running in production, the right column is the English reference.

01View prompt: Fundamental AnalystEN
role: "Fundamental Analyst (Lead Research)"

goal: Analyze 10-K and 10-Q filings.
  Identify revenue drivers, margin risks,
  earnings quality, and guidance shifts.

backstory: Senior buy-side fundamental analyst
  with experience valuing public companies in
  the US and Europe. Extracts key factors from
  SEC filings and translates them into
  investment conclusions.
08View prompt: Risk Manager · CROEN
role: "Risk Manager (CRO)"

goal: Synthesize the team's findings.
  Define the trade decision, entry/exit
  levels, stop-loss, and position size
  using ATR and risk limits.

backstory: You are the Chief Risk Officer
  with final-call authority. Pragmatic,
  disciplined in risk-adjusted thinking,
  requires transparent reasoning.

The ML brain

Not a prompt. A model.

Separate from the agent crew, a LightGBM 3-class classifier (long / flat / short) produces directional probabilities trained on 200+ engineered features: technical indicators, FRED macro data, FinBERT-scored news sentiment, and LunarCrush social signals. Validated walk-forward — not one-shot — with purged splits to prevent leakage near boundaries.

A 3-state Gaussian HMM runs in parallel, classifying market regimes as bear / range / bull with posterior entropy as a confidence score. Both signals feed into the CRO as a quantitative prior — not a vote in a poll.

  • LightGBM edge: P(long) − P(short) ∈ [−1, +1]
  • Walk-forward, purged splits, optional meta-labeling
  • 3-state Gaussian HMM, leak-safe (causal filtering, never future smoothing)
  • Triple-barrier labeling (Lopez de Prado) or fixed-horizon, ATR-scaled
  • technical · 30+
  • macro · FRED
  • sentiment · FinBERT
  • social · LunarCrush
  • regime · HMM

Total features

202+

Categories

8

Largest cluster

Quant · 44

Feature space · cluster map

200+ engineered features across eight categories · projected layout (illustrative)

Technical (price · volume)×32Macro (FRED)×24Sentiment (FinBERT news)×30Social (LunarCrush)×22Quant (Z · ATR · HMM)×44Regime (HMM-derived)×12Engineered (cross-terms · lags)×26Data-quality flags×12
  • Technical (price · volume)×32
  • Macro (FRED)×24
  • Sentiment (FinBERT news)×30
  • Social (LunarCrush)×22
  • Quant (Z · ATR · HMM)×44
  • Regime (HMM-derived)×12
  • Engineered (cross-terms · lags)×26
  • Data-quality flags×12

v1 of this system had hmm_regime_detection_placeholder() returning '[STUB] not implemented yet'. The current implementation in src/features/regime.py is a real GaussianHMM with leak-safe walk-forward fitting. The stub was kept in the codebase as a visible reminder of the maturation path.

Synthesis

The CRO never just averages opinions.

The Risk Manager agent receives every specialist's structured output plus the ML signal as a separate prior. It must produce a single JSON decision — conviction, entry, stop, take-profit, position size — all ATR-driven, not gut estimates.

Most importantly: the CRO must explicitly declare its agreement with the ML signal. The output schema has a required agreement_with_ml field: agree | partial | disagree. If the CRO disagrees with the quant prior, it must populate disagreement_reason with a concrete cause — a news event, a regime shift, a data gap.

This is institutional discipline encoded in a JSON schema. No silent overrides, no convenient consensus.

08View the JSON schema the CRO must populate
{
  "direction": "long | short | flat",
  "conviction": 0.0–1.0,
  "entry": <number>,
  "stop": <number>,
  "take_profit": <number>,
  "position_size_pct": 0.0–1.0,
  "rationale": "<short reason>",
  "risks": ["risk1", "risk2"],
  "invalidation": "<what kills the thesis>",
  "agreement_with_ml": "agree | partial | disagree",
  "disagreement_reason": "<required if disagree>"
}

The risk gate

Built with control, not blind automation.

“Fail-closed: any ambiguous state returns allowed=False. Graceful degradation only for buying-power.”

— Comment in src/risk/gate.py

01

Daily-loss kill switch

Realized loss above 3% of session-start equity flips the session kill-switch. No further positions open until manual reset.

daily_loss_pct: 0.03

02

Position size silent shrink

LLM proposes 25%? Gate silently shrinks to the 10% hard cap with an audit-trail entry. The LLM never sees the override.

max_position_pct: 0.10

03

Stop distance sanity

Stop further than 5% from entry? Trade is forced to flat. No retry path, no escalation.

max_stop_distance_pct: 0.05

If the LLM proposes a 25% position size — maybe high conviction on a clean setup — the gate doesn't argue with it. It shrinks the size to 10% in place, logs size_shrunk_cap:0.2500->0.1000, and forwards the order. The LLM never sees the override. It can't retry. It can't escalate. The cap is non-negotiable.

07View all 7 risk checks
  • 01

    Daily-loss kill-switch

    Session-scoped. Trips when realized loss exceeds threshold. Manual reset required.

    daily_loss_pct: 0.03
  • 02

    Concurrent position cap

    Maximum simultaneous open positions across the strategy.

    max_concurrent_positions: 3
  • 03

    Buying-power / leverage

    Hard leverage ceiling. Position size shrinks gracefully when nearing limit.

    max_leverage: 4.0
  • 04

    Cash buffer floor

    Minimum cash to retain as buffer. Below this, no new positions open.

    min_cash_pct: 0.10
  • 05

    Position size hard cap

    Upper bound on what the LLM can propose per trade. Silently enforced.

    max_position_pct: 0.10
  • 06

    Stop distance sanity

    Maximum permitted stop distance from entry. Trades forced to flat if exceeded.

    max_stop_distance_pct: 0.05
  • 07

    Overnight position cap

    Maximum positions carried into the next session.

    overnight_cap: 1

Below the gate, every external API call has rate limiting, exponential backoff, fallback chains (FMP → yfinance, Polygon → cached profile), and redact_secrets() scrubbing all 12 known API keys and URL token patterns from logs. The CRO never sees a leaked credential. The user never sees a stuck pipeline.

The result

The strategy beat SPY buy & hold across 90 tickers.

Out-of-sample walk-forward backtest across 91 tickers loaded, 90 traded, 1,910 trades, average hold of 7 bars: Total return +323.1%. Sharpe 0.82. Sortino 1.03. Profit factor 1.33. Win rate 53.9%. Max drawdown −24.6%. Calmar 0.59.

The strategy outperformed the SPY buy & hold benchmark on the same window. The risk gate kept max drawdown contained at −24.6% while position-size caps and the CRO veto blocked the high-conviction trades that did not pass the ML prior — preventing the kind of concentration losses that show up after a regime shift.

The engineering — agent crew, ML model, risk gate, CRO orchestrator — is what produced both the upside and the contained downside.

Total return

+323.1%

Sharpe

0.82

Sortino

1.03

Max drawdown

−24.6%

Win rate

53.9%

Profit factor

1.33

Total trades

1,910

Tickers traded

90 / 91

Calmar

0.59

Most agencies show you a win without the engineering. We show you both — the metrics and the system that produced them.

The engineering

Things you only notice when they're missing.

Multi-LLM with cost discipline — Qwen 32B via Aliyun is the default; GPT-4-turbo and Gemini 2.0 Flash are drop-in alternatives via env var. No vendor lock-in.

LaunchDarkly feature flagsdemo-rollout-enabled for percentage rollout; agent-comments-level toggles agent verbosity (brief / full) at runtime. Local fallback works without the SDK.

PDF report generationfpdf2 with DejaVuSans for Cyrillic, ANSI escape stripping, long-token wrapping, last-resort line-by-line render when multi_cell fails.

Preflight model ping — a 16-token call to the configured LLM before the crew runs. Fail-fast on bad keys, bad quota, or bad base URL. Never burn 90 seconds in agent #1 to discover the API is dead.

Secret redaction across logsredact_secrets() scrubs 12 known env-key values and 5 URL-token patterns from every error message and log line.

The pattern

Crypto trading is the case study. The architecture isn't crypto-specific.

Anywhere a business needs to synthesize a decision from many noisy signals — and where a wrong call has real consequences — this architecture transplants directly. The pattern is multi-specialist + ML prior + manager synthesizer + fail-closed gate. The domain plugs in. Specialist roles, ML features, and risk limits change. Everything else stays.

  • Competitive intelligence — specialists watch pricing, hiring, releases, regulatory filings. Manager synthesizes weekly strategic brief.
  • Regulatory monitoring — specialists track jurisdictions, document types, comment periods. Manager flags items requiring legal action with deadlines.
  • Customer sentiment — specialists pull support tickets, reviews, social, churn signals. Manager produces escalation list.
  • Supplier risk — specialists track financial health, news, geopolitical exposure, alternative-supplier mapping. Manager scores per-vendor risk.
  • News triage — specialists categorize by topic, sentiment, source reliability. Manager assembles executive morning brief with action items only.
Next step

Want a multi-agent system for your domain?

15-minute fit call. We map the agents, signals, and gates around your decision.