Our Methodology

How gAInXalpha generates
alpha picks & AI-driven portfolios

Every forecast we produce is the output of a rigorous machine learning pipeline grounded in peer-reviewed research. Here’s exactly how it works — from raw market data to calibrated investment signals.

Principle I
Liquidity-based clustering
Models tailored to each security’s trading environment, not a one-size-fits-all approach
Principle II
Competitive model selection
Three AI architectures compete; the best performer wins and is re-evaluated each period
Principle III
Conformal prediction
Every forecast includes statistically guaranteed uncertainty bounds — not just a number
Market data Technical signals Liquidity clusters Model competition Point forecast Top picks & portfolios

1
Feature construction
Daily · Per security

Each trading day, the engine computes a rich panel of technical signals from end-of-day open, high, low, close prices and traded volumes. These span four economic families — all with strong empirical evidence for predicting future returns beyond standard pricing factors.

Momentum (RSI, EMA) Trend (Aroon, CCI) Liquidity (Amihud proxy) Volume (MFI, Force Index) Volatility (Bollinger Bands)

2
Liquidity-based clustering
10 clusters per asset class

Securities are partitioned into ten clusters based on market liquidity, proxied by trading volume. Thinly traded stocks exhibit different return dynamics and sensitivity to market shocks than heavily traded ones. Fitting a single global model across the entire market would either over-fit the liquid tail or miss the signal in illiquid securities.

Each cluster gets its own dedicated forecasting model, allowing the engine to capture local predictive structure with precision.

Why it matters: Academic research shows that machine learning-based return predictability is concentrated in harder-to-arbitrage, illiquid securities. Partitioning by liquidity unlocks this edge rather than diluting it across the full cross-section.


3
Three model families compete
Per cluster · Three forecast horizons

Within each cluster, three state-of-the-art model architectures are trained in parallel. Forecasts are produced across three investment horizons: approximately two weeks, one month, and one quarter of trading days. No single architecture uniformly dominates in return prediction, so we let all three compete.

Neural network
LSTM
Long Short-Term Memory networks capture non-linear temporal dependencies and long-memory patterns in price sequences.
Gradient boosting
LightGBM
Histogram-based gradient boosting with leaf-wise tree growth. Best-in-class speed and accuracy on structured tabular data.
Gradient boosting
GBDT
Gradient-boosted regression trees with a complementary bias–variance profile to LightGBM, providing ensemble diversity.

4
Cluster-specific model selection
Re-evaluated periodically

The best model for each cluster is chosen by out-of-sample predictive accuracy — not assumption. For each cluster and horizon pair, candidates are evaluated on held-out data. The winner is promoted to production. Selection is revisited regularly to track market regime changes, since the best model today may not be the best model tomorrow.

Selection metric
MSE & MAE
Mean squared and absolute error evaluated on held-out, unseen data
Selection scope
Per cluster
The winning model family may differ across liquidity tiers and horizons

5
Conformal prediction intervals
Distribution-free · Finite-sample guarantee

Every point forecast is wrapped in a statistically calibrated prediction interval using conformal prediction. This framework works with any underlying model, requires no assumptions about the shape of the return distribution, and provides a formal guarantee: for a chosen confidence level, the realized price will fall inside the interval at least that often — a property that holds in finite samples, not just asymptotically.

Why this matters for investors: A price target alone is insufficient for building portfolios or managing risk. Conformal intervals provide the calibrated uncertainty required to size positions intelligently, set stop-losses with confidence, and communicate risk to clients with statistical backing.


6
Ongoing monitoring & quality control
Rolling diagnostics

Forecast quality is tracked continuously using two complementary diagnostics. Together, they implement the modern standard for probabilistic forecast evaluation: maximizing sharpness subject to calibration. When the engine detects distributional drift or degraded accuracy, model re-selection is triggered.

Empirical coverage
Realized prices inside interval match target confidence
Systematic under-coverage triggers model review; over-coverage signals excessive conservatism
Interval sharpness
Narrower intervals signal higher forecast precision
Among models with equal coverage, the one with tighter intervals carries more actionable signal