Our Methodology

How gAInXalpha generates
alpha picks & AI-driven portfolios

Every forecast we produce is the output of a rigorous machine learning pipeline grounded in peer-reviewed research. Here’s exactly how it works — from raw market data to calibrated investment signals.

Principle I

Liquidity-based clustering

Models tailored to each security’s trading environment, not a one-size-fits-all approach

Principle II

Competitive model selection

Three AI architectures compete; the best performer wins and is re-evaluated each period

Principle III

Conformal prediction

Every forecast includes statistically guaranteed uncertainty bounds — not just a number

Market data › Technical signals › Liquidity clusters › Model competition › Point forecast › Top picks & portfolios

Feature construction

Daily · Per security

Each trading day, the engine computes a rich panel of technical signals from end-of-day open, high, low, close prices and traded volumes. These span four economic families — all with strong empirical evidence for predicting future returns beyond standard pricing factors.

Momentum (RSI, EMA) Trend (Aroon, CCI) Liquidity (Amihud proxy) Volume (MFI, Force Index) Volatility (Bollinger Bands)

Liquidity-based clustering

10 clusters per asset class

Securities are partitioned into ten clusters based on market liquidity, proxied by trading volume. Thinly traded stocks exhibit different return dynamics and sensitivity to market shocks than heavily traded ones. Fitting a single global model across the entire market would either over-fit the liquid tail or miss the signal in illiquid securities.

Each cluster gets its own dedicated forecasting model, allowing the engine to capture local predictive structure with precision.

Why it matters: Academic research shows that machine learning-based return predictability is concentrated in harder-to-arbitrage, illiquid securities. Partitioning by liquidity unlocks this edge rather than diluting it across the full cross-section.

Three model families compete

Per cluster · Three forecast horizons

Within each cluster, three state-of-the-art model architectures are trained in parallel. Forecasts are produced across three investment horizons: approximately two weeks, one month, and one quarter of trading days. No single architecture uniformly dominates in return prediction, so we let all three compete.

Neural network

LSTM

Long Short-Term Memory networks capture non-linear temporal dependencies and long-memory patterns in price sequences.

Gradient boosting

LightGBM

Histogram-based gradient boosting with leaf-wise tree growth. Best-in-class speed and accuracy on structured tabular data.

Gradient boosting

GBDT

Gradient-boosted regression trees with a complementary bias–variance profile to LightGBM, providing ensemble diversity.

Cluster-specific model selection

Re-evaluated periodically

The best model for each cluster is chosen by out-of-sample predictive accuracy — not assumption. For each cluster and horizon pair, candidates are evaluated on held-out data. The winner is promoted to production. Selection is revisited regularly to track market regime changes, since the best model today may not be the best model tomorrow.

Selection metric

MSE & MAE

Mean squared and absolute error evaluated on held-out, unseen data

Selection scope

Per cluster

The winning model family may differ across liquidity tiers and horizons

Conformal prediction intervals

Distribution-free · Finite-sample guarantee

Every point forecast is wrapped in a statistically calibrated prediction interval using conformal prediction. This framework works with any underlying model, requires no assumptions about the shape of the return distribution, and provides a formal guarantee: for a chosen confidence level, the realized price will fall inside the interval at least that often — a property that holds in finite samples, not just asymptotically.

Why this matters for investors: A price target alone is insufficient for building portfolios or managing risk. Conformal intervals provide the calibrated uncertainty required to size positions intelligently, set stop-losses with confidence, and communicate risk to clients with statistical backing.

Ongoing monitoring & quality control

Rolling diagnostics

Forecast quality is tracked continuously using two complementary diagnostics. Together, they implement the modern standard for probabilistic forecast evaluation: maximizing sharpness subject to calibration. When the engine detects distributional drift or degraded accuracy, model re-selection is triggered.

Empirical coverage

Realized prices inside interval match target confidence

Systematic under-coverage triggers model review; over-coverage signals excessive conservatism

Interval sharpness

Narrower intervals signal higher forecast precision

Among models with equal coverage, the one with tighter intervals carries more actionable signal

How gAInXalpha generatesalpha picks & AI-driven portfolios

How gAInXalpha generates
alpha picks & AI-driven portfolios