How do I A/B test answer formats for AI citation share?
A/B testing AEO is mechanically different from traditional CRO because you're optimizing for retrieval and quotation behavior across opaque models, not user clicks. You can't run a standard split test — the same URL serves both variants to the AI engine. Instead, you run sequential tests where you publish one version, measure citation share for 14–28 days, then swap to the variant and measure again.
The 5-step AEO A/B test protocol: (1) Define a fixed prompt set — 50–200 prompts that should plausibly cite the page under test, with stable wording. (2) Establish a baseline by running the prompt set daily for 14 days against the current page. Average citation share across engines is your control. (3) Ship the variant — change only one element (lead paragraph, schema type, FAQ block, citation density). (4) Re-run the same prompt set for 14–28 days post-change. AI engines need 7–14 days to re-crawl and re-rank most pages; freshness-sensitive topics re-index faster. (5) Compare citation share with a confidence interval — variance is high, so require at least a 5-percentage-point lift on a 100-prompt set to call a winner.
High-leverage variables worth testing in order of typical impact: lead paragraph format (declarative vs. expository), FAQPage schema (present/absent), HowTo schema (where applicable), citation density (number of named statistics per 500 words), word count (800 vs. 1,500 vs. 3,000), and heading question phrasing. Surfaced runs these tests automatically across customer content, accumulating a multi-tenant dataset on which patterns generalize.