TL;DR. RAG systems chunk pages by heading hierarchy. A flat or broken structure (all H2s, or H1 → H4 with no H2/H3) means the engine cannot isolate the right chunk to cite. Rule: one H1, question-format H2s, scannable H3s, never skip a level. Inspect your page in 2 minutes with the accessibility tree.
Why hierarchy matters for AI
When a RAG system retrieves a page, it does not read top-to-bottom. It extracts chunks defined by heading boundaries. A well-structured page looks like this to the engine:
H1: Main topic
H2: Question 1 → [chunk 1 = content under H2]
H3: Sub-point → [sub-chunk]
H2: Question 2 → [chunk 2]
H2: Question 3 → [chunk 3]A broken page looks like this:
H1: Main topic H4: Random sub-point (skipped H2, H3) (no H2 at all) Random bolded text that should be an H2 but is just a <strong> tag, not a heading.
On the broken page, the engine cannot decide where chunks begin and end. It either quotes the whole page (too long, gets down-weighted) or quotes nothing (you are not cited).
The 4 rules
Rule 1: Exactly one H1 per page - The H1 is the page's title statement. Matches or closely paraphrases the target prompt. One per page. Not two. Not "H1 styled as H2" with a div class. A real <h1> tag.
Rule 2: H2s in question format - H2s are the major section questions. Phrase them as the reader would ask them. Not:
- "Features"
- "Use Cases"
- "Why Choose Us"
Use:
- "What features does Rankscale support?"
- "Who should use Rankscale?"
- "Why use Rankscale over [alternative]?"
Question-format H2s double as FAQ candidates the engine can extract as direct answers.
Rule 3: H3s are scannable sub-points - H3s break an H2 section into sub-points. Keep them short, specific, and parallel. If your H3s under an H2 do not logically belong together, your H2 is too broad.
Rule 4: Never skip a level - H1 → H2 → H3 → H4. Never H1 → H3 directly. Skipping levels breaks the chunking logic and most accessibility tools flag it.
The 2-minute diagnostic
- Kick off our Page Audit V2 pass in Rankscale and read the HTML / heading-structure scores like a 2-minute QA gate
- Read your scores on the HTML / heading structure (schema in the sense of page outline)
- Improve your page outline based on the missing aspects and recommended actions to take
Pass criteria:
- One H1, matching the page topic
- 3 to 8 H2s, each a question
- H3s grouped logically under H2s
- No skipped levels
- No
divorspanmasquerading as headings with CSS
Any fail = hierarchy gap.
The most common failure modes
- Designer-built pages. Marketing pages often use styled
divtags instead of real heading tags. Visual hierarchy looks right; semantic hierarchy is absent. - CMS autogenerated pages. Blog templates often wrap the title in
<h2>and use<h1>for the site logo. Invert it. - Multiple H1s from template modules. Hero banners, CTA sections, and testimonial widgets sometimes ship with their own H1. Audit every imported component.
Do this now:
If multiple H1s, skipped levels, or marketing headings masquerading as questions still show up, log back into Page Audit V2 and use its outline diagnostics to justify the rewrite ticket with screenshots. Fix time is typically 1 hour per page.
Start improving your AI visibility today with Rankscale.
Get started