⌘K
v1.0.7
Lociator

Orphan Page Detection

Orphan pages are one of the most common — and most impactful — internal linking issues. Lociator uses a 3-phase crawl architecture with sitemap discovery to accurately identify pages that have zero incoming internal links.

What are Orphan Pages?

An orphan page is a page on your website that has zero incoming internal links (in-degree = 0). While it may exist in your sitemap or be accessible via direct URL, no other page on your site links to it. Search engine crawlers following internal links will never discover it naturally.

Why BFS Alone Can't Find Orphans

A pure BFS crawl starts from the root URL and follows internal links. This means every page discovered by BFS was found through a link — so its in_degree is always ≥ 1. The orphan check (in_degree === 0 && depth !== 0) would be structurally impossible to be true with BFS alone.

ℹ️To detect orphan pages, we need an external URL source (like a sitemap) to compare against the BFS-visited set. Pages found in the sitemap but NOT found by BFS are orphan candidates.

3-Phase Crawl Architecture

The crawler uses a three-phase approach to combine BFS link-following with sitemap-based discovery:

  1. Phase 1 — BFS Crawl: Standard breadth-first crawl starting from the root URL. Follows internal links, builds the core graph. This is the same as a normal crawl.
  2. Phase 2 — Sitemap Discovery: After BFS finishes, fetches and parses the site'ssitemap.xml. URLs found in the sitemap but NOT visited during BFS are added to the queue at depth = maxBfsDepth + 1 with no parent page.
  3. Phase 3 — Orphan Expansion: Continues the BFS loop for sitemap-discovered pages. If an orphan page contains links to other new pages, those pages get normal BFS treatment atdepth + 1.
// 3-Phase Crawl (simplified from CrawlService)

// Phase 1: BFS crawl
await processBfsQueue();  // Standard link-following

// Phase 2: Enqueue sitemap-only URLs
const sitemapUrls = await sitemapParser.fetchSitemapUrls(rootUrl);
const maxBfsDepth = Math.max(...results.map(p => p.depth), 0);

for (const url of sitemapUrls) {
  if (!visited.has(url)) {
    queue.push({
      url,
      depth: maxBfsDepth + 1,
      parentNormalizedUrl: null,  // No parent → in_degree stays 0
    });
  }
}

// Phase 3: BFS for orphan candidates
await processBfsQueue();  // Same loop, now processes sitemap URLs

Sitemap Discovery

The SitemapParserService handles sitemap fetching and parsing:

  • Fetches /sitemap.xml with fallback to /sitemap_index.xml
  • Recursively handles sitemap index files (max depth 3)
  • Parses <loc> elements using Cheerio
  • Filters to internal-only, non-asset URLs
  • Normalizes all discovered URLs
  • Caps at 10,000 URLs with a 5-second timeout
  • Gracefully returns an empty set on any error (sitemap is optional)

Orphan Classification Logic

After all pages and links are inserted into the database, the GraphServicecomputes in-degree for every page. The classification is straightforward:

// From GraphService.buildGraph()
const isOrphan = inDegree === 0 && page.depth !== 0;
  • in-degree = 0: No other crawled page links to this page.
  • depth ≠ 0: Excludes the root page (homepage), which naturally has no incoming internal links from the crawl's perspective.

Sitemap-discovered pages enter the queue with parentNormalizedUrl: null, meaning they have no BFS parent. Their in_degree depends entirely on whether other crawled pages link to them — if no page links to them, they are classified as orphans.

Impact on SEO & Scoring

Orphan pages directly affect your Architecture Score through the Orphan Score (weighted at 10%):

  • Crawlability: Search engines may never discover orphan pages via link crawling.
  • Indexing: Even if discovered via sitemap, orphan pages receive lower crawl priority.
  • Link equity: Orphan pages receive zero internal link equity from the rest of your site.
  • Scoring formula: orphanScore = (1 - orphanCount / totalPages) × 100

Fixing Orphan Pages

Strategies to resolve orphan pages:

  1. Add contextual links: Link to the orphan page from topically related content within your site.
  2. Update navigation: Include the page in your site's navigation, footer, or sidebar menus.
  3. Create hub/pillar pages: Build category pages that link to groups of related content — this also boosts your Pillar Score.
  4. Check for link issues: Sometimes orphans result from broken links, JavaScript-only navigation, or nofollow attributes.
  5. Remove if unnecessary: If the page serves no purpose, consider removing or redirecting it.
⚠️After fixing orphan pages, re-crawl your site to verify the changes. The orphan count and Architecture Score will update automatically.