Skip to content

grahamton/merchGent

Repository files navigation

merch-connector

npm version License: MIT Node.js >= 18 MCP Server

An MCP server that gives AI agents eyes on any e-commerce storefront.

Scrape product listings, extract facets, badges, sort options, and B2B signals; run AI-powered merchandising audits; compare two storefronts side-by-side; detect what changed between visits; and build persistent memory about sites — all through the Model Context Protocol.


Why merch-connector?

E-commerce merchandising analysis is manual, repetitive, and fragmented. A merchandiser might spend hours clicking through competitor sites, checking if filters work, comparing product grids, and noting what's changed. AI agents can do this work — but they can't see storefronts the way shoppers do.

merch-connector bridges that gap. It gives any MCP-compatible AI agent (Claude, custom agents, etc.) the ability to:

  • Browse any storefront with a stealth headless browser that handles bot protection
  • Extract structured product data, facets, performance metrics, and page structure
  • Analyze merchandising quality through five expert personas or a full roundtable debate
  • Remember site quirks across sessions so the agent gets smarter over time
  • Track changes across visits — new products, price moves, facet/sort changes

Quick start

npx merch-connector

The server communicates over stdio and is designed to be launched by an MCP client, not run standalone.

Configuration

Add to your Claude Desktop claude_desktop_config.json or Claude Code .mcp.json:

{
  "mcpServers": {
    "merch-connector": {
      "command": "npx",
      "args": ["-y", "merch-connector"],
      "env": {
        "ANTHROPIC_API_KEY": "your_key_here"
      }
    }
  }
}

To enable Firecrawl (bypasses bot-protected sites like Ferguson/Akamai) or pass any other env vars, add them to the env block:

"env": {
  "ANTHROPIC_API_KEY": "your_key_here",
  "FIRECRAWL_API_KEY": "fc-..."
}

Or install globally: npm install -g merch-connector

Environment variables

Variable Required Description
ANTHROPIC_API_KEY One of these Anthropic Claude API key
GEMINI_API_KEY One of these Google Gemini API key
OPENAI_API_KEY One of these OpenAI or OpenAI-compatible API key
OPENAI_BASE_URL No Base URL for OpenAI-compatible endpoint. Defaults to https://api.openai.com/v1
MODEL_PROVIDER No Force "anthropic", "gemini", "openai", or "ollama". Auto-detected if omitted.
MODEL_NAME No Override default model. Required when using Ollama.
OPENAI_VISION No Set "true" to pass screenshots to OpenAI-compatible vision models
FIRECRAWL_API_KEY No Enables Firecrawl as a fallback scraper in acquire — only used when Puppeteer is blocked by a WAF (0 products + FCP=0). Puppeteer always runs first.
MERCH_CONNECTOR_DATA_DIR No Custom path for site memory files. Default: ~/.merch-connector/data/
TOOL_TIMEOUT_MS No AI tool timeout in ms. Default: 120000 (2 min)
MERCH_LOG_FILE No Path to NDJSON log file. If set, every server log entry is appended.
LIGHTPANDA_CDP_URL No Connect to an external Lightpanda/Chrome CDP endpoint instead of launching Puppeteer's bundled Chromium. Server-side optimization only — standard npx users can ignore this.

You only need an API key for AI-powered tools (ask_page, merch_roundtable, analyze_products). Scraping tools work without one.

Using Ollama (local models)

{
  "mcpServers": {
    "merch-connector": {
      "command": "npx",
      "args": ["-y", "merch-connector"],
      "env": {
        "MODEL_PROVIDER": "ollama",
        "MODEL_NAME": "qwen2.5:14b",
        "OPENAI_BASE_URL": "http://localhost:11434/v1"
      }
    }
  }
}

AI analysis tools degrade gracefully if no provider is configured — scraping still works, analysis returns an error instead of crashing.


Tools

Tool Description Needs AI key?
acquire Primary scraping tool. One-pass audit payload — products, facets, screenshots, performance, trust signals, navigation, data quality, analytics, and PDP samples in a single call No
analyze_products Run persona analysis on pre-scraped data. Pass a products/facets JSON payload (from acquire, a CSV export, or any source) and get the full 5-persona analysis without touching a browser Yes
merch_roundtable Three expert personas analyze in parallel, then a moderator synthesizes consensus (results stream as each persona completes) Yes
ask_page Scrape a page and ask any question about it in plain language Yes
compare_storefronts Structured side-by-side diff of two URLs: facet gaps, trust signals, sort options, B2B mode, performance No
scrape_pdp Scrape a single product detail page — description fill rate, image count, reviews, spec table, cross-sell modules, CTA text, price No
get_category_sample Sample PDPs from a category page using spread/random/top strategy No
interact_with_page Execute one or more search/click actions in sequence, then extract the result No
site_memory Read/write persistent notes and learned data about any domain No
clear_session Reset stored cookies and page cache for a domain No
save_eval Persist a roundtable run as a structured eval record with convergence score No
list_evals Retrieve eval history for a domain or all domains No
get_logs Retrieve recent server log entries from the in-memory buffer, filterable by level or tool name No
scrape_page (Deprecated — use acquire) Raw structured extraction from any category page No

Examples

acquire

Pull everything needed for a full storefront audit in one call

{
  "url": "https://www.zappos.com/women/CK_XARC81wHAAQHiAgMBAhg.zso",
  "pdp_sample": 2
}

Returns the complete audit payload: products with trust signals, facets, sort, navigation structure, data quality scores, analytics platform detection, performance timings, desktop + mobile screenshots, and 2 sampled PDPs — ready for the plugin to score.

ask_page

"Recommend facet changes for this laptop category page"

{
  "url": "https://www.insight.com/en_US/shop/category/notebooks/store.html",
  "question": "Recommend facet changes?"
}

Brand/Manufacturer — Most glaring omission. 50 products span 6+ brands (HP, Lenovo, Apple, Microsoft, Dell, Crucial). B2B buyers with vendor agreements need this as facet #1.

Price range buckets are misaligned. "Below $50" (2 items) signals category contamination — confirmed by a Crucial RAM stick appearing in laptop results. Clean up category mapping and re-bucket starting at $500.

merch_roundtable

The roundtable scrapes once, then runs three AI analyses in parallel followed by a moderator synthesis:

  1. Floor Walker — reacts as a real shopper ("I can't find Dell laptops without scrolling through 50 products")
  2. Auditor — evaluates Trust/Guidance/Persuasion/Friction ("0% facet detection rate, title normalization at 70%")
  3. Scout — identifies competitive gaps ("every competitor in B2B tech has brand filtering as facet #1")
  4. Moderator — synthesizes consensus, surfaces disagreements, produces prioritized recommendations

B2B Auditor automatically substitutes for Auditor when B2B signals are detected.


Personas

Five expert lenses for merchandising analysis. Use individually via ask_page or merch_roundtable.

Persona Role Voice
Floor Walker A shopper visiting for the first time First-person, casual, instinctive — "I don't know what button to click"
Auditor Compliance analyst with a framework Metric-driven, precise — "Fill rate is 82%, 3/10 titles lack brand prefix"
Scout VP of Merchandising at a competitor Strategic, comparative — "This is table-stakes for the category"
B2B Auditor Procurement buyer evaluating a vendor Process-driven — scores steps-to-PO, spec completeness, pricing transparency, self-serve viability
Conversion Architect CRO specialist mapping the purchase funnel Analytical, hypothesis-driven — "checkout button is below the fold on mobile, estimated −8% conversion"

Each persona returns score (0–100), severity (1–5), findings[] (3–5 concrete observations), and uniqueInsight — the one thing only that lens would catch.


Architecture

MCP Client (Claude, etc.)
    |
    | stdio (JSON-RPC)
    |
merch-connector (Node.js MCP server)
    |
    +-- acquire.js       One-pass audit entry point; Puppeteer-first waterfall (Firecrawl fallback for WAF-blocked sites)
    +-- scraper.js       Puppeteer + stealth plugin, structure detection, PageFingerprint
    +-- analyzer.js      Multi-provider AI (Anthropic / Gemini / OpenAI), 5 personas
    +-- network-intel.js XHR interception, 35-platform fingerprint, dataLayer/GA4 parsing
    +-- site-memory.js   Persistent per-domain JSON store + change detection snapshots
    +-- eval-store.js    JSONL eval index + full run storage, convergence scoring
    +-- prompts/         Persona prompt files (floor-walker, auditor, scout, b2b-auditor, conversion-architect)
  • Scraping: Puppeteer with stealth plugin bypasses bot detection. Two-pass heuristic structure detection finds product grids on unknown sites. Extracts products, facets, trust signals (ratings, badges, stock warnings), performance timing, and screenshots. Firecrawl integration (FIRECRAWL_API_KEY) provides LLM-based extraction as a primary path for bot-protected sites.
  • Network intelligence: Intercepts XHR/fetch during page load to fingerprint the commerce stack (Algolia, Bloomreach, SFCC, Shopify, Elasticsearch, and 30+ more). When a high-confidence match is found, extracts product and facet data directly from the API response — bypassing DOM parsing failures on enterprise storefronts.
  • Analysis: Three-provider AI — Anthropic uses tool_choice forcing for structured JSON; Gemini uses responseSchema; OpenAI-compatible uses function calling with a JSON-prompt fallback. Dynamic imports load only the needed SDK. ask_page uses Haiku-class models for fast Q&A; persona analysis uses Sonnet-class.
  • Personas: Five expert lenses. merch_roundtable runs Floor Walker, Auditor, and Scout in parallel then passes results to a moderator that synthesizes consensus and disagreements. B2B Auditor auto-substitutes for Auditor when B2B mode is detected.
  • Memory: Auto-learns site patterns on every scrape. Normalized snapshots enable change detection across visits — price moves, new/removed products, facet/sort changes. Manual notes persist across sessions.
  • Evals: Two-tier storage — compact JSONL index (100 runs/domain) + full run JSON (10/domain). Convergence score (0–100) measures inter-persona agreement. Dedup hashing prevents double-saves.

Development

git clone https://github.com/grahamton/merchGent.git
cd merchGent
npm install
cp .env.example .env   # fill in at least one AI API key

Running tests

npm test                              # scrape-only (no API key needed)
npm run test:audit                    # full merchandising audit
npm run test:persona                  # single persona (floor_walker)
npm run test:roundtable               # all 3 personas + moderator
node test/smoke.js --b2b              # B2B validation: Insight.com laptops + b2b_auditor
node test/smoke.js --ask "question"   # ask anything about a page
node test/smoke.js --url https://...  # override default URL
node test/protocol.js                 # MCP protocol compliance (no browser/API key needed)

MCP Inspector

npx @modelcontextprotocol/inspector -- node bin/merch-connector.js

Opens a browser UI where you can call any tool interactively.


Tool reference

acquire

One-pass audit payload. The primary tool in v2 — replaces the multi-step scrape_page + analysis workflow. Returns everything the audit pipeline needs in a single call.

Parameter Required Description
url Yes Full URL to acquire
pdp_sample No Number of PDP samples to include (0–5, default 2). Auto-selects median-priced + premium (80th percentile) products.

Returns:

  • page — title, metaDescription, pageType, breadcrumb, h1
  • commerce — mode (B2B/B2C/Hybrid), platform, priceTransparency, loginRequired
  • products[] — normalized with trust signals, B2B/B2C indicators, description quality
  • facets[], sort — filter panel and sort state
  • navigation — hasFilterPanel, filterPanelPosition, hasStickyNav, breadcrumbPresent
  • trustSignals — ratingsOnCards, freeShippingPromised, returnPolicyVisible, urgencyMessaging
  • dataQuality — descriptionFillRate, ratingFillRate, priceFillRate
  • analytics — platform detection, GTM containers, ecommerce tracking status, productImpressionsFiring
  • performance — fcp, lcp, cls, domContentLoaded, loadComplete
  • pdpSamples[] — sampled PDP detail pages
  • screenshots — desktop + mobile base64 JPEG
  • warnings[] — structured quality flags with severity
  • scraper"firecrawl" or "puppeteer" (which path was used)

scrape_page

(Deprecated — use acquire) Raw structured extraction. Returns products (title, price, stock, CTA, description, B2B/B2C signals, trust signals), facets/filters, sort options, B2B mode + conflict score, page metadata, performance timing, data layers, interactable elements, and PageFingerprint. On repeat visits, also returns a changes diff.

Parameter Required Description
url Yes Full URL to scrape
depth No Pagination pages to follow (1–5, default 1)
max_products No Max products per page (default 10)
include_screenshot No Include base64 JPEG desktop screenshot (default false)
mobile_screenshot No Also capture a 390×844 (iPhone 14) mobile screenshot (default false)

Trust signals per product: star rating, review count, sale badge + text, best seller flag, stock warning ("Only 3 left"), sustainability label, raw badge texts.

compare_storefronts

Scrape two URLs concurrently and return a structured diff. No AI call — pure structural analysis.

Parameter Required Description
url_a Yes First URL (your site or baseline)
url_b Yes Second URL (competitor or variant)
max_products No Max products per page (default 10)

Returns: product count delta, facet gap analysis (onlyInA / onlyInB / shared count), trust signal coverage per site, sort option gaps, B2B mode + conflict score for each, performance delta (FCP + full load).

interact_with_page

Execute one or more search/click actions in sequence, then extract the resulting page.

Parameter Required Description
url Yes Full URL to load
actions One of these Array of { action, selector?, value? } for multi-step flows
action One of these Single action shorthand: "search" or "click"
selector Depends CSS selector (required for click)
value Depends Text to type (required for search)
include_screenshot No Include screenshot of result

Multi-step example: [{ "action": "search", "value": "laptop" }, { "action": "click", "selector": ".filter-in-stock" }]

ask_page

Scrape + AI Q&A. The model sees full product data, facets, performance, and a screenshot. Supports Anthropic (Haiku), Gemini, and OpenAI-compatible providers.

Parameter Required Description
url Yes Full URL to scrape and ask about
question Yes Plain language question
depth No Pagination pages (default 1)
max_products No Max products per page (default 10)

merch_roundtable

Multi-persona analysis with moderator synthesis. Floor Walker, Auditor, and Scout run in parallel — each result is streamed as a notifications/message as it completes. B2B Auditor auto-substitutes for Auditor when B2B signals are detected.

Parameter Required Description
url Yes Full URL to analyze
depth No Pagination pages (default 1)
max_products No Max products per page (default 10)

Returns: perspectives (each persona's typed result), debate.consensus, debate.disagreements, debate.finalRecommendations (with impact + endorsing personas).

site_memory

Persistent per-domain memory. Auto-accumulates on every scrape.

Parameter Required Description
action Yes "read", "write", "list", or "delete"
url Depends Any URL on the domain (required for read/write/delete)
note No Text note to append (with write)
key No Custom field name (with write)
value No Value for the field (with write + key)

clear_session

Reset cookies and cached page data for a domain.

Parameter Required Description
url Yes Any URL on the domain to clear

save_eval

Persist the most recent roundtable or audit run as a structured eval record. Reads from the session persona cache — no data round-trip through the model. Must call merch_roundtable on the same URL first.

Parameter Required Description
url Yes URL of the run to save (must match a cached session)
note No Optional free-text annotation

Returns: eval ID, convergence score (0–100 inter-persona agreement), top concerns per persona, moderator summary excerpt, dedup hash.

list_evals

Retrieve eval history for a domain or all domains.

Parameter Required Description
url No Filter to a specific domain. Omit to return all domains with eval history.

get_logs

Retrieve recent server log entries from the in-memory circular buffer (500 entries).

Parameter Required Description
level No Filter by level: "error", "warn", "info", "debug"
tool No Filter by tool name (e.g. "merch_roundtable")
limit No Max entries to return (default 50)

History

v2.0.14 — Ollama local provider support + graceful no-AI degradation

  • Ollama support: MODEL_PROVIDER=ollama routes through the OpenAI-compatible API at http://localhost:11434/v1 — no API key required; MODEL_NAME selects the local model
  • Graceful degradation: ask_page and merch_roundtable now return raw scrape data + a setup hint when no AI provider is configured, instead of throwing
  • hasProvider() export: callers can gate on AI availability before invoking analysis
  • Docs: .env.example and CLAUDE.md updated with Ollama configuration examples

v2.0.13 — Layered data quality model + Firecrawl schema refinement

  • Data quality model: acquire now returns dataQuality.overall.usabilityTier (full/degraded/minimal/failed) and dataQuality.dimensions with graded description tiers (empty, spec, thin, rich), separating extraction confidence from site quality
  • Commerce-mode-aware warnings: generateWarnings() uses B2C/B2B/Hybrid threshold maps; new codes: LOW_DESCRIPTION_FILL_CRITICAL, DESCRIPTIONS_SPEC_ONLY, RATINGS_ABSENT, PRICING_INCONSISTENT, EXTRACTION_CONFIDENCE_LOW, FACETS_MINIMAL
  • Firecrawl schema: descriptioncardSubtitle internally with visual hierarchy cues + few-shot examples; remapped back to description in the payload (no breaking change)
  • Fixed: Puppeteer extractionConfidence false positive when structureConfidence is null — now falls back to product-count + priceFillRate signals

v2.0.12 — MCP-026: stabilize Firecrawl product description extraction

  • MCP-026: description field in Firecrawl EXTRACT_SCHEMA now carries a JSON Schema annotation explaining what to look for (subtitle text, attribute summaries, model/color/finish specs visible on the card); acquireWithFirecrawl() also passes an explicit prompt to the extract call — eliminates non-deterministic empty-description runs caused by the LLM not knowing category cards carry spec text rather than marketing copy

v2.0.11 — MCP-020/023/024/025: breadcrumb heuristic, Hybrid detection, star rating guard, PDP timeout

  • MCP-024: starRating guard added — values above 5 are discarded (review count bleed); ratingEl now prefers content attribute (schema.org) before falling back to aria-label/innerText
  • MCP-020: getBreadcrumb() gets two new fallback passes — data-testid breadcrumb variants (React/Next.js), then a URL-depth heuristic over nav a/header a elements to recover multi-level paths like Ferguson's 4-level hierarchy
  • MCP-023: PRO_TRADE_PATTERN extended with are you a pro, pro login, become a pro; hasProTradeCta() now checks pageText, page.h1, and page.title; Firecrawl path falls back to testing full raw.content when nav items are sparse
  • MCP-025: Per-PDP AbortSignal.timeout(12000) added to Firecrawl PDP path — 12s cap per PDP keeps total acquire wall time under 60s (Claude Desktop client limit); timed-out PDPs fall through to Puppeteer fallback

v2.0.10 — MCP-017–023: data extraction gaps and Firecrawl routing

  • MCP-017: PDP sub-scrapes now route through Firecrawl first (bypasses WAF/Akamai); fall back to Puppeteer per-URL; PDP_SAMPLES_BLOCKED warning emitted when all PDPs fail
  • MCP-018/023: freeShippingPromised now checks trustBadges[] in addition to b2cIndicators; commerce.mode upgraded B2C→Hybrid when Pro/Trade pricing CTAs are detected in page interactables or nav items
  • MCP-019/021: New warnings — FACETS_INCOMPLETE when Firecrawl returns fewer than 4 facets; PERFORMANCE_UNAVAILABLE (info) when Firecrawl is active scraper
  • MCP-020/022: Breadcrumb selector expanded to capture span/li/schema.org elements with dedup + separator filtering; ratingFillRate now requires rating > 0 (zero-star no longer counted as filled)

v2.0.9 — Bot-block resilience: blocked/blockType/fallbackSuggestions

  • acquire now surfaces block state explicitly: top-level blocked (bool) and blockType (WAF | TIMEOUT | EMPTY_RENDER) are set whenever FIRECRAWL_FAILED, LOW_CARD_CONFIDENCE, or NO_PRODUCTS_FOUND warnings are present — skill layer can branch without parsing warnings[]
  • fallbackSuggestions[]: three pre-computed search strings (site:, keyword, cache:) derived from the input URL, ready to pass to a search fallback workflow
  • Blocked responses skip the cache — retries after stealth changes or a different entry point always get a fresh scrape attempt

v2.0.8 — MCP-002: facet extraction for Shopify/Allbirds filter patterns

  • Strategy 2 expanded: candidate selector list now includes form[action*="filter"], [class*="FilterPanel"], [class*="filter-panel"] and similar patterns that Headless Shopify storefronts use — previously missed because filters weren't inside aside/nav/sidebar elements
  • Strategy 3 added: dedicated <details>-based extractor for Shopify filter groups (Allbirds and similar) where each facet is a standalone <details> with a <summary> label and checkbox inputs — no shared sidebar container required

v2.0.7 — MCP-016: acquire silent timeout fix + progress logging

  • Silent hang fixed: Firecrawl mobile screenshot call had no timeout — bot-blocked URLs caused the entire acquire handler to freeze indefinitely with zero log output; added timeout: 30000 to the mobile scrape call
  • Progress logging: acquire now emits sendLog entries at every major step (Firecrawl start/complete, Puppeteer start/complete, PDP sampling start/complete, cache hit) so timeouts are diagnosable from get_logs
  • sendLog wired into acquire: passed via sessionOps from index.js — no circular dependency, no architectural change

v2.0.6 — Fix acquire screenshot crash when using Firecrawl

  • Root cause: Firecrawl returns screenshot as a CDN URL, not base64; the MCP SDK's base64 validator rejected it, crashing every acquire call when FIRECRAWL_API_KEY is set
  • Fix: acquire handler now detects URL-format screenshots, fetches and converts to base64 before sending as MCP image content items

v2.0.5 — Fix dotenv stdout corruption on startup

  • MCP JSON-RPC broken by dotenv v17: dotenv v17.3+ prints a [dotenv@17.x] banner to stdout by default; on a stdio transport this corrupted the JSON-RPC stream before the first message was parsed
  • Fix: Added quiet: true to the user config fallback loadEnv call in index.js — both dotenv calls are now silent on startup

v2.0.4 — Fix acquire field truncation

  • Root cause of missing fields: Screenshot base64 was included in the JSON text payload AND as a separate image content item — the duplicate filled the MCP token budget before performance, trustSignals, analytics, navigation, dataQuality, pdpSamples, and warnings appeared in the serialized output
  • Fix: Screenshots are now stripped from the JSON text and sent only as image content items; all 7 structured fields are now fully visible to the MCP client on every acquire call

v2.0.3 — MCP-013 API key fix + user config fallback

  • MCP-013 root cause: plugin.json was explicitly setting ANTHROPIC_API_KEY="" and FIRECRAWL_API_KEY="", overriding system env vars before they reached the server — fixed in plugin v0.5.1
  • User config fallback: Server now loads ~/.merch-connector/.env as a fallback for any env var that is absent or empty, so API keys survive npx cache clears and work regardless of how the launcher passes env vars
  • Deduped imports: Merged fs import consolidation in index.js startup block

v2.0.2 — MCP-014 acquire field fixes

  • trustSignals.avgRating: Renamed from avgRatingAcrossProducts to match the field name the plugin audit command expects — was causing silent scoring failures on every acquire call
  • Warning severity values: Remapped from "high"/"medium"/"low" to "error"/"warn" across all warnings[] entries to match the plugin's expected enum

v2.0.1 — Model alias fix + full multi-provider ask_page

  • MCP-013: Replaced retired claude-3-5-sonnet-latest alias with claude-sonnet-4-6 across all Anthropic calls — fixes ask_page, merch_roundtable, and all persona analysis tools that were returning 404 errors
  • ask_page multi-provider: Added full Gemini and OpenAI-compatible implementations (were placeholder stubs). Anthropic path now uses Haiku-class model for fast, cost-effective Q&A; all persona analysis continues to use Sonnet.
  • MCP-015 docs: FIRECRAWL_API_KEY documented in README and CLAUDE.md; configuration example updated with env passthrough pattern

v2.0.0 — acquire tool: one-pass v2 architecture

  • New acquire tool: Single call replaces the 6–8 step scrape_page + analysis workflow — returns products, facets, screenshots, performance, trust signals, navigation, data quality, PDP samples, analytics, and warnings[] in one payload
  • Firecrawl integration: LLM extraction via Firecrawl as primary scraper with automatic Puppeteer fallback; scraper field reports which path was used and any fallback reason
  • audit_storefront retired: Returns a hard error directing callers to acquire; scrape_page marked deprecated with log warning
  • Protocol tests updated: 34/34 passing; acquire in tool list, audit_storefront absent, scrape_page deprecation asserted

v1.9.2 — MCP-002 & MCP-005 fixes, roundtable refactor, B2B persona routing

  • MCP-002: Restored extractFacetsGeneric fallback + hasFacetStructure structural scoring bonus (+20); added nested wrapper key support (response.*, data.*) and wired generic extraction as a fallback in extractFromBestApi — "Unknown Facet" no longer appears when XHR data is available
  • MCP-005: Mobile screenshots now dismiss OneTrust, Cookiebot, and TrustArc consent overlays before capture; blank-image threshold raised to 20 KB to reliably reject consent-blocked frames
  • Roundtable refactor: Collapsed per-provider per-persona duplicates into generic dispatch functions (~1000 lines removed); merch_roundtable auto-substitutes the B2B auditor persona when B2B signals are detected

v1.9.1 — Bug fixes from Cowork plugin QA sweep

  • CSS selector safety: compare_storefronts no longer crashes on Tailwind JIT arbitrary-value class names — all class-to-selector conversions now use CSS.escape()
  • Paint timing: FCP and first-paint captured via pre-navigation PerformanceObserver — no longer returns 0 on SPA category pages
  • Mobile screenshot: renders in a fresh browser page with UA + viewport set before navigation, fixing blank white screen on UA-gated SPAs
  • PDP pageType: URL pattern signals (/product/, /p/, /buy/product/, /pdp/) now take priority over DOM product-count heuristics, fixing misclassification on PDPs with related-product carousels
  • AI timeout resilience: audit_storefront and merch_roundtable cap the product payload sent to AI at 20 items, reducing prompt size and inference time
  • scrape_pdp price extraction: falls back to CTA button text when no dedicated price element is found; hasReviews and specTable.present now require count > 0
  • Facet resolution: "Unknown Facet" placeholders replaced with real names from intercepted XHR when a search API is detected
  • get_category_sample: error response now includes reason and suggestion when no product URLs are found

v1.9.0 — PDP sampling, smarter facets, B2B fingerprint depth

  • scrape_pdp tool: dedicated PDP scraper returning description fill rate, image count, review schema, spec table, cross-sell modules, CTA text, and primary/sale prices — purpose-built for single product pages
  • get_category_sample tool: scrapes a category page and runs scrape_pdp in parallel on a spread/random/top selection of products — one call for a multi-PDP spot check
  • Facet detection hardened: Strategy 1 now skips parent containers that wrap multiple filter groups (fixes the "all filters collapsed into one facet" bug on obfuscated-class sites like Zappos); Strategy 2 replaced with heading-to-heading tree walker so filter groups segment correctly regardless of CSS class names
  • B2B fingerprint depth: three new fingerprint fields — contractPricingVisible, loginRequired, accountPersonalization; audit_storefront now uses a dedicated AUDIT_TIMEOUT_MS (default 240s); PageSpeed Insights Core Web Vitals available via include_pagespeed: true on scrape_page

v1.8.0 — Persona architecture v2

  • PA-2 Fingerprint context injection: every persona now receives a ## Page Intelligence (pre-scan) block prepended to its prompt — pageType, platform, commerceMode, trust signal inventory, top risks, and recommended personas — so the AI orients before reading raw product data
  • PA-4 Unified base schema: all personas return score (0–100), severity (1–5), findings[] (3–5 observations), uniqueInsight — enabling structured cross-persona comparison
  • PA-5 Smart auto-selection: audit_storefront accepts persona: "auto"selectPersonas(fingerprint) picks the best-fit lens based on pageType and commerceMode
  • PA-6 Conversion Architect: new CRO persona maps funnel stages, catalogs friction inventory, identifies top drop-off risk, generates A/B hypotheses with estimated lift ranges
  • Perf: roundtable log entries no longer embed full result objects — get_logs payload reduced ~95% for cached re-runs

v1.7.0 — PageFingerprint + synchronous moderator

  • PA-3 Synchronous moderator: merch_roundtable now awaits the moderator synthesis before returning — debate.consensus and debate.finalRecommendations[] are guaranteed in the tool response
  • PA-1 PageFingerprint: every scrape result now includes a fingerprint field with no extra AI call — pageType, platform, commerceMode, priceTransparency, trustSignalInventory, discoveryQuality, funnelReadiness, topRisks[], recommendedPersonas[]
  • Category contamination detector: scrape_page returns contamination: { detected, suspectCount, suspects[] } when off-category products appear in results
  • get_logs tool + file logging: retrieves recent server log entries from an in-memory buffer (500 entries), filterable by level and tool name; set MERCH_LOG_FILE for NDJSON file logging

v1.6.4

save_eval now works with all tool types, not just merch_roundtable. Convergence score returns null (not 0) for single-persona runs. Auto-detects toolName from whichever persona cache slots are populated.

v1.6.3 — Eval store

Two new tools (save_eval, list_evals) add persistent run tracking. Convergence score (0–100) measures inter-persona agreement on top concerns. Two-tier storage: compact JSONL index (100/domain) + full run JSON (10/domain). Dedup hashing prevents double-saving identical runs.

v1.6.2

Roundtable personas now run in parallel via Promise.all, cutting wall-clock time from ~90s to ~30s. Persona results are written to cache the moment each resolves, so a retry after a timeout picks up where it left off.

v1.6.0 — Network Intelligence Layer

Every scrape_page call now intercepts XHR/fetch responses and fingerprints the commerce stack from 35 platform signatures: Elasticsearch, Algolia, Coveo, Lucidworks Fusion, Bloomreach, Searchspring, SFCC, SAP Hybris, Shopify, Bazaarvoice, and more. When a high-confidence API match is found (≥70%), products and facets are extracted directly from the API response. Deep dataLayer/digitalData parsing surfaces GA4 events, GTM container IDs, A/B experiment assignments, and user segments. Discovered API endpoints are persisted to site memory so the discovery pass only runs once per domain.

v1.5.0 — Scraper expansion

Per-product trust signals (ratings, badges, stock warnings), sort order detection, b2bMode + b2bConflictScore, change detection on repeat visits. New compare_storefronts tool. Multi-step interact_with_page actions array. Optional mobile screenshot. Roundtable streams each persona result as it completes.

v1.4.0

10-minute in-memory page cache. ask_page, audit_storefront, and merch_roundtable reuse recent scrape results, cutting latency in half. Configurable TOOL_TIMEOUT_MS.

v1.3.0

OpenAI-compatible provider support (OpenAI, Groq, Together AI, any OpenAI-compatible endpoint). OPENAI_VISION=true for multimodal models.

v1.2.0

Complete rewrite — lean MCP server replacing the original React + Express UI. Four expert personas, roundtable mode, persistent site memory, dual AI provider support (Anthropic + Gemini).

v1.0.0

Original React + Express application with Gemini-powered merchandising analysis.


License

MIT

About

MCP server that gives AI agents eyes on any e-commerce storefront -- scrape, audit, and roundtable analysis via 8 tools

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors