Competitor deep-research method & tooling

Dashboard: Dashboard · landscape: competitor-landscape-report · roll-up: SENTIMENT-SYNTHESIS. This documents how the dossiers are built so the findings are defensible and the process is repeatable across tools.

Tool stack (evaluated 2026-06-16)

Source	Tool	Notes
App Store metadata + marketing screenshots	`rzn-tools app-store search/lookup` (public iTunes API)	rating, rating count, price, `screenshotUrls` (downloaded)
App Store on-page reviews + screens	`rzn-browser appstore/app-details`	supplementary; web App Store shows limited reviews
Capterra reviews (volume)	`rzn-browser capterra/product-details-reviews` paginated `?page=N`	~25 reviews/page via JSON-LD; primary volume engine
G2 reviews	`rzn-browser g2/product-details-reviews`	first-page set; G2 anti-bot limits depth
Web search / review round-ups	`rzn-tools exa search` + `exa answer` (cited)	configured key; entity search + content extraction
Web search backup	`rzn-tools xai-search`	configured key
YouTube demos	`rzn-tools search youtube` (public)	walkthrough/demo videos
Real in-app screenshots (fallback)	`rzn-phone` (device “iGoobey”, iOS 18.7.8, available)	drive the real app for true in-app UI; last resort
Capterra real segmentation	`rzn-browser run _research/capterra_dom_reviews.json` + `parse_capterra_rsc.py`	RSC scrape → REAL firm-size/role/industry/sub-ratings/solicitation/competitive-intel
App Store hi-res screens	`_research/harvest_visuals.py --match <bundle>`	swaps mzstatic size token for full-res; filter by bundle id (search JSON carries many apps)
Walkthrough UI frames	`_research/harvest_frames.sh` (yt-dlp + ffmpeg scene-detect)	find videos via `yt-dlp "ytsearch12:<tool> demo walkthrough"` (rzn-tools youtube returned empty); frames show the web/office UI the App Store doesn’t
Review analysis	Haiku subagents (batched)	classify sentiment/role/theme/bias per `_research/REVIEW_ANALYSIS_SCHEMA.md`

Not available: Parallel, Serper, SerpAPI, Firecrawl, Tavily, Perplexity (no keys); App Store RSS reviews feed (Apple deprecated — returns empty); rzn-tools reddit (permission_denied → use browser reddit or exa site:reddit.com).

Anti-bot reality: G2/Capterra throttle rapid repeated hits. Mitigation: 22–25s delay between page calls, retry-once on an empty page, seed from prior single-page pulls. Realistic yield ~100–250 Capterra reviews for well-reviewed tools; thin/newer tools yield less.

Per-tool pipeline

_research/pull.py — App Store (search/lookup/reviews + download screenshots), Play Store, exa (4 query angles + cited answer), YouTube. → raw/, screens/.
_research/pull_browser.py — Capterra + G2 search → match product → first-page reviews. → raw/.
_research/pull_reviews_volume.py — paginate Capterra (+G2) with delays → raw/reviews_corpus.json.
_research/make_batches.py — split corpus into Haiku batches (raw/batches/).
Haiku subagents — one per batch: Read batch → classify per schema → Write raw/analysis/batch_NN_out.json.
_research/aggregate.py — roll up → raw/analysis/SUMMARY.{json,md} (theme freq, sentiment/role split, bias counts, quotes).
Hand-write dossier.md — profile, pricing, UX narrative, embedded screens, quantified + bias-filtered review synthesis, gap-vs-our-wedge.

Bias / credibility method

Every review tagged by reviewer role (field vs office vs unknown), specificity, and a bias_flag: solicited_vague (down-weight vague 5★), disgruntled_nonproduct (price/sales/culture rants), fit_mismatch (wrong-segment, not a defect), or none (credible). Headline numbers report the credible share; complaint themes are ranked by negative-sentiment mentions from credible reviews.

Dossier structure (the locked template — read like a coherent report)

The brief must read as a continuous report in full sentences, not a worksheet. A stranger should understand what the company does before any strategic judgement appears, so the narrative builds understanding first and reaches our verdict as a conclusion. No emojis, no telegraphic bullet-fragments for the analytical sections, and no internal jargon (terms like “counter-positioning” or “talk-vs-ship gap” are tools for our thinking in _OPPORTUNITY-LENS; in the brief, explain the idea in plain English). Tables are welcome for the coverage grid, ratings and the closing assessment; prose carries everything else. Order:

Opening orientation — one short paragraph stating what the brief is for, then What the company is and what it does (plain prose: the product, the buyer, the core job, the business model).
Where it plays across the market — score 0–100 against the 21 areas in _MARKET-PROBLEM-MAP, sorted, with a prose takeaway on deep-vs-thin. Same 21 areas every time, enabling the cross-competitor coverage heatmap.
The input side — what is captured, the input methods, onboarding and ease, and the entry friction users report.
The management side — what data lands in the dashboard, which teams consume it, what they value, and whether the view is operational or commercial.
Where the value actually comes from — the sales story versus the real source of stickiness; state what not to attack and where the value stops.
What users say, both sides — lead with the credibility caveat (solicited vs organic share), then the praise and the criticism in their own words, ending with the signal that matters for us.
The opportunity for AI in this space — which parts of the job inexpensive AI can now take over versus which are structured plumbing; then what we would build (the baseline to match, the recurring pains we can solve, the specific niche to target first).
How open the platform is — API, webhooks and integrations, and what that openness means for building on top versus replacing.
The vendor’s own AI — what they have actually shipped (each item tagged shipped/beta/announced) versus what the industry is talking about; state the gap plainly; close with a confidence score on how much of their AI ambition they can realistically deliver, with the structural reasoning.
Who actually uses it — real firm-size, role and industry segmentation from the review scrape; alternatives considered and switched-from.
Our read: can we enter and win? — the strategic conclusion in prose: what is entrenched and off-limits, the verified gap, the way in, and the one situation that would undermine the approach (the “what would make us walk away” paragraph). Close with a compact plain-language assessment table. This is where the _OPPORTUNITY-LENS thinking surfaces, translated into readable judgement.
The app itself — the public mobile app and its ratings as the concrete closing artifact.
Screenshots — an Obsidian-embedded gallery (“, never copied bytes), starting with the contact sheets, linking screens/README.
Sources and method — what was verified where, and a pointer to _RESEARCH-METHOD.

DISCIPLINE: evidence over bias (no absence-claims without proof)

Never assert a competitor “lacks X” (e.g. “no commercial event”) from not-having-looked. State only what facts show. If unverified, write “not verified yet,” not “absent.” Actively search the vendor site, RFI/claims/change-order pages, dashboard/insights pages, tier matrices, API docs, and walkthrough videos before concluding a gap. (Caught in Raken pilot: Raken DOES have RFIs with cost/schedule/plan impact + a public API — earlier “no commercial event / closed” claims were bias, corrected to the precise evidenced gap: no dedicated claims/entitlement module, operational-only dashboard.)

REQUIRED: map the full product surface (don’t research only the app)

The app-store listing is the front door, not the product. Sales-led / quote-only / seat-based pricing almost always means a field→office platform (web dashboards, analytics, time→payroll, integrations) where the office side carries the invoice. Every brief must answer:

Full surface — field (mobile) vs office (web/admin) vs finance vs integrations.
Who consumes centrally — PM/ops, executives, payroll/finance, bid managers.
Why they actually get paid / what P&L line — labor-cost control? payroll automation? risk? productivity-vs-estimate? (vs the visible “nicer reports”).
Real moat — usually integrations + the office data loop, not the capture UX.
Sales motion — PLG vs sales-led, land-and-expand, contract size, buyer. Source it from the vendor site (features/product/pricing/integrations pages) + exa answer + parallel-search + funding (Tracxn/PR). Caught in the Raken pilot: it looked like a voice→PDF app; it’s actually a labor/production/payroll platform wired into Sage/QuickBooks/Viewpoint.

Lessons from the Raken pilot (apply at scale)

Capterra body = headline only (JSON-LD), not full pros/cons. Great for volume + rating distribution + bias detection, shallow for deep themes. G2 + exa carry the qualitative.
Firm-size / industry / role / solicitation — SOLVED via the RSC payload. Capterra is Next.js; the page streams complete structured review objects in self.__next_f (keys: reviewer.companySize/industry/jobTitle/timeUsedProduct, six sub-ratings, prosText/consText, reviewSource.code, incentivized, alternativeProducts, switchedProducts). Tooling: _research/capterra_dom_reviews.json (workflow: navigate → execute_javascript returns cards + raw size-script) + _research/parse_capterra_rsc.py (unescapes __next_f chunks, brace-matches each {"reviewId"...}, emits clean records) + _research/run_capterra_dom.sh (loops ?page=N) + _research/aggregate_dom.py (real firm-size/role/industry/solicitation/sub-ratings/competitive-intel rollup). Raken: 248 reviews, real segmentation, 66% vendor-solicited surfaced. ~25/page, same Cloudflare-warm-profile rule. rzn-browser act --url needs an OpenAI key (absent) → drive navigation via the workflow’s navigate_to_url step, not act. The --output-file flag needs a file_write side-effect declared; just redirect stdout (strip the log header before the first {).
Reddit is low-reliability: rzn-tools reddit = HTTP 403 (anonymous block); rzn-browser reddit search returned off-topic junk. parallel-search is the usable Reddit/community path (finds real threads) — pair with exa answer for synthesis.
EXA discipline: prefer browser / parallel-search / public connectors for raw pulls; reserve exa for synthesis + hard-to-get (cited answers, people/company specifics). exa answer gave strong independent sentiment cheaply.
HN has ~no construction-SaaS coverage — skip for this niche.
Volume that works: Capterra pagination ?page=N with ~22s delays + seed-from-prior + retry-once → all 248 in 10 pages.
Haiku batch analysis: ~43 reviews/agent, 6 agents parallel, ~60s each; disjoint file writes; aggregate.py rolls up. Cheap and consistent.

Tier plan (user chose broadest coverage)

Tier A — full dossier (screens + volume reviews + Haiku analysis + gap): Raken (sample, complete), Fieldwire, PlanRadar, Procore (Daily Log + AI agents), ConstructionDailyReport.ai, Voice Log Pro, Document Crunch, Trunk Tools, OpenSpace — plus tender/cost adds (ContraVault, Civils.ai, Pype, Kreo) per “broadest.”
Tier B — medium (reviews + pricing + summary): SiteLogs, EasySiteLog, BuildPass, Autodesk ACC, Trimble/ProjectSight, Buildots, DroneDeploy, Togal, STACK, Matrak, Sablono, Zutec, Operance.
Tier C — light (matrix line + pricing): the takeoff/AR/handover tail.

Scaling at fan-out — the packet + pipeline (10-competitor run, 2026-06-16)

The dossiers now scale via _DOSSIER-PACKET.md — the runbook handed to one background agent per competitor, each owning one folder dossiers/<slug>/. Roll-up of every agent’s returned MATRIX + COVERAGE lines lives in _CROSS-COMPETITOR (the opportunity matrix + the 21-area coverage heatmap).

Pipeline split (the architecture that makes fan-out safe):

Stage 0 — central, serial, main thread: the Capterra RSC scrape (run_capterra_dom.sh). The browser is a single shared session with anti-bot delays — NEVER parallelise it across agents. Resolve Capterra URLs with rzn-browser run capterra search (shape: output.result.results[].product_url), scrape each competitor one at a time, hand the agent its capterra_dom_corpus.json.
Stage 1+ — per-agent, parallel: pull.py (App Store/exa/reddit/youtube — API, no browser), harvest_visuals.py, harvest_frames.sh, aggregate_dom.py, vendor-site reading, and the dossier write. None touch the shared browser, so N agents run concurrently with zero contention on disjoint folders.

Fixes baked in this run (all durable):

parse_capterra_rsc.py now skips unparseable pages instead of aborting the whole corpus — a thin product (≤1 review page) returns non-JSON for the requested ?page=2, which used to kill the entire parse (Kreo lost all 25 reviews to this).
aggregate_dom.py now excludes sub-ratings < 1 — Capterra’s RSC encodes an unrated sub-dimension as 0, and averaging those zeros understated value-for-money / support by 0.5–1.2 stars and reversed conclusions (Raken value 3.92→4.54, Fieldwire 3.6→4.36, Kreo 3.16→4.39, Pype 2.75→4.12; only Procore’s “value is the weakest sub-rating” survived, 3.46→4.07). Lesson: never headline a “weakest sub-rating” claim without confirming blanks are excluded.
render_dossier_html.py now self-titles from the H1 and writes <slug>.html directly — no mv step, no stale “Raken” title.
macOS has no timeout binary — agents must run commands directly, not wrapped in timeout.

Known gap (fix before the next browser wave): some Capterra products (e.g. PlanRadar) 301-redirect /reviews/ → /#reviews (the overview), where reviews lazy-load and the workflow captures none (__next_f present but zero reviewId objects). Workaround used: the thin-Capterra path. Proper fix: add a scroll-the-reviews-container step to capterra_dom_reviews.json before the extract.

Thin-data path (validated): AI-native tools with no Capterra and no app (Document Crunch, Trunk Tools, Civils.ai, OpenSpace) still yield strong briefs from exa + funding/press + walkthrough-video frames + vendor site, provided the credibility caveat leads. Ultra-thin tools (ContraVault, ConstructionDailyReport.ai, Voice Log Pro) are folded into one cohort note (_micro-entrants/cohort) rather than padded into solo 14-section briefs.