What is CiteScope, and how does it differ from commercial GEO monitoring tools like Topify / Profound / Peec.ai?

CiteScope is an MIT-licensed, self-hosted GEO (Generative Engine Optimization) monitoring platform. It runs your probe questions against six AI search engines — ChatGPT, Perplexity, Gemini, Doubao, Kimi, DeepSeek — and records the answers and citations. Commercial tools (Topify $99-199/mo, Profound $499+/mo, Peec.ai €80-120/mo) do the same thing. Three differences: (1) CiteScope is fully open source under MIT — you can audit the 100 lines of services/ai_monitors/chatgpt.py to see exactly how citations are parsed; (2) it implements span attribution — telling you whether a specific citation actually backs the sentence that mentions your brand, something none of the commercial tools do; (3) it's self-hosted — your data lives in your own SQLite file, never uploaded to any SaaS.

What is span attribution, and why doesn't any commercial tool offer it?

Span attribution is the attribution of 'which specific passage in the AI answer this citation actually supports.' Topify can tell you 'Reddit is cited 78% of the time'; CiteScope adds: 'of that 78%, 23% precisely backs the sentence that mentions your brand, and 55% backs unrelated competitor mentions or generic industry commentary.' The algorithm is 90 lines of Python (services/citation_analysis/span_attribution.py) — sentence-segment the AI answer (handling mixed Chinese/English punctuation), then find the interval intersection between brand-keyword positions and citation marker positions. Two likely reasons commercial tools don't ship it: (1) it's computationally non-trivial and the marketing pitch isn't obvious, and (2) once you ship it, you expose how shallow most citation tracking actually is.

What do I need to run CiteScope? What's the budget?

Hardware: a 1 vCPU / 1 GB RAM VM is enough (single SQLite DB + single uvicorn process). One line — docker compose up -d — gets the service running. You need at least 3 API keys: OpenAI Official (platform.openai.com/api-keys, for ChatGPT web search), Perplexity Sonar or OpenRouter, and Google AI Studio (aistudio.google.com/apikey). Optional: Volcengine Ark for Doubao, Moonshot for Kimi. Budget estimate: 100 prompts × 3 platforms × once-a-week — a typical cadence — comes out to $5-15/month in API spend. Topify charges $99/month for the same data.

How do you integrate Doubao and DeepSeek? Do they have official search APIs?

Doubao goes through Volcengine's Ark Responses API + built-in web_search tool (launched in Q1 2026), a single-turn call returning structured url_citation annotations with url / title / start_index / end_index — almost aligned with the OpenAI Responses API. CiteScope's doubao.py adapter automatically reinserts [citation:N] markers back into the body text in reverse end_index order, so span attribution works out of the box. DeepSeek's official API still has no web search tool as of 2026-05; CiteScope uses chat.deepseek.com's refresh_token + WASM POW reverse-engineering as a best-effort fallback, and will switch back to the official endpoint once DeepSeek ships one.

Is anyone running it in production? How is data security guaranteed?

It's already live — a B2B export customer (pseudonymized as X) has been running English-language AI search GEO monitoring on it for three weeks. All data sits on the customer's own 1GB VPS; every probe prompt, AI response, and citation lands in local SQLite, never uploaded to any third-party SaaS. This is the core security advantage of open-source + self-hosted over Topify / Profound: the customer's probe prompts are themselves commercial strategy (which competitors they're watching) and shouldn't exist in a third-party vendor's database.

Why open source instead of just selling SaaS?

Three reasons: (1) I'm a solo dev maintaining several other projects in parallel, with no 6-12 month full-time runway to face VC-backed Topify / Profound head-on in GTM; (2) the Open Core path is already validated by PostHog / Plausible / Cal.com / Supabase / n8n — OSS for 6-12 months to accumulate stars and feedback, then layer on a hosted SaaS — capital-efficient, with the GitHub repo itself serving as the cheapest lead-gen channel; (3) customer cases can only be cited under open source (SaaS contracts forbid surfacing customer internals). So the current playbook: open-source for 6 months to see if there's traction; if 500+ stars and repeat demand for hosted, spin up citescope.cloud at $19-49/mo (undercutting Topify).

OPEN SOURCE · GEO · CITATION ANALYSIS

Open-Sourcing CiteScope: Same Thing Topify Does for $99/Month, Plus One Move They Skipped

Customer-case-driven + span attribution as the differentiator + a solo dev's Open Core calculus between SaaS and OSS

May 24, 2026 · Yao Yuheng GEO/AEO Open Source Citation Analysis

By mid-2026, how customers find suppliers has quietly shifted. The B2B export customer I work with (pseudonymized as X) currently sits at 0% AI visibility on ChatGPT / Perplexity / Gemini. The CEO wants to know two things: how do we get into the AI shortlist, and where are the competitors' AI munitions stockpiled? I surveyed the GEO monitoring tools on the market — Topify at $99-199/mo, Profound at $499+/mo, Peec.ai at €80-120/mo — and they all do roughly the same thing. But there's one move none of them makes. So I spent two weeks building an open-source alternative, CiteScope, which launches today.

GitHub piglet12138/CiteScope → vs commercial tools architecture deep dive

$An Edward Tufte-style modern infographic: a beam of horizontal light enters from the left into a transparent triangular prism, which refracts it into five parallel colored bars (cobalt blue / teal / deep orange / mustard / violet) fanning out toward the right. Each colored bar has a blank circular annotation pin at its top. Below the prism a horizontal axis scatters six small dots representing citation source domains. White background, no text, no logo, pure geometric abstraction — a visual metaphor for how CiteScope refracts AI output (white light) into an analyzable spectrum of sources (the citation spectrum).$

AI search traffic is taking purchase intent away from Google

If you're still running brand growth in 2026, you've probably noticed an uncomfortable pattern: a meaningful slice of high-purchase-intent traffic is migrating from Google to AI chat windows.

When a customer asks ChatGPT "who are the best B2B suppliers in category X for 2026" or asks Perplexity "which service providers should I look at for industry Y in North America," what they get back is a 200-word answer naming 3-5 vendors with clickable citation links. They don't scroll through ten blue links anymore. They pick directly from the AI's shortlist.

If your brand isn't on that shortlist, you're invisible — even if you rank #1 on Google for the same keyword.

This is GEO (Generative Engine Optimization), or AEO (Answer Engine Optimization), the term that started gaining wider use around mid-2026. Same meaning: optimize for visibility in AI search, not just Google.

The space is hot. Topify, Profound, Peec.ai, Otterly, AthenaHQ — all surfaced in the past 18 months, all charging $80-500/mo. But they're fundamentally selling the same thing: fire your probe questions at the AI engines, record the answers and citations, draw a few charts.

The customer's real need: how to climb out of 0% AI visibility

My GEO practice started with a real customer diagnosis. Pseudonym X, mainly B2B export business (OEM/ODM categories), target markets Europe and the US. Background:

Owns its own factory (real production capacity), holds audit certifications (within the LRQA / Intertek systems)
Historically acquired customers via trade shows + Alibaba International Station
Mid-2026 the CEO realized large buyers increasingly start supplier shortlisting on ChatGPT / Perplexity / Gemini
Initial audit: 0% visibility for the customer's brand in English-language AI search. Competitors (Competitor A 29%, Competitor B 34%, Competitor C 24%) are already AI-recommended

The CEO's ask boils down to three questions:

Is my brand mentioned by AI at all? Mention rate across engines and time windows
Which sites does AI cite when answering category questions? Those sites are the "munitions stockpile" I need to seed content into
What are my competitors' AI assets, exactly? When AI mentions a specific competitor, which URLs back that mention — letting me reverse-engineer their GEO content placement

The first question is easy — run probes, look at mention rate; every commercial GEO tool does this. The second is medium difficulty — a Top Domains report aggregated by domain; most mainstream tools also handle this. The third is the critical one: commercial tools mostly stop at "list the total URLs cited for this competitor," but can't precisely say "in the AI sentence that mentions this competitor, which URL is the source backing it." That's the real reverse attribution.

I surveyed the existing tools

Before writing any code I signed up for trials of all the major English-language GEO monitoring SaaS. Rough taxonomy:

Tool	Pricing	AI engine coverage	Differentiator	What's missing
Topify	$99-199/mo	7+, the only one covering Doubao/DeepSeek/Qwen	Broadest Chinese-ecosystem coverage + Source Analysis	Basic plan's 100-prompt cap is tight for agencies
Profound	$499+/mo (quote-based)	10+ (no Doubao)	Enterprise polish + HubSpot/Salesforce integrations	Expensive; weak Chinese-engine coverage
Peec.ai	€80-120/mo + add-ons	5 (DeepSeek is an €80 add-on)	Lean European pricing	Narrowest engine coverage
Otterly	~$50/mo	5	Friendly to small brands / freelancers	Shallow data depth
AthenaHQ	Enterprise quote	5+	Multi-client management for agencies	Low transparency + no Chinese

What they all share: all SaaS, none disclose their data-processing details, none do span attribution, and none let you build custom AI engine adapters. Differentiation lives mostly in workflow polish + integration breadth + AI engine coverage count. The underlying data isn't a moat — the same probe prompt hitting the same OpenAI Responses API returns the same citations. What commercial SaaS sells is the "save two days of self-hosting work" delta.

The entire category is selling a $20/month service for $200/month. Objectively, for small-to-mid brands without engineering capacity, the markup is reasonable. But for teams with engineering capacity + a desire to customize + a stake in data sovereignty, there's a huge arbitrage gap.

Three reasons I built my own

1. Data sovereignty: probe prompts are themselves commercial strategy

A customer's probe prompts look like generic industry questions on the surface, but a carefully designed prompt set (which categories, which competitor names, which use-case scenarios) actually exposes their go-to-market thinking. Putting that into a third-party SaaS database = your strategy is transparent to the vendor. For someone like X, mid-flight on a GEO breakout plan, uploading the prompt set to a US SaaS vendor (however reputable) is an internal compliance event requiring sign-off. Self-hosted SQLite eliminates that friction entirely.

2. Differentiation: I wanted to ship span attribution

This is the real differentiating feature. Commercial tools, at best, tell you:

"Last week, Reddit was cited in 78% of AI answers for your category."

What I want is one more sentence:

"Reddit was cited 78%. Of that, 23% precisely backs the sentence that mentions your brand — versus 55% that backs unrelated competitor mentions or generic industry commentary."

The data-driven actions for those two versions are completely different. Version one only lets you say "Reddit is a high-frequency citation source" — useless, every competitor knows that. Version two lets you distinguish "Reddit is a generic citation source in my category" vs. "Reddit is a real endorsement source for my brand" vs. "Reddit cites everyone in my category except me." That last case signals a real content gap (gap analysis).

3. China AI engine coverage: Doubao / Kimi / DeepSeek

If the customer only does Europe/US export, international AI engines are enough. But X also wants to cover domestic inquiry scenarios (some large Chinese buyers ask Doubao for supplier recommendations), and at the next stage I plan to expand GEO services to domestic brands. That means Doubao / Kimi / DeepSeek must be supportable. Topify is the only commercial tool covering international + Chinese engines together, and I talked to them about a trial, but their Doubao path runs cookie reverse-engineering with high account-token failure rates. I wanted the official API route — and Volcengine Ark's Responses API + built-in web_search tool, launched in Q1 2026, fits perfectly.

What CiteScope is

PROJECT · MIT LICENSE

CiteScope

A self-hosted GEO monitoring platform. See who AI search engines actually cite — the real visibility picture for your brand and your competitors. An MIT-licensed open-source alternative to Topify / Profound / Peec.ai, with one extra capability none of the commercial tools ship: span attribution.

Unified schema across 6 AI engines (ChatGPT / Perplexity / Gemini / Doubao / Kimi / DeepSeek) Citation domain aggregation (Top Domains) + competitor reverse attribution (Competitor Assets) Gemini grounding redirect auto-resolved by background worker Span attribution — sentence-level link between citations and brand mentions (exclusive) Run experiment comparison — matrix + radar chart for evaluating GEO interventions side-by-side In-app embedded tutorial hub + 8 standalone module guides Runs on a single SQLite + 1 vCPU / 1 GB

GitHub → vs commercial tools Chinese README

The UI looks like this — the overview page shows 7-day mention rates across all customers:

CiteScope overview page screenshot: a dark-themed left sidebar lists navigation groups (Overview, Brands, Task Queue, LLM Usage, Tutorials, File Library, Settings). The main area shows a 'Monitored Targets Overview' card displaying Acme Sourcing's 56.5% mention rate over the past 7 days, industry B2B Export · Apparel & Accessories, status active, trend 0%. — Overview · cross-customer mention rate aggregation — see at a glance which brand's AI visibility is rising and which has stalled.

Each monitored target has its own "probe question bank," importable via CSV, with questions tagged by category / competitor / use-case:

Customer detail · probe question bank + CSV bulk import. Categories drive downstream aggregation dimensions.

The core unit of work is a Run — one Run = one set of probes × multiple AI engines × one timestamp. The monitoring center's "Overview" tab gives you the customer's KPI dashboard:

CiteScope monitoring center Overview tab screenshot: three KPI cards at top show 56.5% mention rate, 56.5% first-position rate, 56.5% cited-with-link rate. Below is a 'mention rate trend' area chart with a 0%-100% y-axis and date points on the x-axis. Further down is a per-platform breakdown. Top right: Today / This Week / This Month toggles + refresh + new experiment run buttons. — Monitoring center Overview tab · KPIs + trend + per-platform breakdown. This page answers "what's the current state."

Architecture, 30-second version:

React + Vite + Antd 5
        │ HTTP/REST
        ▼
FastAPI + uvicorn + SQLAlchemy + APScheduler
        │
        ├──→ AI adapters ×6 (ChatGPT/Perplexity/Gemini/Doubao/Kimi/DeepSeek)
        ├──→ SQLite (WAL mode)
        └──→ background citation resolver job (one Gemini wrapper batch every 5 min)

Single process + single DB file, no Redis / Celery / Postgres required to start. 1 vCPU / 1 GB RAM is enough. One line — docker compose up -d — and you're running, open localhost:3000.

The core differentiator: what span attribution is

This is the biggest single difference between CiteScope and the commercial tools. Worth a section to itself.

AI answers usually carry [citation:N] markers (or OpenAI's url_citation annotations, Gemini's groundingMetadata, Volcengine Ark's annotations). Each citation has a start_index and end_index, pointing to the span of text in the AI answer it's meant to support.

Commercial tools use only half of this information when they ingest citations: URL + title. They throw away the span info. So the best they can tell you is "this domain was cited N times."

CiteScope does one extra step: it computes both the positions of [citation:N] markers and the positions of brand keywords, splits the AI answer into sentences, and checks whether a citation falls in the same sentence as a brand mention. If yes, that citation actually backs your brand; if not, it's just generic citation.

The algorithm is 90 lines of Python (backend/app/services/citation_analysis/span_attribution.py). Core logic:

def compute_supports_brand_mention(
    raw_answer: str,
    brand_keywords: list[str],
    cite_indices: list[int],
) -> dict[int, bool]:
    # 1) find all char-spans where brand keywords appear (case-insensitive)
    brand_spans = find_keyword_spans(raw_answer, brand_keywords)

    # 2) find all [citation:N] marker positions, grouped by N
    markers_by_n = find_citation_markers(raw_answer)

    # 3) sentence-split (mixed Chinese/English punctuation),
    #    for each sentence: if it contains both a brand_span and a marker → that N = True
    supported = set()
    for s, e, _ in iter_sentence_spans(raw_answer):
        has_brand = any(span_overlaps(bs, be, s, e) for bs, be in brand_spans)
        if not has_brand:
            continue
        for n, ms in markers_by_n.items():
            if any(span_overlaps(ms_s, ms_e, s, e) for ms_s, ms_e in ms):
                supported.add(n)

    return {n: (n in supported) for n in cite_indices}

The hard part isn't the algorithm, it's three details: (1) sentence segmentation for mixed Chinese/English text (Chinese full-stops typically have no trailing space, and English periods butted up against Chinese characters also need to split); (2) normalizing the citation marker formats returned by each AI engine; (3) reinserting markers into the original body text (OpenAI / Ark return span indices, so you need to insert [citation:N] back in reverse end_index order so downstream code can run).

I stepped on each of these once. OpenAI's chat completions + search-preview doesn't return spans (only an annotations array), so the ChatGPT adapter is currently stuck doing "answer-level" attribution only; Gemini's grounding_supports has segment.start_index/end_index that needs manual reassembly; Ark's url_citation aligns closely with OpenAI Responses but its token-accounting field names are input_tokens/output_tokens. The full alignment work took two weeks.

Why don't commercial tools do this? My guess: two reasons. (1) Compute cost isn't trivial and the marketing pitch isn't obvious — "Reddit cited 78%" looks more striking than "Reddit backs your brand 30%"; (2) once you ship it, you expose how shallow most "citation tracking" actually is. It's a committing-to-ship problem, not an algorithm problem.

Real data: Top Domains pulled from customer X

Customer X ran three Runs (baseline → after editing llms.txt → after adding FAQ schema), across 3 weeks, ~700 citations. The Top Domains report — the "AI munitions stockpile" — came back as:

Domain	Citations	Cross-platform	Reading
`made-in-china.com`	40	3 platforms	B2B sourcing top tier, AI treats it as authoritative
`alibaba.com`	32	2 platforms	Same tier, slightly lower ChatGPT preference
`competitor-a.com`	31	2 platforms	Competitor A's own site — strong SEO signal
`competitor-b.com`	20	3 platforms	Competitor B's own site
`competitor-c.com`	19	3 platforms	Competitor C's own site
`industry-listicle-a.com`	19	3 platforms	Industry "top 10" listicle article
`jingsourcing.com`	12	3 platforms	Sourcing-agent blog
`industry-listicle-b.com`	11	1 platform (Gemini)	"Hidden" authority surfaced by Gemini

This list translates directly into three content placement actions:

marketplace listing optimization: round out the customer's listing on made-in-china.com / alibaba.com (certifications, product specs, video) — top-tier authorities
cluster journalism inclusion: for "top 10 industry supplier" blogs like industry-listicle-a.com / industry-listicle-b.com, find author emails and pitch them to add the customer (with certifications + case studies)
competitor reverse engineering: study competitor-a.com / competitor-b.com / competitor-c.com — their page schema, FAQ blocks, customer case write-ups — and benchmark the gap

More importantly, the Run comparison view lets you validate the causal effect of GEO interventions. The figure below shows X's three-Run comparison — the matrix shows mention rate climbing from baseline 0% → 30.4% after editing llms.txt → 56.5% after adding FAQ schema, and the radar chart shows the per-platform preference differences simultaneously:

CiteScope monitoring-center Compare tab screenshot: the top half is a 'Platform × Run mention-rate matrix' table, listing After FAQ schema markup at 56.5% (n=23) overall, ChatGPT 42.9% (n=7), Google AI 50.0% (n=8), Perplexity 75.0% (n=8); After llms.txt deploy at 30.4% (n=23) overall, etc. The bottom half is a 'per-platform mention rate radar chart' with the blue polygon (After FAQ schema markup) noticeably wider than the green polygon (After llms.txt deploy), with three vertices labeled ChatGPT / Google AI / Perplexity. — Run comparison view · matrix + radar chart for side-by-side intervention evaluation. Editing llms.txt boosted Perplexity the most (30.4% → 75%); FAQ schema impacted ChatGPT most deeply.

Without Run comparison you can't do this kind of causal attribution. CiteScope treats each monitoring pass as a versioned Run (with a name + note), and cross-Run comparison is the essential difference from Topify's "run automatically once a week and draw a line chart" — the latter only shows trends, the former lets you test hypotheses.

How to run it yourself

One command:

git clone https://github.com/piglet12138/CiteScope.git
cd CiteScope
cp backend/.env.example backend/.env
# open backend/.env and fill in three keys:
#   OPENAI_OFFICIAL_API_KEY=sk-...
#   PERPLEXITY_API_KEY=...
#   GOOGLE_AI_API_KEY=...
docker compose up -d
# open http://localhost:3000 in your browser

The key-entry page looks like this — it lands you on the "AI Search / Monitoring Platforms" tab by default, with one card per platform, plus a sign-up link and a test-connection button:

API key signup:

OpenAI Official: platform.openai.com/api-keys (for ChatGPT web search)
Perplexity Sonar: perplexity.ai/settings/api (or via OpenRouter — set PERPLEXITY_API_BASE to https://openrouter.ai/api/v1)
Google AI Studio: aistudio.google.com/apikey
Optional Volcengine Ark: volcengine.com/product/ark (Doubao)

Budget estimate: 100 prompts × 3 platforms × once a week — a typical cadence — comes out to $5-15/month in API spend. Topify charges $99/month for the same data.

After setup, hit localhost:3000 → Brands → New Customer → enter probe questions → Monitoring Center → New Experiment Run. Five minutes later you have Top Domains + Competitor Assets reports under the Citation Sources tab.

The full user manual is embedded in-app — no shoving users out to a GitHub wiki, just hit the /guides route. 8 standalone module guides, markdown-rendered:

CiteScope tutorials overview hub page screenshot: header text explains 'CiteScope is a GEO (Generative Engine Optimization) monitoring platform. Below are user guides organized by module,' with the suggestion that newcomers start with '5-step quickstart.' Below is a grid of 8 guide cards organized by category (Getting Started / Configuration / Workflow), titled '5-step quickstart,' 'API configuration deep dive,' 'Monitored targets + probe questions,' 'Experiment runs (Run),' etc., each with title + number + one-line summary. — In-app tutorial overview · 8 guides organized by Getting Started / Configuration / Workflow / Data Analysis / Operations.

CiteScope single guide reading page screenshot: 'Citation Sources user guide,' top-left 'Back to Tutorials Overview' button, below is markdown body showing sections like 'What problem does this feature solve,' 'Three blocks on this page,' and detailed field-explanation tables (Field / Meaning two columns: Total citation count / Total citation rows captured across all monitoring runs for this customer, etc.). — Single guide reading page · markdown-rendered, with field-explanation tables + how-to-read + FAQ.

Why open source instead of selling SaaS

This is the question I asked myself most over the past two weeks. Packaging this real customer case + the differentiating feature + the already-built product into a $49/mo SaaS is theoretically viable. I still chose open source. Three reasons.

1. A solo dev doesn't have a 12-month full-time runway

I'm running the customer's GEO work, Lucky GitHub agent, claude-ai-harness, the claude-zh CC fork, and several other projects in parallel. Commercializing CiteScope means 6-12 months of full-time GTM (landing page / SEO / content marketing / customer support / international payment compliance) going head-to-head with VC-backed Topify / Profound. Solo + China-based entity + selling international B2B SaaS — the odds are too thin.

2. Open Core is already a validated playbook

PostHog (analytics) / Plausible / Cal.com / Supabase / n8n are all OSS core + later-layered hosted SaaS, starting with 1-2 people, hitting $100K+ ARR within 2-3 years. The pattern is identical:

First 6 months: focus on open source — accumulate stars / community feedback / early power users
Watch for the recurring "I'd use this but can't self-host" signal
Then launch the hosted version, sold to that segment (selling hosting + multi-tenant + priority support, not the core feature)

This path is the most capital-efficient — near-zero marginal cost in the first 6 months, with time investment concentrated on maintenance + issue triage + blog posts (5-10 hours/week), parallelizable with my other projects.

3. Customer cases can only be cited under open source

SaaS contracts forbid surfacing a customer's probe prompts / citations as a public case study (violates the customer contract). Open source can: because the customer self-hosts, all data lives in their own SQLite, and what I show is "data they ran with my OSS," not "data I pulled out for them." That kind of case study is more credible than any number of product screenshots.

Roadmap

The concrete 6-month path:

Week 1-2 (now): post + submit to Show HN / Reddit / Jike / Zhihu, collect the first batch of stars
Month 1: collect signals — GitHub stars / issues / who forks, watch for anyone opening "feat: hosted version please" issues
Month 2-3: ship v0.2 features (Postgres / more AI adapters / Slack alerting)
Month 4: decision point — if 500+ stars + 5+ unsolicited inquiries about hosted, launch hosted version (citescope.cloud)
Month 5-6 (if hosted launches): multi-tenant + Stripe + pricing at $19/49/199 per month (all below Topify's $99 starting tier). Customer case studies on the homepage

The inverse — when to skip open source and go straight to SaaS: 12 months of full-time runway / 5+ customers already lined up / a EU/US legal entity for clean payment processing. None of those hold for me, so the path is set.

Feedback I'm hoping to get

Bug reports, any flavor — adapter output not matching the vendor's real response, span attribution mishandling some Unicode edge case, operations issues
New adapter contributions, for engines I don't yet cover — Bing Copilot / Claude.ai / You.com / Brave Search are all on the wishlist. The adapter template is simple — copy chatgpt.py, change 60 lines, and you've got a new engine
Use-case stories — what would you do with this data that commercial tools don't let you do? Tell me, and I'll prioritize features that unlock those workflows

GitHub Issues / Discussions are open. English or Chinese, both work.

One last thing

GEO becomes table stakes by 2027. Every brand that cares about discoverability will measure it. The question is how — a $99/mo Topify dashboard, or your own $0/mo SQLite file plus the freedom to fork and extend.

I'm betting on the second world. If you're betting the same way, give us a star, run a Run, and tell me what's missing.

Original · First Publish

This is an original technical article, first published on sg.yaoyuheng2001.me

Canonical URL: https://sg.yaoyuheng2001.me/posts/open-source-citescope/en
First Published: May 24, 2026
Last Revised: May 24, 2026
Author: Yao Yuheng (姚钰珩) · NTU Data Science M.Sc.
Related Project: piglet12138/CiteScope
Length: ~5,800 words · reading time ≈ 12 minutes

This article is licensed under CC BY-NC 4.0. For reuse, please credit the source, preserve author attribution, and include the canonical link. For AI training / summarization citation, please preserve the canonical URL: https://sg.yaoyuheng2001.me/posts/open-source-citescope/en. For commercial use, please contact the author via the blog for authorization.

Yao Yuheng / 姚钰珩

NTU Data Science M.Sc. Focused on AI Agent systems engineering, eval-driven development, LLM applications + GEO/AEO monitoring. Building CiteScope and claude-ai-harness.

Blog· GitHub· Juejin· Substack· RSS