Method

How Prewave finds early signals.

We track papers, code, patents, package ecosystems, and operator attention. Each issue is a ranked view of the shifts that still look early.

3
Primary evidence families
7
Core ingestion sources
1-5y
Horizon models
As-of
Evidence discipline
The Pipeline

8 steps. Evidence to issue.

This page is a public snapshot, not a frozen spec. Sources are frozen as-of, ranked by horizon, then reviewed into a weekly issue.

01
Ingest the core
The current benchmark core pulls from arXiv, OpenAlex, PatentsView, GH Archive, Hacker News, npm, and PyPI. Broader tracking sources exist in the wider stack, but not every source is weighted equally.
7 core sourcesSource freezesPublic data
02
Build series and embeddings
We materialize source-specific time series and compute local embeddings where they help. In the current stack, embeddings are computed for arXiv, OpenAlex, and PatentsView while event and package sources stay structured.
DuckDBLanceDBAs-of views
03
Align micro-topics
Topics are aligned into finer micro-topic views so papers, code, patents, and package ecosystems can be compared on the same semantic footing instead of collapsing into one hype bucket.
Micro-topicsCross-sourceLead-lag ready
04
Run structural signals
The current signal layer is built around the modules present in Veille-2: CD Index, transfer entropy, mutual information, wavelets, dependency graphs, developer migration, contrarian signals, and adversarial checks.
StructuralGraphCausal
05
Infer source precedence
We look at which source moves first and which ones follow. That helps label a shift as paper-led, code-led, patent-led, or synchronized instead of pretending every mention means the same thing.
Paper-ledCode-ledPatent-led
06
Rank by horizon
Ranking is horizon-specific. The protocol models 1 to 5 year outcomes separately, with calibration and a preference for the simplest model that still holds up.
1-5 yearsCalibratedBenchmark policy
07
Build evidence packs
Before something becomes publishable, it needs a traceable evidence pack: source links, precedence, timing view, and what would invalidate the signal.
LinkableReviewableAudit trail
08
Write the issue
Editorial synthesis happens last. Each issue compresses the ranked watchlist into a short market-intelligence read: what changed, why it matters, and why it still feels early.
Weekly issueMarket intelligenceHuman-reviewed
Evidence Stack

Primary evidence first. Attention second.

The benchmark core is centered on OpenAlex, GH Archive, and PatentsView or PatentSearch. The broader Veille-2 tracking universe adds arXiv, package ecosystems, and attention layers for context, confirmation, and crowding risk.

Benchmark Core
OpenAlexCore
Academic metadata and citation structure. Useful for seeing where technical attention clusters before product narratives catch up.
Primary
Academic evidence
Good for paper-led shifts and citation structure, not just surface-level paper volume.
GH ArchiveCore
Repository and event activity. Useful when builders move before analysts do, especially for code-led shifts and ecosystem formation.
Primary
Code evidence
This is often the earliest practical signal because behavior shows up before the narrative does.
PatentsView / PatentSearchCore
Patent activity for formalized R&D and longer-dated commercial intent. More useful for precedence and horizon work than for daily noise.
Primary
Patent evidence
Patent layers matter more when asking whether a shift is investable over longer horizons.
Technical Expansion
arXivActive
Preprint layer for research acceleration, model shifts, and new technical language before citation graphs fully mature.
Expansion
Technical depth
Useful when a change is genuinely paper-led and still too new to look settled anywhere else.
npm / PyPIActive
Package ecosystems expose dependency formation and integration pull. That matters when adoption shows up as infrastructure use rather than blog posts.
Expansion
Ecosystem pull
Dependency graphs often reveal which tools are quietly becoming upstream to many others.
Libraries.io / code contextTrack
The broader tracking stack also watches package and code context outside the core benchmark when it helps interpret ecosystem structure.
Context
Broader code layer
These layers are useful for interpretation, but the public method no longer pretends every tracked source has equal weight.
Attention Context
Hacker NewsCore
Used as an early-attention and contrarian layer. Helpful for timing and for spotting when technical movement is still under-discussed.
Context
Attention timing
Attention helps with timing. It should not outrank primary evidence on its own.
Reddit / Stack Exchange / Product HuntTrack
Part of the wider Veille-2 tracking universe. These layers help check whether something is spreading, getting crowded, or staying niche.
Context
Crowding check
Useful as confirmation or penalty layers, not as stand-alone proof that a shift matters.
Selection

Wide intake. Narrow output.

The volume changes by source window and horizon, so the site no longer claims a fixed document count. What matters is the compression logic: public evidence in, few signals out.

Tracked stream
Raw public evidence and early-attention updates across the tracked stack
↓ source-normalized views
Aligned topics
Time series, embeddings, dependency structure, and micro-topic slices
↓ structural signals + lead-lag
Candidate shifts
Items where disruption, source precedence, or ecosystem movement actually show up
↓ horizon ranking + evidence review
Ranked watchlist
Shortlist scored by horizon, evidence quality, and crowding risk
↓ editorial compression
Issue
A short issue with the few shifts worth reading this week
Signals

Current signal stack.

We no longer present the method as a frozen detector list. This is the clearer public summary of what the current code and protocol actually use.

Structural Disruption
Change inside the technical graph, not just louder conversation.
📑 CD Index
Measures whether a document or cluster opens a new direction instead of merely amplifying the old one.
Use: Good for spotting work that breaks from prior citation paths rather than riding the same ones.
Wavelet Transitions
Looks for regime changes in noisy time series rather than reacting to every spike or headline.
Use: Helpful when the shape of activity changes before the absolute level looks dramatic.
🔗 Mutual Information
Checks whether variables start moving together in a way that carries information, not just correlation by chance.
Use: Useful for detecting emerging coupling between technical areas before the market names the pattern.
🔎
Precedence & Flow
Who moves first, who follows, and whether the move still looks under-attended.
Transfer Entropy
Builds directional lead-lag relationships across sources to see whether code leads papers, papers lead patents, or attention is only catching up later.
Use: The point is source precedence, not raw volume.
🧭 Contrarian Signal
Compares technical velocity to attention velocity. Strong technical movement with muted attention is often more interesting than synchronized hype.
Use: When builders move faster than the narrative, timing can still be attractive.
👥 Developer Migration
Tracks where maintainers and contributors shift focus, which can signal real ecosystem pull before press coverage or consensus labels appear.
Use: Talent movement can be an earlier signal than mainstream market attention.
📡
Ecosystem Structure
How a capability becomes usable, composable, or quietly central.
🔗 Dependency Graph
Reads package ecosystems as structure, not chatter. Dependency formation can reveal adoption before end-user awareness.
Use: A tool becoming upstream to many others matters more than one launch post.
💻 Package / Repo Graph
GH Archive and package data help surface new hubs, bridges, and bottlenecks across the builder layer.
Use: Useful for seeing where usage clusters are quietly forming before the market gives them a name.
🔗 Bridge Actors
The protocol also tracks people and organizations that connect previously separate areas, because convergence is often carried by actors as much as documents.
Use: A lab, maintainer, or company appearing across both sides of a shift can matter before broad adoption does.
Guardrails
What stops the system from turning noise into a story.
🛡️ Adversarial Confidence
Stress-tests whether a signal survives perturbation instead of disappearing the moment the sampling window shifts.
Use: A fragile spike is treated differently from a persistent regime change.
As-of Evidence Bounds
Evidence retrieval and features are bounded in time. If it was not visible then, it does not get to explain the past.
Use: This is the basic anti-leakage rule behind the public method.
📊 Horizon Benchmarking
1 to 5 year horizons are modeled separately, so the short-term watchlist does not pretend to answer the long-term question.
Use: Different horizons can favor different models and different evidence mixes.
Ranking

Ranking, not a magic number.

The public site no longer claims a single frozen formula. Candidate shifts are ranked from structural change, source precedence, evidence quality, and horizon-specific benchmarking.

Structural change
core
Source precedence
core
Cross-source confirmation
support
Ecosystem pull
support
Attention divergence
support
Crowded narrative ⊖
penalty
Weak evidence ⊖
penalty
Short-lived spike ⊖
penalty
Ranking blends structural signals, source precedence, horizon-specific model outputs, and evidence quality.
Penalty layers apply when a move looks crowded, weakly supported, or too short-lived.
Editorial judgment happens after ranking, not before.
This is a living method snapshot, not a promise of a permanent public formula.
Lead
Pre-consensus
Paper-led or code-led movement with reinforcing evidence and relatively thin attention. This is the part of the curve Prewave cares about most.
Watch
Needs confirmation
Interesting structural motion, but the precedence or evidence quality is not decisive yet. Worth tracking, not narrating too early.
Late
Crowded
The story is already everywhere, or the evidence is too narrative-led to feel early anymore. That matters, but not for the same reason.
Output

What lands in each issue.

Each issue is intentionally short. It should tell you what changed, why it matters, what still looks early, and what to monitor next.

Active — March 2026
Few signals
Shortlist
Not a dashboard dump. Only the handful of shifts that survive ranking and editorial review.
Evidence pack
Linkable sources
Source links, evidence snippets, and the stack that made the signal interesting in the first place.
Timing view
Precedence + crowding
Whether the move looks paper-led, code-led, patent-led, synchronized, or already too crowded to feel early.
Market angle
Why now
A concise read on where it could matter, what would strengthen it, and what would invalidate it.
Benchmarking

Benchmarking is ongoing.

The protocol is designed for historical replay and 1 to 5 year evaluation, but we do not market fake certainty. The public commitment is simpler: time-bounded evidence, horizon-specific tests, and continuous method updates.

1-5y
Separate horizons
Short and long horizons are treated as different tasks, not one recycled score.
As-of
Anti-leakage
Features and evidence are bounded in time before evaluation or editorial use.
Local-first
Benchmark core
DuckDB, Parquet, LanceDB, and local LLM tooling keep the core reproducible.

Subscribe before consensus forms.

If a shift is already obvious, the timing edge is smaller. Prewave is built to surface it earlier.

Subscribe Free →