Methodology

How The Good Index finds, scores, and organizes the signals that matter most.

How We Find What Matters

Step 1

Ingest

RSS and Atom feeds are fetched on a configurable schedule. New articles are deduplicated by URL and stored with their full content.

Step 2

Embed

Each article's title and content are converted into a 1536-dimensional vector embedding using OpenAI's text-embedding model. These vectors capture semantic meaning.

Step 3

Cluster

Related articles are grouped using agglomerative clustering based on cosine similarity between their embeddings. This ensures coverage of the same story is unified.

Step 4

Score

Each article is evaluated by AI across multiple configurable dimensions (e.g., novelty, commercial viability, funding). Scores range from 1-10 with a rubric-guided assessment.

Step 5

Synthesize

AI generates landscape reports connecting themes across clusters. These syntheses identify trends, patterns, and emerging developments in the domain.

Step 6

Track

Signals are monitored over time with trajectory analysis. Trends are classified as accelerating, growing, plateauing, or declining based on score velocity.

Filtering for Relevance

Before scoring, every article is assessed for relevance to the ethical innovation domain. Articles that don't pass the relevance threshold are filtered out to keep the feed focused and high-signal.

Relevance Scoring Rubric

80-100Core ethical innovation (breakthrough, major investment, new technology)

60-79Strongly related (sustainability initiative, social impact, public health advance, progressive social policy, urban mobility innovation)

40-59Tangentially related (business with ethical angle, infrastructure improvement)

20-39Weakly related (minor environmental mention)

0-19Not relevant

How Signals Are Scored

Every signal is evaluated across multiple dimensions that capture real-world impact, momentum, and novelty. Explore each dimension to see what we look for.

Reading the Radar

The radar chart visualizes an article's score profile across all dimensions. Different shapes indicate different types of signals.

Early signal

Novel approach to an underserved problem. Worth watching closely.

Opportunity signal

Major problem screaming for attention but few building solutions.

Established player

Mature sector with incremental progress.

Trusting the Sources

Not all sources are equal. We assess credibility to ensure high-quality signals rise to the top.

Domain Reputation

Established track record, publishing history, and recognition within the domain. Major outlets and peer-reviewed journals score highest.

Editorial Standards

Evidence of editorial oversight, correction policies, and separation of opinion from reporting. Sources with clear editorial processes are weighted more heavily.

Fact-Checking

Track record of accuracy, use of primary sources, and citation of evidence. Sources that regularly cite data and link to primary research score higher.

Transparency

Clear disclosure of funding, conflicts of interest, and methodology. Organizations that openly share their methods and affiliations are rated higher.

Grouping Related Stories

Articles covering the same story are automatically grouped using vector embeddings and agglomerative clustering. Each article's content is converted to a 1536-dimensional embedding, then compared using cosine similarity to find related coverage.

Similarity Threshold

0.65

Minimum cosine similarity for two articles to be considered related

Merge Threshold

0.70

If two cluster centroids exceed this similarity, they are merged

Min Cluster Size

Minimum number of articles required to form a cluster

Window

7 days

Only articles within this time window are clustered together

Spotting Red Flags

Greenwash Risk

Risk that this cluster contains misleading environmental claims. Based on ratio of corporate PR to independent sources, specificity of claims, and coordinated timing.

Risk that this cluster contains greenwashing or misleading claims

AI Assessment Guidance

“Consider ratio of corporate PR to independent sources, vague vs specific claims, coordinated release timing.”

Back to feed