Last updated: June 2026

How We Score and Review AI Courses, Tools, and Platforms

Disclosed methodology for every comparison published on forzebras.ai. Hands-on testing, transparent weighting, and explicit disclosure of where commercial relationships influence placement.

The composite score

Every product on every comparison page receives a single composite score from 1.0 to 10.0. The score is the weighted combination of three components. Exact weights vary by category to reflect what matters most in that space.

1. Capability and quality - 40% to 60%

Hands-on evaluation by our editorial team against a fixed task set per category:

AI courses: curriculum depth, learning outcomes, pacing, instructor credibility, accuracy, and how current the material is.
General AI tools (ChatGPT, Claude, Copilot, Gemini, etc.): task performance across writing, analysis, and reasoning; output quality and reliability; context handling.
AI tools for seniors: ease of setup and daily use, accessibility, default safety settings, quality of help documentation and support.
AI coding agents (Cursor, GitHub Copilot, Claude Code, etc.): code quality and correctness, IDE integration, context window usage, autonomy vs. safety trade-offs.
MCP servers and clients: install time, latency under load, error handling, protocol coverage, maintenance freshness.

This is the highest-weighted component. A reliable tool that works for your use case beats a feature-rich one that doesn't.

2. Adoption signal - 20% to 35%

Independent, third-party signals that a product is genuinely used: install volume, platform ratings, GitHub activity, mentions in credible publications, and community breadth. Adoption is a proxy for maturity - widely used products get faster bug fixes, more integrations, and lower onboarding friction.

3. Trust and value - 15% to 25%

Pricing transparency, vendor accountability, license clarity, and support quality. For courses: certificate value and whether the price is honest relative to what you get. Tools with opaque pricing, absent maintenance, or misleading claims score lower here regardless of raw capability.

Scoring matrix by content type

The exact criteria and weights differ by content type. The tables below show what we measure and how much each criterion counts.

AI courses and certificates

Criterion	Weight	What we evaluate
Teaching quality and depth	30%	Instructor expertise, explanation clarity, pacing, problem-set quality, curriculum structure
Learning outcomes	20%	What you can actually do after completing it; skill transferability to real work
Content freshness	15%	How current the material is; whether examples reflect tools and models available today
Adoption signal	20%	Enrollment numbers, completion rate signals, third-party reviews, employer recognition
Pricing and value	15%	Cost vs. what you get, audit availability, certificate cost relative to market

General AI tools (ChatGPT, Claude, Copilot, Gemini, Perplexity)

Criterion	Weight	What we evaluate
Task performance	35%	Writing, analysis, summarization, and reasoning quality across a fixed task set
Context and memory handling	15%	Context window size, custom instructions, conversation coherence over long sessions
Reliability	15%	Hallucination frequency, factual accuracy, refusal handling, uptime
Adoption signal	20%	Active user count, third-party integrations, press and community signals
Pricing and transparency	15%	Free tier value, paid plan clarity, overage policy, data handling transparency

AI tools for seniors and non-technical users

Criterion	Weight	What we evaluate
Ease of setup and daily use	35%	Onboarding steps, interface clarity, error recovery, mobile vs. desktop accessibility
Safety defaults	20%	Content filters, scam resistance, data-sharing defaults, account recovery options
Help and support quality	15%	Documentation readability, human support availability, community resources
Adoption signal	15%	Install counts, user reviews mentioning ease of use, media coverage
Pricing and value	15%	Free tier sufficiency, pricing clarity, cancellation ease

AI coding agents (Cursor, GitHub Copilot, Claude Code, Devin Desktop)

Criterion	Weight	What we evaluate
Code quality and correctness	35%	Accuracy on a fixed task set, refactoring quality, test generation, bug introduction rate
IDE and workflow integration	20%	Supported editors, latency, inline vs. panel UX, terminal awareness
Context and codebase handling	20%	Multi-file awareness, repo-scale understanding, rules and instructions support
Adoption signal	10%	GitHub stars, install counts, community size, enterprise adoption
Pricing and trust	15%	Free tier limits, per-seat cost, code-privacy defaults, data retention policy

MCP servers

Criterion	Weight	What we evaluate
Protocol coverage and correctness	30%	Tool surface completeness, schema accuracy, adherence to MCP spec
Installation and reliability	25%	Install steps, authentication complexity, latency, error handling under load
Maintenance freshness	20%	Commit recency, open issue response time, changelog activity
Adoption signal	15%	GitHub stars, forks, mentions in MCP client docs or community channels
Trust and security	10%	Known vulnerabilities, prompt-injection mitigations, auth model transparency

MCP clients

Criterion	Weight	What we evaluate
Server compatibility and tool surface	30%	Number of tested servers that work correctly, tool-call reliability, multi-server support
Configuration experience	20%	Setup steps, config format clarity, error messaging when a server fails to connect
Context and workflow UX	20%	How naturally tool use fits into chat or coding workflow; visibility of tool calls
Adoption signal	15%	Install counts, community size, third-party integrations
Trust and pricing	15%	Pricing clarity, data-handling policy, update frequency

As we add new categories, we publish their scoring matrix here before the first comparison goes live.

How commercial relationships affect placement

We accept advertising compensation from some - but not all - of the brands listed on this site. Compensation can influence the order in which brands appear on a page, and which brands appear in highlighted positions. Compensation does not change our scoring methodology, the editorial content of our reviews, or whether a brand is included or excluded from a comparison.

Two specific protections apply:

We disclose this practice at the top of every page, in the methodology block on every comparison page, and in full on this page.
We do not include a product in a comparison solely because of a commercial relationship. If a product doesn't qualify on the scoring methodology, it doesn't make the list.

How we keep comparisons current

Comparisons are updated quarterly at minimum. We re-test the top 5 ranked products each quarter and the rest annually. When a major product release, pricing change, or security incident occurs, we re-test immediately and update the page within seven days. The "Last updated" date on every comparison page is the date of the most recent re-test or material edit - not a generated current date.

Who writes these comparisons

All comparisons are written by humans with hands-on experience in the relevant category. AI-assisted research and drafting is used, but every comparison is reviewed and fact-checked by the AI for Zebras Team before publication.

How to report a problem

If you find an error - outdated pricing, a missing product, a factual mistake - email us at [email protected] and we will fix it. Material corrections are noted at the bottom of the page with the date.

Read the full affiliate disclosure →