How AI Product Comparison Actually Works (And Why It Beats Traditional Reviews)
A behind-the-scenes look at how CompareXY uses large language models to compare products — and why this approach is faster, more personalized, and more honest than traditional review sites.

When people see CompareXY for the first time, the reaction is usually a polite version of "wait, is this just ChatGPT in a wrapper?" Fair question. Let me explain what's actually happening when you compare two products on CompareXY, why it's a different problem from general-purpose chat, and how it stacks up against traditional review sites like Wirecutter, RTINGS, or YouTube reviewers.
The traditional review model is slow and shallow on purpose
Traditional review sites work like this: a small team of editors picks a category (say, "best wireless earbuds"), tests 8-15 products in person, writes a long article, and publishes it. They update it once or twice a year.
That model has real strengths — hands-on testing is genuinely valuable, and there's no AI substitute for plugging in a pair of headphones and listening to them. But it has three structural weaknesses:
- Coverage is narrow. Wirecutter has reviewed maybe 0.1% of the products you can buy on Amazon. Anything niche, regional, or recently launched isn't there.
- Updates lag reality. Product lines refresh every year. The "best of 2024" article you're reading in 2026 might recommend a product that's been replaced twice.
- The recommendation is one-size-fits-all. "Best wireless earbuds" assumes everyone wants the same thing. They don't. A runner, an audiophile, and someone who takes a lot of calls have completely different needs.
CompareXY is built to address all three.
What happens when you hit "Compare"
Here's the actual flow when you compare two products:
- Input parsing. You type two product names. We canonicalize them — fixing typos, normalizing model numbers, identifying which is which (so "iPhone 15 Pro Max" and "iPhone 15 Pro" don't get confused).
- Product lookup. We pull what we know about each product — specs, price ranges, release date, key features — from a combination of structured product data and Amazon listing info.
- Category detection. A first AI pass figures out what category these are (smartphones, blenders, mattresses, coffee makers) and decides which axes of comparison matter most for that category. Comparing two blenders, you care about wattage, jar size, blade design, noise. Comparing two phones, none of those matter.
- Personalized weighting. If you've set your priorities — budget-first, ecosystem fit, future-proofing, ease of use — those become weights on the categories. A budget-first user comparing two laptops will get a different "winner" than a feature-first user.
- Structured comparison. A second AI pass scores each product across the chosen categories using a strict structured output format. This is the part that's hardest to get right: language models love to hedge, so we force them to commit to a per-category winner with reasoning.
- Final recommendation. A short summary explaining the choice, plus a direct link to the better deal.
The whole thing takes 5-10 seconds.
Why it's not "just ChatGPT"
If you ask ChatGPT directly to compare two products, you'll get a response. It might even be a good one. But there are a few things CompareXY does that ChatGPT alone doesn't:
- Up-to-date product data. Models have training cutoffs. We supplement with current product info so we don't claim a product has features it doesn't have or miss features it does.
- Forced structured output. General chat tends to give you "well, on the one hand… on the other hand…" CompareXY uses strict schemas that force the model to commit to per-category winners and an overall recommendation.
- Personalization. ChatGPT doesn't know if you're a college student on a budget or a professional photographer. CompareXY does, because you tell it once.
- Affiliate-aware presentation. We surface the link to the cheapest legitimate place to buy the recommended product, with the price baked into the comparison.
The honest tradeoffs
I want to be straight about where this approach is weaker than traditional reviews:
- No hands-on testing. If you want to know whether a fabric feels scratchy or whether a button has a satisfying click, AI can't help you. Read a hands-on review for that.
- Niche specialty knowledge. For something like high-end audio gear or pro-grade cameras, dedicated enthusiast sites have depth that AI struggles to match.
- Specs vs. real-world performance. AI is excellent at comparing what's claimed; it's worse at flagging when a manufacturer's claimed battery life is dramatically inflated. We try to surface known issues, but we won't always catch them.
For the 80% of buying decisions where you just need a clear, fast, personalized answer between two roughly equivalent products — that's where CompareXY shines. You can see this approach in action in our iPhone 15 vs Pixel 8 breakdown and our Dyson V15 vs Shark Stratos comparison.
Why this is the right time for this product
Two things make this possible now that weren't possible three years ago:
- Models are good enough to reason about products. Earlier LLMs would happily hallucinate specs. Current frontier models, with structured output and grounded data, are reliable enough to deploy as a comparison engine.
- Structured output APIs. We can guarantee the model returns a per-category winner in a known schema, every time. That's what turns a chatty paragraph into a clean comparison table.
The next year or two will be wild for AI-assisted shopping. CompareXY is my bet on what the future of buying decisions looks like — and you can try it right now, for free. (If you missed it, here's the story of why I built CompareXY.)
Get new comparisons in your inbox
One email when we publish. No spam, no nonsense — unsubscribe anytime.