Discussion about this post

User's avatar
Josh's avatar

For me, the most striking comment: "Tidalwave published a benchmark with Columbia University showing its mortgage-trained agent outperforming general-purpose LLMs on underwriting and compliance tasks - 84% overall accuracy vs. 71% for Claude 4.5, with the largest gap on yes-or-no compliance checks."

Tidalwave built its own specially trained agent, and it only outperformed a 2-generation-old Claude model (w/ undetermined thinking intensity) by a bit. How will 4.6 and 4.7 perform? What about 5.0? What about 5.0 w/ grounding skills and context?

The one lesson from the past 2 years is that verticalized AI gets clobbered on quality by rapid improvements in foundation models and on cost by rapid improvements in open weight/open source models.

This is not the space I'd want to compete.

2 more comments...

No posts

Ready for more?