We’re measuring AI all wrong—and missing what matters most ...Middle East

Fortune - News
We’re measuring AI all wrong—and missing what matters most

There's a peculiar irony in how we evaluate artificial intelligence: We've created systems to mimic and enhance human capabilities, yet we measure their success using metrics that capture everything except what makes them truly valuable to humans.

The tech industry's dashboards overflow with impressive numbers on AI: processing speeds, parameter counts, benchmark scores, user growth rates. Silicon Valley's greatest minds tweak algorithms endlessly to nudge these metrics higher. But in this maze of measurements, we've lost sight of a fundamental truth: The most sophisticated AI in the world is worthless if it doesn't meaningfully improve human lives.

    Consider the story of early search engines. Before Google, companies competed fiercely on the sheer number of web pages indexed. Yet Google prevailed not because it had the biggest database, but because it understood something deeper about human behavior—that relevance and trustworthiness matter more than raw quantity.

    AI that builds trust

    Today's AI landscape feels remarkably similar, with companies racing to build bigger models while potentially missing the more nuanced elements of human-centered design that actually drive adoption and impact.

    The path to better AI evaluation begins with trust. Emerging research demonstrates that users engage more deeply and persistently with AI systems that clearly explain their reasoning, even when those systems occasionally falter. This makes intuitive sense—trust, whether in technology or humans, grows from transparency and reliability rather than pure performance metrics.

    Yet trust is merely the foundation. The most effective AI systems forge genuine emotional connections with users by demonstrating true understanding of human psychology. The research reveals a compelling pattern: When AI systems adapt to users' psychological needs rather than simply executing tasks, they become integral parts of people's daily lives. This isn't about programming superficial friendliness—it's about creating systems that genuinely comprehend and respond to the human experience.

    Trust matters more than technical prowess when it comes to AI adoption. A groundbreaking AI chatbot study of nearly 1,100 consumers found that people are willing to forgive service failures and maintain brand loyalty not based on how quickly an AI resolves their problem, but on whether they trust the system trying to help them.

    AI that gets you

    The researchers discovered three key elements that build this trust: First, the AI needs to demonstrate a genuine ability to understand and address the issue. Second, it needs to show benevolence—a sincere desire to help. Third, it must maintain integrity through consistent, honest interactions. When AI chatbots embodied these qualities, customers were significantly more likely to forgive service problems and less likely to complain to others about their experience.

    How do you make an AI system trustworthy? The study found that simple things make a big difference: anthropomorphizing the AI, programming it to express empathy through its responses ("I understand how frustrating this must be"), and being transparent about data privacy. In one telling example, a customer dealing with a delayed delivery was more likely to remain loyal when a chatbot named Russell acknowledged their frustration and clearly explained both the problem and solution, compared to an unnamed bot that just stated facts.

    This insight challenges the common assumption that AI just needs to be fast and accurate. In health care, financial services, and customer support, the most successful generative AI systems aren't necessarily the most sophisticated —they're the ones that build genuine rapport with users. They take time to explain their reasoning, acknowledge concerns, and demonstrate consistent value for the user's needs.

    And yet traditional metrics don’t always capture these crucial dimensions of performance. We need frameworks that evaluate AI systems not just on their technical proficiency, but on their ability to create psychological safety, build genuine rapport, and most importantly, help users achieve their goals.

    New AI metrics

    At Cleo, where we're focused on improving financial health through an AI assistant, we're exploring these new measurements. This might mean measuring factors like user trust and the depth and quality of user engagement, as well as looking at entire conversational journeys. It's important for us to understand if Cleo, our AI financial assistant, can help a user with what they are trying to achieve with any given interaction.

    A more nuanced evaluation framework doesn't mean abandoning performance metrics—they remain vital indicators of commercial and technical success. But they need to be balanced with deeper measures of human impact. That’s not always easy. One of the challenges with these metrics is their subjectivity. That means reasonable humans can disagree on what good looks like. Still, they are worth pursuing.

    As AI becomes more deeply woven into the fabric of daily life, the companies that understand this shift will be the ones that succeed. The metrics that got us here won't be sufficient for where we're going. It's time to start measuring what truly matters: not just how well AI performs, but how well it helps humans thrive.

    The opinions expressed in Fortune.com commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.

    Read more:

    Genesys CEO: How empathetic AI can scale our humanity during economic uncertainty When AI builds AI: The next great inventors might not be human The AI cost collapse is changing what’s possible—with massive implications for tech startups I’ve spent years helping female founders access capital. Now that they have AI, they might not need to

    This story was originally featured on Fortune.com

    Read More Details
    Finally We wish PressBee provided you with enough information of ( We’re measuring AI all wrong—and missing what matters most )

    Also on site :