Success metrics is everything for a product manager. They speak for how good your decisions were, and help determine future enhancements.
During my AI Product Management capstone, my team focused on business metrics for our imaginary app—a tool that leverages GPT-4o mini from OpenAI to create personalized recipes based on ingredients users already have (through text and image input), helping reduce food waste.
Why using GPT-4o mini? Simple: it's the cheapest for the complexity we needed.
But here’s where we slipped.
In our presentation, we proudly announced we wouldn't track AI model success metrics because we’re using a third-party model. Makes sense, right?
WRONG.
Turns out it is crucial that we measure the AI model success, but not to enhance it. Rather, to ensure we choose the right model in the first place. And this testing should have happened before we ship the app.
To measure the models’ success, we could:
1. Set benchmarks on core tasks such as recognizing ingredients and generating relevant recipes.
2. Run 1000 prompts with each model on those benchmark
3. Score and compare models, factoring in performance and cost.
This testing score would become the baseline to measure success of the model post launch on an ongoing basis.
Lesson learned: Business metrics are critical, but never underestimate the importance of AI model metrics—even when using third-party models.