This is post #2 in my Ivory Tower Notes series. In post #1, I wrote about the problem: how every data and AI project starts.
This time, the topic is the methodology, and why “prompt in, slop out” is what often happens when we skip it.
Prompt in, slop out
I smirked slightly when one of my connections commented, “You Sent Me AI Slop” under some random post that had hundreds of likes. The post, which contained a decision matrix, offered guidance on which platform to use for specific data workloads, albeit with questionable criteria. Quality aside, it really looked great.
My amusement didn’t end there as I thought about how AIS, i.e., “AI Slop”, should be added as a button to all social media now alongside the like button.
If any YouTube folks read this, this is a feature idea instead of quizzing people, “Does this feel like AI slop?”
Nonetheless, YouTube nailed the “feel” part because we all tend to make decisions based on emotions, often at the expense of critical thinking.
Why would we invest energy in empiricism, rationalism, and scepticism when we have AI now? Deadlines are not on our side, and we have this new tool that delivers outputs for us, regardless of the “prompt in, slop out” effect.
But let’s assume you are genuinely interested in how Platform A compares to Platform B in terms of machine learning (ML) capabilities, because you’ve noticed two data teams in your company using separate platforms for almost identical ML use cases. So, your goal is to compile an objective overview of both and propose reducing development costs by keeping only one.
What now? How do you determine whether you should consolidate ML workloads?
Surely not by relying purely on AI, but rather on…
The path of inquiry
And so you’re back to Ivory Tower days again, where you were taught that every discovery is covered by “The methodology”:
The problem → The hypothesis → Testing the hypothesis → The conclusions
Moreover, you were taught that finding the problem is half the work, and the art of getting there lies in asking good questions to narrow it down to something specific and testable.
Hence, you take the vague question, “Should we consolidate onto one ML platform?”, and you keep rewriting it until it becomes something a test can answer:
Does Platform A run our churn pipeline at comparable accuracy and lower cost than Platform B?
Now you have defined a subject, a comparison, and things you can measure, which is enough to turn a business question into a testable hypothesis.
But first, you do your homework and gather additional information, such as what Platform B costs per job today, what accuracy it hits, and how it is designed (e.g., the data, algorithm, and hyperparameters it uses), so that you can reproduce the pipeline on Platform A.
Then, before opinions on your question start to roll in, you state:
If we run the same churn pipeline on Platform A instead of Platform B, using identical data, algorithm and hyperparameters, then the median per-job cost will drop by at least 15%, while the mean accuracy stays within 1 percentage point of Platform B’s.
With this “if-then” formulation, you managed to quiet down (at least some) opinionated answers, knowing that the PoC comes next. Thus, to test the stated presumption, you design and run the PoC, where you change only the independent variable, which is the platform. Together with this, you freeze the control variables: the dataset, the algorithm, and the hyperparameters, and measure cost and accuracy, which are your dependent variables.
You also repeat the run several times to separate the signal from the noise by collecting multiple data points, considering how a single run can be skewed by environmental noise (e.g., cache), and you want to avoid that scenario. Then you account for more nuances, e.g., triggering runs at different times of day (morning, evening, or night), to expose both platforms to the same mix of conditions.
Finally, you collect all the results and evaluate the data against your hypothesis, which leads you to one of these three outcomes:
- Outcome 1: The data supports your hypothesis*. The multiple runs show how Platform A is at least 15% cheaper, and accuracy remained within the defined threshold. (*For the note only: the data will support, but not prove your hypothesis, i.e., it will give you a reason to hold on to it, which in science is as close to a “yes” as you get.)
- Outcome 2: The data rejects your hypothesis. The multiple runs show how Platform A failed to meet one or both criteria; it was only 5% cheaper, or the cost dropped, but the accuracy degraded beyond the defined threshold.
- Outcome 3: Your runs are too noisy to call it either way, and the only answer is to keep testing before drawing any conclusions.
Whichever scenario you land in, you have findings: you either confirmed your educated guess, learned something new, or discovered that you need to keep testing.
And to be clear about this short example: the first two conclusions won’t give you the green light to consolidate two platforms. Corporate reality (and a thorough evaluation) is a bit messier than that, and there’s more data (to be collected and evaluated) affecting people and processes than a single-scoped PoC can settle.
All right, we can stop with the methodology now, because most of you are probably reading the steps above and wondering…
Photo by Toru Wa on Unsplash
What the dickens? Where’s AI in all this?
I can only imagine how something similar to: “MCP, agentic frameworks, agents,…” was going through your head when reading the steps above. Couldn’t agree more, all good stuff, and this is how you could speed up the process.
However, simply posting AI outputs from a prompt like, “Give me an overview on how Platform A compares to Platform B for ML workloads,” is where the slop occurs, and “if you aren’t doing the hands-on, your opinion about it is very likely to be completely wrong.”
“If you aren’t doing the hands-on, your opinion about it is very likely to be completely wrong.”
Relevance and positive influence don’t come from pretty AI posts or presentation infographics, and they can damage work relationships.
When you are already influencing and want to be seen as an authority, it would be more effective to share views and findings from real-life experiments and your own proven experience.
Instead of starting your posts “This is where you should use Platform A over Platform B for…”, try something more concrete (if it’s true, of course):
“When we (I) changed the [independent variable] to see how it affects the [dependent variable], while keeping the [control variables] the same, our (mine) findings were…”
And then see whether the number of your followers increases, and report back the findings.
The inspiration for this post came from a Croatian paper by Professor Mladen Šolić, “Uvod u znanstveni rad” (Introduction to Scientific Research, 2005, [LINK]). I first read it as a student, and it’s still one of the clearest explanations of how to conduct scientific research I’ve come across.
Thank you for reading.
If you found this post valuable, feel free to share it with your network. 👏
Connect for more stories on Medium ✍️ and LinkedIn 🖇️.

