Beyond Code Generation: AI for the Full Data Science Workflow

feeling a constant sense of AI FOMO. Every day, I see people sharing AI tips, new agents and skills they built, and vibe-coded apps. I am increasingly realizing that adapting quickly to AI is becoming a requirement for staying competitive as a data scientist today.

But I am not only talking about brainstorming with ChatGPT, generating code with Cursor, or polishing a report with Claude. The bigger shift is that AI can now participate in a much more end-to-end data science workflow.

To make the idea concrete, I tried it on a real project using my Apple Health data.

A Simple Example — Apple Health Analysis

Context

I have been wearing an Apple Watch every day since 2019 to track my health data, such as heart rate, energy burned, sleep quality, etc. This data contains years of behavioral signals about my daily life, but the Apple Health app mostly surfaces it with simple trend views.

I tried to analyze a two-year Apple Health export six years ago. But it ended up becoming one of those side projects that you never finished… My goal this time is to extract more insights from the raw data quickly with the help of AI.

What I had to work with

Here are the relevant resources I have:

Raw Apple Health export data: 1.85GB in XML, uploaded to my Google Drive.
Sample code to parse the raw export to structured datasets in my GitHub repo from six years ago. But the code could be outdated.

Raw XML data screenshot by the author

Workflow without AI

A standard workflow without AI would look a lot like what I tried six years ago: Inspect the XML structure, write Python to parse it into structured local datasets, conduct EDA with Pandas and Numpy, and summarize the insights.

I am sure every data scientist is familiar with this process — it is not rocket science, but it takes time to build. To get to a polished insights report, it would take at least a full day. That’s why that 6-year-old repo is still marked as WIP…

AI end-to-end workflow

My updated workflow with AI is:

AI locates the raw data in my Google Drive and downloads it.
AI references my old GitHub code and writes a Python script to parse the raw data.
AI uploads the parsed datasets to Google BigQuery. Of course, the analysis could also be done locally without BigQuery, but I set it up this way to better resemble a real work environment.
AI runs SQL queries against BigQuery to conduct the analysis and compile an analysis report.

Essentially, AI handles nearly every step from data engineering to analysis, with me acting more as a reviewer and decision-maker.

AI-generated report

Now, let’s see what Codex was able to generate with my guidance and some back-and-forth in 30 minutes, excluding the time to set up the environment and tooling.

I chose Codex because I mainly use Claude Code at work, so I wanted to explore a different tool. I used this chance to set up my Codex environment from scratch so I can better evaluate all the effort required.

You can see that this report is well structured and visually polished. It summarized valuable insights into annual trends, exercise consistency, and the impact of travel on activity levels. It also provided recommendations and stated limitations and assumptions. What impressed me most was not just the speed, but how quickly the output began to look like a stakeholder-facing analysis instead of a rough notebook.

Please note that the report is sanitized for my data privacy.

Codex-generated report (numbers adjusted for data privacy, screenshot by the author)

How I Actually Did It

Now that we have seen the impressive work AI can generate in 30 minutes, let me break it down and show you all the steps I took to make it happen. I used Codex for this experiment. Like Claude Code, it can run in the desktop app, an IDE, or the CLI.

1. Set up MCP

To enable Codex to access tools, including Google Drive, GitHub, and Google BigQuery, the next step was to set up Model Context Protocol (MCP) servers.

The easiest way to set up MCP is to ask Codex to do it for you. For example, when I asked it to set up Google Drive MCP, it configured my local files quickly with clear next steps on how to create an OAuth client in the Google Cloud Console.

It does not always succeed on the first try, but persistence helps. When I asked it to set up BigQuery MCP, it failed at least 10 times before the connection succeeded. But each time, it provided me with clear instructions on how to test it and what info was helpful for troubleshooting.

Codex MCP set up screenshots by the author

2. Make a plan with the Plan Mode

After setting up the MCPs, I moved to the actual project. For a complicated project that involves multiple data sources/tools/questions, I usually start with the Plan Mode to settle on the implementation steps. In both Claude Code and Codex, you can enable Plan Mode with /plan. It works like this: you outline the task and your rough plan, the model asks clarifying questions and proposes a more detailed implementation plan for you to review and refine. In the screenshots below, you can find my first iteration with it.

Plan Mode screenshots by the author – Part 1

Plan Mode screenshots by the author – Part 2

Plan Mode screenshots by the author – Part 3

3. Execution and iteration

After I hit “Yes, implement this plan”, Codex started executing on its own, following the steps. It worked for 13 minutes and generated the first analysis below. It moved fast across different tools, but it did the analysis locally as it encountered more issues with the BigQuery MCP. After another round of troubleshooting, it was able to upload the datasets and run queries in BigQuery properly.

First analysis output screenshot by the author

However, the first-pass output was still shallow, so I guided it to go deeper with follow-up questions. For example, I have flight tickets and travel plans from past travels in my Google Drive. I asked it to find them and analyze my activity patterns during trips. It successfully located those files, extracted my travel days, and ran the analysis.

After a few iterations, it was able to generate a much more comprehensive report, as I shared at the beginning, within 30 minutes. You can find its code here. That was probably one of the most important lessons from the exercise: AI moved fast, but depth still came from iteration and better questions.

Codex locating my past travel dates (screenshot by the author)

Takeaways for Data Scientists

What AI Changes

Above is a small example of how I used Codex and MCPs to run an end-to-end analysis without manually writing a single line of code. What are the takeaways for data scientists at work?

Think beyond coding assistance. Rather than using AI only for coding and writing, it is worth expanding its role across the full data science lifecycle. Here, I used AI to locate raw data in Google Drive and upload parsed datasets to BigQuery. There are many more AI use cases related to data pipelining and model deployment.
Context becomes a force multiplier. MCPs are what made this workflow much more powerful. Codex scanned my Google Drive to locate my travel dates and read my old GitHub code to find sample parsing code. Similarly, you can enable other company-approved MCPs to help your AI (and yourself) better understand the context. For example:
– Connect to Slack MCP and Gmail MCP to search for past relevant conversations.
– Use Atlassian MCP to access the table documentation on Confluence.
– Set up Snowflake MCP to explore the data schema and run queries.
Rules and reusable skills matter. Although I did not demonstrate it explicitly in this example, you should customize rules and create skills to guide your AI and extend its capabilities. These topics are worth their own article next time 🙂

How the Role of Data Scientists Will Evolve

But does this mean AI will replace data scientists? This example also sheds light on how data scientists’ roles will pivot in the future.

Less manual execution, more problem-solving. In the example above, the initial analysis Codex generated was very basic. The quality of AI-generated analysis depends heavily on the quality of your problem framing. You need to define the question clearly, break it into actionable tasks, identify the right approach, and push the analysis deeper.
Domain knowledge is critical. Domain knowledge is still very much required to interpret results correctly and provide recommendations. For example, AI noticed my activity level had declined significantly since 2020. It could not find a convincing explanation, but said: “Possible causes include routine changes, work schedule, lifestyle shifts, injury, motivation, or less structured training, but those are inferences, not findings.” But the real reason behind it, as you might have realized, is the pandemic. I started working from home in early 2020, so naturally, I burned fewer calories. This is a very simple example of why domain knowledge still matters — even if AI can access all the past docs in your company, it does not mean it will understand all the business nuances, and that is your competitive advantage.
This example was relatively straightforward, but there are still many classes of work where I would not trust AI to operate independently today, especially projects that require stronger technical and statistical judgment, such as causal inference.

Important Caveats

Last but not least, there are some considerations you have to keep in mind while using AI:

Data security. I am sure you’ve heard this many times already, but let me repeat it once more. The data security risk of using AI is real. For a personal side project, I can set things up however I want and take my own risk (honestly, granting AI full access to Google Drive feels like a risky move, so this is more for illustration purposes). But at work, always follow your company’s guidance on which tools are safe to use and how. And make sure to read through every single command before clicking “approve”.
Double-check the code. For my simple project, AI can write accurate SQL without problems. But in more complicated business settings, I still see AI make mistakes in its code from time to time. Sometimes, it joins tables with different granularities, causing fanning out and double-counting. Other times, it misses critical filters and conditions.
AI is convenient, but it might accomplish your ask with unexpected side effects… Let me tell you a funny story to end this article. This morning, I turned on my laptop and saw an alert of no disk storage left — I have a 512GB SSD MacBook Pro, and I was pretty sure I had only used around half of the storage. Since I was playing with Codex last night, it became my first suspect. So I actually asked it, “hey did you do anything? My ‘system data’ had grown by 150GB overnight”. It responded, “No, Codex only takes xx MB”. Then I dug up my files and saw a 142GB “bigquery-mcp-wrapper.log”… Likely, Codex set up this log when it was troubleshooting the BigQuery MCP setup. Later in the actual analysis task, it exploded into a giant file. So yes, this magical wishing machine comes at a cost.

This experience summed up the tradeoff well for me: AI can dramatically compress the distance between raw data and useful analysis, but getting the most out of it still requires judgment, oversight, and a willingness to debug the workflow itself.

What's Hot

Easter Jeep Safari Concepts 2026: Wrangler, Gladiator, Grand Wagoneer

Kizik’s Hands-Free Tech Is Coming to New Balance

Q&A with Tim — The Upcoming AI Tsunami and Building Offline Advantage, Book Recommendations, Spotting Psychedelic Red Flags, Courage as a Learnable Skill, and More (#859)

The price of transparency: What Surfshark’s data request reveals about its collection policies

RPA matters, but AI changes how automation works

Tencent AI Open Sources Covo-Audio: A 7B Speech Language Model and Inference Pipeline for Real-Time Audio Conversations and Reasoning

VPN logging: what data does your VPN need to collect?

How to Build a Vision-Guided Web AI Agent with MolmoWeb-4B Using Multimodal Reasoning and Action Prediction

Building Human-In-The-Loop Agentic Workflows | Towards Data Science