Building Human-In-The-Loop Agentic Workflows | Towards Data Science

like OpenAI’s GPT-5.4 and Anthropic’s Opus 4.6 have demonstrated outstanding capabilities in executing long-running agentic tasks.

As a result, we see an increased use of LLM agents across individual and enterprise settings to accomplish complex tasks, such as running financial analyses, building apps, and conducting extensive research.

These agents, whether part of a highly autonomous setup or a pre-defined workflow, can execute multi-step tasks using tools to achieve goals with minimal human oversight.

However, ‘minimal’ does not mean zero human oversight.

On the contrary, human review remains important because of LLMs’ inherent probabilistic nature and the potential for errors.

These errors can propagate and compound along the workflow, especially when we string numerous agentic components together.

You would have noticed the impressive progress agents have made in the coding domain. The reason is that code is relatively easy to verify (i.e., it either runs or fails, and feedback is visible immediately).

But in areas like content creation, research, or decision-making, correctness is often subjective and harder to evaluate automatically.

That is why human-in-the-loop (HITL) design remains critical.

In this article, we will walk through how to use LangGraph to set up a human-in-the-loop agentic workflow for content generation and publication on Bluesky.

(1) Primer to LangGraph
(2) Example Workflow
(3) Key Concepts
(4) Code Walkthrough
(5) Best Practices of Interrupts

You can find the accompanying GitHub repo here.

(1) Primer to LangGraph

LangGraph (part of the LangChain ecosystem) is a low-level agent orchestration framework and runtime for building agentic workflows.

It is my go-to framework given its high degree of control and customizability, which is vital for production-grade solutions.

While LangChain offers a middleware object (HumanInTheLoopMiddleware) to easily get started with human oversight in agent calls, it is done at a high level of abstraction that masks the underlying mechanics.

LangGraph, by contrast, does not abstract the prompts or architecture, thereby giving us the finer degree of control that we need. It explicitly lets us define:

How data flows between steps
Where decisions and code executions happen
Where human intervention is required

Therefore, we will use LangGraph to demonstrate the HITL concept within an agentic workflow.

It is also helpful to distinguish between agentic workflows and autonomous AI agents.

Agentic workflows have predetermined paths and are designed to execute in a defined order, with LLMs and/or agents integrated into one or more components. On the other hand, AI agents autonomously plan, execute, and iterate towards a goal.

In this article, we focus on agentic workflows, in which we deliberately insert human checkpoints into a pre-defined flow.

Comparing agentic workflows and LLM agents | Image used under license

(2) Example Workflow

For our example, we shall build a social media content generation workflow as follows:

Content generation workflow | Image by author

User enters a topic of interest (e.g., “latest news about Anthropic”).
The web search node utilizes the Tavily tool to search online for articles matching the top.
The top search result is selected and fed into an LLM in the content-creation node to generate a social media post.
In the review node, there are two human review checkpoints:
(i) Present generated content for humans to approve, reject, or edit;
(ii) Upon approval, the workflow triggers the Bluesky API tool and requests final confirmation before posting it online.

Here is what it looks like when run from the terminal:

Workflow run in terminal | Image by author

And here is the live post on my Bluesky profile:

Bluesky social media post generated from workflow | Image by author

Bluesky is a social platform similar to Twitter (X), and it is chosen in this demo because its API is much easier to access and use.

(3) Key Concepts

The core mechanism behind the HITL setup in LangGraph is the concept of interrupts.

Interrupts (using interrupt() and Command in LangGraph) enable us to pause graph execution at specific points, display certain information to the human, and await their input before resuming the workflow.

Command is a versatile object that allows us to update the graph state (update), specify the next node to execute (goto), or capture the value to resume graph execution with (resume).

Here is what the flow looks like:

(1) Upon reaching the interrupt() function, execution pauses, and the payload passed into it is shown to the user. The payload passed in interrupt should typically be JSON or string format, e.g.,

decision = interrupt(“Should we get KFC for lunch?”) # String shown to user

(2) After the user responds, we pass the response values to the graph to resume execution. It involves using Command and its resume parameter as part of re-invoking the graph:

if human_response == “yes”:
return graph.invoke(Command(resume=”KFC”))
else:
return graph.invoke(Command(resume=”McDonalds”))

(3) The response value in resume is returned in the decision variable, which the node will use for the rest of the node execution and subsequent graph flow:

if decision == “KFC”:
return Command(goto=”kfc_order_node”, update={“lunch_choice”: “KFC”)
else:
return Command(goto=”mcd_order_node”, update={“lunch_choice”: “McDonalds”)

Interrupts are dynamic and can be placed anywhere in the code, unlike static breakpoints, which are fixed before or after specific nodes.

That said, we typically place interrupts either within the nodes or within the tools called during graph execution.

Finally, let’s talk about checkpointers. When a workflow pauses at an interrupt, we need a way to save its current state so it can resume later.

We therefore need a checkpoint to persist the state so that the state is not lost during the interrupt pause. Think of a checkpoint as a snapshot of the graph state at a given point in time.

For development, it is acceptable to save the state in memory with the InMemorySaver checkpointer.

For production, it is better to use stores like Postgres or Redis. With that in mind, we shall use the SQLite checkpoint in this example instead of an in-memory store.

To ensure the graph resumes exactly at the point where the interrupt occurred, we need to pass and use the same thread ID.

Think of a thread as an single execution session (like a separate individual conversation) where each one has a unique ID, and maintains its own state and history.

The thread ID is passed into config on each graph invocation so that LangGraph knows which state to resume from after the interrupt.

Now that we have covered the concepts of interrupts, Command, checkpoints, and threads, let’s get into the code walkthrough.

As the focus will be on the human-in-the-loop mechanics, we will not be covering the comprehensive code setup. Visit the GitHub repo for the full implementation.

(4) Code Walkthrough

(4.1) Initial Setup

We start by installing the required dependencies and generating API keys for Bluesky, OpenAI, LangChain, LangGraph, and Tavily.

# requirements.txt
langchain-openai>=1.1.9
langgraph>=1.0.8
langgraph-checkpoint-sqlite>=3.0.3
openai>=2.20.0
tavily-python>=0.7.21

# env.example
export OPENAI_API_KEY=your_openai_api_key
export TAVILY_API_KEY=your_tavily_api_key
export BLUESKY_HANDLE=yourname.bsky.social
export BLUESKY_APP_PASSWORD=your_bluesky_app_password

(4.2) Define State

We set up the State, which is the shared, structured data object serving as the graph’s central memory. It includes fields that capture key information, like post content and approval status.

The post_data key is where the generated post content will be stored.

(4.3) Interrupt at node level

We mentioned earlier that interrupts can occur at the node level or within tool calls. Let us see how the former works by setting up the human review node.

The purpose of the review node is to pause execution and present the draft content to the user for review.

Here we see the interrupt() in action (lines 8 to 13), where the graph execution pauses at the first section of the node function.

The details key passed into interrupt() contains the generated content, while the action key triggers a handler function (handle_content_interrupt()) to support the review:

The generated content is printed in the terminal for the user to view, and they can approve it as-is, reject it outright, or edit it directly in the terminal before approving.

Based on the decision, the handler function returns one of three values:

True (approved),
False (rejected), or
String value corresponding to the user-edited content (edited).

This return value is passed back to the review node using graph.invoke(Command=resume…), which resumes execution from where interrupt() was called (line 15) and determines which node to go next: approve, reject, or edit content and proceed to approve.

(4.4) Interrupt at Tool level

Interrupts can also be defined at the tool call level. This is demonstrated in the next human review checkpoint in the approve node before the content is published online on Bluesky.

Instead of placing interrupt() inside a node, we place it within the publish_post tool that creates posts via the Bluesky API:

Just like what we saw at the node level, we call a handler function (handle_publish_interrupt) to capture the human decision:

The return value from this review step is either:

{“action”: “confirm”}, or
{“action”: “cancel} ,

The latter part of the code (i.e., from line 19) in the publish_post tool uses this return value to determine whether to proceed with post publication on Bluesky or not.

(4.5) Setup Graph with Checkpointer

Next, we connect the nodes in a graph for compilation and introduce a SQLite checkpointer to capture snapshots of the state at each interrupt.

SQLite by default only allows the thread that created the database connection to use it. Since LangGraph uses a thread pool for checkpoint writes, we need to set check_same_thread=False to allow those threads to access the connection too.

(4.6) Setup Full Workflow with Config

With the graph ready, we now place it into a workflow that kickstarts the content generation pipeline.

This workflow includes configuring a thread ID, which is passed to eachgraph.invoke(). This ID is the link that ties the invocations together, so that the graph pauses at an interrupt and resumes from where it left off.

You might have noticed the __interrupt__ key in the code above. It is simply a special key that LangGraph adds to the result whenever an interrupt() is hit.

In other words, it is the primary signal indicating that graph execution has paused and is waiting for human input before continuing.

By placing __interrupt__ as part of a while loop, it means the loop keeps checking whether an interrupt is still ongoing. Once the interrupt is resolved, the key disappears, and the while loop exits.

With the workflow complete, we can run it like this:

run_hitl_workflow(query=”latest news about Anthropic”)

(5) Best Practices of Interrupts

While interrupts are powerful in enabling HITL workflows, they can be disruptive if used incorrectly.

As such, I recommend reading this LangGraph documentation. Here are some practical rules to keep in mind:

Do not wrap interrupt calls in try/except blocks, or they will not pause execution properly
Keep interrupt calls in the same order every time and do not skip or rearrange them
Only pass JSON-safe values into interrupts and avoid complex objects
Make sure any code before an interrupt can safely run multiple times (i.e., idempotency) or move it after the interrupt

For example, I faced an issue in the web search node where I placed an interrupt right after the Tavily search. The intention was to pause and allow users to review the search results for content generation.

But because interrupts work by rerunning the nodes they were called from, the node just reran the web search and passed along a different set of search results than the ones I approved earlier.

Therefore, interrupts work best as a gate before an action, but if we use them after a non-deterministic step (like search), we need to persist the result or risk getting something different on resume.

Wrapping It Up

Human review can seem like a bottleneck in agentic tasks, but it remains critical, especially in domains where outcomes are subjective or hard to verify.

LangGraph makes it straightforward to build HITL workflows with interrupts and checkpointing.

Therefore, the challenge is deciding where to place those human decision points to strike a good balance between oversight and efficiency.

What's Hot

2026 रेनॉ डस्टर के माइलेज के आंकड़े आए सामने

McDonald’s Just Unveiled a Massive ‘KPop Demon Hunters’ Collaboration With 2 Adult Happy Meals

Blink Video Doorbell (2nd gen) review: A basic, barebones video doorbell

Agentalent.ai launches to let businesses hire AI agents like employees for marketing, operations, and complex enterprise workflows

Following Up on Like-for-Like for Stores: Handling PY

Family offices turn to AI for financial data insights

Google Introduces TurboQuant: A New Compression Algorithm that Reduces LLM Key-Value Cache Memory by 6x and Delivers Up to 8x Speedup, All with Zero Accuracy Loss

My Models Failed. That’s How I Became a Better Data Scientist.

AI agents enter banking roles at Bank of America

2026 रेनॉ डस्टर के माइलेज के आंकड़े आए सामने

McDonald’s Just Unveiled a Massive ‘KPop Demon Hunters’ Collaboration With 2 Adult Happy Meals

Blink Video Doorbell (2nd gen) review: A basic, barebones video doorbell

Google Will Now Let You Virtually Try on Clothes With Just a Selfie

What’s in a Name? How to Get Your Domain Right

Speed Across the Galaxy Next Year in Star Wars: Galactic Racer

News

Company

Services