, we had SpaCy, which was the de facto NLP library for both beginners and advanced users. It made it easy to dip your toes into NLP, even if you weren’t a deep learning expert. However, with the rise of ChatGPT and other LLMs, it seems to have been moved aside.
While LLMs like Claude or Gemini can do all sorts of NLP things automagically, you don’t always want to bring a rocket launcher to a fist fight. GliNER is spearheading the return of smaller, focused models for classic NLP techniques like entity and relationship extraction. It’s lightweight enough to run on a CPU, yet powerful enough to have built a thriving community around it.
Released earlier this year, GliNER2 is a significant leap forward. Where the original GliNER focused on entity recognition (spawning various spin-offs like GLiREL for relations and GLiClass for classification), GliNER2 unifies named entity recognition, text classification, relation extraction, and structured data extraction into a single framework.
The core shift in GliNER2 is its schema-driven approach, which allows you to define extraction requirements declaratively and execute multiple tasks in a single inference call. Despite these expanded capabilities, the model remains CPU-efficient, making it an ideal solution for transforming messy, unstructured text into clean data without the overhead of a large language model.
As a knowledge graph enthusiast at Neo4j, I’ve been particularly drawn to newly added structured data extraction via extract_json method. While entity and relation extraction are valuable on their own, the ability to define a schema and pull structured JSON directly from text is what really excites me. It’s a natural fit for knowledge graph ingestion, where structured, consistent output is essential.
Constructing knowledge graphs with GliNER2. Image by author.
In this blog post, we’ll evaluate GliNER2’s capabilities, specifically the model fastino/gliner2-large-v1, with a focus on how well it can help us build clean, structured knowledge graphs.
The code is available on GitHub.
Dataset selection
We’re not running formal benchmarks here, just a quick vibe check to see what GliNER2 can do. Here’s our test text, pulled from the Ada Lovelace Wikipedia page:
Augusta Ada King, Countess of Lovelace (10 December 1815–27 November 1852), also known as Ada Lovelace, was an English mathematician and writer chiefly known for work on Charles Babbage’s proposed mechanical general-purpose computer, the analytical engine. She was the first to recognise the machine had applications beyond pure calculation. Lovelace is often considered the first computer programmer. Lovelace was the only legitimate child of poet Lord Byron and reformer Anne Isabella Milbanke. All her half-siblings, Lord Byron’s other children, were born out of wedlock to other women. Lord Byron separated from his wife a month after Ada was born, and left England forever. He died in Greece during the Greek War of Independence, when she was eight. Lady Byron was anxious about her daughter’s upbringing and promoted Lovelace’s interest in mathematics and logic, to prevent her developing her father’s perceived insanity. Despite this, Lovelace remained interested in her father, naming one son Byron and the other, for her father’s middle name, Gordon. Lovelace was buried next to her father at her request. Although often ill in childhood, Lovelace pursued her studies assiduously. She married William King in 1835. King was a Baron, and was created Viscount Ockham and 1st Earl of Lovelace in 1838. The name Lovelace was chosen because Ada was descended from the extinct Baron Lovelaces. The title given to her husband thus made Ada the Countess of Lovelace.
At 322 tokens, it’s a solid chunk of text to work with. Let’s dive in.
Entity extraction
Let’s start with entity extraction. At its core, entity extraction is the process of automatically identifying and categorizing key entities within text, such as people, locations, organizations, or technical concepts. GliNER1 already handled this well, but GliNER2 takes it further by letting you add descriptions to entity types, giving you finer control over what gets extracted.
entities = extractor.extract_entities(
text,
{
“Person”: “Names of people, including nobility titles.”,
“Location”: “Countries, cities, or geographic places.”,
“Invention”: “Machines, devices, or technological creations.”,
“Event”: “Historical events, wars, or conflicts.”
}
)
The results are the following:
Entity extraction results. Image by author.
Providing custom descriptions for each entity type helps resolve ambiguity and improves extraction accuracy. This is especially useful for broad categories like Event, where on its own, the model might not know whether to include wars, ceremonies, or personal milestones. Adding historical events, wars, or conflicts clarifies the intended scope.
Relation extraction
Relation extraction identifies relationships between pairs of entities in text. For example, in the sentence “Steve Jobs founded Apple”, a relation extraction model would identify the relationship Founded between the entities Steve Jobs and Apple.
With GLiNER2, you define only the relation types you want to extract as you can’t constrain which entity types are allowed as the head or tail of each relation. This simplifies the interface but may require post-processing to filter unwanted pairings.
Here, I added a simple experiment by adding both the alias and the same_as relationship definitions.
relations = extractor.extract_relations(
text,
{
“parent_of”: “A person is the parent of another person”,
“married_to”: “A person is married to another person”,
“worked_on”: “A person contributed to or worked on an invention”,
“invented”: “A person created or proposed an invention”,
“alias”: “Entity is an alias, nickname, title, or alternate reference for another entity”,
“same_as”: “Entity is an alias, nickname, title, or alternate reference for another entity”
}
)
The results are the following:
Relation extraction results. Image by author.
The extraction correctly identified key relationships: Lord Byron and Anne Isabella Milbanke as Ada’s parents, her marriage to William King, Babbage as inventor of the analytical engine, and Ada’s work on it. Notably, the model detected Augusta Ada King as an alias of Ada Lovelace, but same_as wasn’t captured despite having an identical description. The selection doesn’t seem random as the model always populates the alias but never the same_as relationship. This highlights how sensitive relation extraction is to label naming, not just descriptions.
Conveniently, GLiNER2 allows combining multiple extraction types in a single call so you can get entity types alongside relation types in one pass. However, the operations are independent: entity extraction doesn’t filter or constrain which entities appear in relation extraction, and vice versa. Think of it as running both extractions in parallel rather than as a pipeline.
schema = (extractor.create_schema()
.entities({
“Person”: “Names of people, including nobility titles.”,
“Location”: “Countries, cities, or geographic places.”,
“Invention”: “Machines, devices, or technological creations.”,
“Event”: “Historical events, wars, or conflicts.”
})
.relations({
“parent_of”: “A person is the parent of another person”,
“married_to”: “A person is married to another person”,
“worked_on”: “A person contributed to or worked on an invention”,
“invented”: “A person created or proposed an invention”,
“alias”: “Entity is an alias, nickname, title, or alternate reference for another entity”
})
)
results = extractor.extract(text, schema)
The results are the following:
Combined entity and relation extraction results. Image by author.
The combined extraction now gives us entity types, which are distinguished by color. However, several nodes appear isolated (Greece, England, Greek War of Independence) since not every extracted entity participates in a detected relationship.
Structured JSON extraction
Perhaps the most powerful feature is structured data extraction via extract_json. This mimics the structured output functionality of LLMs like ChatGPT or Gemini but runs entirely on CPU. Unlike entity and relation extraction, this lets you define arbitrary fields and pull them into structured records. The syntax follows a field_name::type::description pattern, where type is str or list.
results = extractor.extract_json(
text,
{
“person”: [
“name::str”,
“gender::str::male or female”,
“alias::str::brief summary of included information about the person”,
“description::str”,
“birth_date::str”,
“death_date::str”,
“parent_of::str”,
“married_to::str”
]
}
)
Here we’re experimenting with some overlap: alias, parent_of, and married_to could also be modeled as relations. It’s worth exploring which approach works better for your use case. One interesting addition is the description field, which pushes the boundaries a bit: it’s closer to summary generation than pure extraction.
The results are the following:
{
“person”: [
{
“name”: “Augusta Ada King”,
“gender”: null,
“alias”: “Ada Lovelace”,
“description”: “English mathematician and writer”,
“birth_date”: “10 December 1815”,
“death_date”: “27 November 1852”,
“parent_of”: “Ada Lovelace”,
“married_to”: “William King”
},
{
“name”: “Charles Babbage”,
“gender”: null,
“alias”: null,
“description”: null,
“birth_date”: null,
“death_date”: null,
“parent_of”: “Ada Lovelace”,
“married_to”: null
},
{
“name”: “Lord Byron”,
“gender”: null,
“alias”: null,
“description”: “reformer”,
“birth_date”: null,
“death_date”: null,
“parent_of”: “Ada Lovelace”,
“married_to”: null
},
{
“name”: “Anne Isabella Milbanke”,
“gender”: null,
“alias”: null,
“description”: “reformer”,
“birth_date”: null,
“death_date”: null,
“parent_of”: “Ada Lovelace”,
“married_to”: null
},
{
“name”: “William King”,
“gender”: null,
“alias”: null,
“description”: null,
“birth_date”: null,
“death_date”: null,
“parent_of”: “Ada Lovelace”,
“married_to”: null
}
]
}
The results reveal some limitations. All gender fields are null, even though Ada is explicitly called a daughter, the model doesn’t infer she’s female. The description field captures only surface-level phrases (“English mathematician and writer”, “reformer”) rather than generating meaningful summaries, not useful for workflows like Microsoft’s GraphRAG that rely on richer entity descriptions. There are also clear errors: Charles Babbage and William King are incorrectly marked as parent_of Ada, and Lord Byron is labeled a reformer (that’s Anne Isabella). These errors with parent_ofdidn’t come up during relation extraction, so perhaps that’s the better method here. Overall, the results suggests the model excels at extraction but struggles with reasoning or inference, likely a tradeoff of its compact size.
Additionally, all attributes are optional, which makes sense and simplifies things. However, you have to be careful as sometimes the name attribute will be null, hence making the record invalid. Lastly, we could use something like PyDantic to validate results and cast to to appropriate types like floats or dates and handle invalid results.
Constructing knowledge graphs
Since GLiNER2 allows multiple extraction types in a single pass, we can combine all above methods to construct a knowledge graph. Rather than running separate pipelines for entity, relation, and structured data extraction, a single schema definition handles all three. This makes it straightforward to go from raw text to a rich, interconnected representation.
schema = (extractor.create_schema()
.entities({
“Person”: “Names of people, including nobility titles.”,
“Location”: “Countries, cities, or geographic places.”,
“Invention”: “Machines, devices, or technological creations.”,
“Event”: “Historical events, wars, or conflicts.”
})
.relations({
“parent_of”: “A person is the parent of another person”,
“married_to”: “A person is married to another person”,
“worked_on”: “A person contributed to or worked on an invention”,
“invented”: “A person created or proposed an invention”,
})
.structure(“person”)
.field(“name”, dtype=”str”)
.field(“alias”, dtype=”str”)
.field(“description”, dtype=”str”)
.field(“birth_date”, dtype=”str”)
)
results = extractor.extract(text, schema)
How you map these outputs to your graph (nodes, relationships, properties) depends on your data model. In this example, we use the following data model:
Knowledge graph construction result. Image by author.
What you can notice is that we include the original text chunk in the graph as well, which allows us to retrieve and reference the source material when querying the graph, enabling more accurate and traceable results. The import Cypher looks like the following:
import_cypher_query = “””
// Create Chunk node from text
CREATE (c:Chunk {text: $text})
// Create Person nodes with properties
WITH c
CALL (c) {
UNWIND $data.person AS p
WITH p
WHERE p.name IS NOT NULL
MERGE (n:__Entity__ {name: p.name})
SET n.description = p.description,
n.birth_date = p.birth_date
MERGE (c)-[:MENTIONS]->(n)
WITH p, n WHERE p.alias IS NOT NULL
MERGE (m:__Entity__ {name: p.alias})
MERGE (n)-[:ALIAS_OF]->(m)
}
// Create entity nodes dynamically with __Entity__ base label + dynamic label
CALL (c) {
UNWIND keys($data.entities) AS label
UNWIND $data.entities[label] AS entityName
MERGE (n:__Entity__ {name: entityName})
SET n:$(label)
MERGE (c)-[:MENTIONS]->(n)
}
// Create relationships dynamically
CALL (c) {
UNWIND keys($data.relation_extraction) AS relType
UNWIND $data.relation_extraction[relType] AS rel
MATCH (a:__Entity__ {name: rel[0]})
MATCH (b:__Entity__ {name: rel[1]})
MERGE (a)-[:$(toUpper(relType))]->(b)
}
RETURN distinct ‘import completed’ AS result
“””
The Cypher query takes the results from GliNER2 output and stores them into Neo4j. We could also include embeddings for the text chunks, entities, and so on.
Summary
GliNER2 is a step in the right direction for structured data extraction. With the rise of LLMs, it’s easy to reach for ChatGPT or Claude whenever you need to pull information from text, but that’s often overkill. Running a multi-billion-parameter model to extract a few entities and relationships feels wasteful when smaller, specialized tools can do the job on a CPU.
GliNER2 unifies named entity recognition, relation extraction, and structured JSON output into a single framework. It’s well-suited for tasks like knowledge graph construction, where you need consistent, schema-driven extraction rather than open-ended generation.
While the model has its limitations. It works best for direct extraction rather than inference or reasoning, and results can be inconsistent. But the progress from the original GliNER1 to GliNER2 is encouraging, and hopefully we’ll see continued development in this space. For many use cases, a focused extraction model beats an LLM that’s doing far more than you need.
The code is available on GitHub.

