Concise Concepts: Few-Shot NER with Entity Scoring

Anote
3 min readMay 24, 2023

--

When it comes to applying Named Entity Recognition (NER) to concise concepts, generating examples is usually straightforward, but training an entire pipeline can be quite challenging. That’s where Concise Concepts comes to the rescue! This powerful tool utilizes few-shot NER based on word embedding similarity, making it incredibly easy to get started. And now, it also offers entity scoring for enhanced performance!

In this blog post, we will dive into the technical details of how Concise Concepts works and explore specific examples to showcase its capabilities. Let’s get started!

Getting Started

To begin using Concise Concepts, you will need to import the necessary libraries, including spacy and concise_concepts. Here's an example of the initial setup:

import spacy
from spacy import displacy
import concise_concepts

Next, you can define your data, which consists of various categories and their corresponding concepts. For instance:

data = {
"fruit": ["apple", "pear", "orange"],
"vegetable": ["broccoli", "spinach", "tomato"],
"meat": ['beef', 'pork', 'turkey', 'duck']
}

Configuring the NER Pipeline

To incorporate Concise Concepts into the NER pipeline, you’ll need to load the en_core_web_md model from spacy and disable the existing NER component. Here's an example:

nlp = spacy.load("en_core_web_md", disable=["ner"])

Now, it’s time to add the Concise Concepts component to the pipeline using nlp.add_pipe(). You can configure this component by providing a dictionary of settings. Let's go through the important options:

  • data: This option expects the data you defined earlier, containing the categories and concepts.
  • ent_score: When set to True, entity scoring will be enabled. We'll explore this feature in detail later.
  • verbose: Enabling this option provides additional information during the pipeline execution for debugging purposes.
  • exclude_pos: You can exclude specific parts of speech (POS) tags from being considered as entities. In the example, we exclude verbs and auxiliaries.
  • exclude_dep: Similarly, you can exclude specific dependency labels from being considered as entities. In the example, we exclude direct objects and prepositional complements.
  • include_compound_words: When set to False, compound words will not be considered as entities.
  • json_path: This option allows you to specify the path to a JSON file where the extracted patterns will be saved for future use.
  • topn: This option determines the number of similar words to consider when calculating word embedding similarity. In the example, we use the values (100, 500, 300).

Here’s how you can add the Concise Concepts component to the pipeline:

nlp.add_pipe(
"concise_concepts",
config={
"data": data,
"ent_score": True,
"verbose": True,
"exclude_pos": ["VERB", "AUX"],
"exclude_dep": ["DOBJ", "PCOMP"],
"include_compound_words": False,
"json_path": "./fruitful_patterns.json",
"topn": (100, 500, 300)
},
)

Analyzing Text with Entity Scoring

Now that the pipeline is set up, you can process your text and analyze it for entities using nlp(text). Let's consider an example text:

text = """ 
Heat the oil in a large pan and add the Onion, celery and carrots.
Then, cook over a medium–low heat for 10 minutes, or until softened.
Add the courgette, garlic, red peppers and oregano and cook for 2–3 minutes.
Later, add some oranges and chickens.
"""
doc = nlp(text)

Once the text is processed, you can visualize the extracted entities using displacy.render(). To enhance the visualization, you can provide options such as colors for each entity category. Here's an example of how you can configure the options:

options = {
"colors": {"fruit": "darkorange", "vegetable": "limegreen", "meat": "salmon"},
"ents": ["fruit", "vegetable", "meat"],
}

Next, you can access the entities detected in the document using doc.ents. To incorporate entity scoring into the visualization, you can modify the entity labels by appending the entity score to them. Here's how you can achieve that:

ents = doc.ents
for ent in ents:
new_label = f"{ent.label_} ({ent._.ent_score:.0%})"
options["colors"][new_label] = options["colors"].get(ent.label_.lower(), None)
options["ents"].append(new_label)
ent.label_ = new_label
doc.ents = ents

Finally, you can render the entities using displacy.render():

displacy.render(doc, style="ent", options=options)

Conclusion

Concise Concepts is a powerful tool that simplifies the process of applying NER to concise concepts. By leveraging few-shot NER based on word embedding similarity, it allows you to quickly get started without the need for extensive training. The recently introduced entity scoring feature further enhances the performance of the tool.

In this blog post, we walked through the technical details of setting up and using Concise Concepts. We explored specific examples and demonstrated how to visualize the extracted entities with entity scoring. With its ease of use and effective results, Concise Concepts is a valuable addition to any NER workflow.

--

--

Anote
Anote

Written by Anote

General Purpose Artificial Intelligence. Like our product, our medium articles are written by novel generative AI models, with human feedback on the edge cases.

No responses yet