GLiNER: Unlock Zero-Shot NER Annotation using Annolive

Welcome to this guide on implementing a custom model in Annolive, specifically focusing on using GLiNER as the custom model. We will cover the steps for creating a Flask app server for GLiNER and integrating it with Auto-annotation in Annolive.

Annolive enables users to utilize both publicly available models and their own custom models in a secure, on-premise environment. Here, we will discuss the steps to integrate GLiNER into Annolive and refine your data.

What are we going to do:

Create a GLiNER Model Server Link
Integrate GLiNER model server to auto-annotation in the task

What is GLiNER?

GLiNER is a zero-shot Named Entity Recognition (NER) model that demonstrates strong performance, outperforming both ChatGPT and fine-tuned Large Language Models (LLMs) in zero-shot evaluations across various NER benchmarks.

The paper GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer by Urchade Zaratiana introduced this innovative zero-shot NER model.

GLiNER employs a Bidirectional Language Model (BiLM) and takes as input entity type prompts along with a sentence or text. Each entity is separated by a learned token, [ENT]. The BiLM generates representations for each token. Entity embeddings are then passed into a Feed-Forward Network, while input word representations are directed into a span representation layer to compute embeddings for each span. Finally, the model computes a matching score between the entity representations and span representations using a dot product followed by sigmoid activation.

1. Create a GLiNER model server link

Create an endpoint that accepts the list of possible labels and the text to output the words and entity tags.

Here we are using a flask endpoint for GLiNER. First create an app.py file as shown below:

Create the file app.py

from flask import Flask, request, jsonify
from gliner import GLiNER
from collections import defaultdict

app = Flask(__name__)

model = GLiNER.from_pretrained("urchade/gliner_largev2")

@app.route('/', methods=['POST'])
def gliner():
    # Get the JSON data from the request
    if request.is_json:
        # Handle JSON data
        data = request.get_json()
    else:
        # Handle form data
        data = request.form.to_dict()
    # Extract context and question from the JSON data

    text = data.get('text')
    labels = data.get('labels')

    entities = model.predict_entities(text, labels)
    answer = defaultdict(list)
    for entity in entities:
        answer[entity["label"]].append(entity["text"])
    # Return the answer as JSON response
    return jsonify({"data": answer})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Now save the app.py file in the drive. Start a google colab with code as given below:

install all the required libraries

!pip install gliner

Download model weights

from gliner import GLiNER
model = GLiNER.from_pretrained("urchade/gliner_largev2")

Clone the git repo

!git clone https://github.com/AnnoliveAI/AnnoliveExamples

Run the flask server

!python /content/AnnoliveExamples/NLP/NER/GLiNER/app.py & npx localtunnel --port 5000

This will give a https endpoint running the GLiNER(for example: https://major-parts-drive.loca.lt). Now you have the model server link

All the files can be accessed here

More details on creating the task in Annolive can be found here

2. Integrate the model server link in Annolive auto-annotate

In the task page

Click on the auto-annotate option within the settings tab.
Activate auto annotation by toggling the enable button.
Select custom model from the dropdown menu.
Provide the HTTP link to the custom model interface.

Now, auto annotation using custom model is enabled for both the annotation playground and bulk annotation.

Get started with Annolive for Free : SignUp

FAQs

1. What is GLiNER?

GLiNER is a zero-shot Named Entity Recognition (NER) model designed to identify various entity types in text. Unlike traditional NER models, which require training on specific entity types, GLiNER can recognize any entity by using natural language prompts. This flexibility is achieved through a Bidirectional Transformer-based architecture, making it efficient and adaptable.

3. Can GLiNER compete with larger models like ChatGPT?

Yes, GLiNER has demonstrated superior performance compared to ChatGPT and other fine-tuned large language models in zero-shot NER tasks. For example, GLiNER-L achieves an average F1 score of 60.9 across several datasets, significantly outperforming ChatGPT’s score of 47.5. Additionally, GLiNER’s lightweight architecture makes it more cost-effective and practical for deployment.

5. Is GLiNER suitable for multilingual tasks?

GLiNER supports multilingual Named Entity Recognition. It has been evaluated on datasets in multiple languages and shows strong performance in many, including those not included in its training data. For instance, GLiNER performs well in Spanish and German, demonstrating its adaptability across different languages.

7. What datasets were used to train GLiNER?

GLiNER was trained on Pile-NER, a dataset curated from 50,000 passages in the Pile corpus. The passages were annotated with a wide range of entity types using ChatGPT-generated prompts. This diverse dataset ensures GLiNER’s robust performance across various domains.

9. Is GLiNER available for public use?

Yes, GLiNER is open source and accessible through platforms like GitHub and Hugging Face. This allows developers and researchers to explore, customize, and deploy the model for various applications without licensing restrictions.

2. How is GLiNER different from traditional NER models?

Traditional NER models are limited by predefined categories and often require retraining to expand their capabilities. GLiNER, however, operates in an open-type NER framework. By leveraging Bidirectional Transformers, it matches entity type prompts with text spans directly, allowing it to work without retraining even for unseen entities. This makes it more efficient and scalable, especially in resource-limited environments.

4. How does GLiNER handle zero-shot Named Entity Recognition?

GLiNER uses a unique approach to zero-shot NER by treating the task as a matching problem between entity type embeddings and text spans. It combines entity type prompts with input text in a unified format. The Bidirectional Transformer processes this input to produce span representations and compute matching scores, allowing it to identify entities without prior fine-tuning on specific datasets.

6. What is required to use GLiNER with Annolive for data annotation?

Integrating GLiNER with Annolive involves setting up a model server for GLiNER and linking it to the annotation platform. This requires creating a Flask API endpoint for the GLiNER model, enabling auto-annotation in Annolive’s settings, and providing the server link. Once integrated, GLiNER can automatically annotate data based on the provided entity type prompts.

8. How resource-intensive is GLiNER?

GLiNER is designed to be lightweight and resource-efficient compared to larger models. Its largest variant, GLiNER-L, requires a single high-performance GPU, such as an NVIDIA A100, for training. This efficiency makes it suitable for organizations with limited computational resources.

10. What are GLiNER’s main use cases?

GLiNER is ideal for scenarios requiring flexible and efficient Named Entity Recognition. It can be used in constructing knowledge graphs, information extraction from multilingual texts, and automating data annotation processes. Its zero-shot capabilities also make it valuable for tasks involving novel entity types or domains.

References

Thanks for reading. Please contact us for any queries.

Last Updated on 19/11/2024