Entity Model Update

As of July 2021, we have updated our entity model (Entities V3.1). All legacy models will be deprecated on 5 April 2023.

Customers currently leveraging legacy versions of this model will need to migrate to the latest version to benefit from a range of feature improvements.

This page will walk you through the differences and benefits of the new model and what you need to do to move from using the old version to the new one.

With the latest entities version, users will be empowered to perform enhanced entity searches; not only will users be able to search for entities, but they will be able to supercharge their queries by availing of these features.

AQL

AYLIEN Query Language (or AQL) is a Lucene based syntax that enables users to perform even more powerful searches. You can read more about how to leverage it here.

ELSA

Previously AYLIEN supported sentiment analysis for title and body text. With our Entity Level Sentiment Analysis (ELSA) users can avail of sentiment predictions for individual entities. This can be very useful for finding positive or negative mentions of the entities you care about.

Prominence

Prominence is a prediction of how significant the mention of an entity is in a document. By using this filter, users can limit to only significant mentions of the entities they care about, reducing noise and finding the news that matters.

Element

Users can search for entities and specify what element the entity must appear in i.e. title or body.

New Entity Model Payload Structure - V3 JSON vs V3.1

There are some differences between the data payload for legacy version V3 JSON and the new V3.1 version.

In V3 JSON, if an entity is mentioned in both the title and body, there are two objects representing the entity - one in the title object and one in the body object. See example below of how the entity Messi is presented in the legacy model:

{'body': [
          {'id': 'Q615',
           'links': {'wikidata': 'https://www.wikidata.org/wiki/Q615',
                     'wikipedia': 'https://en.wikipedia.org/wiki/Lionel_Messi'},
           'prominence_score': 0.9800000190734863,
           'sentiment': {'confidence': 0.81, 'polarity': 'neutral'},
           'surface_forms': [{'frequency': 3,
                              'indices': [[45, 57], [485, 497], [557, 569]],
                              'text': 'Lionel Messi'},
                             {'frequency': 5,
                              'indices': [[1075, 1080],
                                          [1225, 1230],
                                          [1512, 1517],
                                          [1714, 1719],
                                          [1828, 1833]],
                              'text': 'Messi'},
                             {'frequency': 1,
                              'indices': [[1387, 1396]],
                              'text': 'Leo Messi'}],
           'types': ['Human']}
           ]
'title':[
        {'id': 'Q615',
            'links': {'wikidata': 'https://www.wikidata.org/wiki/Q615',
                      'wikipedia': 'https://en.wikipedia.org/wiki/Lionel_Messi'},
            'prominence_score': 0.9800000190734863,
            'sentiment': {'confidence': 0.9, 'polarity': 'neutral'},
            'surface_forms': [{'frequency': 1,
                               'indices': [[3, 15]],
                               'text': 'Lionel Messi'}
        ]
}

In the latest version, there is one parent object representing the entity and data pertaining to the title and body elements are child objects of the entity object. See below how the Messi entity from the same document is represented in the latest model.

{'body': {'sentiment': {'confidence': 0.81, 'polarity': 'neutral'},
           'surface_forms': [{'frequency': 3,
                              'mentions': [{'index': {'end': 57, 'start': 45},
                                            'sentiment': {'confidence': 0.6,
                                                          'polarity': 'positive'}},
                                           {'index': {'end': 497, 'start': 485},
                                            'sentiment': {'confidence': 0.7,
                                                          'polarity': 'neutral'}},
                                           {'index': {'end': 569, 'start': 557},
                                            'sentiment': {'confidence': 0.81,
                                                          'polarity': 'neutral'}}],
                              'text': 'Lionel Messi'},
                             {'frequency': 5,
                              'mentions': [{'index': {'end': 1080,
                                                      'start': 1075},
                                            'sentiment': {'confidence': 0.77,
                                                          'polarity': 'neutral'}},
                                           {'index': {'end': 1230,
                                                      'start': 1225},
                                            'sentiment': {'confidence': 0.86,
                                                          'polarity': 'neutral'}},
                                           {'index': {'end': 1517,
                                                      'start': 1512},
                                            'sentiment': {'confidence': 0.89,
                                                          'polarity': 'neutral'}},
                                           {'index': {'end': 1719,
                                                      'start': 1714},
                                            'sentiment': {'confidence': 0.76,
                                                          'polarity': 'positive'}},
                                           {'index': {'end': 1833,
                                                      'start': 1828},
                                            'sentiment': {'confidence': 0.78,
                                                          'polarity': 'positive'}}],
                              'text': 'Messi'},
                             {'frequency': 1,
                              'mentions': [{'index': {'end': 1396,
                                                      'start': 1387},
                                            'sentiment': {'confidence': 0.92,
                                                          'polarity': 'positive'}}],
                              'text': 'Leo Messi'}]},
  'external_ids': {},
  'id': 'Q615',
  'links': {'wikidata': 'https://www.wikidata.org/wiki/Q615',
            'wikipedia': 'https://en.wikipedia.org/wiki/Lionel_Messi'},
  'overall_frequency': 10,
  'overall_prominence': 0.98,
  'overall_sentiment': {'confidence': 0.86, 'polarity': 'neutral'},
  'stock_tickers': [],
  'title': {'sentiment': {'confidence': 0.9, 'polarity': 'neutral'},
            'surface_forms': [{'frequency': 1,
                               'mentions': [{'index': {'end': 15, 'start': 3},
                                             'sentiment': {'confidence': 0.9,
                                                           'polarity': 'neutral'}}],
                               'text': 'Lionel Messi'}]},
  'types': ['Human']}

This new structure makes interrogating the data in post processing more efficient and also enables us to introduce aggregate measures - see below.

New Features

As you can see from the payload examples above, the latest version of the entities model has additional features.

overall_sentiment

An aggregate sentiment prediction for the entity over the whole document.

The previous version of entities had separate sentiment predictions for both title and body.

Granular sentiment predictions

Users can avail of a sentiment prediction for each individual mention of the entity - previously there was only an element level prediction.

overall_prominence

Prominence is a prediction of how significant the mention of an entitiy is. Users can leverage this to search for significant mentions of the entities they care about, which is very useful for reducing noise. The previous version of entities had separate prominence predictions for both title and body elements.

Query format

In the previous entity model, users could perform enhanced entity searches using the following JSON format.

params = {
'published_at.start': 'NOW-5DAYS'
, 'published_at.end': 'NOW'
  ,  "query": json.dumps(
            {"$or":[ 
                {"entity":
                    { "$and" :[
                        {"surface_forms.text":{"$text":"Biden"}},
                        ]}
                        },
                {"entity":
                    { "$and" :[ 
                        {"surface_forms.text":{"$text":"Washington"}},
                        ]}
                        }
                    ]}
                    ),
    "language": ["en"]
}

This enabled users to query features using complex boolean relationships. However, it was very complicated to write, read and debug.

In the new version, AYLIEN have made it easy to write enhanced entity searches using our AQL language using a Lucene syntax. Note the same query written below with the newest version.

params = {
'published_at.start': 'NOW-5DAYS'
, 'published_at.end': 'NOW'
,  'aql': 'entities{{surface_forms: Biden}} AND entities{{surface_forms: Washington}}',
    "language": ["en"]
}

Needless to say, this version is much more digestible and intuitive. Users can more easily write, read and debug their query criteria.

What do I need to do to Migrate?

Users on older versions that want to move to the latest version will need to take action in three ways:

Query Format

If you are currently querying the News API using a legacy entitiy model, you will need to update your queries to leverage the latest syntax.

Post Processing

If you are post processing your data payloads to extract entity-level information, you will need to update your algorithms to reflect the newest data structure.

Migration from SDKs

AYLIEN are deprecating our SDKs on date 15 March 2023.

In order to avail of the latest features, users will need to migrate away from the SDK and towards directly calling the HTTP API endpoints.

Switching On the New Version & Support

The AYLIEN team will be happy top provide help and support in migrating your queries before updating your account to leverage the latest version.

Please contact customer.support@aylien.com with any queries.