Entity Model Update

As of July 2021, we have updated our entity model (Entities V3.1). All legacy models will be deprecated on 5 April 2023.

Customers currently leveraging legacy versions of this model will need to migrate to the latest version to benefit from a range of feature improvements.

This page will walk you through the differences and benefits of the new model and what you need to do to move from using the old version to the new one.

DBPedia vs Wikidata Trained Model

While the legacy Entities V2 model was trained on DBPedia, the newest model is trained on Wikidata.

Training the entity model on Wikidata has increased the number of entities from 1.6 to 5.6 million.

This means end users can benefit from an even wider range of named entities.

AYLIEN can also expand the entities knowledge base at customer request.

With the latest entities version, users will be empowered to perform enhanced entity searches; not only will users be able to search for entities, but they will be able to supercharge their queries by availing of these features.

AQL

AYLIEN Query Language (or AQL) is a Lucene based syntax that enables users to perform even more powerful searches. You can read more about how to leverage it here.

ELSA

Previously AYLIEN supported sentiment analysis for title and body text. With our Entity Level Sentiment Analysis (ELSA) users can avail of sentiment predictions for individual entities. This can be very useful for finding positive or negative mentions of the entities you care about.

Prominence

Prominence is a prediction of how significant the mention of an entity is in a document. By using this filter, users can limit to only significant mentions of the entities they care about, reducing noise and finding the news that matters.

Element

Users can search for entities and specify what element the entity must appear in i.e. title or body.

New Entity Model Payload Structure - V2 vs V3.1

There are some differences between the data payload for versions V2 and V3.1.

As mentioned above, DBPedia was used to train the V2 knowledge base and DBPedia links can be used to identify the entities.

Entity objects in V2 are children of title and body elements i.e. if an entity is mentioned in the title and body of a document it will be represented in both the title and body arrays.

There is a separate entity object for each distinct entity surface form, even if they link to the same DBPedia entity e.g. Messi vs Leo Messi vs Lionel Messi are all represented as separate entity objects. See example below.

{'title': [
        {'indices': [[3, 15]],
         'links': {'dbpedia': 'http://dbpedia.org/resource/Lionel_Messi'},
         'text': 'Lionel Messi',
         'types': ['Athlete', 'Agent', 'SoccerPlayer', 'Person', 'Footballer']}
        ]},
{'body': [
        {'indices': [[45, 57], [485, 497], [557, 569]],
         'links': {'dbpedia': 'http://dbpedia.org/resource/Lionel_Messi'},
         'text': 'Lionel Messi',
         'types': ['Athlete',
                   'Agent',
                   'SoccerPlayer',
                   'Person',
                   'Organisation',
                   'Footballer']}
        {'indices': [[52, 57],
                     [492, 497],
                     [564, 569],
                     [1075, 1080],
                     [1225, 1230],
                     [1391, 1396],
                     [1512, 1517],
                     [1714, 1719],
                     [1828, 1833]],
         'links': {'dbpedia': 'http://dbpedia.org/resource/Lionel_Messi'},
         'text': 'Messi',
         'types': ['Athlete',
                   'Agent',
                   'SoccerPlayer',
                   'Person',
                   'Organisation',
                   'Footballer']}
        {'indices': [[1387, 1396]],
         'links': {'dbpedia': 'http://dbpedia.org/resource/Lionel_Messi'},
         'text': 'Leo Messi',
         'types': ['Athlete', 'Agent', 'SoccerPlayer', 'Person', 'Footballer']}
        ]}

In the latest version, there is one parent object representing the entity and data pertaining to the title and body elements are child objects of the entity object. All surface form variations that link to the entity are part of the same entity object.

The new version also has the additional entity enrichments of Entity Level Sentiment Analysis and prominence, as mentioned above.

See below how the Messi entity from the same document is represented in the latest model.

{'body': {'sentiment': {'confidence': 0.81, 'polarity': 'neutral'},
           'surface_forms': [{'frequency': 3,
                              'mentions': [{'index': {'end': 57, 'start': 45},
                                            'sentiment': {'confidence': 0.6,
                                                          'polarity': 'positive'}},
                                           {'index': {'end': 497, 'start': 485},
                                            'sentiment': {'confidence': 0.7,
                                                          'polarity': 'neutral'}},
                                           {'index': {'end': 569, 'start': 557},
                                            'sentiment': {'confidence': 0.81,
                                                          'polarity': 'neutral'}}],
                              'text': 'Lionel Messi'},
                             {'frequency': 5,
                              'mentions': [{'index': {'end': 1080,
                                                      'start': 1075},
                                            'sentiment': {'confidence': 0.77,
                                                          'polarity': 'neutral'}},
                                           {'index': {'end': 1230,
                                                      'start': 1225},
                                            'sentiment': {'confidence': 0.86,
                                                          'polarity': 'neutral'}},
                                           {'index': {'end': 1517,
                                                      'start': 1512},
                                            'sentiment': {'confidence': 0.89,
                                                          'polarity': 'neutral'}},
                                           {'index': {'end': 1719,
                                                      'start': 1714},
                                            'sentiment': {'confidence': 0.76,
                                                          'polarity': 'positive'}},
                                           {'index': {'end': 1833,
                                                      'start': 1828},
                                            'sentiment': {'confidence': 0.78,
                                                          'polarity': 'positive'}}],
                              'text': 'Messi'},
                             {'frequency': 1,
                              'mentions': [{'index': {'end': 1396,
                                                      'start': 1387},
                                            'sentiment': {'confidence': 0.92,
                                                          'polarity': 'positive'}}],
                              'text': 'Leo Messi'}]},
  'external_ids': {},
  'id': 'Q615',
  'links': {'wikidata': 'https://www.wikidata.org/wiki/Q615',
            'wikipedia': 'https://en.wikipedia.org/wiki/Lionel_Messi'},
  'overall_frequency': 10,
  'overall_prominence': 0.98,
  'overall_sentiment': {'confidence': 0.86, 'polarity': 'neutral'},
  'stock_tickers': [],
  'title': {'sentiment': {'confidence': 0.9, 'polarity': 'neutral'},
            'surface_forms': [{'frequency': 1,
                               'mentions': [{'index': {'end': 15, 'start': 3},
                                             'sentiment': {'confidence': 0.9,
                                                           'polarity': 'neutral'}}],
                               'text': 'Lionel Messi'}]},
  'types': ['Human']}

This new structure makes interrogating the data more powerful and efficient through

  • new enrichments enabling more enhaced searches
  • collapsing all entity data into one object, making post processing easier and more efficient

Query format

In the previous entity V2 model, users were limited to searchin for entities in an additive and extractive manner. Below the query is searching for isntances of Messi OR FIFA in the title, but that do not mention Ronaldo in the title.

params = {
  'entities.title.links.dbpedia[]': ['http://dbpedia.org/resource/Lionel_Messi', 'http://dbpedia.org/resource/FIFA']
  , '!entities.title.links.dbpedia[]': ['http://dbpedia.org/resource/Cristiano_Ronaldo']
}

In the new version, AYLIEN have made it easy to write enhanced entity searches using our AQL language using a Lucene syntax. Note the use of boolean relationships AND, NOT, searching for specific sentiment, searching by element and searching by both entity ID and entity surface form.

params = {
  'aql' : 'entities: {{element:title AND id:(Q615) AND sentiment:(positive)}} AND entities:{{id:(Q253414)}} NOT entities{{surface_forms.text: "Cristiano Ronaldo"}}'
} 

This empowers users to be even more specific for the type of news they want to retrieve.

What do I need to do to Migrate?

Users on older versions that want to move to the latest version will need to take action in three ways:

Query Format

If you are currently querying the News API using a legacy entitiy model, you will need to update your queries to leverage the latest syntax.

Post Processing

If you are post processing your data payloads to extract entity-level information, you will need to update your algorithms to reflect the newest data structure.

Migration from SDKs

AYLIEN are deprecating our SDKs in 15 March 2023.

In order to avail of the latest features, users will need to migrate away from the SDK and towards directly calling the HTTP API endpoints.

Switching On the New Version & Support

The AYLIEN team will be happy top provide help and support in migrating your queries before updating your account to leverage the latest version.

Please contact customer.support@aylien.com with any queries.