Entity Object updates

As of July 2021, we have updated the structure of our new entity objects, as well as introducing entity prominence and frequency, to improve News API users' experience, with data backfilled to August 27th, 2020. While searching for articles using either entity frequency or prominence allows users to retrieve articles that are relevant to their entities of interest, querying for articles using AQL facilitates enhanced entity-related queries.

If you have been using entities in your workflow prior to this date (either as search parameters or in the story objects you return), you will need to update your workflow to leverage the new objects and avoid problems with your workflow. This page will walk through what you need to do to move from using the old entity object to using the new one.

Updates to the entity object

The main changes from the old entity object are:

  • Wikipedia and Wikidata links have replaced DBPedia links
  • Entity types have been refined and updated
  • Sentiment is now predicted for every entity
  • Stock tickers have been added to the entity object (where applicable)
  • Entities are now recognised as being in articles as a whole, and stated as being in article titles and/or bodies.
  • Entity prominence, a measure of how prominence an entity is in an article, is now calculated for every entity.
  • Entity frequency, the number of mentions an entity has in an article, is calculated for each entity.

Old entity object:

{
    'title':
    {
        'indices': [
            [0, 6]
        ],
        'links': {'dbpedia': 'http://dbpedia.org/resource/Google'},
        'text': 'Google',
        'types': [
            'Organisation', 
            'Company',
        ]
    },
    'body':
    {
        'indices': [
            [829, 835]
        ],
        'links': {'dbpedia': 'http://dbpedia.org/resource/Google'},
        'text': 'Google',
        'types': [
            'Organisation', 
            'Company',
        ]
    }    

}

New entity object

{
    'title':
    {
        'id': 'Q95',
        'links': {'wikipedia': 'https://en.wikipedia.org/wiki/Google',
        'wikidata': 'https://www.wikidata.org/wiki/Q95'},
        'types': ['Organization', 'Business'],
        'sentiment': {'polarity': 'positive', 'confidence': 0.52},
        'surface_forms': [{'text': 'Google', 'indices': [[0, 6]]}]
    },
    'body':
    {
        'id': 'Q95',
        'links': {'wikipedia': 'https://en.wikipedia.org/wiki/Google',
        'wikidata': 'https://www.wikidata.org/wiki/Q80069'},
         'types': ['Organization', 'Business'],
        'sentiment': {'polarity': 'neutral', 'confidence': 0.77},
        'surface_forms': [{'text': 'Google', 'indices': [[829, 835]]}]
    }
}

Updated entity object:

{
    'id': 'Q95',
    'links': {'wikipedia': 'https://en.wikipedia.org/wiki/Google',
    'wikidata': 'https://www.wikidata.org/wiki/Q95'},
    'stock_tickers': ['GOOG'],
    'types': ['Organization', 'Business'],
    'overall_sentiment': {'polarity': 'neutral', 'confidence': 0.77},
    'overall_prominence': 0.98,
    'overall_frequency': 3,
    'body': {
        'sentiment': {'polarity': 'neutral', 'confidence': 0.77},
        'surface_forms': [
            {
                'text': 'Google',
                'frequency': 2,
                    'mentions': [
                        {'index': {'start': 829, 'end': 835},
                            'sentiment': {'polarity': 'neutral', 'confidence': 0.7656157}},
                            {'index': {'start': 1598, 'end': 1604},
                        'sentiment': {'polarity': 'neutral', 'confidence': 0.7704393}}
                ]
            }
        ]
    },
    'title': {
        'sentiment': {'polarity': 'positive', 'confidence': 0.52},
        'surface_forms': [
            {
                'text': 'Google',
                    'frequency': 1,
                    'mentions': [
                    {
                        'index': {'start': 0, 'end': 6},
                        'sentiment': {'polarity': 'positive', 'confidence': 0.52143073}
                    }
                ]
            }
        ]
    }
}

New Entity flat search Parameters

Since the entity object has been restructured, several new parameters are availalbe for users to query the News API. All parameters can be used to exclude the given values using a '!' (e.g., "!entities_id[]": ["Q2283"] returns stories excluding Microsoft as an entity).

"entities_id": ["Q2283"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with its ID
"entities_types": ["Organization"] ## returns stories that mention Organization-type entities in the story
"entities_stock_ticker": ["MSFT"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with its stock ticker
"entities_links_wikipedia": ["https://en.wikipedia.org/wiki/Microsoft"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with its appropriate Wikipedia link
"entities_links_wikidata": ["https://www.wikidata.org/wiki/Q2283"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with its appropriate Wikidata link
"entities_surface_forms_text": ["Microsoft"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with the surface form 'Microsoft' 
entities_id: ["Q2283"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with its ID
entities_types: ["Organization"] ## returns stories that mention Organization-type entities in the story
entities_stock_ticker: ["MSFT"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with its stock ticker
entities_links_wikipedia: ["https://en.wikipedia.org/wiki/Microsoft"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with its appropriate Wikipedia link
entities_links_wikidata: ["https://www.wikidata.org/wiki/Q2283"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with its appropriate Wikidata link
entities_surface_forms_text: ["Microsoft"] ## returns stories that mention Microsoft in the story, where that entity has been tagged with the surface form 'Microsoft'
EntitiesId: optional.NewInterface([]string{"Q2283"}) // returns stories that mention Microsoft in the story, where that entity has been tagged with its ID
EntitiesTypes: optional.NewInterface([]string{"Organization"}) // returns stories that mention Organization-type entities in the story
EntitiesStockTicker: optional.NewInterface([]string{"MSFT"}) // returns stories that mention Microsoft in the story, where that entity has been tagged with its stock ticker
EntitiesLinksWikipedia: optional.NewInterface([]string{"https://en.wikipedia.org/wiki/Microsoft"}) // returns stories that mention Microsoft in the story, where that entity has been tagged with its appropriate Wikipedia link
EntitiesLinksWikidata: optional.NewInterface([]string{"https://www.wikidata.org/wiki/Q2283"}) // returns stories that mention Microsoft in the story, where that entity has been tagged with its appropriate Wikidata link
EntitiesSurfaceFormsText: optional.NewInterface([]string{"Microsoft"}) // returns stories that mention Microsoft in the story, where that entity has been tagged with the surface form 'Microsoft'
:entities_id => ['Q2283'] // returns stories that mention Microsoft in the article, where that entity has been tagged with its ID
:entities_types => ['Organization'] // returns stories that mention Organization-type entities in the story
:entities_stock_ticker => ['MSFT'] // returns stories that mention Microsoft in the article, where that entity has been tagged with its stock ticker
:entities_links_wikipedia => ['https://en.wikipedia.org/wiki/Microsoft'] // returns stories that mention Microsoft in the story, where that entity has been tagged with its appropriate Wikipedia link
:entities_links_wikidata => ['https://www.wikidata.org/wiki/Q2283'] // returns stories that mention Microsoft in the story, where that entity has been tagged with its appropriate Wikidata link
:entities_surface_forms_text => ['Microsoft'] //  returns stories that mention Microsoft in the story, where that entity has been tagged with the surface form 'Microsoft'

Several parameters have been removed from the entity object and are no longer usable for querying the News API. The parameters used to exclude values are also no longer available for querying.

"entities_title_id": ["Q2283"] ## no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its ID
"entities_body_id": ["Q2283"] ## no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its ID
"entities_title_types": ["Organization"] ## no longer returns stories that mention Organization-type entities in the title
"entities_body_types": ["Organization"] ## no longer returns stories that mention Organization-type entities in the body
"entities_title_stock_ticker": ["MSFT"] ## no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its stock ticker
"entities_body_stock_ticker": ["MSFT"] ## no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its stock ticker
"entities_title_links_wikipedia": ["https://en.wikipedia.org/wiki/Microsoft"] ## no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its appropriate Wikipedia link
"entities_body_links_wikipedia": ["https://en.wikipedia.org/wiki/Microsoft"] ## no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its appropriate Wikipedia link
"entities_title_links_wikidata": ["https://www.wikidata.org/wiki/Q2283"] ## no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its appropriate Wikidata link
"entities_body_links_wikidata": ["https://www.wikidata.org/wiki/Q2283"] ## no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its appropriate Wikidata link
entities_title_id: ["Q2283"] ## no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its ID
entities_body_id: ["Q2283"] ## no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its ID
entities_title_types: ["Organization"] ## no longer returns stories that mention Organization-type entities in the title
entities_body_types: ["Organization"] ## no longer returns stories that mention Organization-type entities in the body
entities_title_stock_ticker: ["MSFT"] ## no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its stock ticker
entities_body_stock_ticker: ["MSFT"] ## no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its stock ticker
entities_title_links_wikipedia: ["https://en.wikipedia.org/wiki/Microsoft"] ## no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its appropriate Wikipedia link
entities_body_links_wikipedia: ["https://en.wikipedia.org/wiki/Microsoft"] ## no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its appropriate Wikipedia link
entities_title_links_wikidata: ["https://www.wikidata.org/wiki/Q2283"] ## no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its appropriate Wikidata link
entities_body_links_wikidata: ["https://www.wikidata.org/wiki/Q2283"] ## no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its appropriate Wikidata link
EntitiesTitleId: optional.NewInterface([]string{"Q2283"}) // no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its ID
EntitiesBodyId: optional.NewInterface([]string{"Q2283"}) // no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its ID
EntitiesTitleTypes: optional.NewInterface([]string{"Q2283"}) // no longer returns stories that mention Organization-type entities in the title
EntitiesBodyTypes: optional.NewInterface([]string{"Q2283"}) // no longer returns stories that mention Organization-type entities in the body
EntitiesTitleStockTicker: optional.NewInterface([]string{"MSFT"}) // no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its stock ticker
EntitiesBodyStockTicker: optional.NewInterface([]string{"MSFT"}) // no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its stock ticker
EntitiesTitleLinksWikipedia: optional.NewInterface([]string{"https://en.wikipedia.org/wiki/Microsoft"}) // no longer returns stories that mention Microsoft in the title, where that entity has been tagged with its appropriate Wikipedia link
EntitiesBodyLinksWikipedia: optional.NewInterface([]string{"https://en.wikipedia.org/wiki/Microsoft"}) // no longer returns stories that mention Microsoft in the body, where that entity has been tagged with its appropriate Wikipedia link
EntitiesTitleLinksWikidata: optional.NewInterface([]string{"https://www.wikidata.org/wiki/Q2283"}) // returns stories that mention Microsoft in the title, where that entity has been tagged with its appropriate Wikidata link
EntitiesBodyLinksWikidata: optional.NewInterface([]string{"https://www.wikidata.org/wiki/Q2283"}) // returns stories that mention Microsoft in the body, where that entity has been tagged with its appropriate Wikidata link
:entities_title_id => ['Q2283'] // returns stories that mention Microsoft in the title, where that entity has been tagged with its ID
:entities_body_id => ['Q2283'] // returns stories that mention Microsoft in the body, where that entity has been tagged with its ID
:entities_title_types => ['Q2283'] // no longer returns stories that mention Organization-type entities in the title
:entities_body_types => ['Q2283'] // no longer returns stories that mention Organization-type entities in the body
:entities_title_stock_ticker => ['MSFT'] // returns stories that mention Microsoft in the title, where that entity has been tagged with its stock ticker
:entities_body_stock_ticker => ['MSFT'] // returns stories that mention Microsoft in the body, where that entity has been tagged with its stock ticker
:entities_title_links_wikipedia => ['https://en.wikipedia.org/wiki/Microsoft'] // returns stories that mention Microsoft in the title, where that entity has been tagged with its appropriate Wikipedia link
:entities_body_links_wikipedia => ['https://en.wikipedia.org/wiki/Microsoft'] // returns stories that mention Microsoft in the body, where that entity has been tagged with its appropriate Wikipedia link
:entities_title_links_wikidata => ['https://www.wikidata.org/wiki/Q2283'] // returns stories that mention Microsoft in the title, where that entity has been tagged with its appropriate Wikidata link
:entities_body_links_wikidata => ['https://www.wikidata.org/wiki/Q2283'] // returns stories that mention Microsoft in the body, where that entity has been tagged with its appropriate Wikidata link

Updating your DBpedia searches to use Wikipedia and Wikidata instead.

The new entity objects contain Wikipedia links instead of DBpedia links. To update your workflow to search by Wikipedia and Wikidata links instead of DBpedia ones, you will need to update both the parameter name and the links you are searching.

Old parameters & values:

"entities.title.links.dbpedia[]": ["http://dbpedia.org/resource/Donald_Trump"]
"entities.body.links.dbpedia[]": ["http://dbpedia.org/resource/Donald_Trump"]

New parameters & values:

"entities.links.wikipedia[]": ["https://en.wikipedia.org/wiki/Donald_Trump"]
"entities.links.wikidata[]": ["https://www.wikidata.org/wiki/Q22686"]

Note that although most DBpedia urls will map accurately to Wikipedia ones by simply substituting http://dbpedia.org/resource/ for https://en.wikipedia.org/wiki/, some will not. We recommend testing out the entities you are currently searching using DBpedia links with Wikipedia links. If you notice any discrepancy in the results returned for an entity, you should check Wikipedia for this entity's corect url.

Updating the entity types being searched

The new entity model applies more refined type data to the entities it recognises. Although conceptually similar, the types in the new entity object are slightly different. This is because Wikidata is now leveraged under the hood instead of DBpedia.

To retrieve articles that contain Apple as an Organization (and not a fruit), simply call the Stories endpoint and supply the type parameter in the AQL query parameter, as shown in the example below:

"https://api.aylien.com/news/stories?aql=entities%3A%7B%7Bsurface_forms.text%3AApple+AND+type%3AOrganization%7D%7D"

You can also test out the following common types on the new entity object, or see the full list here.

Organization Location Business Human
Country Currency Product Profession
Technology Corporation Bank Software
Financial_institution Stock_exchange U.S._state

Entity prominence and Frequency

Entity prominence is a measure of how prominent an entity is within an article. The prominence score, which ranges between 0 and 1, is a measure of how close to the top of an article the entity is mentioned, whether the entity is mentioned in the article title, and how many times the entity is mentioned in the overall article. The example query below returns articles that contain the entity Google with a prominence score between 0.7 and 1. The example query below returns articles that contain the entity Google with a prominence score between 0.7 and 1.

{"aql": "entities: {{surface_forms:Microsoft AND overall_prominence:[0.7 TO *]}}"}

Entity frequency is simply the number of mentions an entity has in an article title, body, or in the overall article. Frequency can be queried in 2 ways, either using the overall_frequency parameter to filter for articles with an entity having an overall (title + body) frequency value, and the frequency parameter along with the element parameter to filter for articles with an entity having a frequency value in either the article title or body. The example below retrieves articles including the entity with the surface forms “Trump” mentioned in articles at least twice, title and body combined.

{"aql": "entities: {{surface_forms: Trump AND overall_frequency:[2 TO *]}}"}

Entity-level Sentiment

Sentiment is now predicted at entity-level, for every entity extracted from the story's body, title, and the overall story. Each entity object contains polarity and confidence objects:

"sentiment": {"polarity": "positive", "confidence": 0.78}"

Testing out the Enhanced Search Functionality

With this new data being added to the entity object, we have added new search functionality to properly leverage this data in their searches. Specifically, we now allow users to search for content mentioning entities that meet multiple criteria. Take a look here at how you can make these queries.