V6 Migration Guide
What are the key changes?
The latest version of the News API is largely backwards compatible, meaning your existing workflows/queries will function with some minor tweaks. Follow the steps below to update your workflow to the latest version of the API and additionally take advantage of some of the new features.
Basic Migration Guide:
New Features Guide
Beyond a basic migration to the latest version of the News API, there are also some key improvements that can be taken advantage of as part of this release:
New advanced search functionality:
- By default an ‘AND’ operator applies between each unique HTTP parameter with the separator '&' symbol in the request URL.
- This is an issue when looking to apply more advanced logic such as an ‘OR’ between fields.
- The full capability of AQL enables these types of queries, across all fields and within nested objects.
Improved error handling and messaging
- The News API lacked error messaging for basic syntax errors such as misspelling a parameter name. This resulted in scenarios where parameters could be ignored in a query and a HTTP OK 200 code being returned instead of an error.
- Since all queries now get parsed as AQL in the backend this has enabled query validation and accurate error messaging.
Flat search input fields with AQL parsing
- Currently the News API supports flat-param search and AQL search via the ‘aql’ parameter only. Flat search refers to simple searches with a key-value pair structure i.e.
parameter=value
. Whereas 'aql' caters for advanced queries with boolean logicaql=parameter_1:(value OR value) AND parameter_2:(value NOT value)
- The ‘aql’ parameter is a single string and can become lengthy especially when searching for lists of items such as entities or categories.
- Both approaches have their advantages and this is the basis for the new unified syntax.
- The latest release has the simplicity of a flat input for the user, while being compiled as AQL in the backend.
- Currently the News API supports flat-param search and AQL search via the ‘aql’ parameter only. Flat search refers to simple searches with a key-value pair structure i.e.
Sample Use Cases/Code Snippets
- Advanced Search Example - Multilingual Search: For example, let’s say that you want to search for the text ‘money laundering’ across all articles. The ‘text’ field applies to only the original language of the article, therefore the ‘translations’ field is also required to search for ‘money laundering’ as a translated piece of text in translated articles.
- With the flat paradigm search this isn’t possible since ‘AND’ logic automatically applies between each parameter e.g.
#Sample query with old url endpoint and flat params
https://api.aylien.com/news/stories?text=("money laundering")&translations.en.body=("money laundering")'
- Therefore it requires leveraging the power of AQL to apply an ‘OR’ condition between params as follows:
#Upgraded query with V6 url endpoint and 'aql' parameter
https://api.aylien.com/v6/news/stories?aql=(text:"money laundering") OR (translations.en.body:"money laundering")
- Improved Error Handling and Messaging - Basic Syntax Errors: As referenced above proper error handling is something that the platform has lacked to date. The new capabilities in this area can be illustrated through a minor syntax error in a query, in this case misspelling a parameter name. Debugging becomes much simpler with accurate response codes and detailed error messages.
- For example misspelling the name of a parameter (in this case source location) with the old url endpoint returns an ‘200 Ok’ HTTP response code, in fact the query returns a result set that is inaccurate since it completely ignores the parameter
https://api.aylien.com/news/stories?source.location.country[]=US
response.status_code
200
- Whereas executing this query with the new V6 endpoint url returns a detailed error message that can quickly be debugged and resolved by referring to the documentation.
https://api.aylien.com/v6/news/stories?source.location.country[]=US
[{'code': 'KB400', 'detail': 'Unsupported query parameter', 'id': 'stories_params_source.location.country', 'links': {'about': 'https://docs.aylien.com/newsapi/#error-codes-amp-responses ', 'docs': 'https://docs.aylien.com/newsapi '}, 'status': 400, 'title': 'Bad Request', 'type': 'http://httpstatus.es/400 '}]}
- Flat search input fields with AQL parsing
- Inputting all parameters to a long AQL string isn’t the best user experience in terms of ease of writing or legibility of queries. However it’s important to maintain the advanced querying capability that AQL offers. Therefore the latest iteration of News API syntax caters for both as illustrated through these examples. This example shows the current query structure where all parameters are passed through the ‘aql’ string. It has been evident through user feedback that this is challenging to decompose and edit where necessary
'aql': 'language:(en) AND entities:({{id:Q95 AND overall_prominence:>=0.65}} OR {{id:Q312 AND overall_prominence:>=0.65}} OR {{id:Q2283 AND overall_prominence:>=0.65}} OR {{id:Q355 AND overall_prominence:>=0.65}}) AND categories:({{taxonomy:aylien AND id:ay.impact.joint}} OR {{taxonomy:aylien AND id:ay.biz.manda}} OR {{taxonomy:aylien AND id:ay.impact.ops}} OR {{taxonomy:aylien AND id:ay.biz.regulat}}) AND source.name.keyword:("Techcrunch" OR "FinExtra")'
- The new flat parameter structure allows you to submit each field individually including nested fields such as entities. Boolean operators are also accepted within the flat params, since all of these get parsed as AQL in the backend. It’s still possible to default back to the ‘aql' parameter for more advanced queries (example above with multilingual search).
'language':'en',
'entities': '{{id:Q95 AND overall_prominence:>=0.65}} OR {{id:Q312 AND overall_prominence:>=0.65}} OR {{id:Q2238 AND overall_prominence:>=0.65}} OR {{id:Q355 AND overall_prominence:>=0.65 }}',
'categories':'{{taxonomy:aylien AND id:ay.biz.regulat}} OR {{taxonomy:aylien AND id:ay.biz.manda}} OR {{taxonomy:aylien AND id:ay.impact.ops}} OR {{taxonomy:aylien AND id:ay.biz.regulat}}',
'source.name' : '("Techcrunch" OR "FinExtra" OR "CNBC")'
Deprecations
- SDK Deprecations - Follow our SDK Migration Guide.
- Legacy Category Fields -
categories.taxonomy
,categories.id
,categories.level
,categories.label
to be replaced by categories flat param or AQL e.g.categories={{taxonomy:XXX AND id=XXX}}
- Entities V2/V3 Params -
entities.title.text
,entities.title.type
,entities.title.links.dbpedia
,entities.body.text
,entities.body.type
,entities.body.links.dbpedia
replaced by entities V3.1 syntax through flat param or AQL e.g.entities={{surface_forms.text: "Amazon" AND element:title AND type:Organization AND overall_prominence:>=0.65}}
Entities V3 JSON migrating to the latest entities model
Entity Model Update
As of July 2021, we have updated our entity model (Entities V3.1). All legacy models will be deprecated on 5 April 2023.
Customers currently leveraging legacy versions of this model will need to migrate to the latest version to benefit from a range of feature improvements.
This page will walk you through the differences and benefits of the new model and what you need to do to move from using the old version to the new one.
AQL & Enhanced Entity Search
With the latest entities version, users will be empowered to perform enhanced entity searches; not only will users be able to search for entities, but they will be able to supercharge their queries by availing of these features.
AQL
AYLIEN Query Language (or AQL) is a Lucene based syntax that enables users to perform even more powerful searches. You can read more about how to leverage it here.
ELSA
Previously AYLIEN supported sentiment analysis for title and body text. With our Entity Level Sentiment Analysis (ELSA) users can avail of sentiment predictions for individual entities. This can be very useful for finding positive or negative mentions of the entities you care about.
Prominence
Prominence is a prediction of how significant the mention of an entity is in a document. By using this filter, users can limit to only significant mentions of the entities they care about, reducing noise and finding the news that matters.
Element
Users can search for entities and specify what element the entity must appear in i.e. title or body.
New Entity Model Payload Structure - V3 JSON vs V3.1
There are some differences between the data payload for legacy version V3 JSON and the new V3.1 version.
In V3 JSON, if an entity is mentioned in both the title and body, there are two objects representing the entity - one in the title object and one in the body object. See example below of how the entity Messi is presented in the legacy model:
{'body': [
{'id': 'Q615',
'links': {'wikidata': 'https://www.wikidata.org/wiki/Q615',
'wikipedia': 'https://en.wikipedia.org/wiki/Lionel_Messi'},
'prominence_score': 0.9800000190734863,
'sentiment': {'confidence': 0.81, 'polarity': 'neutral'},
'surface_forms': [{'frequency': 3,
'indices': [[45, 57], [485, 497], [557, 569]],
'text': 'Lionel Messi'},
{'frequency': 5,
'indices': [[1075, 1080],
[1225, 1230],
[1512, 1517],
[1714, 1719],
[1828, 1833]],
'text': 'Messi'},
{'frequency': 1,
'indices': [[1387, 1396]],
'text': 'Leo Messi'}],
'types': ['Human']}
]
'title':[
{'id': 'Q615',
'links': {'wikidata': 'https://www.wikidata.org/wiki/Q615',
'wikipedia': 'https://en.wikipedia.org/wiki/Lionel_Messi'},
'prominence_score': 0.9800000190734863,
'sentiment': {'confidence': 0.9, 'polarity': 'neutral'},
'surface_forms': [{'frequency': 1,
'indices': [[3, 15]],
'text': 'Lionel Messi'}
]
}
In the latest version, there is one parent object representing the entity and data pertaining to the title and body elements are child objects of the entity object. See below how the Messi entity from the same document is represented in the latest model.
{'body': {'sentiment': {'confidence': 0.81, 'polarity': 'neutral'},
'surface_forms': [{'frequency': 3,
'mentions': [{'index': {'end': 57, 'start': 45},
'sentiment': {'confidence': 0.6,
'polarity': 'positive'}},
{'index': {'end': 497, 'start': 485},
'sentiment': {'confidence': 0.7,
'polarity': 'neutral'}},
{'index': {'end': 569, 'start': 557},
'sentiment': {'confidence': 0.81,
'polarity': 'neutral'}}],
'text': 'Lionel Messi'},
{'frequency': 5,
'mentions': [{'index': {'end': 1080,
'start': 1075},
'sentiment': {'confidence': 0.77,
'polarity': 'neutral'}},
{'index': {'end': 1230,
'start': 1225},
'sentiment': {'confidence': 0.86,
'polarity': 'neutral'}},
{'index': {'end': 1517,
'start': 1512},
'sentiment': {'confidence': 0.89,
'polarity': 'neutral'}},
{'index': {'end': 1719,
'start': 1714},
'sentiment': {'confidence': 0.76,
'polarity': 'positive'}},
{'index': {'end': 1833,
'start': 1828},
'sentiment': {'confidence': 0.78,
'polarity': 'positive'}}],
'text': 'Messi'},
{'frequency': 1,
'mentions': [{'index': {'end': 1396,
'start': 1387},
'sentiment': {'confidence': 0.92,
'polarity': 'positive'}}],
'text': 'Leo Messi'}]},
'external_ids': {},
'id': 'Q615',
'links': {'wikidata': 'https://www.wikidata.org/wiki/Q615',
'wikipedia': 'https://en.wikipedia.org/wiki/Lionel_Messi'},
'overall_frequency': 10,
'overall_prominence': 0.98,
'overall_sentiment': {'confidence': 0.86, 'polarity': 'neutral'},
'stock_tickers': [],
'title': {'sentiment': {'confidence': 0.9, 'polarity': 'neutral'},
'surface_forms': [{'frequency': 1,
'mentions': [{'index': {'end': 15, 'start': 3},
'sentiment': {'confidence': 0.9,
'polarity': 'neutral'}}],
'text': 'Lionel Messi'}]},
'types': ['Human']}
This new structure makes interrogating the data in post processing more efficient and also enables us to introduce aggregate measures - see below.
New Features
As you can see from the payload examples above, the latest version of the entities model has additional features.
overall_sentiment
An aggregate sentiment prediction for the entity over the whole document.
The previous version of entities had separate sentiment predictions for both title and body.
Granular sentiment predictions
Users can avail of a sentiment prediction for each individual mention of the entity - previously there was only an element level prediction.
overall_prominence
Prominence is a prediction of how significant the mention of an entitiy is. Users can leverage this to search for significant mentions of the entities they care about, which is very useful for reducing noise. The previous version of entities had separate prominence predictions for both title and body elements.
Query format
In the previous entity model, users could perform enhanced entity searches using the following JSON format.
params = {
'published_at.start': 'NOW-5DAYS'
, 'published_at.end': 'NOW'
, "query": json.dumps(
{"$or":[
{"entity":
{ "$and" :[
{"surface_forms.text":{"$text":"Biden"}},
]}
},
{"entity":
{ "$and" :[
{"surface_forms.text":{"$text":"Washington"}},
]}
}
]}
),
"language": ["en"]
}
This enabled users to query features using complex boolean relationships. However, it was very complicated to write, read and debug.
In the new version, AYLIEN have made it easy to write enhanced entity searches using our AQL language using a Lucene syntax. Note the same query written below with the newest version.
params = {
'published_at.start': 'NOW-5DAYS'
, 'published_at.end': 'NOW'
, 'aql': 'entities{{surface_forms: Biden}} AND entities{{surface_forms: Washington}}',
"language": ["en"]
}
Needless to say, this version is much more digestible and intuitive. Users can more easily write, read and debug their query criteria.
What do I need to do to Migrate?
Users on older versions that want to move to the latest version will need to take action in three ways:
Query Format
If you are currently querying the News API using a legacy entitiy model, you will need to update your queries to leverage the latest syntax.
Post Processing
If you are post processing your data payloads to extract entity-level information, you will need to update your algorithms to reflect the newest data structure.
Switching On the New Version & Support
The AYLIEN team will be happy top provide help and support in migrating your queries before updating your account to leverage the latest version.
Please contact customer.support@aylien.com with any queries.
Entities V2 migrating to the latest entities model
Entity Model Update
As of July 2021, we have updated our entity model (Entities V3.1). All legacy models will be deprecated on 5 April 2023.
Customers currently leveraging legacy versions of this model will need to migrate to the latest version to benefit from a range of feature improvements.
This page will walk you through the differences and benefits of the new model and what you need to do to move from using the old version to the new one.
DBPedia vs Wikidata Trained Model
While the legacy Entities V2 model was trained on DBPedia, the newest model is trained on Wikidata.
Training the entity model on Wikidata has increased the number of entities from 1.6 to 5.6 million.
This means end users can benefit from an even wider range of named entities.
AYLIEN can also expand the entities knowledge base at customer request.
AQL & Enhanced Entity Search
With the latest entities version, users will be empowered to perform enhanced entity searches; not only will users be able to search for entities, but they will be able to supercharge their queries by availing of these features.
AQL
AYLIEN Query Language (or AQL) is a Lucene based syntax that enables users to perform even more powerful searches. You can read more about how to leverage it here.
ELSA
Previously AYLIEN supported sentiment analysis for title and body text. With our Entity Level Sentiment Analysis (ELSA) users can avail of sentiment predictions for individual entities. This can be very useful for finding positive or negative mentions of the entities you care about.
Prominence
Prominence is a prediction of how significant the mention of an entity is in a document. By using this filter, users can limit to only significant mentions of the entities they care about, reducing noise and finding the news that matters.
Element
Users can search for entities and specify what element the entity must appear in i.e. title or body.
New Entity Model Payload Structure - V2 vs V3.1
There are some differences between the data payload for versions V2 and V3.1.
As mentioned above, DBPedia was used to train the V2 knowledge base and DBPedia links can be used to identify the entities.
Entity objects in V2 are children of title and body elements i.e. if an entity is mentioned in the title and body of a document it will be represented in both the title and body arrays.
There is a separate entity object for each distinct entity surface form, even if they link to the same DBPedia entity e.g. Messi
vs Leo Messi
vs Lionel Messi
are all
represented as separate entity objects. See example below.
{'title': [
{'indices': [[3, 15]],
'links': {'dbpedia': 'http://dbpedia.org/resource/Lionel_Messi'},
'text': 'Lionel Messi',
'types': ['Athlete', 'Agent', 'SoccerPlayer', 'Person', 'Footballer']}
]},
{'body': [
{'indices': [[45, 57], [485, 497], [557, 569]],
'links': {'dbpedia': 'http://dbpedia.org/resource/Lionel_Messi'},
'text': 'Lionel Messi',
'types': ['Athlete',
'Agent',
'SoccerPlayer',
'Person',
'Organisation',
'Footballer']}
{'indices': [[52, 57],
[492, 497],
[564, 569],
[1075, 1080],
[1225, 1230],
[1391, 1396],
[1512, 1517],
[1714, 1719],
[1828, 1833]],
'links': {'dbpedia': 'http://dbpedia.org/resource/Lionel_Messi'},
'text': 'Messi',
'types': ['Athlete',
'Agent',
'SoccerPlayer',
'Person',
'Organisation',
'Footballer']}
{'indices': [[1387, 1396]],
'links': {'dbpedia': 'http://dbpedia.org/resource/Lionel_Messi'},
'text': 'Leo Messi',
'types': ['Athlete', 'Agent', 'SoccerPlayer', 'Person', 'Footballer']}
]}
In the latest version, there is one parent object representing the entity and data pertaining to the title and body elements are child objects of the entity object. All surface form variations that link to the entity are part of the same entity object.
The new version also has the additional entity enrichments of Entity Level Sentiment Analysis and prominence, as mentioned above.
See below how the Messi entity from the same document is represented in the latest model.
{'body': {'sentiment': {'confidence': 0.81, 'polarity': 'neutral'},
'surface_forms': [{'frequency': 3,
'mentions': [{'index': {'end': 57, 'start': 45},
'sentiment': {'confidence': 0.6,
'polarity': 'positive'}},
{'index': {'end': 497, 'start': 485},
'sentiment': {'confidence': 0.7,
'polarity': 'neutral'}},
{'index': {'end': 569, 'start': 557},
'sentiment': {'confidence': 0.81,
'polarity': 'neutral'}}],
'text': 'Lionel Messi'},
{'frequency': 5,
'mentions': [{'index': {'end': 1080,
'start': 1075},
'sentiment': {'confidence': 0.77,
'polarity': 'neutral'}},
{'index': {'end': 1230,
'start': 1225},
'sentiment': {'confidence': 0.86,
'polarity': 'neutral'}},
{'index': {'end': 1517,
'start': 1512},
'sentiment': {'confidence': 0.89,
'polarity': 'neutral'}},
{'index': {'end': 1719,
'start': 1714},
'sentiment': {'confidence': 0.76,
'polarity': 'positive'}},
{'index': {'end': 1833,
'start': 1828},
'sentiment': {'confidence': 0.78,
'polarity': 'positive'}}],
'text': 'Messi'},
{'frequency': 1,
'mentions': [{'index': {'end': 1396,
'start': 1387},
'sentiment': {'confidence': 0.92,
'polarity': 'positive'}}],
'text': 'Leo Messi'}]},
'external_ids': {},
'id': 'Q615',
'links': {'wikidata': 'https://www.wikidata.org/wiki/Q615',
'wikipedia': 'https://en.wikipedia.org/wiki/Lionel_Messi'},
'overall_frequency': 10,
'overall_prominence': 0.98,
'overall_sentiment': {'confidence': 0.86, 'polarity': 'neutral'},
'stock_tickers': [],
'title': {'sentiment': {'confidence': 0.9, 'polarity': 'neutral'},
'surface_forms': [{'frequency': 1,
'mentions': [{'index': {'end': 15, 'start': 3},
'sentiment': {'confidence': 0.9,
'polarity': 'neutral'}}],
'text': 'Lionel Messi'}]},
'types': ['Human']}
This new structure makes interrogating the data more powerful and efficient through
- new enrichments enabling more enhaced searches
- collapsing all entity data into one object, making post processing easier and more efficient
Query format
In the previous entity V2 model, users were limited to searchin for entities in an additive and extractive manner. Below the query is searching for isntances of Messi OR FIFA in the title, but that do not mention Ronaldo in the title.
params = {
'entities.title.links.dbpedia[]': ['http://dbpedia.org/resource/Lionel_Messi', 'http://dbpedia.org/resource/FIFA']
, '!entities.title.links.dbpedia[]': ['http://dbpedia.org/resource/Cristiano_Ronaldo']
}
In the new version, AYLIEN have made it easy to write enhanced entity searches using our AQL language using a Lucene syntax. Note the use of boolean relationships AND, NOT, searching for specific sentiment, searching by element and searching by both entity ID and entity surface form.
params = {
'aql' : 'entities: {{element:title AND id:(Q615) AND sentiment:(positive)}} AND entities:{{id:(Q253414)}} NOT entities{{surface_forms.text: "Cristiano Ronaldo"}}'
}
This empowers users to be even more specific for the type of news they want to retrieve.
What do I need to do to Migrate?
Users on older versions that want to move to the latest version will need to take action in three ways:
Query Format
If you are currently querying the News API using a legacy entitiy model, you will need to update your queries to leverage the latest syntax.
Post Processing
If you are post processing your data payloads to extract entity-level information, you will need to update your algorithms to reflect the newest data structure.
SDK Migration Guide
What is happening?
With the release of the latest version of the News API, we have decided to move away from the SDK paradigm. There are a few key reasons for this but mostly the goal was to improve our user experience: - It has allowed us to simplify our Docs which are no longer crowded with code examples from each SDK. - Having multiple versions of SDK’s made it confusing for customers and a burden when needing to update to the latest one. - All customers will immediately benefit from new features and improvements since there will no longer be a delay between the roll out on standard News API and the implementation in SDKs.
How will this affect you?
- SDK’s (including the latest versions) won’t be compatible with our V6/ endpoints.
- Customers will need to move away from SDKs, by migrating to standard HTTP requests.
What is the recommended solution?
- If you’re a Python SDK user our updated Docs contain a number of useful code snippets leveraging Python HTTP requests.
- You will need to replace SDK specific syntax (as below) with standard HTTP syntax
- Or alternatively you can generate your own SDK with OpenAPI Generator
Topic | Deprecated SDK | V6 Raw HTTP |
---|---|---|
Authentication | Basic Config of App ID and App Key | OAuth Bearer Token |
Param Labels | Follow an underscore naming convention e.g 'published_at_start' |
Different labels with dots instead of underscores for param names e.g. 'published_at.start' |
Exception Handling | Built in out of the box for a variety of errors. | Need to write exception handling code. Example in common workflows. |
Endpoints | Method to call each endpoint e.g. api_instance.list_time_series(**opts) | Raw requests need to define the specific endpoint URL e.g. requests.get('https://api.aylien.com/news/time_series', params=params, headers=headers) |
When will SDK’s be deprecated?
- These will be fully deprecated by 31st March 2024.
- Up until to this deadline SDK’s will be available to use with the old version of the News API. However we will provide comprehensive support to enable customers to adopt V6 well ahead of March 2024.
Switching On the New Version & Support
The AYLIEN team will be happy top provide help and support in migrating your queries before updating your account to leverage the latest version.
Please contact customer.support@aylien.com with any queries.