Sample workflows for working with the News API with handy code snippets.

Category and Industry taxonomies

Workflow examples

Filtering articles by Smart Tagger Categories tags

The parameters below filter for articles tagged with the Aylien category ay.spec.adverse ('Adverse Events') with a relevance score ranging from 0.7 to 1:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:aylien AND id:(ay.spec.adverse) AND score: [0.7 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Alternatively, it can be searched by the category tag label:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:aylien AND label:("Adverse Events") AND score: [0.7 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Filtering articles by Smart Tagger Industries tags

Below is an example of parameters that filter for articles tagged with the in.hcare.pharma ('Pharmaceuticals') industry with a minimum relevance score of 0.7:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "industries": '{{id:(in.hcare.pharma) AND score: [0.7 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Alternatively, it can be searched by the industry tag label:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "industries": '{{label:("Pharmaceuticals") AND score: [0.7 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Combining multiple taxonomy and tags with Boolean operators

Using Boolean operators, you can also filter for articles tagged with multiple industries or categories, as shown in the parameter below:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:aylien AND label:(("Computer Science" AND "Digital Divide") OR "Artificial Intelligence") AND score: [0.6 TO 1]}}',
    "industries": '{{label:("Supercomputers") AND score:[0.6 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Filtering articles by IPTC category tags

The parameters below filter for articles tagged with the IPTC ‘economy, business, and finance' or 'arts, culture and entertainment’ category with a minimum score of 0.8:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:iptc AND id:(04000000 OR 01000000) AND score: [0.8 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Alternatively, it can be searched by the category tag label:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:iptc AND label:("economy, business and finance" OR "arts, culture and entertainment") AND score: [0.8 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Filtering articles by IAB category tags

The parameters below filter for articles tagged with the IAB ‘Human Resources' or 'Law, Gov’t & Politics’ category with a minimum score of 0.8:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:iab-qag AND id:(IAB3-9 OR IAB11) AND score:[0.8 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Alternatively, it can be searched by the category tag label:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:iab-qag AND label:("Human Resources" OR "Law, Gov’t & Politics") AND score:[0.8 TO 1]}}',
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100
}

Time indexation

Workflow examples

Searching articles with date math expression

The below query is a basic date math expression:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "per_page": 50,
}

Searching articles with exact datetime stamps

The query below is a search with a hardcoded time stamp from the bottom of a day to the very end of the last day of the period:

params = {
    "published_at": "[2023-09-07T23:59:59.999Z TO 2023-09-21T00:00:00.000Z]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "per_page": 50,
}

Multilingual content

Workflow examples

Searching articles with terms in the original text

The query below searches for articles where the terms passed in the parameter text match either the title or body of the article regardless of the original language:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "text": "'presidente' AND 'política'",
    "per_page": 100
}

Searching articles with terms in the translated text

The query below searches for articles where the terms passed in the parameter translations.en.title match the translated title of the article:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "translations.en.title": "'president' AND 'politics'",
    "per_page": 100
}

Combined search on the original and translated text

The query below searches for articles where the terms passed in the parameters text or translations.en.title match either the original or translated tittle of the article:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "translations.en.title": "'president' AND 'politics'",
    "title": "'presidente' AND 'política'",
    "per_page": 100
}

Searching by more than one language in the language parameter:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(es OR de)",
    "translations.en.title": "'president' AND 'politics'",
    "text": "('política' AND 'presidente') OR ('politik' AND 'präsident')",
    "per_page": 100
}

Pagination of results

Workflow examples

Looping through pages

With the function below get_stories(), it will pick the cursor from the next_page_cursor header parameter and pass it to the next interaction, doing so until all the stories from the result set are collected:

import requests
import time
import requests
from pprint import pprint

username = "YOUR EMAIL"
password = "YOUR PASSWORD"
AppID = "YOUR APP ID"

def get_auth_header(username, password, appid):
    # Generate the authorization header for making requests to the Aylien API.

    token = requests.post("https://api.aylien.com/v1/oauth/token", auth=(username, password), data={"grant_type": "password"})

    token = token.json()["access_token"]

    headers = {f"Authorization": "Bearer {}".format(token), "AppId": appid}

    return headers

def get_stories(params, headers):
    #  Fetch stories from the Aylien News API using the provided parameters and headers.

    fetched_stories = []
    stories = None

    while stories is None or len(stories) > 0:
        try:
            response = requests.get("https://api.aylien.com/v6/news/stories", params=params, headers=headers)

            # If the call is successfull it will append it
            if response.status_code == 200:
                response_json = response.json()
                stories = response_json["stories"]

                if "next_page_cursor" in response_json.keys():
                    params["cursor"] = response_json["next_page_cursor"]
                else:
                    print("No next_page_cursor")

                fetched_stories += stories

                if len(stories) > 0 and not stories == None:
                    print(
                        "Fetched %d stories. Total story count so far: %d"
                        % (len(stories), len(fetched_stories))
                    )

            # If the application reached the limit per minute it will sleep and retry until the limit is reset
            elif response.status_code == 429:
                time.sleep(10)
                continue

            # If the API call face network or server errors it sleep for few minutes and try again a few times until completely stop the script.
            elif 500 <= response.status_code <= 599:
                time.sleep(260)
                continue

            # If the API call return any other status code it return the error for futher investigation and stop the script.
            else:
                print(response.text)
                break

        except Exception as e:
            # In case the code fall in any exception error.
            print(e)
            break

    return fetched_stories

Passing query parameters

The following example shows how to send a query that calls the function from the block above. It will retrieve all stories in English that are categorised as Sports and were published between 1 hour ago and now:

params = {
    "published_at": "[NOW-1HOUR/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.sports) AND score:>=0.65}}",
    "cursor": "*",
    "per_page": 100,
}

stories = get_timeseries(params, headers)

Note that the parameter cursor was passed with "" because the first loop doesn't yet have the page cursor. If no values are passed the parameter cursor will fall back to the default value “”.

Sorting results

Workflow examples

Sorting by relevance

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "cursor": "*",
    "per_page": 100,
    "sort_by": "relevance",
    "sort_direction": "desc"
}

Sorting by recency

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "cursor": "*",
    "per_page": 100,
    "sort_by": "recency",
    "sort_direction": "desc"
}

Sorting by the published datetime stamp

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "cursor": "*",
    "per_page": 100,
    "sort_by": "published_at",
    "sort_direction": "desc"
}

Sorting by number of photos within the articles

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "cursor": "*",
    "per_page": 100,
    "sort_by": "media.images.count",
    "sort_direction": "desc"
}

Sorting by number of videos within the articles

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "cursor": "*",
    "per_page": 100,
    "sort_by": "media.videos.count",
    "sort_direction": "desc"
}

Sorting by web traffic rank

Sorting by the global traffic ranking:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "cursor": "*",
    "per_page": 100,
    "sort_by": "source.rankings.alexa.rank",
    "sort_direction": "desc"
}

Sorting by a specific traffic ranking:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.fin.stockups OR ay.fin.stkclose) AND score:>=0.65}}",
    "entities": '{{surface_forms:("Apple" OR "Tesla") AND overall_prominence:>=0.65}}',
    "cursor": "*",
    "per_page": 100,
    "sort_by": "source.rankings.alexa.rank.US",
    "sort_direction": "desc"
}

Sorting results with Boosting

The query below will bring articles that have in their titles the word “Apple”, or other words such as "Ireland", "UK", or "Germany", but the result set will first list articles where Ireland is mentioned in the title, then UK, and for last articles with Germany.

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "title": '"Apple" AND ("Ireland"^3 OR "UK"^2 OR "Germany")',
    "sort_by": "relevance",
    "per_page": 100,
    "return": ["title"]
}

Boolean operators

Boolean operators allow you to apply Boolean logic to queries, requiring the presence or absence of specific terms or conditions in fields to match documents.

The table below summarises the Boolean operators supported by the standard query parser.

Boolean operator Alternative symbol Description
NOT ! Requires that the following term not be present.
AND && Requires both terms on either side of the Boolean operator to be present for a match.
+ Requires the following term to be present.
- Prohibits the following term (matches on fields or documents that do not include that term). The - operator is functionally similar to the Boolean operator ( ! ). Because it's used by popular search engines such as Google, it may be more familiar to some user communities.
OR ∣ ∣ Requires that either term (or both terms) be present for a match.
  • When specifying Boolean operators with keywords such as AND or NOT, the keywords must appear in all uppercase.

  • Alternative symbols occupy less character length in the query, which is useful to know if you reach the character limit.

Operator NOT ( ! )

The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT.

The following queries search for documents that contain the phrase "jakarta apache" but do not contain the phrase "apache lucene":

"jakarta apache" NOT "apache lucene"

Alternatively:

"jakarta apache" ! "apache lucene"

Operator AND ( & )

The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. The symbol && can be used in place of the word AND.

The following queries search for documents that contain "jakarta apache" and "apache lucene", use either of the following queries:

"jakarta apache" AND "apache lucene"

Alternatively:

"jakarta apache" && "apache lucene"

Operator OR ( || )

The OR operator is the default conjunction operator. This means that the OR operator is used if there is no Boolean operator between two terms. The OR operator links two terms and finds a matching document if either of the terms exists in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR.

For example, to search for documents that contain either "jakarta apache" or just "jakarta", use the query:

"jakarta apache" OR jakarta

Alternatively:

"jakarta apache" || jakarta

Alternatively, the operator OR is default and does nit need to be explicitly added to a query, for example:

"jakarta apache" jakarta

Grouping terms to form sub-queries

The API supports using parentheses to group clauses to form sub-queries. This can be very useful if you want to control the Boolean logic for a query.

The query below searches for either "jakarta" or "apache" and "website":

(jakarta OR apache) AND website

This adds precision to the query, requiring that the term "website" exist, along with the terms "jakarta" and "apache."

Operator ( + )

The + symbol, also known as the "required" operator, requires the term after the + symbol to exist somewhere in a field in at least one document for the query to return a match.

For example, to search for documents that must contain "apache" and that may or may not contain "lucene", use the following query:

+jakarta lucene

Operator ( - )

The - symbol or "prohibit" operator excludes documents that contain the term after the - symbol.

For example, to search for documents that contain "jakarta apache" but not "apache lucene", use the following query:

"jakarta apache" -"apache lucene"

Workflow examples

Building queries with the operator NOT ( ! )

The query below will return articles containing the term "United States" but do not contain the term "United Kingdom":

params = {
    "published_at": "[14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '"United States" AND NOT "United Kingdom"',
    "cursor": "*",
    "per_page": 100
}

Alternatively, you can use the symbol (!) instead of the reserved word NOT:

params = {
    "published_at": "[14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '"United States" ! "United Kingdom"',
    "cursor": "*",
    "per_page": 100
}

Building queries with the operator AND ( && )

The query below will return articles containing the term "United States" and "United Kingdom":

params = {
    "published_at": "[14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '"United States" AND "United Kingdom"',
    "cursor": "*",
    "per_page": 100
}

Alternatively, you can use the symbol (&&) instead of the reserved word AND:

params = {
    "published_at": "[14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '"United States" && "United Kingdom"',
    "cursor": "*",
    "per_page": 100
}

Building queries with the operator OR ( || )

The query below will return articles containing the term "Panama" or "Czech Republic":

params = {
    "published_at": "[14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '"Panama" OR "Czech Republic"',
    "cursor": "*",
    "per_page": 100
}

Alternatively, you can use the symbol (||) instead of the reserved word OR:

params = {
    "published_at": "[14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '"Panama" || "Czech Republic"',
    "cursor": "*",
    "per_page": 100
}

Building subqueries with advanced boolean logic

params = {
    "published_at": "[90DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '("United States" OR "United Kingdom") AND "Germany"',
    "cursor": "*",
    "per_page": 100
}
params = {
    "published_at": "[5YEARS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '(("breakfast" OR "lunch") AND "food") AND ((("Ham" AND "chesse") OR "egg") AND "sandwich")',
    "cursor": "*",
    "per_page": 100
}

Building queries with operator ( + )

params = {
    "published_at": "[14DYAS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '+Ford Mustang',
    "cursor": "*",
    "per_page": 100
}

Building queries with operator ( - )

params = {
    "published_at": "[14DYAS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "text": '"Ford Mustang" -"Ford Maverick"',
    "cursor": "*",
    "per_page": 100
}

Locations

The Quantexa News API aggregates and enriches news content from over 90,000 global news sources in 15 languages, covering 200 countries and territories.

NLP models enrich every news article with 26 data points, creating structured news data served via Quantexa News API, including metadata on news sources.

The parameters below enable you to filter articles by source. As is the nature of news publishing, one source can post and publish many articles in a day.

Sources are geolocalised by their locations and scopes.

Source location

News source location refers to the location where the publisher is headquartered or the official address of the news company. You can filter by country, state, or city.

Source scope

News source scope coverage refers to a news publisher's geographical areas or regions. You can filter by country, state, or city, as well as level (i.e. international, national, local).

Note: It’s important to note that not all sources on the inventory have locations or scope metadata. It’s a nullable field. Filtering by geo-location could narrow the results returned.

Workflow examples

Filtering articles based on the source location

The query below will return articles where the source locations are US and Canada countries and California and Montreal states:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "source.locations.country": "(US OR CA)",
    "source.locations.state": '("California", "Montreal")',
    "sort_by": "published_at",
    "per_page": 100,
}

Filtering articles based on the source scope

The query below will return articles where the source scope country is GB and Germany and cities of London and Berlin:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "source.scopes.country": '(GB OR DE)',
    "source.scopes.city": '("London", "Berlin")',
    "sort_by": "published_at",
    "per_page": 100,
}

Note: Country values are supplied in ISO format, whereas city and state values are not.

Filtering articles based on the mentioned locations

The query below will return articles where the locations are mentioned in the title or body of a story.

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "entities": '{{element:title AND surface_forms:("San Francisco" OR "Philadelphia") AND type:("Location", "City")}}',
    "sort_by": "published_at",
    "per_page": 100,
}

Website traffic rank

Workflow examples

Filtering articles by the source global traffic rank

The following example shows articles that are in English and are about Science, published by sources with an Alexa rank between 25 and 100 and were published between 1 day ago and now:

params = {
    "published_at": "[1DAY-NOW/DAY TO NOW]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.appsci) AND score:>=0.65}}",
    "source.rankings.alexa.rank.min": "1",
    "source.rankings.alexa.rank.max": 100,
    "per_page": 100,
}

Filtering articles by the source global traffic rank in specific countries

The following example shows articles that are in English and are about Science, published by sources with an Alexa rank between 25 and 100 in the US and UK and were published between 1 day ago and now:

params = {
    "published_at": "[1DAY-NOW/DAY TO NOW]",
    "language": "(en)",
    "categories": "{{taxonomy:aylien AND id:(ay.appsci) AND score:>=0.65}}",
    "source.rankings.alexa.rank.US": "[1 TO 100]",
    "source.rankings.alexa.rank.UK": "[1 TO 100]",
    "per_page": 100,
}

Languages

Workflow examples

Filtering articles by language

The query below will retrieve only articles in the English language:

params = {
    "published_at": "[1DAY-NOW/DAY TO NOW]",
    "language": "(en)",
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100,
}

Filtering articles from many languages

The query below will retrieve articles in English, German, French or Spanish:

params = {
    "published_at": "[1DAY-NOW/DAY TO NOW]",
    "language": "(en OR de OR fr OR es)",
    "sort_by": "published_at",
    "cursor": "*",
    "per_page": 100,
}

Clusters

Workflow examples

Retrieving clusters using the Clusters endpoint

The Clusters endpoint allows you to retrieve clusters from a specific time window.

This endpoint is useful for monitoring the news for important “breaking” news events that are receiving a certain level of coverage. Since each cluster provides metadata on the number of stories in it, new clusters with many stories usually refer to a new, important event (you can additionally filter the events by country to localize your search).

Once you have retrieved the cluster objects, you can query the Stories endpoint with the cluster ID to gather the stories associated with the cluster.

The code below shows how you can gather clusters that were created in the last 6 hours and have more than 10 stories associated with them:

params = {
    "time.end": "NOW-6HOURS",
    "story_count.min": 10
}

To retrieve stories associated with the cluster, you can either use the representative story or else make an additional call to the stories endpoint with the cluster-ID.

The Trends endpoint allows you to filter clusters based on the stories contained within them. For example, you can filter clusters that contain stories with a specific category label, mention a specific entity, or monitor events about a specific topic or entity in real-time.

The Trends endpoint returns the ID of clusters sorted by the count of stories associated with them. Once you have each cluster’s ID, you can go on to:

  • Get the cluster metadata from the Clusters endpoint

  • Get the stories for each of the clusters from the Stories endpoint.

The Trends endpoint only returns the top 100 clusters for a given query. As such, if your intention is to support real-time monitoring, you should ensure that your query is very specific and covers a small enough interval to retrieve all of the relevant clusters.

The sample code below shows how you can gather clusters associated with stories classified under the politics category (ay.pol) and mentioning the US Congress over the last 12 hours.

params = {
    "published_at": "[NOW-12HOURS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:aylien AND id:(ay.pol) AND score: [0.7 TO 1]}}',
    "entities": '{{surface_forms:("US Congress")}}',
    "field": "clusters"
}

It will return the trending clusters that match the criteria:

{'field': 'clusters',
 'published_at.end': '2023-09-22T15:00:00Z',
 'published_at.start': '2023-09-08T15:00:00Z',
 'trends': [{'count': 18, 'value': '512391127'},
            {'count': 17, 'value': '507818229'},
            {'count': 7, 'value': '511720407'},
            {'count': 7, 'value': '512435223'},
            {'count': 7, 'value': '514583888'},
            {...}]}

The trending clusters and pass them into the stories endpoint and retrieve stories:

Retrieving clustered stories using the stories endpoint

The Stories endpoint allows you to gather a filtered stream of stories and retrieve the cluster ID associated with each one.

You can use this to effectively “collapse” stories in a real-time news stream that you are monitoring. This could be useful, for example, to avoid showing similar stories in a rolling news stream.

Once you have the cluster's ID, you can query the Clusters endpoint to retrieve its metadata.

The following code snippet retrieves recent stories mentioning “US Congress” and collapses stories referring to the same event.

params = {
    "clusters": "(512391127 OR 507818229 OR 511720407 OR 512435223 OR 514583888)",
    "cursor": "*",
    "per_page": 100
}

Entities

Workflow examples

Searching articles by recognized entities within

The following example searches English language stories from the last 14 days that also mention an entity referred to as "Apple" in the title, where that entity is also recognised as an Organization:

params = {
    "published_at": "[14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "entities": '{{element:title AND surface_forms:("Apple") AND type:("Organization") AND overall_prominence:>=0.65 AND sentiment:(positive OR neutral) AND overall_frequency:>=5}}',
    "cursor": "*",
    "per_page": 100
}

Searching articles by multiple recognized entities within

The parameters below return stories that mention two entities that each meet given conditions:

params = {
    "published_at": "[14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "entities": '{{surface_forms:("Apple") AND type:("Organization") AND overall_prominence:>=0.65 AND sentiment:(positive OR neutral)}} AND {{surface_forms:("Tesla") AND type:("Organization") AND overall_prominence:>=0.65}}',
    "cursor": "*",
    "per_page": 100
}

Searching articles by entity-based sentiment over time

Searching by entity-level sentiment (ELSA) on the Time Series endpoint returns highly accurate data on sentiment expressed toward a given entity:

params = {
    "published_at": "[NOW-14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "entities": '{{element:title AND id:Q312 AND type:Organization AND sentiment:positive}}',
    "period": "+1DAY"
}

Response from time series endpoint:

{'period': '+1DAY',
 'published_at.end': '2023-09-22T15:00:00Z',
 'published_at.start': '2023-09-08T00:00:00Z',
 'time_series': [{'count': 381, 'published_at': '2023-09-08T00:00:00Z'},
                 {'count': 126, 'published_at': '2023-09-09T00:00:00Z'},
                 {'count': 131, 'published_at': '2023-09-10T00:00:00Z'},
                 {'count': 320, 'published_at': '2023-09-11T00:00:00Z'},
                 {'count': 1940, 'published_at': '2023-09-12T00:00:00Z'},
                 {'count': 1180, 'published_at': '2023-09-13T00:00:00Z'},
                 {'count': 363, 'published_at': '2023-09-14T00:00:00Z'},
                 {'count': 312, 'published_at': '2023-09-15T00:00:00Z'},
                 {'count': 131, 'published_at': '2023-09-16T00:00:00Z'},
                 {'count': 126, 'published_at': '2023-09-17T00:00:00Z'},
                 {'count': 503, 'published_at': '2023-09-18T00:00:00Z'},
                 {'count': 414, 'published_at': '2023-09-19T00:00:00Z'},
                 {'count': 295, 'published_at': '2023-09-20T00:00:00Z'},
                 {'count': 215, 'published_at': '2023-09-21T00:00:00Z'},
                 {'count': 264, 'published_at': '2023-09-22T00:00:00Z'}]}

Media elements

Workflow examples

Searching articles by the amount of media

The query below will return articles that have at least one image:

params = {
    "published_at": "[NOW-14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "media.images.count.min": 1,
    "per_page": 100
}

The query below will return articles that have at least three images:

params = {
    "published_at": "[NOW-14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "media.images.count.min": 3,
    "per_page": 100
}

The query below will return articles that have no images:

params = {
    "published_at": "[NOW-14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "media.images.count.max": 0,
    "per_page": 100
}

Sorting articles by media count

The query below will sort articles by the count of media elements in them. Articles with more media elements will be listed first:

params = {
    "published_at": "[NOW-14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "sort_by": "media.images.count",
    "sort_direction": "desc",
    "per_page": 100
}

Searching articles by media format & size

The query below will bring articles that have at least one image with them and a minimum resolution of W240xH240 pixels:

params = {
    "published_at": "[NOW-14DAYS/DAY TO NOW/HOUR]",
    "language": "(en)",
    "media.images.width.min": 240,
    "media.images.height.min": 240,
    "sort_by": "media.images.count",
    "sort_direction": "desc",
    "per_page": 100
}

Proximity search

Quantexa News API allows you to make 'smart' keyword queries by applying proximity operators to your search which add extra conditions for the searched keywords to meet.

Proximity operator

Frequently, entities or events of interest to us are mentioned in varying sequences of terms. For example, HSBC's division in China could appear in multiple forms: “HSBC China”, “HSBC’s branches in China”, “In China, HSBC is introducing new…” etc.

Proximity search is a feature that enables users to broaden the search criteria to return these combinations. “Proximity” refers to the distance, in terms, between two searched terms in a story. For example, "HSBC China"~5 only returns stories that mention "HSBC" and "China", where there is a maximum of four words in between them.

Workflow examples

Searching articles with proximity keyword operator

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "title": '"HSBC China"~5',
    "per_page": 100,
    "return": ["title"]
}

Response:

[{'title': 'Contrasting China Construction Bank (OTCMKTS:CICHF) & HSBC (NYSE:HSBC)'},
 {'title': 'HSBC Increases Yum China (NYSE:YUMC) Price Target to $79.20'},
 {'title': 'Financial Service Outsourcing Market Share, Size and Forecast to 2030 | China Everbright Group, Axa, HSBC | 132 Pages Report'},
 {'title': 'HSBC Increases Yum China (NYSE:YUMC) Price Target to $79.20'},
 {'title': 'HSBC Boosts Yum China (NYSE:YUMC) Price Target to $79.20'},
 {'title': 'HSBC appoints head of investment banking China'},
 {'title': 'China: HSBC insurance broking unit strengthens position with fund distribution licence'}]

Excluding operator

Most parameters can be used to exclude results through the addition of the excluding operator ( ! ).

Workflow examples

Excluding results by values passed on parameter

The query below will return articles from the last 14 days that were categorized as “politics” but exclude results that contain words in the title such as "breaking news", "live events", or "exclusive reporting":

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:aylien AND id:(ay.pol)}}',
    "!title": '"breaking news" OR "live events" OR "exclusive reporting"',
    "per_page": 100
}

Excluding results with boolean logic

The query below will search for articles that are categorized as “Sports” but exclude results that are uncategorized as “Sports Fishing”:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:aylien AND id:ay.sports}} AND NOT {{taxonomy:aylien AND id:ay.sports.fishing}}',
    "per_page": 100
}

Authors

Workflow examples

Searching articles by author name

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "author.name": '("Anusuya Lahiri" OR "Dan Benton")',
    "per_page": 100
}

Searching articles by author ID

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "author.id": "(10004156 OR 14563286)",
    "per_page": 100
}

Sentiment analysis

Workflow examples

Filter articles by sentiment towards the text

The query below will return articles where the title and the body of the article are positive or neutral:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "per_page": 100,
    "sentiment.body.polarity": "(positive OR neutral)",
    "sentiment.title.polarity": "(positive OR neutral)"
}

Filter articles by sentiment towards the entity

The query below will return articles where the mentions of the entity Microsoft in the title and the body are positive or neutral:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "entities": '{{element:(title OR body) AND surface_forms:(Apple) AND type:(Organization) AND sentiment:(positive OR neutral)}}',
    "per_page": 100,
}

Searching for entities with Autocomplete

Workflow examples

Searching for entity types

The query below compares the term passed (“organization”) with the entity types in the knowledge base. It performs a fuzzy match and returns entity types that match the word passed by the parameter term, e.g. non-profit organization, standards organization etc. Also, it's set to bring only the 10 top matches once the parameter limit is set to 10:

params = {
    "term": "organization",
    "limit": 10
}

Response:

{'href': 'https://api.aylien.com/v2/autocomplete/suggestions/entity-types?limit=3&term=organization',
 'items': [{'description': 'An organization or organisation is an entity, such '
                           'as a company, an institution, or an association, '
                           'comprising one or more people and having a '
                           'particular purpose.',
            'href': 'https://api.aylien.com/v1/entity-manager/entities/Q43229',
            'id': 'Q43229',
            'text': 'organization',
            'types': [],
            'weight': 1707369},
           {'description': 'A nonprofit organization (NPO), also known as a '
                           'non-business entity, not-for-profit organization, '
                           'or nonprofit institution, is an organization '
                           'traditionally dedicated to furthering a particular '
                           'social cause or advocating for a shared point of '
                           'view.',
            'href': 'https://api.aylien.com/v1/entity-manager/entities/Q163740',
            'id': 'Q163740',
            'text': 'nonprofit organization',
            'types': [],
            'weight': 1590643},
           {'description': 'A standards organization, standards body, '
                           'standards developing organization (SDO), or '
                           'standards setting organization (SSO) is an '
                           'organization whose primary activities are '
                           'developing, coordinating, promulgating, revising, '
                           'amending, reissuing, interpre...',
            'href': 'https://api.aylien.com/v1/entity-manager/entities/Q1328899',
            'id': 'Q1328899',
            'text': 'standards organization',
            'types': [],
            'weight': 103050}],
 'next': 'https://api.aylien.com/v2/autocomplete/suggestions/entity-types?limit=3&term=organization&page=2',
 'prev': 'https://api.aylien.com/v2/autocomplete/suggestions/entity-types?limit=3&term=organization&page=1'}

The results from this query, like type ID, can be passed when searching for entities on the knowledge base, as shown in the next example.

Searching for entities

This query will return entities where the name is a close match to the term “Apple. It will also disambiguate the search by only returning entities that are type Organization - ID Q43229:

params = {
    "term": "Apple",
    "type_id": "Q43229",
    "limit": 3
}

Response:

{'href': 'https://api.aylien.com/v2/autocomplete/suggestions/entity-names?limit=3&term=Apple&type_id=Q43229',
 'items': [{'description': 'Apple Inc.',
            'href': 'https://api.aylien.com/v1/entity-manager/entities/Q312',
            'id': 'Q312',
            'text': 'Apple Inc.',
            'types': [{'id': 'Q43229', 'names': 'organization'},
                      {'id': 'Q4830453', 'names': 'business'},
                      {'id': 'Q783794', 'names': 'company'}],
            'weight': 17087},
           {'description': 'Apple Music is a music and video streaming service '
                           'developed by Apple Inc.',
            'href': 'https://api.aylien.com/v1/entity-manager/entities/Q20056642',
            'id': 'Q20056642',
            'text': 'Apple Music',
            'types': [{'id': 'Q15401930', 'names': 'product'},
                      {'id': 'Q2424752', 'names': 'product'},
                      {'id': 'Q43229', 'names': 'organization'},
                      {'id': 'Q4830453', 'names': 'business'},
                      {'id': 'Q7397', 'names': 'software'}],
            'weight': 4432},
           {'description': 'Apple Records is a record label founded by the '
                           'Beatles in 1968 as a division of Apple Corps Ltd.',
            'href': 'https://api.aylien.com/v1/entity-manager/entities/Q213710',
            'id': 'Q213710',
            'text': 'Apple',
            'types': [{'id': 'Q15401930', 'names': 'product'},
                      {'id': 'Q43229', 'names': 'organization'},
                      {'id': 'Q4830453', 'names': 'business'},
                      {'id': 'Q783794', 'names': 'company'}],
            'weight': 1429}],
 'next': 'https://api.aylien.com/v2/autocomplete/suggestions/entity-names?limit=3&term=Apple&type_id=Q43229&page=2',
 'prev': 'https://api.aylien.com/v2/autocomplete/suggestions/entity-names?limit=3&term=Apple&type_id=Q43229&page=1'}

Searching for sources with Autocomplete

Workflow examples

Searching sources by name

The query below will return matches from the source inventory that are close to the name term “The Mirror”:

params = {
    "name_term": "the mirror",
    "limit": 3
}

Response:

{'href': 'https://api.aylien.com/v2/autocomplete/suggestions/sources?limit=3&name_term=the+mirror',
 'items': [{'domain': 'mirror.co.uk',
            'href': 'https://api.aylien.com/v1/sources/1260',
            'id': 1260,
            'name': 'The Mirror',
            'url': 'https://www.mirror.co.uk/'}],
 'next': 'https://api.aylien.com/v2/autocomplete/suggestions/sources?limit=3&name_term=the+mirror&page=2',
 'prev': 'https://api.aylien.com/v2/autocomplete/suggestions/sources?limit=3&name_term=the+mirror&page=1'}

Searching sources by domain

The query below will return matches on the source inventory that are close to the domain term “usnews.com”:

params = {
    "domain_term": "usnews.com",
    "limit": 3
}

Response:

{'href': 'https://api.aylien.com/v2/autocomplete/suggestions/sources?domain_term=usnews.com&limit=3',
 'items': [{'domain': 'usnews.com',
            'href': 'https://api.aylien.com/v1/sources/8476',
            'id': 8476,
            'name': 'U.S. News & World Report Online',
            'url': 'https://www.usnews.com/'}],
 'next': 'https://api.aylien.com/v2/autocomplete/suggestions/sources?domain_term=usnews.com&limit=3&page=2',
 'prev': 'https://api.aylien.com/v2/autocomplete/suggestions/sources?domain_term=usnews.com&limit=3&page=1'}

Trends

Workflow examples

The query below will return the most common entities for articles where Elon Musk is mentioned in the last 14 days:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "entities": '{{surface_forms:("Elon Musk") AND overall_prominence:>=0.8}}',
    "field": "entities.surface_forms.text"
}

Response:

{'field': 'entities.surface_forms.text',
 'published_at.end': '2023-09-25T05:00:00Z',
 'published_at.start': '2023-09-11T05:00:00Z',
 'trends': [{'count': 14458, 'value': 'Elon Musk'},
            {'count': 7186, 'value': 'Musk'},
            {'count': 4992, 'value': 'Tesla'},
            {'count': 4811, 'value': 'Twitter'},
            {'count': 2640, 'value': 'SpaceX'},
            {'count': 2345, 'value': 'Walter Isaacson'},
            {'count': 1698, 'value': 'Starlink'},
            {'count': 1614, 'value': 'US'},
            {'count': 1577, 'value': 'Isaacson'},
            {...}]}

The query below will return the volume of articles for each sentiment polarity where Elon Musk is mentioned in the last 14 days:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "entities": '{{surface_forms:("Elon Musk") AND overall_prominence:>=0.8}}',
    "field": "sentiment.body.polarity"
}

Response:

{'field': 'sentiment.body.polarity',
 'published_at.end': '2023-09-25T05:00:00Z',
 'published_at.start': '2023-09-11T05:00:00Z',
 'trends': [{'count': 3434, 'value': 'positive'},
            {'count': 3355, 'value': 'negative'},
            {'count': 1982, 'value': 'neutral'}]}

Response objects

Workflow examples

Specifying return fields

The query below will return in its response only the fields specified in the parameter return[]:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "categories": '{{taxonomy:aylien AND id:ay.econ}}',
    "per_page": 100,
    "return[]": ["title", "body", "source", "entities"]
}

Sources

Workflow examples

Filtering articles by source

The query below will return only articles that either BBC or US News published:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "source.domain": '("bbc.com" OR "usnews.com")',
    "per_page": 100
}

Excluding articles by source

The query below will exclude from the results articles published by the source Washington Examiner:

params = {
    "published_at": "[NOW-14DAYS/HOUR TO NOW/HOUR]",
    "language": "(en)",
    "source.domain": 'NOT ("washingtonexaminer.com")',
    "per_page": 100
}