Parameters by theme

Datetime

Here, you will find information on how the time stamp is composed and practical examples of searching for content on Quantexa News API based on news articles' time indexation.

Date Formatting

The format used is a restricted form of the canonical representation of DateTime in the XML Schema specification (ISO 8601):

YYYY-MM-DDThh:mm:ssZ

  • YYYY is the year.
  • MM is the month.
  • DD is the day of the month.
  • T is a literal 'T' character that indicates the beginning of the time string.
  • hh is the hour of the day as on a 24-hour clock.
  • mm is minutes.
  • ss is seconds.
  • Z is a literal 'Z' character, indicating that this string representation of the date is in UTC.

No time zone can be specified. The String representations of dates are always expressed in Coordinated Universal Time (UTC). Here is an example value:

2022-03-27T13:47:26Z

You can optionally include fractional seconds if you wish, although any precision beyond milliseconds will be ignored. Here are examples of values with sub-seconds include:

  • 2016-03-27T13:47:26.822Z
  • 2016-03-27T13:47:26.82Z
  • 2016-03-27T13:47:26.8Z

Date Math

The date field types also support date math expressions, which makes it easy to create times relative to fixed moments in time, including the current time, which can be represented using the special value of "NOW".

Date Math Syntax

Date math expressions can do two things: specify a period by adding time units to the current time and round the time to a specified unit. Expressions can be chained and are evaluated left to right.

This represents a point in time two months from now:

NOW+2MONTHS

This is one day ago:

NOW-1DAY

A slash is used to indicate rounding. Below is a point in time yesterday, rounded to the previous hour. If the current time is 15:42:17.2165, the point below is 15:00:00.0000 yesterday:

NOW-1DAY/HOUR

Below is yesterday at 00:00:00.0000AM:

NOW-1DAY/DAY

All the math expressions will work to the millisecond precision.

Date Math Keywords

Here are the supported keywords in Date Math:

Date keywords Description
NOW It represents the current date and time.
YEAR It represents the year part of the date and time.
MONTH It represents the month part of the date and time.
DAY It represents the day part of the date and time.
HOUR It represents the hour part of the date and time.
MINUTE It represents the minute part of the date and time.
SECOND It represents the second part of the date and time.

Note: All the keywords above can be passed in their plural form: YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. The datetime parameter can be used with the following Quantexa News API endpoints:

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

This parameter enables you to find articles in Quantexa News API’s database based on the datetime stamp from when it was ingested.

Parameter Values Description
published_at [* TO *] This parameter is used for finding stories whose published at time is greater than the specified value. Here you can find more information about how to work with dates.

Common workflows

Time indexation

Sorting results

Languages

The News API offers content in 16 languages, and searching for stories in these specific languages is done by supplying the language of your choice to the language parameter.

Language is a mandatory field, so all articles available on Quantexa News API have the language field populated with the prediction made by the language model.

It is strongly recommended that you always supply a language parameter in your search, no matter how many languages you want to search across. If you do not supply the language parameter, your search will default to all languages available to your account plan but without the necessary language-specific filters like stop word removal and stemming. This could result in your search not retrieving all of the stories that are relevant to your query.

The supported languages are listed below, along with their language codes. Multilingual support requires an upgraded license key - contact sales to upgrade your account.

Translations

All articles written in non-English languages available on Quantexa News API will also have an English translation. The translation to English is in the response field translations. An in-house machine translation (MT) model translates all news stories available through Quantexa News API using a proprietary MT model. This model is optimised for speed whilst maintaining high quality translations across diverse source languages. The model is a “transformer-based” neural machine translation model (NMT), trained on hundreds of millions of examples. Here is an example of a piece of the response from the stories endpoint which includes translations:

{
  "title": "original title",
  "body": "original body",
  "language": "<non-English language code>",

  "<...>": "<...>",

  "translations": {
    "en": {
      "body": "translated body",
      "title": "translated title"
    }
  }
}

Analyzing multilingual content

All news articles accessible via Quantexa News API are available in their native language text as well as machine-translated English text.

All the translated content benefits from all of the enrichment features supported by the News API. Most NLP enrichments are performed on native English or translated English text.

In the case of article summarisation, this is conducted in the original text of some languages and in the translated text of others. See below:

Language Text Used
en, de, fr, it, es, pt Original text
ar, da, fi, nl, fa, ru, sv, tr, zh-cn, zh-tw Translated text (en)

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. The language parameter can be used with the following Quantexa News API endpoints:

/clusters

/stories

/time_series

/trends

/histograms

Parameters

This parameter enables you to select which languages you wish to be included in search results. For more information on supported languages, see Common Workflows.

Parameter Values Description
language ("int", "int", ... "int") This parameter is used for autocompletes whose language is the specified value. It supports ISO 639-1 language codes

Common workflows

Languages

Keywords

These parameters enable you to perform keyword searches in Quantexa News API for articles where specific words or phrases match and retrieve stories that contain those keywords.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Keyword parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/stories

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter Values Description
title ("int", "int", ... "int") This parameter is used for finding stories whose title contains a specific keyword. It supports advanced search operators.
body ("int", "int", ... "int") This parameter is used for finding stories whose body contains a specific keyword. It supports advanced search operators.
text ("int", "int", ... "int") This parameter is used for finding stories whose both title and body contains a specific keyword. It supports advanced search operators.
translations.{language}.title ("int", "int", ... "int") This parameter is used for filtering stories translated from a non-English language containing an English-language query term in the title. Translation requires an Advanced or Enterprise license key. Start a free trial or contact sales to upgrade your account. This parameter is used for filtering stories translated from a specified language containing a query term in the title. To specify a language, use the ISO 639-1 standard. Translation requires an Advanced or Enterprise license key. Start a free trial or contact sales to upgrade your account.
translations.{language}.body ("int", "int", ... "int") This parameter is used for filtering stories translated from a non-English language containing an English-language query term in the body. Translation requires an Advanced or Enterprise license key. Start a free trial or contact sales to upgrade your account. This parameter is used for filtering stories translated from a specified language containing a query term in the body. To specify a language, use the ISO 639-1 standard. Translation requires an Advanced or Enterprise license key. Start a free trial or contact sales to upgrade your account.

Common workflows

Proximity search

Boolean operators

Excluding operator

Entities

What is an entity?

An entity is a real-world thing that is mentioned in a story and then tagged with metadata by Quantexa News API so users can build an accurate picture of what is being talked about in news content. The following data points are applied to each entity:

  • The surface form(s) is the text in the story that mentions the entity.

  • The type of entity it is.

  • A Wikipedia link to that entity's Wikipedia page - if applicable.

  • A Wikidata link to that entity's Wikidata page - if applicable.

  • The sentiment expressed towards it in the story.

  • The indices of the surface forms - the index of the mention(s) of the entity in the story.

  • The prominence of the entity - how prominent the entity is in an article.

  • The frequency of the entity - the number of mentions the entity has in an article.

Why use entities instead of keywords?

Keywords can refer to multiple things, and things can be referred to by multiple keywords. Quantexa News API's Entities recognises and disambiguates real-world people, companies, and things that are mentioned in the news, going beyond keywords to provide far more accurate news analytics data.

Using Entities has two high-level benefits when building your search:

First, when multiple different keywords commonly refer to a single entity, Quantexa News API correctly recognises the entity in each mention. For example, take a look at how the News API recognises the entity “MetLife,” even when different names for the company are mentioned:

Sample 1: "Shares in Metropolitan Life Insurance fell sharply this morning." Sample 2: “MetLife announces new insurance offerings."
Surface Form "Metropolitan Life Insurance" "MetLife"
Entity Name MetLife MetLife
Entity Type Business, Organization Business, Organization
Wikipedia URL MetLife MetLife

Second, the entities model disambiguates mentions for you: when a single keyword can refer to multiple entities, the News API will consider the rest of the document to make an accurate prediction about which thing is being referred to. As an example, take the following two sentences mentioning the keyword “square” and see how the News API will recognise each as a different entity and how it returns some key information:

Sample 1: “Protests commenced in the town square.” Sample 2: “Square was founded by Jack Dorsey.”
Surface Form "square" "Square"
Entity Name square Square Inc.
Entity Type Business, Organization Organization
Wikipedia URL None Square Inc.
Wikidata URL None Square Inc.

Entity types

Just as Square and MetLife above have "Business" and "Organization" as their entity types, every entity recognized by Quantexa News API has a type.

These types can be extracted from Wikipedia & Wikidata, or, where that is not applicable, they can be predicted on the fly by the News API. For example, the New York Stock Exchange entity has the type "Stock_exchange," extracted from Wikipedia & Wikidata, but the entity "Jeremy Draper" has the type "Human", even though it doesn't have a Wikipedia page.

There are two ways to use entity types to build intelligent searches - simple searches and enhanced searches.

Entity types are structured in a parent/child relationship, with almost all top-level entity types having child entity types.

Top-level entity types

Currency Location Human Organization
Product_(business) Profession Technology Risk
Retail Regulation_(European_Union)

Child entity types of organisation type:

Advocacy_group Bank Bank_holding_company Brick_and_mortar
Business Certificate_authority Civil_service Commercial_bank
Community Company Conglomerate_(company) Conservation_authority_(Ontario,_Canada)
Consumer_organization Corporate_group Corporation Credit_bureau
Deliberative_assembly Educational_organization Emergency_service Environmental_organization
Financial_institution Government Holding_company Investment_banking
Investment_company Law_commission Law_enforcement_organization Local_federation
Local_government National_research_and_education_network Newspaper Nonprofit_organization
Parlement Political_organisation Private-equity_firm Privately_held_company
Public_company Ruling_party Social_movement_organization Standards_organization
Stock_exchange Subsidiary Technology_company Think_tank

Child entity types of geographic location type:

City Country Location Island_country
Sovereign_state State_(polity) U.S._state

Child entity types of risk type:

Business risks Endangerment External risk Financial risk
Operational_risk Vulnerability

Child entity types of product type:

Software Software_as_a_service Stock_market_index

Quantexa News API allows enhanced queries on the entity object, enabling you to specify multiple conditions for a single entity to meet. For example, you can specify stories that mention an entity "Square" when that entity is also of the type "Organization", or you can return stories where a specific entity was mentioned in a negative tone.

This is done by supplying a nested query to the aql parameter, which accepts entities:{{ }} as a value, in which you can supply a list of parameters in a Lucene-based syntax for an entity to meet. You can use the parameters below.

AQL Parameter Description
element The part of the story the entity should be mentioned in (accepts "title" or "body").
surface_forms A specific form of the entity ("Apple" OR "Apple Computer").
id The entity's ID on the Knowledge Base.
links.wikipedia The entity's Wikipedia link.
links.wikidata The entity's Wikidata link.
sentiment The sentiment expressed about the entity.
stock_ticker The entity's stock ticker.
overall_prominence The prominence of the entity within an article. A value ranging from 0 to 1, where 0 indicates no article prominence and 1 indicates very high article prominence.
frequency The number of times an entity is mentioned in an article title or body. It should be used in conjunction with the element parameter.
overall_frequency The number of times an entity is mentioned in an article.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Entities parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

These parameters enable you to search for articles that contain entities recognized by Quantexa News API’s entity recognition model, which consists of over 5.6 million entities. For more information on entities, see the entry in Common Workflows.

Parameter Subfield Values Description
entities id {{id:("string", "string", ... "string")}} This parameter is used to find stories based on the specified entities id in stories. When querying for entities, we recommend that users make searches using nested queries. You can learn more about these queries on the Common Workflows page.
entities links.wikipedia {{links.wikipedia:"url string"}} This parameter is used to find stories based on the specified entities Wikipedia URL in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities stock_ticker {{stock_ticker:"string"}} This parameter is used to find stories based on the specified stock tickers of entities in stories. When querying for entities, we recommend that users make searches using nested queries. You can learn more about these queries on the Common Workflows page.
entities surface_forms.text {{surface_forms.text:("string")}} This parameter is used to find stories based on the specified entities surface form text in stories. When querying for entities, we recommend that users make searches using nested queries. You can learn more about these queries on the Common Workflows page.
entities overall_prominence {{overall_prominence:[* TO *]}} This parameter is used to find stories based on the prominence of the entity is in the article.
entities element {{element:(title OR body)}} This parameter is used to indicate the entity search is to be performed on title or body of the article.

Common workflows

Entities

Locations

Categories taxonomy

All articles on Quantexa News API are enriched with categories and industry taxonomy tags. The tagged content helps to categorize and organize large volumes of content, making it easier to search, filter, and understand the information within the documents.

Altogether, the classifiers are capable of classifying content into four taxonomies. The complete list of taxonomies included with Quantexa News API is outlined in the table below.

Taxonomy Supported Languages Number of classes Levels of depth Commonly used for Taxonomy ID
Smart Tagger (Aylien categories) en 2998 6 News articles, Blog posts aylien
Smart Tagger (Industries) en 1496 4 News articles, Blog posts industries
IPTC Subject Codes en 1400 3 News articles, Blog posts iptc-subjectcode
IAB QAG en 392 2 Websites, Advertisement iab-qag

The supported taxonomies are made up of categories and subcategories, or parent categories and child categories, for example, with Football being a child category of Sport.

It’s standardised into a tree-like structure, which allows you to easily traverse from child categories to parent categories recursively.

Taxonomy lifecycle

Taxonomy tags for categories and industries are never removed from our catalogue and taxonomy models.

Over time, new tags are added and appended to the current catalogue.

Smart Tagger

The Smart Tagger is Quantexa News API’s most granular article tagging feature that leverages state-of-the-art classification models built using a vast collection of manually tagged news articles based on domain-specific industry and topical category taxonomies. With a taxonomy of ~3000 topical categories and ~1500 industries, Smart Tagger classifies articles with high precision, making it easier for users to filter for articles most relevant to them and their use cases. Tags can be passed as IDs or labels.

Smart Tagger Categories

Smart Tagger’s Categories taxonomy is a classification system that helps categorize news articles into broader thematic areas. It allows users to classify news stories into high-level topics such as business, politics, technology, as well as multiple sub-categories in the parent/child hierarchy. This taxonomy serves as a useful tool for organizing and filtering news articles based on their granular subject matter, enabling easier access to relevant information within specific content categories.

Examples of Smart Tagger Categories in articles:

{'categories': [{...},
                 'score': 0.39},
                {'id': 'ay.sports',
                 'label': 'Sports',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports'},
                 'score': 1},
                {'id': 'ay.sports.rugby',
                 'label': 'Rugby',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.team'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.rugby'},
                 'score': 1},
                {'id': 'ay.sports.team',
                 'label': 'Team Sports',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.team'},
                 'score': 1}]}

Smart Tagger Industries

Smart Tagger’s Industries taxonomy is a classification system designed to categorize news articles and content into various industry-related topics. This taxonomy aids in organizing and retrieving news content relevant to particular sectors, making it a valuable tool for those who are looking to access and analyze news data within specific industry contexts.

Examples of Smart Tagger Industries in articles:

{'industries': [{'id': 'in.tech',
                 'label': 'Technology',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech'},
                 'score': 0.7},
                {'id': 'in.tech.appsoft',
                 'label': 'Application Software',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.software'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.appsoft'},
                 'score': 0.7},
                {'id': 'in.tech.software',
                 'label': 'Software',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.software'},
                 'score': 0.7}]}

Score

The score is the level of confidence for the taxonomy prediction. Each time an article is linked to a taxonomy tag, a score between 0 and 1 is applied to that tagging, indicating how relevant the industry or category is to the article. The higher the score, the more relevant the tag is to the article.

Taxonomy hierarchy

Documents tagged with a child node are tagged with all parent nodes too.

If searching for a top-level node, for example, label "Adverse Events", it will return articles tagged with this label and its child nodes.

Smart Tagger Recommender

Quantexa News API provides a UI tool that helps users find categories and industry tags based on input terms. If you have a term in mind, let's say "risk", but you are not sure which taxonomy IDs or labels are associated with this term, you can use the recommender available on app.aylien.com.

Please see below the main steps to find tags with the recommender:

1 - Go to app.aylien.com and enter your credentials - If you have questions about login and password, please contact the support team.

Step 1

2 - Enter the term(s) you are looking for:

Step 2

3 - Select the items that match what you are looking for.

Step 2

4 - Run your search and consult the developer section.

Step 4

IPTC Taxonomy

The IPTC (International Press Telecommunications Council) taxonomy is a standardized classification system used in the media industry to tag and categorize news content. It consists of predefined categories, such as subjects, events, and genres, that help news organizations and content providers uniformly label their articles and images with relevant metadata.

Examples of IPTC taxonomy in articles:

{'categories': [{...},
                {'id': '01000000',
                 'label': 'arts, culture and entertainment,
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01000000'},
                 'score': 0.58},
                {'id': '01005000',
                 'label': 'cinema',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01000000'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005000'},
                 'score': 0.58},
                {'id': '01005001',
                 'label': 'film festival',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005000'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005001'},
                 'score': 0.58},
                {'id': 'ay.culture',
                 'label': 'Culture, Entertainment and the Arts',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture'},
                 'score': 1},
                {'id': 'ay.culture.film',
                 'label': 'Film',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture.film'},
                 'score': 1}]}

IAB Taxonomy

The IAB (Interactive Advertising Bureau) taxonomy is a standardized system used in the digital advertising industry to categorize online content, advertising campaigns, and audience segments. It provides a structured framework for classifying digital advertising assets and defining audience interests and demographics.

Examples of IAB taxonomy in articles:

{'categories': [{'id': 'IAB1',
                 'label': 'Arts & Entertainment',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1'},
                 'score': 0.21},
                {'id': 'IAB1-5',
                 'label': 'Movies',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1-5'},
                 'score': 0.36},
                {...}]}

These parameters enable you to find articles categorized by Quantexa News API’s category taxonomies, including our proprietary Smart Tagger, which includes over 3,000 topical categories and 1,500 industries. We also support IPTC and IAB category taxonomies. For more information on how to use our category taxonomies, refer to Common Workflows.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Category parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/clusters

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

These parameters enable you to find articles categorized by Quantexa News API’s category taxonomies. For more information on how to use our category taxonomies, refer to Common Workflows.

Parameter Subfield Values Description
categories taxonomy {{taxonomy:(aylien OR iptc-subjectcode OR iab-qag}} This parameter, which is used for defining the type of taxonomy for the rest of the categories query, is available for our standard IPTC and IAB category taxonomies. Please click here to learn more about our Smart Tagger taxonomies, and about using the right taxonomy for you.
industries This parameter is not available in flat search. However, you can filter for articles containing specific industries using AQL. You can read more about the industries filter here here, and find industry ids and labels here.
categories id {{id:("string", "string", ... "string")}} This parameter, which is used for finding stories by categories id, is available for our standard IPTC and IAB category taxonomies. Please click here to learn more about our Smart Tagger taxonomies, and about using the right taxonomy for you.
categories label {{label:("string", "string", ... "string")}} This parameter, which is used for finding stories by categories label - An alternative to the ID. Please click here to learn more about our Smart Tagger taxonomies, and about using the right taxonomy for you.
categories score {{score:[* TO *]}} It's the confidence score of the model prediction.

Common workflows

Category and Industry taxonomies

Sources

Quantexa News API aggregates and enriches news content from approximately 90,000 global sources.

Every article comes with its source metadata, which enables you to find articles from specific sources or filter out noise and irrelevant sources.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Source parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

These parameters enable you to filter articles published by specific sources, source location, and source ranking.

Parameter Values Description
source.id (int, int, int, ...) Filter stories from publisher sources that the ID is the specified values.
source.name ("string ", "string", "string", ...) Filter stories from publisher sources that the name is the specified values.
source.domain ("string ", "string", "string", ...) Filter stories from publisher sources that the website domain is the specified values.
source.locations.country ("string ", "string", "string", ...) Filter stories from a publisher source located or headquartered in the specified country values. It supports ISO 3166-1 alpha-2 country codes.
source.locations.state ("string ", "string", "string", ...) Filters stories from a publisher source located or headquartered in the specified state/province values.
source.locations.city ("string ", "string", "string", ...) Filters stories from a publisher source located or headquartered in the specified city values.
source.scopes.country ("string ", "string", "string", ...) Filter stories from publisher source scopes are in the specified country values. It supports ISO 3166-1 alpha-2 country codes.
source.scopes.state ("string ", "string", "string", ...) Filter stories from publisher source scopes are in the specified state/province values.
source.scopes.city ("string ", "string", "string", ...) Filter stories from publisher source scopes are in the specified city values.
source.scopes.level ("iternational", "national", "city") Filter stories from a publisher source scope are the specified level values.
Available values: international, national, local
source.rankings.alexa.country ("string ", "string", "string", ...) Filter stories from publisher sources whose Alexa rank is in the specified country value. It supports ISO 3166-1 alpha-2 country codes.
source.rankings.alexa.rank [0 TO *] Filter stories from publisher sources whose Alexa rank is in the specified range.

Common workflows

Locations

Website treaffic rank

Authors

Stories can be found in Quantexa News API based on the author. You can search for the author using either the author ID or the author’s name.

Note: The author metadata is not always populated. This is because either there's no associated author mentioned, or it’s not explicit on the page.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. The parameter can be used with the following Quantexa News API endpoints:

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

These parameters enable you to filter by an author’s name or ID.

Parameters Values Description
author.id (int, int, ... int) Filter content created by a specific individual or a set of authors by provided author's unique identifier ID.
author.name ("string", "string", ... "string") Filter content created by a specific individual or a set of authors by provided author's name.

Common workflows

Authors

Clusters

Quantexa News API provides access to millions of news stories from over 90,000 sources across the world. Clustering groups these stories into clusters based on the real-world events they represent.

A cluster is a collection of news stories that all refer to the same real-world event. For example, multiple stories referring to a specific company’s earnings will appear in the same cluster, just as multiple stories about a single road accident will.

Lifecycle of a cluster

A newly published story is compared with representative stories of other clusters to see if a cluster already exists for this event. This is achieved by converting the new story’s body to a vector, which is compared against the other story vectors. If the story is found to be similar to other stories (i.e. having a small vector distance), it will be added to that cluster. If the new story is not similar to any existing stories (i.e. having a large vector distance), a new cluster is created containing that single story. Thus, a new cluster is born and subsequently grows with every new, related story that is published. This process happens in real-time, i.e. within an average of 15 minutes of Quantexa News API receiving the story from the publisher.

If no stories have been added to a cluster within a maximum two-week (14 days) period, the cluster is “frozen”. No more stories can be added to the cluster after that point. This cluster lifecycle (creation, growth, decline and freeze) repeats for each and every cluster.

Can an article be in more than one cluster?

No. A story only belongs to the most appropriate, single cluster, and this remains the case through all stages of the cluster lifecycle. A story cannot move from one cluster to another.

Are all stories added to clusters?

No. Stories shorter than 400 characters long on the body are not clusterable. The cluster field on these stories are empty.

The Cluster Object

A cluster object is a type of JSON object that provides a cluster’s ID along with metadata about the stories associated with it.

A cluster has the following properties:

  • Each cluster has a unique ID in the News API

  • A cluster can have one or more stories associated with it

  • A story will always belong to just one cluster.

  • The relationship between the story and cluster does not change - it will not be reassigned to another cluster at a later time.

Examples of clusters in articles from the clusters endpoint JSON API response:

{
  "cluster_count": 2042945,
  "clusters": [
    {
      "id": 4992716,
      "time": "2019-07-20T07:16:03Z",
      "story_count": 26488,
      "earliest_story": "2019-07-20T07:16:03Z",
      "latest_story": "2019-08-03T07:41:09Z",
      "representative_story": {
        "id": 21466483,
        "title": "Analysts Offer Predictions for A. O. Smith Corp’s Q3 2019 Earnings (NYSE:AOS)",
        "permalink": "https://www.tickerreport.com/banking-finance/4497288/analysts-offer-predictions-for-a-o-smith-corps-q3-2019-earnings-nyseaos.html",
        "published_at": "2019-08-03T07:15:26Z"
      },
      "location": {
        "country": "US"
      }
    }
  ],
  "next_page_cursor": "<string to use in pagination of results>"
}

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Clustering parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/clusters

Parameters

These parameters help you find event clusters based on criteria such as location, story volume, and date range. For more information on how to use clusters, please refer to Common Workflows.

Parameters Values Description
id (INT64 OR INT64 OR … OR INT64) This parameter allows you to retrieve specific clusters by their id which is specified with a list of int64 values.
location.country (“string”, “string”, …, “string”) This parameter allows you to specify clusters that refer to events in specific countries. It supports ISO 3166-1 alpha-2 country codes.
story_count.max “INT64” This parameter allows you to specify clusters that have a maximum number of stories associated with them.
story_count.min “INT64” This parameter allows you to specify the minimum number stories that retrieved clusters should be associated with. Default value is 2.
time.end "YYYY-MM-DDThh:mm:ssZ" This parameter allows you to retrieve clusters for which the associated event’s time is before a specified time stamp.
time.start "YYYY-MM-DDThh:mm:ssZ" This parameter allows you to retrieve clusters for which the associated event’s time is after a specified time stamp.
earliest_story.end "YYYY-MM-DDThh:mm:ssZ" This parameter allows you to specify clusters whose earliest story was published before a specified time stamp.
earliest_story.start "YYYY-MM-DDThh:mm:ssZ" This parameter allows you to specify clusters whose earliest story was published after a specified time stamp.
latest_story.end "YYYY-MM-DDThh:mm:ssZ" This parameter allows you to specify clusters whose latest story was published before a specified time stamp
latest_story.start "YYYY-MM-DDThh:mm:ssZ" This parameter allows you to specify clusters whose latest story was published after a specified time stamp.

Common workflows

Clusters

Stories metadata

These parameters enable you to filter metadata from stories, e.g. a single news article or a specific URL, or retrieving specific clusters by their ID.

Endpoints Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Stories metadata parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

These parameters enable you to filter by the source that published the article.

Parameters Values Description
id (INT64 OR INT64 OR … OR INT64) This parameter allows you to retrieve specific clusters by their id which is specified with a list of int64 values.
links.permalink “URL string” This parameter is used for finding stories with a specified url.
clusters (INT64 OR INT64 OR … OR INT64) This parameter is used for filtering stories associated with a specific cluster (currently accepts one cluster per search). Clustering requires an Advanced or Enterprise license key. Start a free trial or contact sales to upgrade your account.
source.links_in_count.min INT64 This parameter is used for finding stories from sources whose Links in count is greater than or equal to the specified value. You can read more about working with Links in count here.
source.links_in_count.max INT64 This parameter is used for finding stories from sources whose Links in count is less than or equal to the specified value. You can read more about working with Links in count here.

Common workflows

Website traffic rank

Sentiment analysis

There are two types of sentiment analysis in Quantexa News API: document-level and entity-level.

All stories in Quantexa News API contain sentiment predictions for:

  • Text in the title and body of the article. This is document-level sentiment analysis

  • Text in each sentence where entities have been recognised. This is entity-level sentiment analysis

Fields which apply to both document-level and entity-level sentiment:

Polarity

Polarity is the sentiment category predicted by the model: positive, negative or neutral.

It can be found in the field "polarity" within the document-level sentiment analysis, or for each mention of the entity within the article under the object "entities".

Score

A sentiment score is the confidence of the prediction made by the sentiment model.

Predictions are made in each article element for positive, negative and neutral sentiments. The prediction with the highest confidence score wins and is populated in the field "score". It ranges from 0 to 1.

Document-level sentiment analysis

Document-level sentiment analysis is the prediction of the sentiment expressed in the title and body of the story. It can be found in the object "sentiment". Example:

{
 <...>
 "sentiment": {"body": {"polarity": "negative", "score": 0.96},
                "title": {"polarity": "positive", "score": 0.45}
 <...>
}

Entity-level sentiment analysis

Sentiment analysis of the entities recognised in the text are broken down by body and title, but also for each mention of the entity. Each object contains the sentiment polarity, score, frequency and mention index. Take a look at the following example:

{
  <...>
  "entities": {"body": {"sentiment": {"confidence": 0.75, "polarity": "neutral"},
                        "surface_forms": [{"frequency": 5,
                                           "mentions": [{"index": {"end": 5,   "start": 0},   
                                                         "sentiment": {"confidence": 0.77, 
                                                         "polarity": "positive"}},
                                                        {"index": {"end": 365, "start": 360}, 
                                                         "sentiment": {"confidence": 0.91, 
                                                         "polarity": "neutral"}},
                                                        {"index": {"end": 662, "start": 657}, 
                                                         "sentiment": {"confidence": 0.83, 
                                                         "polarity": "positive"}},
                                                        {"index": {"end": 795, "start": 790}, 
                                                         "sentiment": {"confidence": 0.51, 
                                                         "polarity": "neutral"}},
                                                        {"index": {"end": 832, "start": 827}, 
                                                         "sentiment": {"confidence": 0.83, 
                                                         "polarity": "neutral"}}],
                                           "text": "Apple"}]},
               "id": "Q312",
               "links": {"wikidata": "https://www.wikidata.org/wiki/Q312",
                         "wikipedia": "https://en.wikipedia.org/wiki/Apple_Inc."},
               "overall_frequency": 6,
               "overall_prominence": 0.98,
               "overall_sentiment": {"confidence": 0.63, "polarity": "neutral"},
               "stock_tickers": ["AAPL"],
               "title": {"sentiment": {"confidence": 0.51, "polarity": "neutral"},
                         "surface_forms": [{"frequency": 1,
                                            "mentions": [{"index": {"end": 5, "start": 0}, 
                                                          "sentiment": {"confidence": 0.51, 
                                                          "polarity": "neutral"}}],
                                            "text": "Apple"}]},
               "types": ["Business", "Organization"] }
 <...>
}

Overall sentiment

The overall sentiment is the predominant sentiment towards the entity in the article. It can be found in the field "overall_sentiment" inside the object "entities".

A brief explanation of how the overall sentiment is calculated. First, the model performs an individual sentiment prediction for each mention of the entity in the article.

Once individual predictions are made, the average confidence score is calculated for each polarity, and it ends up with one overall score per polarity for an entity. The polarity with the highest score is selected as the overall sentiment for that entity in the article. If the scores are tied, the model favours positive over negative and negative over neutral.

This is done for the entity in the title and the entity in the body. This means the model calculates the overall entity sentiment for both body and title. This is done by simply selecting the polarity with the higher score again.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Parameters for sentiment analysis can be used with the following Quantexa News API endpoints:

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

These parameters enable you to filter for positive, negative, or neutral sentiment within an article’s title or body.

N.B. For entity-level sentiment analysis (ELSA), which we recommend using over document-level sentiment analysis, go to Entities in Common Workflows.

Parameters Values Description
sentiment.title.polarity (positive OR neutral OR negative) This parameter is used for finding stories whose title sentiment is the specified value.
sentiment.body.polarity (positive OR neutral OR negative) This parameter is used for finding stories whose body sentiment is the specified value.

Common workflows

Sentiment analysis

Timeseries

This parameter enables you to query for story volume over a date range.

For more information on Timeseries, please read the API Endpoints section of our documentation.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. This parameter can be used with the following Quantexa News API endpoints:

/time_series

Parameters

Parameters Values Description
period +{int}{time unit} The size of each date range is expressed as an interval to be added to the lower bound. It supports Date Math Syntax. Valid options are + following an integer number greater than 0 and one of the Date Math keywords. e.g. +1DAY, +2MINUTES and +1MONTH. Here are Supported keywords.

Common workflows

Time indexation

Autocomplete

Quantexa News API’s Autocomplete endpoint is a helper endpoint that enables you to find specific entities in our knowledge base, which consists of over 5 million entities, as well as sources based on name or domain from over 90,000 publishers.

Entities

The latest version of Autocomplete allows enhanced queries on the entity object, enabling you to specify multiple conditions for a single entity to meet. For example, you can specify stories that mention an entity by its ID and entity type. The addition of extra metadata helps with the manual disambiguation of entities.

In order to find entities and their metadata (e.g. ID, Wikipedia links, and type), you can pass terms to the auto-complete endpoints, and it returns the best match for the term in the knowledge base.

Sources

Once a source name term or domain is passed to this endpoint, it returns the closest matches to the term or domain searched. This enables you to check which sources are available on the News API inventory or search for source IDs and include them accurately in your queries.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Autocomplete parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/autocomplete/suggestions/entity-names

/autocomplete/suggestions/sources

/autocomplete/suggestions/entity-types

Parameters

These parameters enable you to match entities and sources that are in Quantexa News API’s knowledge base and inventory.

Parameters Compatible paths Values Description
term entity-names “string” This parameter is used for finding autocomplete objects that contain the specified value.
entity-types
type_id entity-names “string” It’s possible to filter by entity type as in the ui which is useful for ambiguous entities e.g. when looking for Amazon the company filtering by the organization entity type
name_term sources “string” Searching the source autocomplete by source name.
domain_term sources “string domain” Use this parameter to search the source autocomplete by source domain
limit entity-names INT64 Similar to the ‘per_page’ parameter in existing endpoint. Limits number of results returned from a request. If not specified it defaults to 25.
sources
entity-types

Common workflows

Searching for entities with Autocomplete

Searching sources with autocompletes

Histograms

These parameters enable you to return the distribution of articles over a range of values for your specified parameter.

For more information on Histograms, please read the API Endpoints section of our documentation.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Histogram parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/histograms

Parameters

Parameters Values Description
interval.start "YYYY-MM-DDThh:mm:ssZ" This parameter is used for setting the start data point of histogram intervals.
interval.end "YYYY-MM-DDThh:mm:ssZ" This parameter is used for setting the end data point of histogram intervals.
interval.width “INT64” This parameter is used for setting the width of histogram intervals.

Common workflows

Time indexation

Media elements

Many stories contain media such as images or videos as well as text, which can be valuable to some users, depending on what they are building. Some users are only interested in stories with video content to increase click-through rates, whereas other users do not want videos in their results if they are concerned with the end-users loading time, as videos take slightly longer to load on poorer connections.

Quantexa News API allows you to:

  • Specify whether your results should include these media or not

  • Specify the number of images or videos your results should include (this can be an exact number, a range, or a minimum or maximum)

  • Sort your results according to how many images or videos they contain

  • Specify the format of the media in your stories

  • Display quantitative trends in media with the histogram endpoint

Amount of media in stories

It is possible to specify whether the stories returned by your query should contain media or not and also to specify the number of images and videos in each story by using the media.images.count or the media.videos.count parameter.

  • By setting media.videos.count.min to 1, you are specifying that your query only returns stories with at least one image.

  • By setting media.videos.count.min to 1 and media.videos.count.max also to 1, you are specifying that you only want results that contain exactly one video.

  • By setting media.videos.count.max to 0, you are excluding any stories with videos from your results.

Media count

To display stories with more images before stories with fewer images in your results, set the sort_by parameter to media.images.count and the sort_direction parameter to desc. Whenever you use the sort_by parameter, the sort direction will automatically be in descending order (i.e. stories with the most results in the parameter will be shown first). In order to reverse this default, set the sort_by parameter to asc, which will sort the results in ascending order.

Media format & size

It is possible to return or exclude stories that contain images in a specified format by using the media.images.format[] parameter. These can help avoid any technical issues you can foresee with these formats.

The image formats you can use as a parameter are:

  • BMP

  • GIF

  • JPEG

  • PNG

  • TIFF

  • PSD

  • ICO

  • CUR

  • WEBP

  • SVG

It is also possible to specify maximum and minimum height, width, and content length by appending .min or .max to the following parameters:

media.images.width

media.images.height

media.images.content_length

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. These parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

These parameters enable you filter articles based on the media they contain, e.g. images and videos

Parameters Values Description
media.images.content_length [* TO *] This parameter is used for finding stories whose images content length are greater than or equal to the specified value.
media.images.count [* TO *] This parameter is used for finding stories whose number of images is greater than or equal to the specified value.
media.images.height [* TO *] This parameter is used for finding stories whose height of images are greater than or equal to the specified value.
media.images.width [* TO *] This parameter is used for finding stories whose width of images are greater than or equal to the specified value.
media.videos.count [* TO *] This parameter is used for finding stories whose number of videos is greater than or equal to the specified value.
media.images.format ("string", "string", ... "string") "This parameter is used for finding stories whose images format are the specified value. Available values: BMP, GIF, JPEG, PNG, TIFF, PSD, ICO, CUR, WEBP, SVG"

Common workflows

Media elements

Sorting

You can choose how you want your results to be sorted by using the sort_by parameter. This allows you to retrieve the most relevant results of your query first, with relevance based on a value you choose as a parameter.

The sort_by parameter can take one of the following values:

Relevance

Using the relevance value returns the stories that most closely match your search input. The parameter value is relevance.

Recency

Using the recency value gives a higher rank to stories that were published most recently, whilst still giving weight to your query.

Published datetime stamp

Using published_at as the value here will rank your results based only on how recently your returned stores were published.

Web traffic rank

Website traffic refers to the flow of visitors and users who access a website. It encompasses the visitors who land on a website through various means, such as search engines, social media platforms, direct visits, or referral links.

Alexa is a ranking system that ranks websites based on the volume of traffic they have generated over the previous 3 months. The more traffic a website receives, the higher its ranking. For example, Google has a ranking of 1, BBC has a ranking of around 85, and so on. Alexa gives two options to users when seeking the ranking of sites. For more details on how web traffic data works, visit the page Sources and Website traffic rank.

  • Global ranking - The metric on how popular a website is in the global rankings. You can sort results passing the value source.rankings.alexa.rank on parameter sort_by.

  • National ranking - The metric on how popular a site is in a specific country. This is available for every country in the world and is accessed by adding the ISO 3166-1 alpha-2 country code to the parameter sort_by with the value pattern source.rankings.alexa.rank.{country}.

Note: * Not all sources contain the rank metadata, meaning that using this parameter value could narrow your search.

  • Alexa ranking was discontinued by Amazon last year, so the rank we are using is a snapshot from May 2022.

Number of photos

This value allows users to rank results based on the number of photos on the page. The parameter value is media.images.count.

Number of videos

This value allows users to rank results based on the number of videos on the page. The parameter value is media.videos.count.

Keyword boosting

When making a query with multiple keywords, it might be the case that one keyword is more important to your search than others. Boosting enables you to add weight to the more important keyword/keywords so that results mentioning these keywords are given a “boost” to get them higher in the order of the results.

For example, searching ["John", "Frank", "Sarah"] gives equal weight to each term, but ["John", "Frank"^2, "Sarah"] is like saying a mention of “Frank” is twice as important as a mention of “John” or “Sarah”. Stories mentioning “Frank” will, therefore, appear higher in the rank of search results.

Boosting is not the definitive keyword search input. It simply allows the user to specify the preponderant keywords in a list (i.e. if a story contains many mentions of non-boosted searched keywords, it could still be returned ahead of many stories that mention a boosted keyword). Boosting, therefore, does not exclude stories from the results. It only affects the order of returned results.

Boosting only works in a search when there is more than one keyword, as it boosts the weight of a keyword compared to the other keywords being searched.

Sorting direction

Each of the parameters above can sort results by ascending or descending value. This is achieved by entering either asc or desc as a value of the sort_direction parameter. If this parameter is not declared, results will be returned in descending order.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Sorting parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/clusters

/stories

Parameters

These parameters enable you to arrange the results in a specific order, such as ascending or descending, based on chosen criteria, facilitating easier analysis and retrieval of relevant news articles.

Parameters Values Description
sort_by “string” This parameter allows you to specify the parameter by which your results will be sorted. The accepted values are: - story_count - earliest_story - latest_story - time
sort_direction (ascending OR descending) This parameter allows you to specify the sort direction of your results. The accepted values are asc and desc.

Common workflows

Sorting results

Quantexa News API’s Trends endpoint allows you to identify the most frequent values for categorical attributes contained in stories, e.g. most frequent entities, concepts or keywords. This endpoint allows you to set parameters like a time period, a subject category, or an entity, and it will return the most mentioned entities or keywords that are mentioned in relation to your query.

Rather than simply reviewing granular stories. Similar to the Timeseries endpoint, you may be interested in seeing themes and patterns over time that aren't immediately apparent when looking at individual documents. The Trends endpoint allows you to see the most frequently recurring entities, concepts or keywords that appear in articles that meet your search criteria. This enables you to generalise the data and make high level assertions about content.

For more information on Trends, please read the API Endpoints section of our documentation.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. The parameter (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/trends

Parameters

This parameter enables you to find trending elements within news articles, including entities, sentiment, sources, categories, languages, and clusters.

Parameters Values Description
field “string” This parameter is used for specifying the y-axis variable for the histogram.

Common workflows

Trends

Pagination

Quantexa News API returns up to 100 stories per call/page. This workflow shows you how to use the cursor to chain multiple calls together and retrieve more than 100 stories at a time.

Fetching a large number of sorted results: Cursor

The API supports using a cursor to scan through results. In the API, a cursor is a logical concept that doesn't cache any state information on the server. Instead, the sort values of the last story returned to the client are used to compute a next_page_cursor, representing a logical point in the ordered space of sort values. That next_page_cursor can be specified in the parameters of subsequent requests to tell the API where to continue.

Cursor

To use a cursor with the API, specify a cursor parameter with the value of *. This is the same as declaring page=1 to tell the API "start at the beginning of my sorted results," except it also informs the API that you want to use a cursor. The default value of the cursor is * unless you specify otherwise. In addition to returning the top N sorted results (where you can control N using the per_page parameter), the API response will also include an encoded String named next_page_cursor.

You then take the next_page_cursor String value from the response and pass it back to the API as the cursor parameter for your next request. You can repeat this process until you've fetched as many stories as you want or until the next_page_cursor returns match the cursor you've already specified — indicating that there are no more results.

Per Page Attribute

The API supports using a per_page to specify the maximum number of stories per page. This parameter is used to paginate results from a query. The possible value for this parameter is between 1 to 100. The default value is 10 if the per_page parameter is not passed in the query.

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. Pagination parameters (listed in full in the section below) can be used with the following Quantexa News API endpoints:

/clusters

/stories

/related_stories

/autocompletes

Parameters

These parameters enable you to chain API requests. The API returns up to 100 stories per call, and these parameters enable you to scroll through the full results set. This is crucial for results sets of over 100 stories.

Parameters Values Description
cursor (“*” OR “hash string”) This parameter allows you to chain your requests in calls that return more than 100 clusters by supplying the next_page_cursor response to your next call. For more information about using cursor with the News API, take a look at the documentation page.
per_page int This parameter allows you to specify the maximum number of clusters to be returned by your query. The maximum value is 100, for more than 100 results cursor can be used.

Common workflows

Pagination of results

Response control

The response from the Quantexa News API consists of 26 data enrichments nested in JSON objects. For some queries not all of the 26 enrichments will be required, so with the parameter return[] you can specify only the fields you want to return in your API response. For example, if you only want to return entities, use return[]="entities".

Endpoints

Quantexa News API includes endpoints that provide both retrieval and analysis features that allow you to search, collect, and analyze news content at scale. This parameter can be used with the following Quantexa News API endpoints:

/stories

/related_stories

Parameters

This parameter is to filter objects in the API response. See full available values in the table below and in Common Workflows.

Parameters Values Description
return[] ['id', ‘title’, ‘body’, ‘summary’, ‘source’, ‘author’, ‘entities’, ‘keywords’, ‘hashtags’, ‘characters_count', ‘words_count', ‘sentences_count’, ‘paragraphs_count’, ‘categories’, ‘media’, ‘sentiment’, ‘language’, 'published_at’, 'links’] This parameter is used for specifying return fields.

Common workflows

Response objects