Parameters by theme

Datetime

The time stamp composition and practical examples of searching for content based on news articles' time indexation.

Date Formatting

The format used is a restricted form of the canonical representation of DateTime in the XML Schema specification (ISO 8601):

YYYY-MM-DDThh:mm:ssZ

YYYY is the year.
MM is the month.
DD is the day of the month.
T is a literal 'T' character that indicates the beginning of the time string.
hh is the hour of the day as on a 24-hour clock.
mm is minutes.
ss is seconds.
Z is a literal 'Z' character, indicating that this string representation of the date is in UTC.

No time zone can be specified. The String representations of dates are always expressed in Coordinated Universal Time (UTC). Example value:

2022-03-27T13:47:26Z

You can optionally include fractional seconds if you wish, although any precision beyond milliseconds will be ignored. Examples of values with sub-seconds include:

2016-03-27T13:47:26.822Z
2016-03-27T13:47:26.82Z
2016-03-27T13:47:26.8Z

Date Math

The date field types also support date math expressions, which makes it easy to create times relative to fixed moments in time, including the current time, which can be represented using the special value of "NOW".

Date Math Syntax

Date math expressions can do two things: specify a period by adding time units to the current time and round the time to a specified unit. Expressions can be chained and are evaluated left to right.

This represents a point in time two months from now:

NOW+2MONTHS

This is one day ago:

NOW-1DAY

A slash is used to indicate rounding. Below is a point in time yesterday, rounded to the previous hour. If the current time is 15:42:17.2165, the point below is 15:00:00.0000 yesterday:

NOW-1DAY/HOUR

Below is yesterday at 00:00:00.0000AM:

NOW-1DAY/DAY

All the math expressions will work to the millisecond precision.

Date Math Keywords

The supported keywords in Date Math:

Date keywords
NOW
YEAR
MONTH
DAY
HOUR
MINUTE
SECOND

Note: All the keywords above can be passed in their plural form: YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter	Values	Description
published_at	[* TO *]	Find stories whose published at time is in the specified range.

Common workflows

Time indexation

Sorting results

Languages

The News API offers content in 16 languages. By filtering the language parameter, you can find specific language articles.

Language is a mandatory field, so the language field of all articles available on News API is populated with the prediction made by the language model.

Language filter and keyword search

Always supply a language parameter in your search, no matter how many languages you want to search. When no language is passed, the search will default to all languages available to your account plan but without the necessary language-specific filters like stop word removal and stemming. This could result in your search not retrieving all of the relevant stories to your query.

The supported languages and their language codes are listed below. Multilingual support requires an upgraded license key. Contact sales to upgrade your account.

Translations

All articles written in non-English languages available on the News API will also have an English translation. The translation to English is in the response field translations. An in-house machine translation (MT) model translates all news stories available through the API using a proprietary MT model. This model is optimized for speed whilst maintaining high-quality translations across diverse source languages. The model is a “transformer-based” neural machine translation model (NMT), trained on hundreds of millions of examples. An example of a piece of the response from the stories endpoint, which includes translations:

{
  "title": "original title",
  "body": "original body",
  "language": "<non-English language code>",

  "<...>": "<...>",

  "translations": {
    "en": {
      "body": "translated body",
      "title": "translated title"
    }
  }
}

Analyzing multilingual content

All news articles accessible via News API are available in their native language text and machine-translated English text.

All the translated content benefits from all of the enrichment features supported by the News API. Most NLP enrichments are performed on native English or translated English text.

In the case of article summarisation, this is conducted in the original text of some languages and in the translated text of others. See below:

Language	Text Used
en, de, fr, it, es, pt	Original text
ar, da, fi, nl, fa, ru, sv, tr, zh-cn, zh-tw	Translated text (en)

Endpoints

/clusters

/stories

/time_series

/trends

/histograms

Parameters

Parameter	Values	Description
language	("int", "int", ... "int")	Specify the language. It supports ISO 639-1 language codes

Common workflows

Languages

Keywords

These parameters enable you to perform keyword searches in Quantexa News API for articles where specific words or phrases match and retrieve stories that contain those keywords.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter	Values	Description
title	("int", "int", ... "int")	Find stories whose title contains a specific keyword. It supports advanced search operators.
body	("int", "int", ... "int")	Find stories whose body contains a specific keyword. It supports advanced search operators.
text	("int", "int", ... "int")	Find stories whose both title and body contains a specific keyword. It supports advanced search operators.
translations.{language}.title	("int", "int", ... "int")	Filter stories translated from a non-English language containing an English-language term in the title.
translations.{language}.body	("int", "int", ... "int")	Filter stories translated from a non-English language containing an English-language query term in the body.

Common workflows

Proximity search

Boolean operators

Excluding operator

Entities

What is an entity?

An entity is a real-world thing that is mentioned in a story and then tagged with metadata by Quantexa News API so users can build an accurate picture of what is being talked about in news content. The following data points are applied to each entity:

The surface form(s) is the text in the story that mentions the entity.
The type of entity it is.
A Wikipedia link to that entity's Wikipedia page - if applicable.
A Wikidata link to that entity's Wikidata page - if applicable.
The sentiment expressed towards it in the story.
The indices of the surface forms - the index of the mention(s) of the entity in the story.
The prominence of the entity - how prominent the entity is in an article.
The frequency of the entity - the number of mentions the entity has in an article.

Why use entities instead of keywords?

Keywords can refer to multiple things, and things can be referred to by multiple keywords. Quantexa News API's Entities recognises and disambiguates real-world people, companies, and things that are mentioned in the news, going beyond keywords to provide far more accurate news analytics data.

Using Entities has two high-level benefits when building your search:

First, when multiple different keywords commonly refer to a single entity, Quantexa News API correctly recognises the entity in each mention. For example, take a look at how the News API recognises the entity “MetLife,” even when different names for the company are mentioned:

	Sample 1: "Shares in Metropolitan Life Insurance fell sharply this morning."	Sample 2: “MetLife announces new insurance offerings."
Surface Form	"Metropolitan Life Insurance"	"MetLife"
Entity Name	MetLife	MetLife
Entity Type	Business, Organization	Business, Organization
Wikipedia URL	MetLife	MetLife

Second, the entities model disambiguates mentions for you: when a single keyword can refer to multiple entities, the News API will consider the rest of the document to make an accurate prediction about which thing is being referred to. As an example, take the following two sentences mentioning the keyword “square” and see how the News API will recognise each as a different entity and how it returns some key information:

	Sample 1: “Protests commenced in the town square.”	Sample 2: “Square was founded by Jack Dorsey.”
Surface Form	"square"	"Square"
Entity Name	square	Square Inc.
Entity Type	Business, Organization	Organization
Wikipedia URL	None	Square Inc.
Wikidata URL	None	Square Inc.

Entity types

Just as Square and MetLife above have "Business" and "Organization" as their entity types, every entity recognized by Quantexa News API has a type.

These types can be extracted from Wikipedia & Wikidata, or, where that is not applicable, they can be predicted on the fly by the News API. For example, the New York Stock Exchange entity has the type "Stock_exchange," extracted from Wikipedia & Wikidata, but the entity "Jeremy Draper" has the type "Human", even though it doesn't have a Wikipedia page.

There are two ways to use entity types to build intelligent searches - simple searches and enhanced searches.

Entity types are structured in a parent/child relationship, with almost all top-level entity types having child entity types.

Top-level entity types


Currency	Location	Human	Organization
Product_(business)	Profession	Technology	Risk
Retail	Regulation_(European_Union)

Child entity types of organisation type:


Advocacy_group	Bank	Bank_holding_company	Brick_and_mortar
Business	Certificate_authority	Civil_service	Commercial_bank
Community	Company	Conglomerate_(company)	Conservation_authority_(Ontario,_Canada)
Consumer_organization	Corporate_group	Corporation	Credit_bureau
Deliberative_assembly	Educational_organization	Emergency_service	Environmental_organization
Financial_institution	Government	Holding_company	Investment_banking
Investment_company	Law_commission	Law_enforcement_organization	Local_federation
Local_government	National_research_and_education_network	Newspaper	Nonprofit_organization
Parlement	Political_organisation	Private-equity_firm	Privately_held_company
Public_company	Ruling_party	Social_movement_organization	Standards_organization
Stock_exchange	Subsidiary	Technology_company	Think_tank

Child entity types of geographic location type:


City	Country	Location	Island_country
Sovereign_state	State_(polity)	U.S._state

Child entity types of risk type:


Business risks	Endangerment	External risk	Financial risk
Operational_risk	Vulnerability

Child entity types of product type:


Software	Software_as_a_service	Stock_market_index

Enhanced entity search

Quantexa News API allows enhanced queries on the entity object, enabling you to specify multiple conditions for a single entity to meet. For example, you can specify stories that mention an entity "Square" when that entity is also of the type "Organization", or you can return stories where a specific entity was mentioned in a negative tone.

This is done by supplying a nested query to the aql parameter, which accepts entities:{{ }} as a value, in which you can supply a list of parameters in a Lucene-based syntax for an entity to meet. You can use the parameters below.

AQL Parameter	Description
`element`	The part of the story the entity should be mentioned in (accepts "title" or "body").
`surface_forms`	A specific form of the entity ("Apple" OR "Apple Computer").
`id`	The entity's ID on the Knowledge Base.
`links.wikipedia`	The entity's Wikipedia link.
`links.wikidata`	The entity's Wikidata link.
`sentiment`	The sentiment expressed about the entity.
`stock_ticker`	The entity's stock ticker.
`overall_prominence`	The prominence of the entity within an article. A value ranging from 0 to 1, where 0 indicates no article prominence and 1 indicates very high article prominence.
`frequency`	The number of times an entity is mentioned in an article title or body. It should be used in conjunction with the element parameter.
`overall_frequency`	The number of times an entity is mentioned in an article.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter	Subfield	Values	Description
entities	id	{{id:("string", "string", ... "string")}}	Find stories based on the specified entities' id in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities	links.wikipedia	{{links.wikipedia:"url string"}}	Find stories based on the specified entities Wikipedia URL in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities	stock_ticker	{{stock_ticker:"string"}}	Find stories based on the specified stock tickers of entities in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities	surface_forms.text	{{surface_forms.text:("string")}}	Find stories based on the specified entities' surface form text in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities	overall_prominence	{{overall_prominence:[* TO *]}}	Find stories based on the prominence of the entity in the article.
entities	element	{{element:(title OR body)}}	Specify wether the entity search is to be performed on the title or body of the article.

Common workflows

Entities

Locations

Categories taxonomy

All articles on Quantexa News API are enriched with categories and industry taxonomy tags. The tagged content helps to categorize and organize large volumes of content, making it easier to search, filter, and understand the information within the documents.

The classifiers are capable of classifying content into four taxonomies. The complete list of taxonomies included with Quantexa News API is outlined in the table below.

Taxonomy	Supported Languages	Number of classes	Levels of depth	Commonly used for	Taxonomy ID
Smart Tagger (Aylien categories)	en	2998	6	News articles, Blog posts	`aylien`
Smart Tagger (Industries)	en	1496	4	News articles, Blog posts	`industries`
IPTC Subject Codes	en	1400	3	News articles, Blog posts	`iptc-subjectcode`
IAB QAG	en	392	2	Websites, Advertisement	`iab-qag`

The supported taxonomies are made up of categories and subcategories, or parent categories and child categories, for example, with Football being a child category of Sport.

It’s standardised into a tree-like structure, allowing you to traverse from child to parent categories recursively.

Taxonomy lifecycle

Taxonomy tags for categories and industries are never removed from our catalogue and taxonomy models.

Over time, new tags are added and appended to the current catalogue.

Smart Tagger

The Smart Tagger is Quantexa News API’s most granular article tagging feature that leverages state-of-the-art classification models built using a vast collection of manually tagged news articles based on domain-specific industry and topical category taxonomies. With a taxonomy of ~3000 topical categories and ~1500 industries, Smart Tagger classifies articles with high precision, making it easier for users to filter for articles most relevant to them and their use cases. Tags can be passed as IDs or labels.

Smart Tagger Categories

Smart Tagger’s Categories taxonomy is a classification system that helps categorize news articles into broader thematic areas. It allows users to classify news stories into high-level topics such as business, politics, and technology, as well as multiple sub-categories in the parent/child hierarchy. This taxonomy serves as a useful tool for organizing and filtering news articles based on their granular subject matter, enabling easier access to relevant information within specific content categories.

Examples of Smart Tagger Categories in articles:

{'categories': [{...},
                 'score': 0.39},
                {'id': 'ay.sports',
                 'label': 'Sports',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports'},
                 'score': 1},
                {'id': 'ay.sports.rugby',
                 'label': 'Rugby',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.team'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.rugby'},
                 'score': 1},
                {'id': 'ay.sports.team',
                 'label': 'Team Sports',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.team'},
                 'score': 1}]}

Smart Tagger Industries

Smart Tagger’s Industries taxonomy is a classification system designed to categorize news articles and content into various industry-related topics. This taxonomy aids in organizing and retrieving news content relevant to particular sectors, making it a valuable tool for those who are looking to access and analyze news data within specific industry contexts.

Examples of Smart Tagger Industries in articles:

{'industries': [{'id': 'in.tech',
                 'label': 'Technology',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech'},
                 'score': 0.7},
                {'id': 'in.tech.appsoft',
                 'label': 'Application Software',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.software'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.appsoft'},
                 'score': 0.7},
                {'id': 'in.tech.software',
                 'label': 'Software',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.software'},
                 'score': 0.7}]}

Score

The score is the level of confidence for the taxonomy prediction. Each time an article is linked to a taxonomy tag, a score between 0 and 1 is applied to that tagging, indicating how relevant the industry or category is to the article. The higher the score, the more relevant the tag is to the article.

Taxonomy hierarchy

Documents tagged with a child node are tagged with all parent nodes, too.

If searching for a top-level node, for example, label "Adverse Events", it will return articles tagged with this label and its child nodes.

IPTC Taxonomy

The IPTC (International Press Telecommunications Council) taxonomy is a standardized classification system used in the media industry to tag and categorize news content. It consists of predefined categories, such as subjects, events, and genres, that help news organizations and content providers uniformly label their articles and images with relevant metadata.

Examples of IPTC taxonomy in articles:

{'categories': [{...},
                {'id': '01000000',
                 'label': 'arts, culture and entertainment,
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01000000'},
                 'score': 0.58},
                {'id': '01005000',
                 'label': 'cinema',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01000000'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005000'},
                 'score': 0.58},
                {'id': '01005001',
                 'label': 'film festival',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005000'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005001'},
                 'score': 0.58},
                {'id': 'ay.culture',
                 'label': 'Culture, Entertainment and the Arts',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture'},
                 'score': 1},
                {'id': 'ay.culture.film',
                 'label': 'Film',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture.film'},
                 'score': 1}]}

IAB Taxonomy

The IAB (Interactive Advertising Bureau) taxonomy is a standardized system used in the digital advertising industry to categorize online content, advertising campaigns, and audience segments. It provides a structured framework for classifying digital advertising assets and defining audience interests and demographics.

Examples of IAB taxonomy in articles:

{'categories': [{'id': 'IAB1',
                 'label': 'Arts & Entertainment',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1'},
                 'score': 0.21},
                {'id': 'IAB1-5',
                 'label': 'Movies',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1-5'},
                 'score': 0.36},
                {...}]}

These parameters enable you to find articles categorized by Quantexa News API’s category taxonomies, including our proprietary Smart Tagger, which includes over 3,000 topical categories and 1,500 industries. We also support IPTC and IAB category taxonomies. For more information on how to use our category taxonomies, refer to Common Workflows.

Endpoints

/clusters

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter	Subfield	Values	Description
categories	taxonomy	{{taxonomy:(aylien OR iptc-subjectcode OR iab-qag}}	Define the type of taxonomy for the rest of the categories query, is available for our standard IPTC and IAB category taxonomies.
industries			This parameter is not available in flat search. However, you can filter for articles containing specific industries using AQL.
categories	id	{{id:("string", "string", ... "string")}}	Find stories by categories id. Available for standard IPTC and IAB category taxonomies.
categories	label	{{label:("string", "string", ... "string")}}	Find stories by categories label - An alternative to the ID.
categories	score	{{score:[* TO *]}}	Filter the confidence score of the model prediction.

Common workflows

Category and Industry taxonomies

Sources

Quantexa News API aggregates and enriches news content from approximately 90,000 global sources.

Every article comes with its source metadata, which enables you to find articles from specific sources or filter out noise and irrelevant sources.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter	Values	Description
source.id	(int, int, int, ...)	Filter stories from publisher sources that the ID is the specified value.
source.name	("string, ""string," "string,"...)	Filter stories from publisher sources that the name is the specified value.
source.domain	("string ", "string", "string", ...)	Filter stories from publisher sources that the website domain is the specified values.
source.locations.country	("string ", "string", "string", ...)	Filter stories from a publisher source located or headquartered in the specified country values. It supports ISO 3166-1 alpha-2 country codes.
source.locations.state	("string ", "string", "string", ...)	Filters stories from a publisher source located or headquartered in the specified state/province values.
source.locations.city	("string ", "string", "string", ...)	Filters stories from a publisher source located or headquartered in the specified city values.
source.scopes.country	("string ", "string", "string", ...)	Filter stories from publisher source scopes are in the specified country values. It supports ISO 3166-1 alpha-2 country codes.
source.scopes.state	("string ", "string", "string", ...)	Filter stories from publisher source scopes are in the specified state/province values.
source.scopes.city	("string ", "string", "string", ...)	Filter stories from publisher source scopes are in the specified city values.
source.scopes.level	("iternational", "national", "city")	Filter stories from a publisher source scope are the specified level values.
		Available values: international, national, local
source.rankings.alexa.country	("string ", "string", "string", ...)	Filter stories from publisher sources whose Alexa rank is in the specified country value. It supports ISO 3166-1 alpha-2 country codes.
source.rankings.alexa.rank	[0 TO *]	Filter stories from publisher sources whose Alexa rank is in the specified range.

Common workflows

Locations

Website treaffic rank

Authors

Stories can be found in Quantexa News API based on the author. You can search for the author using either the author ID or the author’s name.

Note: The author metadata is not always populated. This is because either there's no associated author mentioned, or it’s not explicit on the page.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameters	Values	Description
author.id	(int, int, ... int)	Filter content created by a specific individual or a set of authors by providing the author's unique identifier ID.
author.name	("string", "string", ... "string")	Filter content created by a specific individual or a set of authors by provided author's name.

Common workflows

Authors

Clusters

Quantexa News API provides access to millions of news stories from over 90,000 sources across the world. Clustering groups these stories into clusters based on the real-world events they represent.

A cluster is a collection of news stories that all refer to the same real-world event. For example, multiple stories referring to a specific company’s earnings will appear in the same cluster, just as multiple stories about a single road accident will.

Lifecycle of a cluster

A newly published story is compared with representative stories of other clusters to see if a cluster already exists for this event. This is achieved by converting the new story’s body to a vector, which is compared against the other story vectors. If the story is found to be similar to other stories (i.e. having a small vector distance), it will be added to that cluster. If the new story is not similar to any existing stories (i.e. having a large vector distance), a new cluster is created containing that single story. Thus, a new cluster is born and subsequently grows with every new, related story that is published. This process happens in real-time, i.e. within an average of 15 minutes of Quantexa News API receiving the story from the publisher.

If no stories have been added to a cluster within a maximum two-week (14 days) period, the cluster is “frozen”. No more stories can be added to the cluster after that point. This cluster lifecycle (creation, growth, decline and freeze) repeats for each and every cluster.

Can an article be in more than one cluster?

No. A story only belongs to the most appropriate, single cluster, and this remains the case through all stages of the cluster lifecycle. A story cannot move from one cluster to another.

Are all stories added to clusters?

No. Stories shorter than 400 characters long on the body are not clusterable. The cluster field on these stories are empty.

The Cluster Object

A cluster object is a type of JSON object that provides a cluster’s ID along with metadata about the stories associated with it.

A cluster has the following properties:

Each cluster has a unique ID in the News API
A cluster can have one or more stories associated with it
A story will always belong to just one cluster.
The relationship between the story and cluster does not change - it will not be reassigned to another cluster at a later time.

Examples of clusters in articles from the clusters endpoint JSON API response:

{
  "cluster_count": 2042945,
  "clusters": [
    {
      "id": 4992716,
      "time": "2019-07-20T07:16:03Z",
      "story_count": 26488,
      "earliest_story": "2019-07-20T07:16:03Z",
      "latest_story": "2019-08-03T07:41:09Z",
      "representative_story": {
        "id": 21466483,
        "title": "Analysts Offer Predictions for A. O. Smith Corp’s Q3 2019 Earnings (NYSE:AOS)",
        "permalink": "https://www.tickerreport.com/banking-finance/4497288/analysts-offer-predictions-for-a-o-smith-corps-q3-2019-earnings-nyseaos.html",
        "published_at": "2019-08-03T07:15:26Z"
      },
      "location": {
        "country": "US"
      }
    }
  ],
  "next_page_cursor": "<string to use in pagination of results>"
}

Endpoints

/clusters

Parameters

Parameters	Values	Description
id	(INT64 OR INT64 OR … OR INT64)	Specific clusters by their id which is specified with a list of int64 values.
location.country	(“string”, “string”, …, “string”)	Specify clusters that refer to events in specific countries. It supports ISO 3166-1 alpha-2 country codes.
story_count.max	“INT64”	Specify clusters that have a maximum number of stories associated with them.
story_count.min	“INT64”	Specify the minimum number stories that retrieved clusters should be associated with. Default value is 2.
time.end	"YYYY-MM-DDThh:mm:ssZ"	Retrieve clusters for which the associated event’s time is before a specified time stamp.
time.start	"YYYY-MM-DDThh:mm:ssZ"	This parameter allows you to retrieve clusters for which the associated event’s time is after a specified time stamp.
earliest_story.end	"YYYY-MM-DDThh:mm:ssZ"	Specify clusters whose earliest story was published before a specified time stamp.
earliest_story.start	"YYYY-MM-DDThh:mm:ssZ"	Specify clusters whose earliest story was published after a specified time stamp.
latest_story.end	"YYYY-MM-DDThh:mm:ssZ"	Specify clusters whose latest story was published before a specified time stamp
latest_story.start	"YYYY-MM-DDThh:mm:ssZ"	Specify clusters whose latest story was published after a specified time stamp.

Common workflows

Clusters

Stories metadata

These parameters enable you to filter metadata from stories, e.g., a single news article or a specific URL, or retrieve specific clusters by their ID.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameters	Values	Description
id	(INT64 OR INT64 OR … OR INT64)	Retrieve specific clusters by their id which is specified with a list of int64 values.
links.permalink	“URL string”	Find stories with a specified URL.
clusters	(INT64 OR INT64 OR … OR INT64)	Filter stories associated with a specific cluster (currently accepts one cluster per search). Clustering requires an Advanced or Enterprise license key. Start a free trial or contact sales to upgrade your account.
source.links_in_count.min	INT64	Find stories from sources whose Links in the count are greater than or equal to the specified value.
source.links_in_count.max	INT64	Find stories from sources whose Links in the count are less than or equal to the specified value.

Common workflows

Website traffic rank

Sentiment analysis

There are two types of sentiment analysis in Quantexa News API: document-level and entity-level.

All stories in Quantexa News API contain sentiment predictions for:

Text in the title and body of the article. This is a document-level sentiment analysis
Text in each sentence where entities have been recognised. This is an entity-level sentiment analysis

Fields which apply to both document-level and entity-level sentiment:

Polarity

Polarity is the sentiment category predicted by the model: positive, negative, or neutral.

It can be found in the field "polarity" within the document-level sentiment analysis, or for each mention of the entity within the article under the object "entities".

Score

A sentiment score is the confidence of the prediction made by the sentiment model.

Predictions are made in each article element for positive, negative and neutral sentiments. The prediction with the highest confidence score wins and is populated in the field "score". It ranges from 0 to 1.

Document-level sentiment analysis

Document-level sentiment analysis is the prediction of the sentiment expressed in the title and body of the story. It can be found in the object "sentiment". Example:

{
 <...>
 "sentiment": {"body": {"polarity": "negative", "score": 0.96},
                "title": {"polarity": "positive", "score": 0.45}
 <...>
}

Entity-level sentiment analysis

Sentiment analysis of the entities recognised in the text is broken down by body and title, but also for each mention of the entity. Each object contains the sentiment polarity, score, frequency and mention index. Take a look at the following example:

{
  <...>
  "entities": {"body": {"sentiment": {"confidence": 0.75, "polarity": "neutral"},
                        "surface_forms": [{"frequency": 5,
                                           "mentions": [{"index": {"end": 5,   "start": 0},   
                                                         "sentiment": {"confidence": 0.77, 
                                                         "polarity": "positive"}},
                                                        {"index": {"end": 365, "start": 360}, 
                                                         "sentiment": {"confidence": 0.91, 
                                                         "polarity": "neutral"}},
                                                        {"index": {"end": 662, "start": 657}, 
                                                         "sentiment": {"confidence": 0.83, 
                                                         "polarity": "positive"}},
                                                        {"index": {"end": 795, "start": 790}, 
                                                         "sentiment": {"confidence": 0.51, 
                                                         "polarity": "neutral"}},
                                                        {"index": {"end": 832, "start": 827}, 
                                                         "sentiment": {"confidence": 0.83, 
                                                         "polarity": "neutral"}}],
                                           "text": "Apple"}]},
               "id": "Q312",
               "links": {"wikidata": "https://www.wikidata.org/wiki/Q312",
                         "wikipedia": "https://en.wikipedia.org/wiki/Apple_Inc."},
               "overall_frequency": 6,
               "overall_prominence": 0.98,
               "overall_sentiment": {"confidence": 0.63, "polarity": "neutral"},
               "stock_tickers": ["AAPL"],
               "title": {"sentiment": {"confidence": 0.51, "polarity": "neutral"},
                         "surface_forms": [{"frequency": 1,
                                            "mentions": [{"index": {"end": 5, "start": 0}, 
                                                          "sentiment": {"confidence": 0.51, 
                                                          "polarity": "neutral"}}],
                                            "text": "Apple"}]},
               "types": ["Business", "Organization"] }
 <...>
}

Overall sentiment

The overall sentiment is the predominant sentiment towards the entity in the article. It can be found in the field "overall_sentiment" inside the object "entities".

A brief explanation of how the overall sentiment is calculated. First, the model performs an individual sentiment prediction for each mention of the entity in the article.

Once individual predictions are made, the average confidence score is calculated for each polarity, and there is one overall score per polarity for an entity. The polarity with the highest score is selected as the overall sentiment for that entity in the article. If the scores are tied, the model favours positive over negative and negative over neutral.

This is done for the entity in the title and the entity in the body. This means the model calculates the overall entity sentiment for both the body and title by simply selecting the polarity with the higher score again.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameters	Values	Description
sentiment.title.polarity	(positive OR neutral OR negative)	Find stories whose title sentiment is the specified value.
sentiment.body.polarity	(positive OR neutral OR negative)	Find stories whose body sentiment is the specified value.

Common workflows

Sentiment analysis

Timeseries

This parameter enables you to query for story volume over a date range.

For more information on Timeseries, please read the API Endpoints section of our documentation.

Endpoints

/time_series

Parameters

Parameters	Values	Description
period	+{int}{time unit}	The size of each date range is expressed as an interval to be added to the lower bound. It supports Date Math Syntax. Valid options are + following an integer number greater than 0 and one of the Date Math keywords. e.g. +1DAY, +2MINUTES and +1MONTH.

Common workflows

Time indexation

Autocomplete

the News API’s Autocomplete endpoint is a helper endpoint that enables you to find specific entities in our knowledge base, which consists of over 5 million entities, as well as sources based on name or domain from over 90,000 publishers.

Entities

The latest version of Autocomplete allows enhanced queries on the entity object, enabling you to specify multiple conditions for a single entity to meet. For example, you can specify stories that mention an entity by its ID and entity type. The addition of extra metadata helps with the manual disambiguation of entities.

You can pass terms to the auto-complete endpoints to find entities and their metadata (e.g., ID, Wikipedia links, and type). The auto-complete endpoints return the best match for the term in the knowledge base.

Sources

Once a source name term or domain is passed to this endpoint, it returns the closest matches to the term or domain searched. This enables you to check which sources are available on the News API inventory or search for source IDs and include them accurately in your queries.

Endpoints

/autocomplete/suggestions/entity-names

/autocomplete/suggestions/sources

/autocomplete/suggestions/entity-types

Parameters

Parameters	Compatible paths	Values	Description
term	entity-names	“string”	Find autocomplete objects that contain the specified value.
	entity-types
type_id	entity-names	“string”	Filter by entity type - for entity disambiguation, when looking for Amazon the company filtering by the organization entity type
name_term	sources	“string”	Searching the source autocomplete by source name.
domain_term	sources	“string domain”	Search for source by source domain
limit	entity-names	INT64	Limits the number of results returned from a request. If not specified it defaults to 25.
	sources
	entity-types

Common workflows

Searching for entities with Autocomplete

Searching sources with autocompletes

Histograms

These parameters enable you to return the distribution of articles over a range of values for your specified parameter.

For more information on Histograms, please read the API Endpoints section of our documentation.

Endpoints

/histograms

Parameters

Parameters	Values	Description
interval.start	"YYYY-MM-DDThh:mm:ssZ"	Set the start data point of histogram intervals.
interval.end	"YYYY-MM-DDThh:mm:ssZ"	Set the end data point of histogram intervals.
interval.width	“INT64”	Set the width of histogram intervals.

Common workflows

Time indexation

Media elements

Many stories contain media such as images or videos as well as text, which can be valuable to some users, depending on what they are building. Some users are only interested in stories with video content to increase click-through rates, whereas other users do not want videos in their results if they are concerned with the end-users loading time, as videos take slightly longer to load on poorer connections.

Quantexa News API allows you to:

Specify whether your results should include these media or not
Specify the number of images or videos your results should include (this can be an exact number, a range, or a minimum or maximum)
Sort your results according to how many images or videos they contain
Specify the format of the media in your stories
Display quantitative trends in media with the histogram endpoint

Amount of media in stories

Using the media. images. count or media. videos. count parameter, you can specify whether the stories returned by your query should contain media and also the number of images and videos in each story.

By setting media.videos.count.min to 1, you are specifying that your query only returns stories with at least one image.
By setting media.videos.count.min to 1 and media.videos.count.max also to 1, you are specifying that you only want results that contain exactly one video.
By setting media.videos.count.max to 0, you are excluding any stories with videos from your results.

Media count

To display stories with more images before stories with fewer images in your results, set the sort_by parameter to media.images.count and the sort_direction parameter to desc. Whenever you use the sort_by parameter, the sort direction will automatically be in descending order (i.e. stories with the most results in the parameter will be shown first). In order to reverse this default, set the sort_by parameter to asc, which will sort the results in ascending order.

Media format & size

It is possible to return or exclude stories that contain images in a specified format by using the media.images.format[] parameter. These can help avoid any technical issues you can foresee with these formats.

The image formats you can use as a parameter are:

BMP
GIF
JPEG
PNG
TIFF
PSD
ICO
CUR
WEBP
SVG

It is also possible to specify maximum and minimum height, width, and content length by appending .min or .max to the following parameters:

media.images.width

media.images.height

media.images.content_length

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameters	Values	Description
media.images.content_length	[* TO *]	Find stories whose image content length are greater than or equal to the specified value.
media.images.count	[* TO *]	Find stories whose number of images is greater than or equal to the specified value.
media.images.height	[* TO *]	Find stories whose height of images is greater than or equal to the specified value.
media.images.width	[* TO *]	Find stories whose width of images is greater than or equal to the specified value.
media.videos.count	[* TO *]	Find stories whose number of videos is greater than or equal to the specified value.
media.images.format	("string", "string", ... "string")	"Find stories whose images format are the specified value. Available values: BMP, GIF, JPEG, PNG, TIFF, PSD, ICO, CUR, WEBP, SVG"

Common workflows

Media elements

Sorting

You can choose how you want your results to be sorted by using the sort_by parameter. This allows you to retrieve the most relevant results of your query first, with relevance based on a value you choose as a parameter.

The sort_by parameter can take one of the following values:

Relevance

Using the relevance value returns the stories that most closely match your search input. The parameter value is relevance.

Recency

Using the recency value gives a higher rank to stories published most recently while still giving weight to your query.

Published datetime stamp

Using published_at as the value will rank your results based only on how recently your returned stores were published.

Web traffic rank

Website traffic refers to the flow of visitors and users who access a website. It encompasses the visitors who land on a website through various means, such as search engines, social media platforms, direct visits, or referral links.

Alexa is a ranking system that ranks websites based on the volume of traffic they have generated over the previous 3 months. The more traffic a website receives, the higher its ranking. For example, Google has a ranking of 1, BBC has a ranking of around 85, and so on. Alexa gives two options to users when seeking the ranking of sites. For more details on how web traffic data works, visit the page Sources and Website traffic rank.

Global ranking - The metric on how popular a website is in the global rankings. Sort results by passing the value source.rankings.alexa.rank on parameter sort_by.
National ranking - The metric on how popular a site is in a specific country. This is available for every country in the world and is accessed by adding the ISO 3166-1 alpha-2 country code to the parameter sort_by with the value pattern source.rankings.alexa.rank.{country}.

Note: * Not all sources contain the rank metadata, meaning that using this parameter value could narrow your search.

Alexa ranking was discontinued by Amazon last year, so the rank we are using is a snapshot from May 2022.

Number of photos

This value allows users to rank results based on the number of photos on the page. The parameter value is media.images.count.

Number of videos

This value allows users to rank results based on the number of videos on the page. The parameter value is media.videos.count.

Keyword boosting

When making a query with multiple keywords, it might be the case that one keyword is more important to your search than others. Boosting enables you to add weight to the more important keyword/keywords so that results mentioning these keywords are given a “boost” to get them higher in the order of the results.

For example, searching ["John", "Frank", "Sarah"] gives equal weight to each term, but ["John", "Frank"^2, "Sarah"] is like saying a mention of “Frank” is twice as important as a mention of “John” or “Sarah”. Stories mentioning “Frank” will, therefore, appear higher in the rank of search results.

Boosting is not the definitive keyword search input. It simply allows the user to specify the preponderant keywords in a list (i.e. if a story contains many mentions of non-boosted searched keywords, it could still be returned ahead of many stories that mention a boosted keyword). Boosting, therefore, does not exclude stories from the results. It only affects the order of returned results.

Boosting only works in a search when there is more than one keyword, as it boosts the weight of a keyword compared to the other keywords being searched.

Sorting direction

Each of the parameters above can sort results by ascending or descending value. This is achieved by entering either asc or desc as a value of the sort_direction parameter. If this parameter is not declared, results will be returned in descending order.

Endpoints

/clusters

/stories

Parameters

Parameters	Values	Description
sort_by	“string”	Specify the parameter by which your results will be sorted. The accepted values are: - `story_count` - `earliest_story` - `latest_story` - `time`
sort_direction	(ascending OR descending)	Specify the sort direction of your results. The accepted values are `asc` and `desc`.

Common workflows

Sorting results

Trends

Quantexa News API’s Trends endpoint allows you to identify the most frequent values for categorical attributes contained in stories, e.g., most frequent entities, concepts, or keywords. This endpoint allows you to set parameters like a time period, a subject category, or an entity, and it will return the most mentioned entities or keywords that are mentioned in relation to your query.

Rather than simply reviewing granular stories. Similar to the Timeseries endpoint, you may be interested in seeing themes and patterns over time that aren't immediately apparent when looking at individual documents. The Trends endpoint allows you to see the most frequently recurring entities, concepts or keywords that appear in articles that meet your search criteria. This enables you to generalize the data and make high-level assertions about content.

For more information on Trends, please read the API Endpoints section of our documentation.

Endpoints

/trends

Parameters

Parameters	Values	Description
field	“string”	Specify the y-axis variable for the histogram.

Common workflows

Trends

Pagination

Quantexa News API returns up to 100 stories per call/page. This workflow shows you how to use the cursor to chain multiple calls together and retrieve more than 100 stories at a time.

Fetching a large number of sorted results: Cursor

The API supports using a cursor to scan through results. In the API, a cursor is a logical concept that doesn't cache any state information on the server. Instead, the sort values of the last story returned to the client are used to compute a next_page_cursor, representing a logical point in the ordered space of sort values. That next_page_cursor can be specified in the parameters of subsequent requests to tell the API where to continue.

Cursor

To use a cursor with the API, specify a cursor parameter with the value of *. This is the same as declaring page=1 to tell the API "start at the beginning of my sorted results," except it also informs the API that you want to use a cursor. The default value of the cursor is * unless you specify otherwise. In addition to returning the top N sorted results (where you can control N using the per_page parameter), the API response will also include an encoded String named next_page_cursor.

You then take the next_page_cursor String value from the response and pass it back to the API as the cursor parameter for your next request. You can repeat this process until you've fetched as many stories as you want or until the next_page_cursor returns match the cursor you've already specified — indicating that there are no more results.

Per Page Attribute

The API supports using a per_page to specify the maximum number of stories per page. This parameter is used to paginate results from a query. The possible value for this parameter is between 1 to 100. The default value is 10 if the per_page parameter is not passed in the query.

Endpoints

/clusters

/stories

/related_stories

/autocompletes

Parameters

Parameters	Values	Description
cursor	(“*” OR “hash string”)	Chain your requests in calls that return more than 100 clusters by supplying the `next_page_cursor` response to your next call. For more information about using the cursor with the News API, take a look at the documentation page.
per_page	int	Specify the maximum number of clusters to be returned by your query. The maximum value is 100, for more than 100 results `cursor` can be used.

Common workflows

Pagination of results

Response control

The response from the Quantexa News API consists of 26 data enrichments nested in JSON objects. For some queries, not all of the 26 enrichments will be required, so with the parameter return[], you can specify only the fields you want to return in your API response. For example, if you only want to return entities, use return[]="entities".

Endpoints

/stories

/related_stories

Parameters

Parameters	Values	Description
return[]	['id', ‘title’, ‘body’, ‘summary’, ‘source’, ‘author’, ‘entities’, ‘keywords’, ‘hashtags’, ‘characters_count', ‘words_count', ‘sentences_count’, ‘paragraphs_count’, ‘categories’, ‘media’, ‘sentiment’, ‘language’, 'published_at’, 'links’]	Specify and limit the return fields on the response object.

Common workflows

Response objects