Parameters by theme


Datetime

The time stamp composition and practical examples of searching for content based on news articles' time indexation.

Date Formatting

The format used is a restricted form of the canonical representation of DateTime in the XML Schema specification (ISO 8601):

YYYY-MM-DDThh:mm:ssZ

  • YYYY is the year.
  • MM is the month.
  • DD is the day of the month.
  • T is a literal 'T' character that indicates the beginning of the time string.
  • hh is the hour of the day as on a 24-hour clock.
  • mm is minutes.
  • ss is seconds.
  • Z is a literal 'Z' character, indicating that this string representation of the date is in UTC.

No time zone can be specified. The String representations of dates are always expressed in Coordinated Universal Time (UTC). Example value:

2022-03-27T13:47:26Z

You can optionally include fractional seconds if you wish, although any precision beyond milliseconds will be ignored. Examples of values with sub-seconds include:

  • 2016-03-27T13:47:26.822Z
  • 2016-03-27T13:47:26.82Z
  • 2016-03-27T13:47:26.8Z

Date Math

The date field types also support date math expressions, which makes it easy to create times relative to fixed moments in time, including the current time, which can be represented using the special value of "NOW".

Date Math Syntax

Date math expressions can do two things: specify a period by adding time units to the current time and round the time to a specified unit. Expressions can be chained and are evaluated left to right.

This represents a point in time two months from now:

NOW+2MONTHS

This is one day ago:

NOW-1DAY

A slash is used to indicate rounding. Below is a point in time yesterday, rounded to the previous hour. If the current time is 15:42:17.2165, the point below is 15:00:00.0000 yesterday:

NOW-1DAY/HOUR

Below is yesterday at 00:00:00.0000AM:

NOW-1DAY/DAY

All the math expressions will work to the millisecond precision.

Date Math Keywords

The supported keywords in Date Math:

Date keywords
NOW
YEAR
MONTH
DAY
HOUR
MINUTE
SECOND

Note: All the keywords above can be passed in their plural form: YEARS, MONTHS, DAYS, HOURS, MINUTES, SECONDS.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter Values Description
published_at [* TO *] Find stories whose published at time is in the specified range.

Common workflows

Time indexation

Sorting results


Languages

The News API offers content in 16 languages. By filtering the language parameter, you can find specific language articles.

Language is a mandatory field, so the language field of all articles available on News API is populated with the prediction made by the language model.

Always supply a language parameter in your search, no matter how many languages you want to search. When no language is passed, the search will default to all languages available to your account plan but without the necessary language-specific filters like stop word removal and stemming. This could result in your search not retrieving all of the relevant stories to your query.

The supported languages and their language codes are listed below. Multilingual support requires an upgraded license key. Contact sales to upgrade your account.

Translations

All articles written in non-English languages available on the News API will also have an English translation. The translation to English is in the response field translations. An in-house machine translation (MT) model translates all news stories available through the API using a proprietary MT model. This model is optimized for speed whilst maintaining high-quality translations across diverse source languages. The model is a “transformer-based” neural machine translation model (NMT), trained on hundreds of millions of examples. An example of a piece of the response from the stories endpoint, which includes translations:

{
  "title": "original title",
  "body": "original body",
  "language": "<non-English language code>",

  "<...>": "<...>",

  "translations": {
    "en": {
      "body": "translated body",
      "title": "translated title"
    }
  }
}

Analyzing multilingual content

All news articles accessible via News API are available in their native language text and machine-translated English text.

All the translated content benefits from all of the enrichment features supported by the News API. Most NLP enrichments are performed on native English or translated English text.

In the case of article summarisation, this is conducted in the original text of some languages and in the translated text of others. See below:

Language Text Used
en, de, fr, it, es, pt Original text
ar, da, fi, nl, fa, ru, sv, tr, zh-cn, zh-tw Translated text (en)

Endpoints

/clusters

/stories

/time_series

/trends

/histograms

Parameters

Parameter Values Description
language ("int", "int", ... "int") Specify the language. It supports ISO 639-1 language codes

Common workflows

Languages


Keywords

These parameters enable you to perform keyword searches in Quantexa News API for articles where specific words or phrases match and retrieve stories that contain those keywords.

Endpoints

/stories

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter Values Description
title ("int", "int", ... "int") Find stories whose title contains a specific keyword. It supports advanced search operators.
body ("int", "int", ... "int") Find stories whose body contains a specific keyword. It supports advanced search operators.
text ("int", "int", ... "int") Find stories whose both title and body contains a specific keyword. It supports advanced search operators.
translations.{language}.title ("int", "int", ... "int") Filter stories translated from a non-English language containing an English-language term in the title.
translations.{language}.body ("int", "int", ... "int") Filter stories translated from a non-English language containing an English-language query term in the body.

Common workflows

Proximity search

Boolean operators

Excluding operator


Entities

What is an entity?

An entity is a real-world thing that is mentioned in a story and then tagged with metadata by Quantexa News API so users can build an accurate picture of what is being talked about in news content. The following data points are applied to each entity:

  • The surface form(s) is the text in the story that mentions the entity.

  • The type of entity it is.

  • A Wikipedia link to that entity's Wikipedia page - if applicable.

  • A Wikidata link to that entity's Wikidata page - if applicable.

  • The sentiment expressed towards it in the story.

  • The indices of the surface forms - the index of the mention(s) of the entity in the story.

  • The prominence of the entity - how prominent the entity is in an article.

  • The frequency of the entity - the number of mentions the entity has in an article.

Why use entities instead of keywords?

Keywords can refer to multiple things, and things can be referred to by multiple keywords. Quantexa News API's Entities recognises and disambiguates real-world people, companies, and things that are mentioned in the news, going beyond keywords to provide far more accurate news analytics data.

Using Entities has two high-level benefits when building your search:

First, when multiple different keywords commonly refer to a single entity, Quantexa News API correctly recognises the entity in each mention. For example, take a look at how the News API recognises the entity “MetLife,” even when different names for the company are mentioned:

Sample 1: "Shares in Metropolitan Life Insurance fell sharply this morning." Sample 2: “MetLife announces new insurance offerings."
Surface Form "Metropolitan Life Insurance" "MetLife"
Entity Name MetLife MetLife
Entity Type Business, Organization Business, Organization
Wikipedia URL MetLife MetLife

Second, the entities model disambiguates mentions for you: when a single keyword can refer to multiple entities, the News API will consider the rest of the document to make an accurate prediction about which thing is being referred to. As an example, take the following two sentences mentioning the keyword “square” and see how the News API will recognise each as a different entity and how it returns some key information:

Sample 1: “Protests commenced in the town square.” Sample 2: “Square was founded by Jack Dorsey.”
Surface Form "square" "Square"
Entity Name square Square Inc.
Entity Type Business, Organization Organization
Wikipedia URL None Square Inc.
Wikidata URL None Square Inc.

Entity types

Just as Square and MetLife above have "Business" and "Organization" as their entity types, every entity recognized by Quantexa News API has a type.

These types can be extracted from Wikipedia & Wikidata, or, where that is not applicable, they can be predicted on the fly by the News API. For example, the New York Stock Exchange entity has the type "Stock_exchange," extracted from Wikipedia & Wikidata, but the entity "Jeremy Draper" has the type "Human", even though it doesn't have a Wikipedia page.

There are two ways to use entity types to build intelligent searches - simple searches and enhanced searches.

Entity types are structured in a parent/child relationship, with almost all top-level entity types having child entity types.

Top-level entity types

Currency Location Human Organization
Product_(business) Profession Technology Risk
Retail Regulation_(European_Union)

Child entity types of organisation type:

Advocacy_group Bank Bank_holding_company Brick_and_mortar
Business Certificate_authority Civil_service Commercial_bank
Community Company Conglomerate_(company) Conservation_authority_(Ontario,_Canada)
Consumer_organization Corporate_group Corporation Credit_bureau
Deliberative_assembly Educational_organization Emergency_service Environmental_organization
Financial_institution Government Holding_company Investment_banking
Investment_company Law_commission Law_enforcement_organization Local_federation
Local_government National_research_and_education_network Newspaper Nonprofit_organization
Parlement Political_organisation Private-equity_firm Privately_held_company
Public_company Ruling_party Social_movement_organization Standards_organization
Stock_exchange Subsidiary Technology_company Think_tank

Child entity types of geographic location type:

City Country Location Island_country
Sovereign_state State_(polity) U.S._state

Child entity types of risk type:

Business risks Endangerment External risk Financial risk
Operational_risk Vulnerability

Child entity types of product type:

Software Software_as_a_service Stock_market_index

Quantexa News API allows enhanced queries on the entity object, enabling you to specify multiple conditions for a single entity to meet. For example, you can specify stories that mention an entity "Square" when that entity is also of the type "Organization", or you can return stories where a specific entity was mentioned in a negative tone.

This is done by supplying a nested query to the aql parameter, which accepts entities:{{ }} as a value, in which you can supply a list of parameters in a Lucene-based syntax for an entity to meet. You can use the parameters below.

AQL Parameter Description
element The part of the story the entity should be mentioned in (accepts "title" or "body").
surface_forms A specific form of the entity ("Apple" OR "Apple Computer").
id The entity's ID on the Knowledge Base.
links.wikipedia The entity's Wikipedia link.
links.wikidata The entity's Wikidata link.
sentiment The sentiment expressed about the entity.
stock_ticker The entity's stock ticker.
overall_prominence The prominence of the entity within an article. A value ranging from 0 to 1, where 0 indicates no article prominence and 1 indicates very high article prominence.
frequency The number of times an entity is mentioned in an article title or body. It should be used in conjunction with the element parameter.
overall_frequency The number of times an entity is mentioned in an article.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter Subfield Values Description
entities id {{id:("string", "string", ... "string")}} Find stories based on the specified entities' id in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities links.wikipedia {{links.wikipedia:"url string"}} Find stories based on the specified entities Wikipedia URL in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities stock_ticker {{stock_ticker:"string"}} Find stories based on the specified stock tickers of entities in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities surface_forms.text {{surface_forms.text:("string")}} Find stories based on the specified entities' surface form text in stories. When querying for entities, we recommend that users search using nested queries. You can learn more about these queries on the Common Workflows page.
entities overall_prominence {{overall_prominence:[* TO *]}} Find stories based on the prominence of the entity in the article.
entities element {{element:(title OR body)}} Specify wether the entity search is to be performed on the title or body of the article.

Common workflows

Entities

Locations


Categories taxonomy

All articles on Quantexa News API are enriched with categories and industry taxonomy tags. The tagged content helps to categorize and organize large volumes of content, making it easier to search, filter, and understand the information within the documents.

The classifiers are capable of classifying content into four taxonomies. The complete list of taxonomies included with Quantexa News API is outlined in the table below.

Taxonomy Supported Languages Number of classes Levels of depth Commonly used for Taxonomy ID
Smart Tagger (Aylien categories) en 2998 6 News articles, Blog posts aylien
Smart Tagger (Industries) en 1496 4 News articles, Blog posts industries
IPTC Subject Codes en 1400 3 News articles, Blog posts iptc-subjectcode
IAB QAG en 392 2 Websites, Advertisement iab-qag

The supported taxonomies are made up of categories and subcategories, or parent categories and child categories, for example, with Football being a child category of Sport.

It’s standardised into a tree-like structure, allowing you to traverse from child to parent categories recursively.

Taxonomy lifecycle

Taxonomy tags for categories and industries are never removed from our catalogue and taxonomy models.

Over time, new tags are added and appended to the current catalogue.

Smart Tagger

The Smart Tagger is Quantexa News API’s most granular article tagging feature that leverages state-of-the-art classification models built using a vast collection of manually tagged news articles based on domain-specific industry and topical category taxonomies. With a taxonomy of ~3000 topical categories and ~1500 industries, Smart Tagger classifies articles with high precision, making it easier for users to filter for articles most relevant to them and their use cases. Tags can be passed as IDs or labels.

Smart Tagger Categories

Smart Tagger’s Categories taxonomy is a classification system that helps categorize news articles into broader thematic areas. It allows users to classify news stories into high-level topics such as business, politics, and technology, as well as multiple sub-categories in the parent/child hierarchy. This taxonomy serves as a useful tool for organizing and filtering news articles based on their granular subject matter, enabling easier access to relevant information within specific content categories.

Examples of Smart Tagger Categories in articles:

{'categories': [{...},
                 'score': 0.39},
                {'id': 'ay.sports',
                 'label': 'Sports',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports'},
                 'score': 1},
                {'id': 'ay.sports.rugby',
                 'label': 'Rugby',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.team'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.rugby'},
                 'score': 1},
                {'id': 'ay.sports.team',
                 'label': 'Team Sports',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.sports.team'},
                 'score': 1}]}

Smart Tagger Industries

Smart Tagger’s Industries taxonomy is a classification system designed to categorize news articles and content into various industry-related topics. This taxonomy aids in organizing and retrieving news content relevant to particular sectors, making it a valuable tool for those who are looking to access and analyze news data within specific industry contexts.

Examples of Smart Tagger Industries in articles:

{'industries': [{'id': 'in.tech',
                 'label': 'Technology',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech'},
                 'score': 0.7},
                {'id': 'in.tech.appsoft',
                 'label': 'Application Software',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.software'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.appsoft'},
                 'score': 0.7},
                {'id': 'in.tech.software',
                 'label': 'Software',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/industries/in.tech.software'},
                 'score': 0.7}]}

Score

The score is the level of confidence for the taxonomy prediction. Each time an article is linked to a taxonomy tag, a score between 0 and 1 is applied to that tagging, indicating how relevant the industry or category is to the article. The higher the score, the more relevant the tag is to the article.

Taxonomy hierarchy

Documents tagged with a child node are tagged with all parent nodes, too.

If searching for a top-level node, for example, label "Adverse Events", it will return articles tagged with this label and its child nodes.

IPTC Taxonomy

The IPTC (International Press Telecommunications Council) taxonomy is a standardized classification system used in the media industry to tag and categorize news content. It consists of predefined categories, such as subjects, events, and genres, that help news organizations and content providers uniformly label their articles and images with relevant metadata.

Examples of IPTC taxonomy in articles:

{'categories': [{...},
                {'id': '01000000',
                 'label': 'arts, culture and entertainment,
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01000000'},
                 'score': 0.58},
                {'id': '01005000',
                 'label': 'cinema',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01000000'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005000'},
                 'score': 0.58},
                {'id': '01005001',
                 'label': 'film festival',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005000'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iptc-subjectcode/01005001'},
                 'score': 0.58},
                {'id': 'ay.culture',
                 'label': 'Culture, Entertainment and the Arts',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture'},
                 'score': 1},
                {'id': 'ay.culture.film',
                 'label': 'Film',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/aylien/ay.culture.film'},
                 'score': 1}]}

IAB Taxonomy

The IAB (Interactive Advertising Bureau) taxonomy is a standardized system used in the digital advertising industry to categorize online content, advertising campaigns, and audience segments. It provides a structured framework for classifying digital advertising assets and defining audience interests and demographics.

Examples of IAB taxonomy in articles:

{'categories': [{'id': 'IAB1',
                 'label': 'Arts & Entertainment',
                 'links': {'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1'},
                 'score': 0.21},
                {'id': 'IAB1-5',
                 'label': 'Movies',
                 'links': {'parents': ['https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1'],
                           'self': 'https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB1-5'},
                 'score': 0.36},
                {...}]}

These parameters enable you to find articles categorized by Quantexa News API’s category taxonomies, including our proprietary Smart Tagger, which includes over 3,000 topical categories and 1,500 industries. We also support IPTC and IAB category taxonomies. For more information on how to use our category taxonomies, refer to Common Workflows.

Endpoints

/clusters

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter Subfield Values Description
categories taxonomy {{taxonomy:(aylien OR iptc-subjectcode OR iab-qag}} Define the type of taxonomy for the rest of the categories query, is available for our standard IPTC and IAB category taxonomies.
industries This parameter is not available in flat search. However, you can filter for articles containing specific industries using AQL.
categories id {{id:("string", "string", ... "string")}} Find stories by categories id. Available for standard IPTC and IAB category taxonomies.
categories label {{label:("string", "string", ... "string")}} Find stories by categories label - An alternative to the ID.
categories score {{score:[* TO *]}} Filter the confidence score of the model prediction.

Common workflows

Category and Industry taxonomies


Sources

Quantexa News API aggregates and enriches news content from approximately 90,000 global sources.

Every article comes with its source metadata, which enables you to find articles from specific sources or filter out noise and irrelevant sources.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameter Values Description
source.id (int, int, int, ...) Filter stories from publisher sources that the ID is the specified value.
source.name ("string, ""string," "string,"...) Filter stories from publisher sources that the name is the specified value.
source.domain ("string ", "string", "string", ...) Filter stories from publisher sources that the website domain is the specified values.
source.locations.country ("string ", "string", "string", ...) Filter stories from a publisher source located or headquartered in the specified country values. It supports ISO 3166-1 alpha-2 country codes.
source.locations.state ("string ", "string", "string", ...) Filters stories from a publisher source located or headquartered in the specified state/province values.
source.locations.city ("string ", "string", "string", ...) Filters stories from a publisher source located or headquartered in the specified city values.
source.scopes.country ("string ", "string", "string", ...) Filter stories from publisher source scopes are in the specified country values. It supports ISO 3166-1 alpha-2 country codes.
source.scopes.state ("string ", "string", "string", ...) Filter stories from publisher source scopes are in the specified state/province values.
source.scopes.city ("string ", "string", "string", ...) Filter stories from publisher source scopes are in the specified city values.
source.scopes.level ("iternational", "national", "city") Filter stories from a publisher source scope are the specified level values.
Available values: international, national, local
source.rankings.alexa.country ("string ", "string", "string", ...) Filter stories from publisher sources whose Alexa rank is in the specified country value. It supports ISO 3166-1 alpha-2 country codes.
source.rankings.alexa.rank [0 TO *] Filter stories from publisher sources whose Alexa rank is in the specified range.

Common workflows

Locations

Website treaffic rank


Authors

Stories can be found in Quantexa News API based on the author. You can search for the author using either the author ID or the author’s name.

Note: The author metadata is not always populated. This is because either there's no associated author mentioned, or it’s not explicit on the page.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameters Values Description
author.id (int, int, ... int) Filter content created by a specific individual or a set of authors by providing the author's unique identifier ID.
author.name ("string", "string", ... "string") Filter content created by a specific individual or a set of authors by provided author's name.

Common workflows

Authors


Clusters

Quantexa News API provides access to millions of news stories from over 90,000 sources across the world. Clustering groups these stories into clusters based on the real-world events they represent.

A cluster is a collection of news stories that all refer to the same real-world event. For example, multiple stories referring to a specific company’s earnings will appear in the same cluster, just as multiple stories about a single road accident will.

Lifecycle of a cluster

A newly published story is compared with representative stories of other clusters to see if a cluster already exists for this event. This is achieved by converting the new story’s body to a vector, which is compared against the other story vectors. If the story is found to be similar to other stories (i.e. having a small vector distance), it will be added to that cluster. If the new story is not similar to any existing stories (i.e. having a large vector distance), a new cluster is created containing that single story. Thus, a new cluster is born and subsequently grows with every new, related story that is published. This process happens in real-time, i.e. within an average of 15 minutes of Quantexa News API receiving the story from the publisher.

If no stories have been added to a cluster within a maximum two-week (14 days) period, the cluster is “frozen”. No more stories can be added to the cluster after that point. This cluster lifecycle (creation, growth, decline and freeze) repeats for each and every cluster.

Can an article be in more than one cluster?

No. A story only belongs to the most appropriate, single cluster, and this remains the case through all stages of the cluster lifecycle. A story cannot move from one cluster to another.

Are all stories added to clusters?

No. Stories shorter than 400 characters long on the body are not clusterable. The cluster field on these stories are empty.

The Cluster Object

A cluster object is a type of JSON object that provides a cluster’s ID along with metadata about the stories associated with it.

A cluster has the following properties:

  • Each cluster has a unique ID in the News API

  • A cluster can have one or more stories associated with it

  • A story will always belong to just one cluster.

  • The relationship between the story and cluster does not change - it will not be reassigned to another cluster at a later time.

Examples of clusters in articles from the clusters endpoint JSON API response:

{
  "cluster_count": 2042945,
  "clusters": [
    {
      "id": 4992716,
      "time": "2019-07-20T07:16:03Z",
      "story_count": 26488,
      "earliest_story": "2019-07-20T07:16:03Z",
      "latest_story": "2019-08-03T07:41:09Z",
      "representative_story": {
        "id": 21466483,
        "title": "Analysts Offer Predictions for A. O. Smith Corp’s Q3 2019 Earnings (NYSE:AOS)",
        "permalink": "https://www.tickerreport.com/banking-finance/4497288/analysts-offer-predictions-for-a-o-smith-corps-q3-2019-earnings-nyseaos.html",
        "published_at": "2019-08-03T07:15:26Z"
      },
      "location": {
        "country": "US"
      }
    }
  ],
  "next_page_cursor": "<string to use in pagination of results>"
}

Endpoints

/clusters

Parameters

Parameters Values Description
id (INT64 OR INT64 OR … OR INT64) Specific clusters by their id which is specified with a list of int64 values.
location.country (“string”, “string”, …, “string”) Specify clusters that refer to events in specific countries. It supports ISO 3166-1 alpha-2 country codes.
story_count.max “INT64” Specify clusters that have a maximum number of stories associated with them.
story_count.min “INT64” Specify the minimum number stories that retrieved clusters should be associated with. Default value is 2.
time.end "YYYY-MM-DDThh:mm:ssZ" Retrieve clusters for which the associated event’s time is before a specified time stamp.
time.start "YYYY-MM-DDThh:mm:ssZ" This parameter allows you to retrieve clusters for which the associated event’s time is after a specified time stamp.
earliest_story.end "YYYY-MM-DDThh:mm:ssZ" Specify clusters whose earliest story was published before a specified time stamp.
earliest_story.start "YYYY-MM-DDThh:mm:ssZ" Specify clusters whose earliest story was published after a specified time stamp.
latest_story.end "YYYY-MM-DDThh:mm:ssZ" Specify clusters whose latest story was published before a specified time stamp
latest_story.start "YYYY-MM-DDThh:mm:ssZ" Specify clusters whose latest story was published after a specified time stamp.

Common workflows

Clusters


Stories metadata

These parameters enable you to filter metadata from stories, e.g., a single news article or a specific URL, or retrieve specific clusters by their ID.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameters Values Description
id (INT64 OR INT64 OR … OR INT64) Retrieve specific clusters by their id which is specified with a list of int64 values.
links.permalink “URL string” Find stories with a specified URL.
clusters (INT64 OR INT64 OR … OR INT64) Filter stories associated with a specific cluster (currently accepts one cluster per search). Clustering requires an Advanced or Enterprise license key. Start a free trial or contact sales to upgrade your account.
source.links_in_count.min INT64 Find stories from sources whose Links in the count are greater than or equal to the specified value.
source.links_in_count.max INT64 Find stories from sources whose Links in the count are less than or equal to the specified value.

Common workflows

Website traffic rank


Sentiment analysis

There are two types of sentiment analysis in Quantexa News API: document-level and entity-level.

All stories in Quantexa News API contain sentiment predictions for:

  • Text in the title and body of the article. This is a document-level sentiment analysis

  • Text in each sentence where entities have been recognised. This is an entity-level sentiment analysis

Fields which apply to both document-level and entity-level sentiment:

Polarity

Polarity is the sentiment category predicted by the model: positive, negative, or neutral.

It can be found in the field "polarity" within the document-level sentiment analysis, or for each mention of the entity within the article under the object "entities".

Score

A sentiment score is the confidence of the prediction made by the sentiment model.

Predictions are made in each article element for positive, negative and neutral sentiments. The prediction with the highest confidence score wins and is populated in the field "score". It ranges from 0 to 1.

Document-level sentiment analysis

Document-level sentiment analysis is the prediction of the sentiment expressed in the title and body of the story. It can be found in the object "sentiment". Example:

{
 <...>
 "sentiment": {"body": {"polarity": "negative", "score": 0.96},
                "title": {"polarity": "positive", "score": 0.45}
 <...>
}
Entity-level sentiment analysis

Sentiment analysis of the entities recognised in the text is broken down by body and title, but also for each mention of the entity. Each object contains the sentiment polarity, score, frequency and mention index. Take a look at the following example:

{
  <...>
  "entities": {"body": {"sentiment": {"confidence": 0.75, "polarity": "neutral"},
                        "surface_forms": [{"frequency": 5,
                                           "mentions": [{"index": {"end": 5,   "start": 0},   
                                                         "sentiment": {"confidence": 0.77, 
                                                         "polarity": "positive"}},
                                                        {"index": {"end": 365, "start": 360}, 
                                                         "sentiment": {"confidence": 0.91, 
                                                         "polarity": "neutral"}},
                                                        {"index": {"end": 662, "start": 657}, 
                                                         "sentiment": {"confidence": 0.83, 
                                                         "polarity": "positive"}},
                                                        {"index": {"end": 795, "start": 790}, 
                                                         "sentiment": {"confidence": 0.51, 
                                                         "polarity": "neutral"}},
                                                        {"index": {"end": 832, "start": 827}, 
                                                         "sentiment": {"confidence": 0.83, 
                                                         "polarity": "neutral"}}],
                                           "text": "Apple"}]},
               "id": "Q312",
               "links": {"wikidata": "https://www.wikidata.org/wiki/Q312",
                         "wikipedia": "https://en.wikipedia.org/wiki/Apple_Inc."},
               "overall_frequency": 6,
               "overall_prominence": 0.98,
               "overall_sentiment": {"confidence": 0.63, "polarity": "neutral"},
               "stock_tickers": ["AAPL"],
               "title": {"sentiment": {"confidence": 0.51, "polarity": "neutral"},
                         "surface_forms": [{"frequency": 1,
                                            "mentions": [{"index": {"end": 5, "start": 0}, 
                                                          "sentiment": {"confidence": 0.51, 
                                                          "polarity": "neutral"}}],
                                            "text": "Apple"}]},
               "types": ["Business", "Organization"] }
 <...>
}

Overall sentiment

The overall sentiment is the predominant sentiment towards the entity in the article. It can be found in the field "overall_sentiment" inside the object "entities".

A brief explanation of how the overall sentiment is calculated. First, the model performs an individual sentiment prediction for each mention of the entity in the article.

Once individual predictions are made, the average confidence score is calculated for each polarity, and there is one overall score per polarity for an entity. The polarity with the highest score is selected as the overall sentiment for that entity in the article. If the scores are tied, the model favours positive over negative and negative over neutral.

This is done for the entity in the title and the entity in the body. This means the model calculates the overall entity sentiment for both the body and title by simply selecting the polarity with the higher score again.

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameters Values Description
sentiment.title.polarity (positive OR neutral OR negative) Find stories whose title sentiment is the specified value.
sentiment.body.polarity (positive OR neutral OR negative) Find stories whose body sentiment is the specified value.

Common workflows

Sentiment analysis


Timeseries

This parameter enables you to query for story volume over a date range.

For more information on Timeseries, please read the API Endpoints section of our documentation.

Endpoints

/time_series

Parameters

Parameters Values Description
period +{int}{time unit} The size of each date range is expressed as an interval to be added to the lower bound. It supports Date Math Syntax. Valid options are + following an integer number greater than 0 and one of the Date Math keywords. e.g. +1DAY, +2MINUTES and +1MONTH.

Common workflows

Time indexation


Autocomplete

the News API’s Autocomplete endpoint is a helper endpoint that enables you to find specific entities in our knowledge base, which consists of over 5 million entities, as well as sources based on name or domain from over 90,000 publishers.

Entities

The latest version of Autocomplete allows enhanced queries on the entity object, enabling you to specify multiple conditions for a single entity to meet. For example, you can specify stories that mention an entity by its ID and entity type. The addition of extra metadata helps with the manual disambiguation of entities.

You can pass terms to the auto-complete endpoints to find entities and their metadata (e.g., ID, Wikipedia links, and type). The auto-complete endpoints return the best match for the term in the knowledge base.

Sources

Once a source name term or domain is passed to this endpoint, it returns the closest matches to the term or domain searched. This enables you to check which sources are available on the News API inventory or search for source IDs and include them accurately in your queries.

Endpoints

/autocomplete/suggestions/entity-names

/autocomplete/suggestions/sources

/autocomplete/suggestions/entity-types

Parameters

Parameters Compatible paths Values Description
term entity-names “string” Find autocomplete objects that contain the specified value.
entity-types
type_id entity-names “string” Filter by entity type - for entity disambiguation, when looking for Amazon the company filtering by the organization entity type
name_term sources “string” Searching the source autocomplete by source name.
domain_term sources “string domain” Search for source by source domain
limit entity-names INT64 Limits the number of results returned from a request. If not specified it defaults to 25.
sources
entity-types

Common workflows

Searching for entities with Autocomplete

Searching sources with autocompletes


Histograms

These parameters enable you to return the distribution of articles over a range of values for your specified parameter.

For more information on Histograms, please read the API Endpoints section of our documentation.

Endpoints

/histograms

Parameters

Parameters Values Description
interval.start "YYYY-MM-DDThh:mm:ssZ" Set the start data point of histogram intervals.
interval.end "YYYY-MM-DDThh:mm:ssZ" Set the end data point of histogram intervals.
interval.width “INT64” Set the width of histogram intervals.

Common workflows

Time indexation


Media elements

Many stories contain media such as images or videos as well as text, which can be valuable to some users, depending on what they are building. Some users are only interested in stories with video content to increase click-through rates, whereas other users do not want videos in their results if they are concerned with the end-users loading time, as videos take slightly longer to load on poorer connections.

Quantexa News API allows you to:

  • Specify whether your results should include these media or not

  • Specify the number of images or videos your results should include (this can be an exact number, a range, or a minimum or maximum)

  • Sort your results according to how many images or videos they contain

  • Specify the format of the media in your stories

  • Display quantitative trends in media with the histogram endpoint

Amount of media in stories

Using the media. images. count or media. videos. count parameter, you can specify whether the stories returned by your query should contain media and also the number of images and videos in each story.

  • By setting media.videos.count.min to 1, you are specifying that your query only returns stories with at least one image.

  • By setting media.videos.count.min to 1 and media.videos.count.max also to 1, you are specifying that you only want results that contain exactly one video.

  • By setting media.videos.count.max to 0, you are excluding any stories with videos from your results.

Media count

To display stories with more images before stories with fewer images in your results, set the sort_by parameter to media.images.count and the sort_direction parameter to desc. Whenever you use the sort_by parameter, the sort direction will automatically be in descending order (i.e. stories with the most results in the parameter will be shown first). In order to reverse this default, set the sort_by parameter to asc, which will sort the results in ascending order.

Media format & size

It is possible to return or exclude stories that contain images in a specified format by using the media.images.format[] parameter. These can help avoid any technical issues you can foresee with these formats.

The image formats you can use as a parameter are:

  • BMP

  • GIF

  • JPEG

  • PNG

  • TIFF

  • PSD

  • ICO

  • CUR

  • WEBP

  • SVG

It is also possible to specify maximum and minimum height, width, and content length by appending .min or .max to the following parameters:

media.images.width

media.images.height

media.images.content_length

Endpoints

/stories

/time_series

/related_stories

/trends

/histograms

Parameters

Parameters Values Description
media.images.content_length [* TO *] Find stories whose image content length are greater than or equal to the specified value.
media.images.count [* TO *] Find stories whose number of images is greater than or equal to the specified value.
media.images.height [* TO *] Find stories whose height of images is greater than or equal to the specified value.
media.images.width [* TO *] Find stories whose width of images is greater than or equal to the specified value.
media.videos.count [* TO *] Find stories whose number of videos is greater than or equal to the specified value.
media.images.format ("string", "string", ... "string") "Find stories whose images format are the specified value. Available values: BMP, GIF, JPEG, PNG, TIFF, PSD, ICO, CUR, WEBP, SVG"

Common workflows

Media elements


Sorting

You can choose how you want your results to be sorted by using the sort_by parameter. This allows you to retrieve the most relevant results of your query first, with relevance based on a value you choose as a parameter.

The sort_by parameter can take one of the following values:

Relevance

Using the relevance value returns the stories that most closely match your search input. The parameter value is relevance.

Recency

Using the recency value gives a higher rank to stories published most recently while still giving weight to your query.

Published datetime stamp

Using published_at as the value will rank your results based only on how recently your returned stores were published.

Web traffic rank

Website traffic refers to the flow of visitors and users who access a website. It encompasses the visitors who land on a website through various means, such as search engines, social media platforms, direct visits, or referral links.

Alexa is a ranking system that ranks websites based on the volume of traffic they have generated over the previous 3 months. The more traffic a website receives, the higher its ranking. For example, Google has a ranking of 1, BBC has a ranking of around 85, and so on. Alexa gives two options to users when seeking the ranking of sites. For more details on how web traffic data works, visit the page Sources and Website traffic rank.

  • Global ranking - The metric on how popular a website is in the global rankings. Sort results by passing the value source.rankings.alexa.rank on parameter sort_by.

  • National ranking - The metric on how popular a site is in a specific country. This is available for every country in the world and is accessed by adding the ISO 3166-1 alpha-2 country code to the parameter sort_by with the value pattern source.rankings.alexa.rank.{country}.

Note: * Not all sources contain the rank metadata, meaning that using this parameter value could narrow your search.

  • Alexa ranking was discontinued by Amazon last year, so the rank we are using is a snapshot from May 2022.

Number of photos

This value allows users to rank results based on the number of photos on the page. The parameter value is media.images.count.

Number of videos

This value allows users to rank results based on the number of videos on the page. The parameter value is media.videos.count.

Keyword boosting

When making a query with multiple keywords, it might be the case that one keyword is more important to your search than others. Boosting enables you to add weight to the more important keyword/keywords so that results mentioning these keywords are given a “boost” to get them higher in the order of the results.

For example, searching ["John", "Frank", "Sarah"] gives equal weight to each term, but ["John", "Frank"^2, "Sarah"] is like saying a mention of “Frank” is twice as important as a mention of “John” or “Sarah”. Stories mentioning “Frank” will, therefore, appear higher in the rank of search results.

Boosting is not the definitive keyword search input. It simply allows the user to specify the preponderant keywords in a list (i.e. if a story contains many mentions of non-boosted searched keywords, it could still be returned ahead of many stories that mention a boosted keyword). Boosting, therefore, does not exclude stories from the results. It only affects the order of returned results.

Boosting only works in a search when there is more than one keyword, as it boosts the weight of a keyword compared to the other keywords being searched.

Sorting direction

Each of the parameters above can sort results by ascending or descending value. This is achieved by entering either asc or desc as a value of the sort_direction parameter. If this parameter is not declared, results will be returned in descending order.

Endpoints

/clusters

/stories

Parameters

Parameters Values Description
sort_by “string” Specify the parameter by which your results will be sorted. The accepted values are: - story_count - earliest_story - latest_story - time
sort_direction (ascending OR descending) Specify the sort direction of your results. The accepted values are asc and desc.

Common workflows

Sorting results


Quantexa News API’s Trends endpoint allows you to identify the most frequent values for categorical attributes contained in stories, e.g., most frequent entities, concepts, or keywords. This endpoint allows you to set parameters like a time period, a subject category, or an entity, and it will return the most mentioned entities or keywords that are mentioned in relation to your query.

Rather than simply reviewing granular stories. Similar to the Timeseries endpoint, you may be interested in seeing themes and patterns over time that aren't immediately apparent when looking at individual documents. The Trends endpoint allows you to see the most frequently recurring entities, concepts or keywords that appear in articles that meet your search criteria. This enables you to generalize the data and make high-level assertions about content.

For more information on Trends, please read the API Endpoints section of our documentation.

Endpoints

/trends

Parameters

Parameters Values Description
field “string” Specify the y-axis variable for the histogram.

Common workflows

Trends


Pagination

Quantexa News API returns up to 100 stories per call/page. This workflow shows you how to use the cursor to chain multiple calls together and retrieve more than 100 stories at a time.

Fetching a large number of sorted results: Cursor

The API supports using a cursor to scan through results. In the API, a cursor is a logical concept that doesn't cache any state information on the server. Instead, the sort values of the last story returned to the client are used to compute a next_page_cursor, representing a logical point in the ordered space of sort values. That next_page_cursor can be specified in the parameters of subsequent requests to tell the API where to continue.

Cursor

To use a cursor with the API, specify a cursor parameter with the value of *. This is the same as declaring page=1 to tell the API "start at the beginning of my sorted results," except it also informs the API that you want to use a cursor. The default value of the cursor is * unless you specify otherwise. In addition to returning the top N sorted results (where you can control N using the per_page parameter), the API response will also include an encoded String named next_page_cursor.

You then take the next_page_cursor String value from the response and pass it back to the API as the cursor parameter for your next request. You can repeat this process until you've fetched as many stories as you want or until the next_page_cursor returns match the cursor you've already specified — indicating that there are no more results.

Per Page Attribute

The API supports using a per_page to specify the maximum number of stories per page. This parameter is used to paginate results from a query. The possible value for this parameter is between 1 to 100. The default value is 10 if the per_page parameter is not passed in the query.

Endpoints

/clusters

/stories

/related_stories

/autocompletes

Parameters

Parameters Values Description
cursor (“*” OR “hash string”) Chain your requests in calls that return more than 100 clusters by supplying the next_page_cursor response to your next call. For more information about using the cursor with the News API, take a look at the documentation page.
per_page int Specify the maximum number of clusters to be returned by your query. The maximum value is 100, for more than 100 results cursor can be used.

Common workflows

Pagination of results


Response control

The response from the Quantexa News API consists of 26 data enrichments nested in JSON objects. For some queries, not all of the 26 enrichments will be required, so with the parameter return[], you can specify only the fields you want to return in your API response. For example, if you only want to return entities, use return[]="entities".

Endpoints

/stories

/related_stories

Parameters

Parameters Values Description
return[] ['id', ‘title’, ‘body’, ‘summary’, ‘source’, ‘author’, ‘entities’, ‘keywords’, ‘hashtags’, ‘characters_count', ‘words_count', ‘sentences_count’, ‘paragraphs_count’, ‘categories’, ‘media’, ‘sentiment’, ‘language’, 'published_at’, 'links’] Specify and limit the return fields on the response object.

Common workflows

Response objects