Getting Started

Text Analysis by AYLIEN is an Extension made up of different Operators that allow you to analyze and make sense of textual data from within RapidMiner. The different Operators contained in "Text Analysis by AYLIEN" include the following:

  • Sentiment Analysis
  • Entity Extraction
  • Language Detection
  • Hashtag Suggestion
  • Related Phrases

You can read more about each feature on RapidMiner's Marketplace page.

Aylien rm logos
Install Text Analysis for RapidMiner

Installation

Text Analysis by AYLIEN can be found with the rest of the Text Processing Extensions in the RapidMiner marketplace, which you can navigate to from the Help > Marketplace (Updates and Extensions) menu. Alternatively make your way to the marketplace entry for the Extension on the Web and download or install it from there.

Aylien rm 2

Aylien rm 3

Once you've installed the Text Analysis extension, you can find its Operators from within RapidMiner by simply searching for AYLIEN. Here you'll see the list of Operators that were installed as part of the Text Analysis Extension:

Aylien rm 4

Credentials and Connecting

The first thing we need to do before we can start analyzing text, is make sure we're connected to the AYLIEN API. You can configure your connections under settings and Manage Connections.

Aylien rm 5

To connect to the AYLIEN API you need an App ID and API Key. If you haven't already got yours you can grab one for free here.

Create a new connection of type "AYLIEN Text Analysis Connection", add your credentials (App ID and API Key) as shown below and hit "Save all changes".

Aylien rm 6

Aylien rm 7

Now we're pretty much good to go, and can reuse the connection we just created in all Text Analysis Operators.

Test your Setup

Finally, to make sure everything is working, let's walk through a basic Sentiment Analysis procedure:

As shown below the first thing we do is add an Analyze Sentiment (Document) Operator to our Process:

Aylien rm 8

In this case, we've also added a Create Document Operator as shown in the screenshot below, where we'll type or paste the text we want to analyze:

Aylien rm 9

Add the text you want to analyze to the Document, for instance "I love puppies":

Aylien rm 10

Now we've completed the bare bones of our Process to analyze the sentiment of a piece of text we've written in a Document.

To run the Process, hit Run, but before you do, make sure you connect your Operators to each other and the results ports and also select the Connection you created earlier, in the Analyze Sentiment Operator.

Your results will be displayed in a results tab, similar to below:

Aylien rm 11

Now that you've got the Extension up and running, please proceed to our more in-depth examples in the next sections.

Text Analysis Tutorials

To help you get started with the Text Analysis Extension we’ve put together some simple step-by-step guides and sample processes which showcase how the Extension can be used.

Tutorial 1: Twitter Sentiment Analysis

In this tutorial we’re going to walk you through using the Text Analysis by AYLIEN Extension for RapidMiner, to collect and analyze tweets. If you’re new to RapidMiner, or it’s your first time using the Text Analysis Extension you should first read our Getting Started tutorial which takes you through the installation process. Also, If you haven’t got an AYLIEN account, which you’ll need to use the Extension, you can grab one here.

So, here’s what we’re going to do:

  1. Collect tweets using the Twitter Search Operator
  2. Analyze their Sentiment using the Analyze Sentiment Operator
  3. Assign the tweets to different categories using the Categorize Operator
  4. Visualize our results and make them more consumable and understandable

You can download the finished Process from our Sample Processes page.

Step 1. Gathering tweets

Create a new Process in RapidMiner and add a Search Twitter Operator. Build your desired search as you would using the Twitter search API. You can see from the screenshot below we’re searching for tweets containing the keyword "Samsung". We’ve cleaned up our search a little by removing retweets (-rt) and links (-http). We’ve also restricted the number of tweets to collect to 20 and decided we only want to see English tweets by adding "en" in the language parameter. We’ve also indicated that we want only recent or popular tweets to be returned using the Result type parameter.

Aylien rm 12

Let's have a look at what kind of results our search returns. Once you hit Run (don’t forget to connect your Operators) the results from the Twitter search are displayed in an ExampleSet tab, like the one below:

Aylien rm 13

Step 2. Analyzing tweets for Sentiment

So now we have a collection of 20 tweets stored in an ExampleSet that are ready to be further analyzed. The first thing we’re going to do from an analysis point of view is, try and determine what the Sentiment of each tweet is, i.e. whether they are Positive, Negative or Neutral.

We do this by adding the Analyze Sentiment Operator to our Process and selecting "text" as our "Input attribute" on the right hand side, as shown in the screenshot below:

Aylien rm 14

So now we have a relatively simple Twitter Sentiment Analysis Process that collects tweets about "Samsung" and analyzes them to determine the Polarity (i.e. positive, neutral or negative) and Subjectivity (i.e. subjective or objective) of each tweet.

As is displayed in the ExampleSet below, the results now contain not only the tweets that were pulled in but their corresponding Polarity and Subjectivity as well as a confidence score for both:

Aylien rm 15

Step 3. Categorizing tweets

So we’ve determined the sentiment of the tweets but like we said in the beginning, we also want to categorize them in some way. We can do this pretty easily by using the Categorize Operator from the Text Analysis Extension, but before we do we need to prepare our data for analysis.

Firstly we’re going to use a Data to Documents Operator to generate Documents from our existing data set making it easier to categorize:

Aylien rm 16

We’ll then add a Categorize Operator which will basically classify our text based on a particular taxonomy (simply put, a set of predefined categories), in this case we’re using the IAB QAG taxonomy, which is a standard used in the digital advertising industry for categorizing content:

Aylien rm 17

Now our Process is starting to take shape, but because we previously transformed our data into documents before they were categorized, we need to reverse the process and create a dataset from the resulting categorized documents, which in turn will make it easier to visualize and understand as a whole.

Aylien rm 18

So here’s what our completed Process looks like:

Aylien rm 19

Connect the Operators and hit Run.

The Process we've built now collects tweets, analyzes the Sentiment of those tweets, prepares them for categorization against a taxonomy and finally displays the results in an ExampleSet, like the one below:

Aylien rm 20

Step 4. Visualizing the results

We have our results stored in a table (ExampleSet) but in order to make them more presentable we want to visualize them a bit better.

RapidMiner lets you display and visualize results of your Process really easily using simple charts and visualizations like the ones below, which can all be created using the Charts widget on the left hand side of your results display:

Bar chart showing the total number of positive, negative and neutral tweets

Pie chart showing the percentage of positive, negative and neutral tweets

Pie chart showing a breakdown of tweets by their top-level category

Tutorial 2: News Analysis

In this tutorial we’re going to walk you through using the Text Analysis by AYLIEN Extension for RapidMiner, to build a "News Analyzer" that monitors and analyzes articles from a particular RSS feed, or feeds.

If you’re new to RapidMiner, or it’s your first time using the Text Analysis Extension you should first read our Getting Started tutorial which takes you through the installation process. Also, If you haven’t got an AYLIEN account, which you’ll need to use the Extension, you can grab one here.

So, here’s what we’re going to do:

  1. Monitor an RSS feed collecting Article updates using the Read RSS Feed Operator
  2. Extract the main body of text and Title from the article with the Extract Article Operator
  3. Analyze and categorize these articles using the Categorize Operator
  4. Extract Entities from the article, mentions of People, Places, Organization etc. using the Extract Entities Operator
  5. Visualize our results and make them more consumable and understandable

You can download the finished Process from our Sample Processes page.

Step 1. Collecting news articles

The first step to building our News Analyzer will involve adding a Read RSS Feed Operator to our Process. When you add the Read RSS Feed you need to specify what RSS feed you want to monitor by adding the URL in the RSS feed input and adding your timeout counters, we’ve kept the default values:

Aylien rm 24

Step 2. Extracting the article titles and body

To extract the relevant pieces of text from the URLs collected we can use the Extract Article Operator. This will pull the main body of text, the title and any image present directly from the URL.

Aylien rm 25

To prepare our extracted text for analysis we use a Data to Document Operator. This will transform the dataset of text to a collection of documents making it easier to categorize.

Aylien rm 26

Now we need to specify which column(s) in the ExampleSet contain the text we want to create a Document from:

Aylien rm 27

The first thing we’re going to do with the extracted text is, try and get a high level understanding for what it’s about by categorizing it based on a particular taxonomy, in this case the IAB QAG taxonomy:

Aylien rm 28

We’ll then add a Document to Data Operator which transforms our documents back to an ExampleSet, making it easier to further process the data:

Aylien rm 29

Step 3. Extracting Entities

Finally, the last piece of analysis we’ll do on the text is extract any mention of an Entity (Keywords, People, Places, Organizations, % values, $ values etc.) using an Extract Entities Operator:

Aylien rm 30

Step 4. Results

Now if you connect the Operators and Run the process, your results will be displayed in an ExampleSet tab like the one below. Each row will contain the extracted text and title, its appropriate categories as well as any Entities that were extracted separated out in columns:

Aylien rm 31

Step 5. Visualizing the results

RapidMiner let’s you display and visualize results of your Process really easily using simple charts and visualizations like the ones below, which can all be created using the Charts widget on the left hand side of your results display.

We put together a simple pie-chart below visualizing the categories of the articles extracted with our News Analyzer:

Aylien rm 32

Sample Processes

Below you can find a list of sample Processes for RapidMiner, which make use of the Text Analysis by AYLIEN Extension. You can download and import these Processes into RapidMiner easily, and use them as a foundation for your Text Analysis operations:

Title Description Download Notes
Twitter Sentiment Analysis A complete process for retrieving tweets, analyzing them (extracting Sentiment and Categories) and visualizing the results in RapidMiner. Download Requires a Twitter connection
News Analysis (from RSS feeds) A complete process for retrieving news articles from an RSS feed and extracting their title and main body of text, analyzing them to find their high-level category as well as the Entities mentioned in each article, and visualizing the results. Download Requires the Web Mining Extension
News Retrieval and Analysis (using AYLIEN News API) A sample process that shows you how you can integrate with our News API through RapidMiner, to retrieve news content and use it for further visualization and analysis in RapidMiner. Download Requires the Web Mining Extension and valid credentials for AYLIEN News API

Importing Sample Processes

You can import a Process by navigating to File > Import Process and choosing the Process file (*.rmp):

Aylien rm 33