{"__v":18,"_id":"5624e72b06e8040d005ed6e4","category":{"__v":4,"_id":"5624db675a86b423009462e1","pages":["5624db8f85a31117001c5428","5624e72b06e8040d005ed6e4","5624ed3b6ff1010d009b1646","562502255473621900689514"],"project":"55a7aee84a33f92b00b7a150","version":"55a7aee84a33f92b00b7a153","sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-10-19T12:00:39.932Z","from_sync":false,"order":9999,"slug":"rapidminer-extension","title":"RapidMiner Extension"},"parentDoc":null,"project":"55a7aee84a33f92b00b7a150","user":"55a7ae50bf1be93100d89df1","version":{"__v":6,"_id":"55a7aee84a33f92b00b7a153","project":"55a7aee84a33f92b00b7a150","createdAt":"2015-07-16T13:17:28.411Z","releaseDate":"2015-07-16T13:17:28.411Z","categories":["55a7aee94a33f92b00b7a154","55a7fefa3efe0c2f0074cbdf","55a8fb10c8bd450d000dd130","55a936b1cf45e1390093f362","55abddeaa36ccd0d00fdebe1","5624db675a86b423009462e1"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2015-10-19T12:50:51.701Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"results":{"codes":[]},"settings":"","auth":"required","params":[],"url":""},"isReference":false,"order":1,"body":"In this tutorial we’re going to walk you through using the “Text Analysis by AYLIEN” Extension for RapidMiner, to collect and analyze tweets. If you’re new to RapidMiner, or it’s your first time using the Text Analysis Extension you should first read our [Getting Started](doc:getting-started-rapidminer) tutorial which takes you through the installation process. Also, If you haven’t got an AYLIEN account, which you’ll need to use the Extension, you can grab one [here](http://developer.aylien.com/signup?source=rapidminer).\n\nSo, here’s what we’re going to do:\n\n1. Collect tweets using the **Twitter Search** Operator\n2. Analyze their Sentiment using the **Analyze Sentiment** Operator\n3. Assign the tweets to different categories using the **Categorize** Operator\n4. Visualize our results and make them more consumable and understandable\n\n**You can download the finished Process from our [Sample Processes](doc:sample-processes-rapidminer) page.** \n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Step 1. Gathering tweets\"\n}\n[/block]\nCreate a new Process in RapidMiner and add a **Search Twitter** Operator. Build your desired search as you would using the Twitter search API. You can see from the screenshot below we’re searching for tweets containing the keyword “Samsung”. We’ve cleaned up our search a little by removing retweets (`-rt`) and links (`-http`). We’ve also restricted the number of tweets to collect to 20 and decided we only want to see English tweets by adding “en” in the `language` parameter. We’ve also indicated that we want only recent or popular tweets to be returned using the `Result type` parameter.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/dByapgjWRdawNRCUUGio_Screen%20Shot%202015-10-13%20at%205.16.00%20PM.png\",\n        \"Screen Shot 2015-10-13 at 5.16.00 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#5ab278\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\nLet's have a look at what kind of results our search returns. Once you hit Run (don’t forget to connect your Operators) the results from the Twitter search are displayed in an ExampleSet tab, like the one below:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/XoEo24LLSeaR3a1vt6GI_Screen%20Shot%202015-10-13%20at%205.16.33%20PM.png\",\n        \"Screen Shot 2015-10-13 at 5.16.33 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#988a66\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Step 2. Analyzing tweets for Sentiment\"\n}\n[/block]\nSo now we have a collection of 20 tweets stored in an ExampleSet that are ready to be  further analyzed. The first thing we’re going to do from an analysis point of view is, try and determine what the Sentiment of each tweet is, i.e. whether they are Positive, Negative or Neutral.\n\nWe do this by adding the **Analyze Sentiment** Operator to our Process and selecting “text” as our “Input attribute” on the right hand side, as shown in the screenshot below:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/EXQtKIPURvezpeCUacHi_Screen%20Shot%202015-10-13%20at%205.20.04%20PM.png\",\n        \"Screen Shot 2015-10-13 at 5.20.04 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#5bb36f\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\nSo now we have a relatively simple Twitter Sentiment Analysis Process that collects tweets about “Samsung” and analyzes them to determine the Polarity (i.e. positive, neutral or negative) and Subjectivity (i.e. subjective or objective) of each tweet.\n\nAs is displayed in the ExampleSet below, the results now contain not only the tweets that were pulled in but their corresponding Polarity and Subjectivity as well as a confidence score for both:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/NiVzXIScQCX8sIe7PB5c_Screen%20Shot%202015-10-13%20at%205.21.12%20PM.png\",\n        \"Screen Shot 2015-10-13 at 5.21.12 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#d36133\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Step 3. Categorizing tweets\"\n}\n[/block]\nSo we’ve determined the sentiment of the tweets but like we said in the beginning, we also want to categorize them in some way. We can do this pretty easily by using the **Categorize** Operator from the Text Analysis Extension, but before we do we need to prepare our data for analysis.\n\nFirstly we’re going to use a **Data to Documents** Operator to generate Documents from our existing data set making it easier to categorize:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/NAn2Zs99Tme9JW4x30l5_Screen%20Shot%202015-10-14%20at%207.36.47%20PM.png\",\n        \"Screen Shot 2015-10-14 at 7.36.47 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#c85f33\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\nWe’ll then add a Categorize Operator which will basically classify our text based on a particular taxonomy (simply put, a set of predefined categories), in this case we’re using the IAB QAG taxonomy, which is a standard used in the digital advertising industry for categorizing content:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/2tSKdqQmTUOtcjTt4xD1_Screen%20Shot%202015-10-14%20at%207.36.49%20PM.png\",\n        \"Screen Shot 2015-10-14 at 7.36.49 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#c95f33\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\nNow our Process is starting to take shape, but because we previously transformed our data into documents before they were categorized, we need to reverse the process and create a dataset from the resulting categorized documents, which in turn will make it easier to visualize and understand as a whole.\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/pI9UJLKdQvyYFhsrcv3q_Screen%20Shot%202015-10-14%20at%207.36.50%20PM.png\",\n        \"Screen Shot 2015-10-14 at 7.36.50 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#c95f33\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\nSo here’s what our completed Process looks like:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/a9Y4OdtwRaGHiQbg63AD_Screen%20Shot%202015-10-14%20at%207.36.40%20PM.png\",\n        \"Screen Shot 2015-10-14 at 7.36.40 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#5bb375\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\nConnect the Operators and hit Run.\n\nThe Process we've built now collects tweets, analyzes the Sentiment of those tweets, prepares them for categorization against a taxonomy and finally displays the results in an ExampleSet, like the one below:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/mT4sSxeSrCrtiePN0uUz_Screen%20Shot%202015-10-13%20at%205.40.53%20PM.png\",\n        \"Screen Shot 2015-10-13 at 5.40.53 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#a78c58\",\n        \"\"\n      ]\n    }\n  ]\n}\n[/block]\n\n[block:api-header]\n{\n  \"type\": \"basic\",\n  \"title\": \"Step 4. Visualizing the results\"\n}\n[/block]\nWe have our results stored in a table (ExampleSet) but in order to make them more presentable we want to visualize them a bit better.\n\nRapidMiner let’s you display and visualize results of your Process really easily using simple charts and visualizations like the ones below, which can all be created using the Charts widget on the left hand side of your results display:\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/eqWg5RjQBKZPs6hDaoXz_Screen%20Shot%202015-10-13%20at%205.22.35%20PM.png\",\n        \"Screen Shot 2015-10-13 at 5.22.35 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#1f1fbf\",\n        \"\"\n      ],\n      \"caption\": \"Bar chart showing the total number of positive, negative and neutral tweets\"\n    }\n  ]\n}\n[/block]\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/QtP7cD2zRA2w6xpN4QZU_Screen%20Shot%202015-10-13%20at%205.22.53%20PM.png\",\n        \"Screen Shot 2015-10-13 at 5.22.53 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#1c1dc4\",\n        \"\"\n      ],\n      \"caption\": \"Pie chart showing the percentage of positive, negative and neutral tweets\"\n    }\n  ]\n}\n[/block]\n\n[block:image]\n{\n  \"images\": [\n    {\n      \"image\": [\n        \"https://files.readme.io/uVsZoHgKReetZgjWVPTu_Screen%20Shot%202015-10-13%20at%205.45.08%20PM.png\",\n        \"Screen Shot 2015-10-13 at 5.45.08 PM.png\",\n        \"1552\",\n        \"1072\",\n        \"#1eba82\",\n        \"\"\n      ],\n      \"caption\": \"Pie chart showing a breakdown of tweets by their top-level category\"\n    }\n  ]\n}\n[/block]","excerpt":"","slug":"tutorial-tweet-sentiment-rapidminer","type":"basic","title":"Tutorial: Building a Twitter Sentiment Analysis Process"}

Tutorial: Building a Twitter Sentiment Analysis Process


In this tutorial we’re going to walk you through using the “Text Analysis by AYLIEN” Extension for RapidMiner, to collect and analyze tweets. If you’re new to RapidMiner, or it’s your first time using the Text Analysis Extension you should first read our [Getting Started](doc:getting-started-rapidminer) tutorial which takes you through the installation process. Also, If you haven’t got an AYLIEN account, which you’ll need to use the Extension, you can grab one [here](http://developer.aylien.com/signup?source=rapidminer). So, here’s what we’re going to do: 1. Collect tweets using the **Twitter Search** Operator 2. Analyze their Sentiment using the **Analyze Sentiment** Operator 3. Assign the tweets to different categories using the **Categorize** Operator 4. Visualize our results and make them more consumable and understandable **You can download the finished Process from our [Sample Processes](doc:sample-processes-rapidminer) page.** [block:api-header] { "type": "basic", "title": "Step 1. Gathering tweets" } [/block] Create a new Process in RapidMiner and add a **Search Twitter** Operator. Build your desired search as you would using the Twitter search API. You can see from the screenshot below we’re searching for tweets containing the keyword “Samsung”. We’ve cleaned up our search a little by removing retweets (`-rt`) and links (`-http`). We’ve also restricted the number of tweets to collect to 20 and decided we only want to see English tweets by adding “en” in the `language` parameter. We’ve also indicated that we want only recent or popular tweets to be returned using the `Result type` parameter. [block:image] { "images": [ { "image": [ "https://files.readme.io/dByapgjWRdawNRCUUGio_Screen%20Shot%202015-10-13%20at%205.16.00%20PM.png", "Screen Shot 2015-10-13 at 5.16.00 PM.png", "1552", "1072", "#5ab278", "" ] } ] } [/block] Let's have a look at what kind of results our search returns. Once you hit Run (don’t forget to connect your Operators) the results from the Twitter search are displayed in an ExampleSet tab, like the one below: [block:image] { "images": [ { "image": [ "https://files.readme.io/XoEo24LLSeaR3a1vt6GI_Screen%20Shot%202015-10-13%20at%205.16.33%20PM.png", "Screen Shot 2015-10-13 at 5.16.33 PM.png", "1552", "1072", "#988a66", "" ] } ] } [/block] [block:api-header] { "type": "basic", "title": "Step 2. Analyzing tweets for Sentiment" } [/block] So now we have a collection of 20 tweets stored in an ExampleSet that are ready to be further analyzed. The first thing we’re going to do from an analysis point of view is, try and determine what the Sentiment of each tweet is, i.e. whether they are Positive, Negative or Neutral. We do this by adding the **Analyze Sentiment** Operator to our Process and selecting “text” as our “Input attribute” on the right hand side, as shown in the screenshot below: [block:image] { "images": [ { "image": [ "https://files.readme.io/EXQtKIPURvezpeCUacHi_Screen%20Shot%202015-10-13%20at%205.20.04%20PM.png", "Screen Shot 2015-10-13 at 5.20.04 PM.png", "1552", "1072", "#5bb36f", "" ] } ] } [/block] So now we have a relatively simple Twitter Sentiment Analysis Process that collects tweets about “Samsung” and analyzes them to determine the Polarity (i.e. positive, neutral or negative) and Subjectivity (i.e. subjective or objective) of each tweet. As is displayed in the ExampleSet below, the results now contain not only the tweets that were pulled in but their corresponding Polarity and Subjectivity as well as a confidence score for both: [block:image] { "images": [ { "image": [ "https://files.readme.io/NiVzXIScQCX8sIe7PB5c_Screen%20Shot%202015-10-13%20at%205.21.12%20PM.png", "Screen Shot 2015-10-13 at 5.21.12 PM.png", "1552", "1072", "#d36133", "" ] } ] } [/block] [block:api-header] { "type": "basic", "title": "Step 3. Categorizing tweets" } [/block] So we’ve determined the sentiment of the tweets but like we said in the beginning, we also want to categorize them in some way. We can do this pretty easily by using the **Categorize** Operator from the Text Analysis Extension, but before we do we need to prepare our data for analysis. Firstly we’re going to use a **Data to Documents** Operator to generate Documents from our existing data set making it easier to categorize: [block:image] { "images": [ { "image": [ "https://files.readme.io/NAn2Zs99Tme9JW4x30l5_Screen%20Shot%202015-10-14%20at%207.36.47%20PM.png", "Screen Shot 2015-10-14 at 7.36.47 PM.png", "1552", "1072", "#c85f33", "" ] } ] } [/block] We’ll then add a Categorize Operator which will basically classify our text based on a particular taxonomy (simply put, a set of predefined categories), in this case we’re using the IAB QAG taxonomy, which is a standard used in the digital advertising industry for categorizing content: [block:image] { "images": [ { "image": [ "https://files.readme.io/2tSKdqQmTUOtcjTt4xD1_Screen%20Shot%202015-10-14%20at%207.36.49%20PM.png", "Screen Shot 2015-10-14 at 7.36.49 PM.png", "1552", "1072", "#c95f33", "" ] } ] } [/block] Now our Process is starting to take shape, but because we previously transformed our data into documents before they were categorized, we need to reverse the process and create a dataset from the resulting categorized documents, which in turn will make it easier to visualize and understand as a whole. [block:image] { "images": [ { "image": [ "https://files.readme.io/pI9UJLKdQvyYFhsrcv3q_Screen%20Shot%202015-10-14%20at%207.36.50%20PM.png", "Screen Shot 2015-10-14 at 7.36.50 PM.png", "1552", "1072", "#c95f33", "" ] } ] } [/block] So here’s what our completed Process looks like: [block:image] { "images": [ { "image": [ "https://files.readme.io/a9Y4OdtwRaGHiQbg63AD_Screen%20Shot%202015-10-14%20at%207.36.40%20PM.png", "Screen Shot 2015-10-14 at 7.36.40 PM.png", "1552", "1072", "#5bb375", "" ] } ] } [/block] Connect the Operators and hit Run. The Process we've built now collects tweets, analyzes the Sentiment of those tweets, prepares them for categorization against a taxonomy and finally displays the results in an ExampleSet, like the one below: [block:image] { "images": [ { "image": [ "https://files.readme.io/mT4sSxeSrCrtiePN0uUz_Screen%20Shot%202015-10-13%20at%205.40.53%20PM.png", "Screen Shot 2015-10-13 at 5.40.53 PM.png", "1552", "1072", "#a78c58", "" ] } ] } [/block] [block:api-header] { "type": "basic", "title": "Step 4. Visualizing the results" } [/block] We have our results stored in a table (ExampleSet) but in order to make them more presentable we want to visualize them a bit better. RapidMiner let’s you display and visualize results of your Process really easily using simple charts and visualizations like the ones below, which can all be created using the Charts widget on the left hand side of your results display: [block:image] { "images": [ { "image": [ "https://files.readme.io/eqWg5RjQBKZPs6hDaoXz_Screen%20Shot%202015-10-13%20at%205.22.35%20PM.png", "Screen Shot 2015-10-13 at 5.22.35 PM.png", "1552", "1072", "#1f1fbf", "" ], "caption": "Bar chart showing the total number of positive, negative and neutral tweets" } ] } [/block] [block:image] { "images": [ { "image": [ "https://files.readme.io/QtP7cD2zRA2w6xpN4QZU_Screen%20Shot%202015-10-13%20at%205.22.53%20PM.png", "Screen Shot 2015-10-13 at 5.22.53 PM.png", "1552", "1072", "#1c1dc4", "" ], "caption": "Pie chart showing the percentage of positive, negative and neutral tweets" } ] } [/block] [block:image] { "images": [ { "image": [ "https://files.readme.io/uVsZoHgKReetZgjWVPTu_Screen%20Shot%202015-10-13%20at%205.45.08%20PM.png", "Screen Shot 2015-10-13 at 5.45.08 PM.png", "1552", "1072", "#1eba82", "" ], "caption": "Pie chart showing a breakdown of tweets by their top-level category" } ] } [/block]