{"_id":"55aba4ba63a6b60d006616e1","editedParams":true,"parentDoc":null,"__v":4,"user":"55a7ae50bf1be93100d89df1","editedParams2":true,"project":"55a7aee84a33f92b00b7a150","version":{"_id":"55a7aee84a33f92b00b7a153","__v":6,"project":"55a7aee84a33f92b00b7a150","createdAt":"2015-07-16T13:17:28.411Z","releaseDate":"2015-07-16T13:17:28.411Z","categories":["55a7aee94a33f92b00b7a154","55a7fefa3efe0c2f0074cbdf","55a8fb10c8bd450d000dd130","55a936b1cf45e1390093f362","55abddeaa36ccd0d00fdebe1","5624db675a86b423009462e1"],"is_deprecated":false,"is_hidden":false,"is_beta":false,"is_stable":true,"codename":"","version_clean":"1.0.0","version":"1.0"},"category":{"_id":"55a7aee94a33f92b00b7a154","__v":21,"project":"55a7aee84a33f92b00b7a150","version":"55a7aee84a33f92b00b7a153","pages":["55a7aee94a33f92b00b7a156","55a7ea403efe0c2f0074cb75","55a80eb23ec2ec0d00bd66ea","55a811fc6e61e619004f703e","55a8c513cf45e1390093f18c","55a8d9fa27a17d210052516d","55a8f048cf45e1390093f248","55aba4ba63a6b60d006616e1","55abab0a0685ce0d0049fb8d","55abb04da36ccd0d00fdebc1","55abc1cc63a6b60d006616f4","55abcdbd63a6b60d006616fa","55abd36763a6b60d00661700","55abd50ea36ccd0d00fdebdb","55abd8c3a36ccd0d00fdebdf","55abd9cc0685ce0d0049fbb5","55abdb6b63a6b60d00661706","55abdc9b0685ce0d0049fbb8","55c1f5fca131980d005be95b","55fac407bc972f0d0005f644","569f68e766a5640d00efa52d"],"sync":{"url":"","isSync":false},"reference":false,"createdAt":"2015-07-16T13:17:29.102Z","from_sync":false,"order":2,"slug":"endpoints","title":"Endpoints"},"updates":[],"next":{"pages":[],"description":""},"createdAt":"2015-07-19T13:23:06.050Z","link_external":false,"link_url":"","githubsync":"","sync_unique":"","hidden":false,"api":{"examples":{"codes":[{"language":"curl","code":"curl https://api.aylien.com/api/v1/extract \\\n   -H \"X-AYLIEN-TextAPI-Application-Key: [[app:key]]\" \\\n   -H \"X-AYLIEN-TextAPI-Application-ID: [[app:id]]\" \\\n   -d best_image=true \\\n   -d url=\"http://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate\""},{"code":"textapi.extract({\n  url: 'http://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate',\n  best_image: true\n}, function(error, response) {\n  if (error === null) {\n    console.log(response);\n  }\n});","language":"javascript"},{"code":"url = \"http://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate\"\nextract = client.Extract({\"url\": url, \"best_image\": True})\nprint extract","language":"python"},{"language":"php","code":"<?php\n$url = 'http://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate';\n$extract = $textapi->Extract(array('url' => $url, 'best_image' => 'true'));\nvar_dump($extract);\n?>"},{"code":"ExtractParams.Builder builder = ExtractParams.newBuilder();\njava.net.URL url = new java.net.URL(\"http://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate\");\nbuilder.setUrl(url);\nbuilder.setBestImage(true);\nArticle extract = client.extract(builder.build());\nSystem.out.println(extract);","language":"java"},{"code":"url = \"http://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate\"\n\nextract = client.extract(url: url, best_image: true)\n\nputs extract","language":"ruby"},{"code":"url := \"http://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate\"\nextractParams := &textapi.ExtractParams{URL: url, BestImage: true}\narticle, err := client.Extract(extractParams)\nif err != nil {\n  panic(err)\n}\nfmt.Printf(\"%v\\n\", article)","language":"go"},{"code":"string url = \"http://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate\";\n\nvar extract = client.Extract(url: url, bestImage: true);\n\nConsole.WriteLine(\"Title: \" + extract.Title);\nConsole.WriteLine(\"Author: \" + extract.Author);","language":"csharp"}]},"method":"get","results":{"codes":[{"language":"json","status":200,"name":"","code":"{\n    \"article\": \"Remember Edward Snowden?\\r\\n\\r\\nFor many Americans who talked to John Oliver\\u00a0on Last Week Tonight, the answer is no.\\r\\n\\r\\nIt's been almost two years since the world was captivated by Snowden's leaks to The Guardian and The Washington Post about American surveillance programs. For weeks it seemed there was a new headline everyday about\\u00a0another previously classified surveillance program or another government official calling for action on this issue. Presidential review groups were called.\\u00a0President Obama called for major changes to the programs.\\r\\n\\r\\nAlthough the Snowden leaks certainly proved to be much more than a \\\"three-day story,\\\" American surveillance practices remain largely\\u00a0the same two years later.\\u00a0The main difference is this issue no longer dominates\\u00a0our political discourse. In November, the FREEDOM Act -- legislation under development for two years that would have overhauled NSA surveillance programs --\\u00a0died in a Senate procedural vote to little display.\\r\\n\\r\\nLast night we found out Oliver's HBO program was off last week because he was in Russia interviewing \\\"the most famous hero and/or traitor in recent American history.\\\" Oliver hit on many points that have been lacking in past interviews with the former government contractor -- from whether Snowden misses Hot Pockets to whether he's actually read all of the documents he's released to reporters (Snowden maintained he understands the scope of the documents).\\r\\n\\r\\nOliver's interview is timely\\u00a0as we approach an important\\u00a0deadline for surveillance reform on June 1. In response to the public outcry that followed the Snowden revelations, President Obama stipulated that congress must renew or reform the Patriot Act provision authorizing the bulk collection of Americans' phone records by that date, or else the program will expire. \\r\\n\\r\\nOnline this morning, Twitter, Reddit and the expected\\u00a0publications\\u00a0were abuzz with how \\\"John Oliver killed it\\\" and or \\\"slayed it\\\" in this new segment.\\r\\nSo John Oliver didn't ~destroy~ anything last night. He just interviewed Snowden.\\r\\n\\r\\n\\u2014 Steve Kovach (:::at:::stevekovach) April 6, 2015\\r\\nDespite what network news chose to lead with, the interview's\\u00a0significance extends beyond Snowden saying\\u00a0you should keep sending dirty pictures.\\u00a0It's important because millions of people who haven't thought about surveillance reform in months\\u00a0are going to click on this YouTube video and care about it again.\\r\\n\\r\\nLast summer we saw Oliver's ability to captivate the public's attention when it came to the complex, technical issue of net neutrality. His commentary prompted tens of thousands to submit comments to the Federal Communications Commission, helping net neutrality\\u00a0receive more public input than any other FCC docket item\\u00a0that came before it. In just 13 minutes, Oliver helped change\\u00a0the future of the Internet.\\r\\n\\r\\nSo what will the 33 minutes he spent on government surveillance reform do?\\r\\n\\r\\nThe\\u00a0FREEDOM Act's failure last year can largely be attributed to a lack of public support and understanding. If the average voter doesn't know the name Edward Snowden, how can they be expected to know about Section 215 of the Patriot Act?\\r\\n\\r\\nEver able to make us laugh about even the driest news topics, Oliver\\u00a0changed the topic of discussion from vague hypotheticals about civil liberties to something tangible he knew many Americans would care about -- dick pics.\\r\\n\\r\\n\\\"Well the good news is there's no program named, 'the dick pic program,'\\\" Snowden said. \\u00a0\\\"The bad news is they are still collecting everybody's information, including your dick pics.\\\"\\r\\n\\r\\nSnowden then went on to explain how the government uses different programs to access those pictures, from Executive Order 12333 to Section 702 of the Foreign Intelligence Surveillance Act.\\u00a0He was able to explain confusing provisions of American surveillance law in just a few sentences.\\r\\n\\r\\nHopefully as this debate returns to Washington and we\\u00a0turn to more serious conversations about surveillance reform, Oliver's jokes will keep Americans engaged and calling on their representatives to act on surveillance reform.\",\n    \"author\": \"Cat Zakrzewski\",\n    \"feeds\": [\n        \"https://techcrunch.com/feed/\",\n        \"https://techcrunch.com/comments/feed/\",\n        \"https://techcrunch.com/2015/04/06/john-oliver-just-changed-the-surveillance-reform-debate/feed/\"\n    ],\n    \"image\": \"\",\n    \"keywords\": [],\n    \"publishDate\": \"2015-04-06T17:45:49+00:00\",\n    \"title\": \"John Oliver Just Changed The Surveillance Reform Debate\",\n    \"videos\": [\n        \"https://www.youtube.com/embed/XEVlyP4_11M?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent\"\n    ]\n}"}]},"settings":"","auth":"required","params":[{"_id":"55a8f048cf45e1390093f249","ref":"","in":"query","required":false,"desc":"Article or webpage URL","default":"","type":"string","name":"url"},{"_id":"55aba5a5a36ccd0d00fdebba","ref":"","in":"query","required":false,"desc":"Raw HTML to extract data from","default":"","type":"string","name":"html"},{"_id":"55aba608a36ccd0d00fdebbb","ref":"","in":"query","required":false,"desc":"Whether or not the API should try to extract the best image (might affect processing time)","default":"false","type":"boolean","name":"best_image"},{"_id":"55a8f18bc8bd450d000dd113","ref":"","in":"query","required":false,"desc":"Language (refer to [Language Support](/docs/language-support))","default":"auto","type":"string","name":"language"}],"url":"/extract"},"isReference":false,"order":7,"body":"If you are dealing with webpages and articles, chances are the text you'd like to analyze is surrounded by some 'clutter' such as site navigation or ads. In order to get accurate results in your text analysis, you might want to remove such clutter and extract the main text of the webpage or article. **Article Extraction** allows you to do that, and in addition to removing clutter, also helps you extract the following information:\n\n- **Title**: raw title of the webpage or article\n- **Article**: full text of the webpage or article\n- **Author**: name of the author\n- **Image**: the main image on the webpage or article\n- **Videos**: an array of videos embedded in the webpage or article\n- **Feeds**: an array of RSS feeds found on the webpage or article\n- **Publish Date**: publish date of the article\n- **Keywords**: an array of keywords extracted from the webpage\n[block:callout]\n{\n  \"type\": \"warning\",\n  \"body\": \"Knowing the language of the target webpage or article often helps improve the quality of extraction, so it must be supplied via the `language` parameter whenever possible. See [Language Support](/docs/language-support) for more information.\",\n  \"title\": \"Extraction and language\"\n}\n[/block]","excerpt":"/extract","slug":"extract","type":"endpoint","title":"Article Extraction"}

getArticle Extraction

/extract

Definition

{{ api_url }}{{ page_api_url }}

Parameters

Query Params

url:
string
Article or webpage URL
html:
string
Raw HTML to extract data from
best_image:
booleanfalse
Whether or not the API should try to extract the best image (might affect processing time)
language:
stringauto
Language (refer to [Language Support](/docs/language-support))

Examples


Result Format


Documentation

If you are dealing with webpages and articles, chances are the text you'd like to analyze is surrounded by some 'clutter' such as site navigation or ads. In order to get accurate results in your text analysis, you might want to remove such clutter and extract the main text of the webpage or article. **Article Extraction** allows you to do that, and in addition to removing clutter, also helps you extract the following information: - **Title**: raw title of the webpage or article - **Article**: full text of the webpage or article - **Author**: name of the author - **Image**: the main image on the webpage or article - **Videos**: an array of videos embedded in the webpage or article - **Feeds**: an array of RSS feeds found on the webpage or article - **Publish Date**: publish date of the article - **Keywords**: an array of keywords extracted from the webpage [block:callout] { "type": "warning", "body": "Knowing the language of the target webpage or article often helps improve the quality of extraction, so it must be supplied via the `language` parameter whenever possible. See [Language Support](/docs/language-support) for more information.", "title": "Extraction and language" } [/block]

User Information

Try It Out

get
{{ tryResults.results }}
Method{{ tryResults.method }}
Request Headers
{{ tryResults.requestHeaders }}
URL{{ tryResults.url }}
Request Data
{{ tryResults.data }}
Status
Response Headers
{{ tryResults.responseHeaders }}