+ 1
How to extract all the text from a website to one file
I want to extract all the text from a simple website. But I don't the id or pictures and so on in ... the problem is that there is just too much stuff to take care of that I need to remove ... i mean the irrelevant tags that don't make the text like <div> or <table> ... so how could I get around it.. Oh and my language is php
2 Answers
+ 2
Use file_get_contents to get all the website contents:
$contents = file_get_contents('http://www.blogger.com/');
Form regex pattern to filter tags, $pattern.
Then use preg_match to get the filtered text:
preg_match($pattern, $contents, $matches, PREG_OFFSET_CAPTURE, 0);
print_r($matches);
+ 1
use curl