Getting An Excerpt From Html In Php
Solution 1:
Simplest way is to strip all HTML from the item text using strip_tags()
before truncating it.
Solution 2:
I would take the 2nd option if it's important to retain the HTML structure of the original news item.
A simple way to implement this would be to run your fragment through Tidy to close off any unclosed tags. In particular, see the tidy::cleanRepair method.
Solution 3:
Hello I guess what you are looking for is called website scraping. Here is how you can scrape a website; Use a library PHP Simple HTML DOM Parser download here PHP Simple HTML DOM Parser
And finally here is the code how you can scrape Slashdot
// Create DOM from URL$html = file_get_html('http://slashdot.org/');
// Find all article blocksforeach($html->find('div.article') as$article) {
$item['title'] = $article->find('div.title', 0)->plaintext;
$item['intro'] = $article->find('div.intro', 0)->plaintext;
$item['details'] = $article->find('div.details', 0)->plaintext;
$articles[] = $item;
}
print_r($articles);
Solution 4:
You could try parsing your data to XML and then truncating only the "pure" text nodes.
Note: This solution forces the input to be valid XML and to be always in about the same structure.
Solution 5:
This excerpts down to the first paragraph without cutting words and appends optional trail.
$excerpt = self::excerpt_paragraph($html, 180)
/**
* excerpt first paragraph from html content
*
**/publicstaticfunctionexcerpt_paragraph($html, $max_char = 100, $trail='...')
{
// temp var to capture the p tag(s)$matches= array();
if ( preg_match( '/<p>[^>]+<\/p>/', $html, $matches) )
{
// found <p></p>$p = strip_tags($matches[0]);
} else {
$p = strip_tags($html);
}
//shorten without cutting words$p = self::short_str($p, $max_char );
// remove trailing comma, full stop, colon, semicolon, 'a', 'A', space$p = rtrim($p, ',.;: aA' );
// return nothing if just spaces or too shortif (ctype_space($p) || $p=='' || strlen($p)<10) { return''; }
return'<p>'.$p.$trail.'</p>';
}
///**
* shorten string but not cut words
*
**/publicstaticfunctionshort_str($str, $len, $cut = false)
{
if ( strlen( $str ) <= $len ) { return$str; }
$string = ( $cut ? substr( $str, 0, $len ) : substr( $str, 0, strrpos( substr( $str, 0, $len ), ' ' ) ) );
return$string;
}
//
Post a Comment for "Getting An Excerpt From Html In Php"