I wrote something similar (based on the same principles) back in May, posted at http://www.ilovejackdaniels.com/php/google-style-keyword-highlighting/ – I have to say though, this version is very nicely done. The textarea and script bug was something I hadn’t considered, and the accent replacement is a nice touch. Good work!
Copy & paste the code below to embed this comment.
Justin Greer
Before we go implementing this everywhere, we should ask if this is really a good thing? Usability is mentioned in the article, and it says that search term highlighting is good, but where’s the proof? What’s the reality?
A quick informal survey this morning (of web developers and web users both) shows that people I know don’t seem to like search term highlighting. In fact, it distracts attention from the content you’re looking for. Each person I talked to said, at best, they ignore the highlighting because they’re in a different mode of scanning the page once they reach it.
How do people then know the difference between explicitly highlighted items on the page and the ones that are done by the search term highlighter? Also, if they’re really looking for the terms (that they already know are on the page), why not use the “Find” feature in the browser? That’s what it’s for. This way it’s left to the user.
If nothing else, maybe it would be better to put a little toggle near the top of the page to turn highlighting on and off—so they could use that rather than the “Find” command when they’re actually interested in finding the terms.
In any case, I just want to make sure that people consider whether it’s really useful to their users before they implement this. I’d rather not have every site I visit highlighting random words throughout the pages.
Better yet, why not just make the text a different colour rather than highlighting it? It would single out the text still, but would do so much more subtly.
How about doing this with JavaScript. No extra meaningless markup and we could even have a link to disable the highlighting with out the extra trip to the server.
Without wishing to blow my own trumpet too severely, my searchhi routine (http://www.kryogenix.org/code/browser/searchhi/) highlights search terms client-side, and is unobtrusive DHTML (meaning that it only requires a script src tag at the top of the page, and a definition for span.searchword in your stylesheets. It’s simple to implement; obviously it won’t work in non-JS browsers.
Copy & paste the code below to embed this comment.
Amit
I have just coded an advanced search in my website http://www.rolex-replica.net,
This hightlight thingy seems to be a option… Will be adding it..
Thanks,
Amit
One additional problem is a search utilizing inclusionary or exclusionary syntax (’+’ or ‘-’). I pretty much always search using advanced filtering techniques. While this works well for basic searches, it fails to take into account common advanced search strategies.
I agree that a clientside scripted solution would be far better, and easier to implement, in that you don’t have to go dabbling with every bit of text that’s output in a serverside script.
For semantics, it’d be good to have said script run through the text, and wrap the highlighted terms in <strong class=“highlight”> elements, as well as dynamically adding a stylesheet with (ideally) something like:
-or more realistically than ideally, using a different classname for each highlighted term.
Using :contains would complicate things anyway, in that you’d have to make sure that the rule for :contains(cat) comes before the one for :contains(category).
Thanks for the comments so far; we’ll take them on board for future releases of the script. Here’s our considered response to some of the questions and criticisms so far:
“Why not just make the text a different colour rather than highlighting it?”
That’s up to you and your style sheets. The keyword highlighter may be better described as keyword tagger; it surrounds the words the visitor searched for with a span tag that has a special class. You can use this class to highlight the words anyway you want with CSS (without needing to touch the PHP code at all). Whether that is by changing the colour of the text or adding verbal stress to the word for screen readers is up to you. You may even want to change the use of the span element to that of strong or em for a more semantic value; it is your decision.
“How about doing this with Javascript”
You could implement this in Javascript (as some have done) or any client-side technology. While we have no qualms about doing this, we decided a server-side method held more benefits. For example, you can guarantee that all user-agents will receive the same output on the server-side, something you cannot say about a client-side script (e.g. clients with no understanding of Javascript will not highlight anything). You can also augment the server-side script in ways you couldn’t client-side, such as integrating it with a site-wide search as we mention in the article.
“Using Javascript allows you to disable the highlighting without an extra trip to the server”
Let’s polarise this: instead of using Javascript to highlight the search terms, why not continue to highlight server-side, and add Javascript to the page to allow the added span elements to be removed from the DOM? That would remove the need to make a request to the server, without the main functionality relying on client-side code. Another way would be to add an alternative stylesheet that turns off the highlighting.
“Your regular expressions for parsing the HTML are naive”
Implementing a full SGML/XML parser was well beyond the scope of our project, although it may be a good idea for a future version. The main focus of the project was on usability and allowing users to find the information they want faster.
“One additional problem is a search utilising inclusive or exclusive syntax (’+’ or ‘-’)”
We hadn’t thought about this, so thanks for pointing it out. If a user searches for “food -dog”, the highlighting function would try to highlight the words “food” and “-dog”, rather than “dog”. We’ll put in the to-do list and fix it as soon as we can.
“Searching for <! breaks the highlighting function”
Good catch! We missed that one, but it’s been fixed in version 1.8.1 (available from http://suda.co.uk/projects/SEHL/). The problem was that while the special HTML characters were being properly escaped when being searched for, they weren’t when being displayed in the little advisory at the top of the page. So the highlighter was never broken, but the advisory note at the top was.
“Why not use Firefox’s find-in-page feature instead?”
You can, but you can only search for one word or phrase at a time. The highlighting function is available to any visitor without extra effort on their part, and it shows any number of distinct phrases on the page simultaneously.
“Wouldn’t doing it server side have odd results if a browser is using a proxy?”
If we understand you correctly the implication here is that the referring URL would be removed by the proxy; thus nothing will be highlighted, the user is none the wiser, and the system degrades gracefully. Can anyone think of any other problems with a client behind a proxy?
On a final note, we hope you see our code as a seed for future ideas, to allow you to think about how best to provide information to web users, to do more than just provide static information. Remember, our code is released under the GPL, so please feel free to take it, fork it, and make it better! If you have any more comments please let us know. Cheers!
PHP Highlighter (http://www.hotscripts.com/Detailed/21112.html) is another alternative. I recently implemented it in my own site search engine without much hassle.
Copy & paste the code below to embed this comment.
Erki Esken
Instead of using auto_prepend php file you can achieve the same thing with Apache 2.0 output filters, and this way you can add search term highlighting to static html files also.
You need to have mod_ext_filter loaded, and then use this in server level config:
I’ve made a couple of modifications to fit in with my dynamic generation from template files and database content; and to cope with my caching strategy. See the above URL for details.
Copy & paste the code below to embed this comment.
Martin Kliehm
Nice script, but when your search term includes a German “Umlaut” like “ü” (ü), all text vanishes!
Also I got strange results when the searched text includes several <div>s, <!—comments—>, and <?php ?> tags, don’t know which is responsible. Then the first <div> block is ignored and only text in the second is highlighted…
Highlighting is nice if the words you were looking for are in an obscure part of the page. However, it quickly becomes annoying when the searched words are very common in the main text. Precisely because those words stand out, it gets very hard to read the rest.
Rather than defining colors to use for highlighting, you can use css colors to automatically specify whichever colors the browser uses by default for highlighting. Look here: http://bombingpixels.com/css/testing/css-user-interface.htm
27 Reader Comments
Back to the ArticleILoveJackDaniels
I wrote something similar (based on the same principles) back in May, posted at http://www.ilovejackdaniels.com/php/google-style-keyword-highlighting/ – I have to say though, this version is very nicely done. The textarea and script bug was something I hadn’t considered, and the accent replacement is a nice touch. Good work!
Justin Greer
Before we go implementing this everywhere, we should ask if this is really a good thing? Usability is mentioned in the article, and it says that search term highlighting is good, but where’s the proof? What’s the reality?
A quick informal survey this morning (of web developers and web users both) shows that people I know don’t seem to like search term highlighting. In fact, it distracts attention from the content you’re looking for. Each person I talked to said, at best, they ignore the highlighting because they’re in a different mode of scanning the page once they reach it.
How do people then know the difference between explicitly highlighted items on the page and the ones that are done by the search term highlighter? Also, if they’re really looking for the terms (that they already know are on the page), why not use the “Find” feature in the browser? That’s what it’s for. This way it’s left to the user.
If nothing else, maybe it would be better to put a little toggle near the top of the page to turn highlighting on and off—so they could use that rather than the “Find” command when they’re actually interested in finding the terms.
In any case, I just want to make sure that people consider whether it’s really useful to their users before they implement this. I’d rather not have every site I visit highlighting random words throughout the pages.
Kim Siever
Better yet, why not just make the text a different colour rather than highlighting it? It would single out the text still, but would do so much more subtly.
Andri Sigurðsson
How about doing this with JavaScript. No extra meaningless markup and we could even have a link to disable the highlighting with out the extra trip to the server.
Stuart Langridge
Without wishing to blow my own trumpet too severely, my searchhi routine (http://www.kryogenix.org/code/browser/searchhi/) highlights search terms client-side, and is unobtrusive DHTML (meaning that it only requires a script src tag at the top of the page, and a definition for span.searchword in your stylesheets. It’s simple to implement; obviously it won’t work in non-JS browsers.
Michael Kellen
The problem with your naive expression is that it is easily broken by perfectly valid HTML. A few examples:
What about an image with ALT=”> Comment”? What about a comment with a > in it? What about pages with scripting that doesnumeric > comparisons ?
I highly recommend looking at a full-on HTML parser if you want to avoid potential problems.
http://php-html.sourceforge.net/ is one such.
You can see the general method (albeit implemented in perl) in the second code listing at this link:
http://perlmonks.org/index.pl?node_id=370246
Urban Clothing - FOS
Thanks for the great article. I will definately consider using this for my site. Thanks a ton.
Trevor
I agree with Justin Greer. Check out Firefox’s “find” feature. It makes it easy for those who WANT search term highlighting to access it.
Amit
I have just coded an advanced search in my website http://www.rolex-replica.net,
This hightlight thingy seems to be a option… Will be adding it..
Thanks,
Amit
andrew
One additional problem is a search utilizing inclusionary or exclusionary syntax (’+’ or ‘-’). I pretty much always search using advanced filtering techniques. While this works well for basic searches, it fails to take into account common advanced search strategies.
Brendan Taylor
…but it does seem that using Javascript would be more appropriate. Wouldn’t doing it server side have odd results if a browser is using a proxy?
Anon
entering <! breaks it…
Simon F. P. Murray
I agree that a clientside scripted solution would be far better, and easier to implement, in that you don’t have to go dabbling with every bit of text that’s output in a serverside script.
For semantics, it’d be good to have said script run through the text, and wrap the highlighted terms in <strong class=“highlight”> elements, as well as dynamically adding a stylesheet with (ideally) something like:
strong.highlight {font-weight:inherit}
strong.highlight:contains(keyword1) { background:#8FF }
strong.highlight:contains(keyword2) { background:#FF0 }
-or more realistically than ideally, using a different classname for each highlighted term.
Using :contains would complicate things anyway, in that you’d have to make sure that the rule for :contains(cat) comes before the one for :contains(category).
Brian and Matt
Thanks for the comments so far; we’ll take them on board for future releases of the script. Here’s our considered response to some of the questions and criticisms so far:
“Why not just make the text a different colour rather than highlighting it?”
That’s up to you and your style sheets. The keyword highlighter may be better described as keyword tagger; it surrounds the words the visitor searched for with a span tag that has a special class. You can use this class to highlight the words anyway you want with CSS (without needing to touch the PHP code at all). Whether that is by changing the colour of the text or adding verbal stress to the word for screen readers is up to you. You may even want to change the use of the span element to that of strong or em for a more semantic value; it is your decision.
“How about doing this with Javascript”
You could implement this in Javascript (as some have done) or any client-side technology. While we have no qualms about doing this, we decided a server-side method held more benefits. For example, you can guarantee that all user-agents will receive the same output on the server-side, something you cannot say about a client-side script (e.g. clients with no understanding of Javascript will not highlight anything). You can also augment the server-side script in ways you couldn’t client-side, such as integrating it with a site-wide search as we mention in the article.
“Using Javascript allows you to disable the highlighting without an extra trip to the server”
Let’s polarise this: instead of using Javascript to highlight the search terms, why not continue to highlight server-side, and add Javascript to the page to allow the added span elements to be removed from the DOM? That would remove the need to make a request to the server, without the main functionality relying on client-side code. Another way would be to add an alternative stylesheet that turns off the highlighting.
“Your regular expressions for parsing the HTML are naive”
Implementing a full SGML/XML parser was well beyond the scope of our project, although it may be a good idea for a future version. The main focus of the project was on usability and allowing users to find the information they want faster.
“One additional problem is a search utilising inclusive or exclusive syntax (’+’ or ‘-’)”
We hadn’t thought about this, so thanks for pointing it out. If a user searches for “food -dog”, the highlighting function would try to highlight the words “food” and “-dog”, rather than “dog”. We’ll put in the to-do list and fix it as soon as we can.
“Searching for <! breaks the highlighting function”
Good catch! We missed that one, but it’s been fixed in version 1.8.1 (available from http://suda.co.uk/projects/SEHL/). The problem was that while the special HTML characters were being properly escaped when being searched for, they weren’t when being displayed in the little advisory at the top of the page. So the highlighter was never broken, but the advisory note at the top was.
“Why not use Firefox’s find-in-page feature instead?”
You can, but you can only search for one word or phrase at a time. The highlighting function is available to any visitor without extra effort on their part, and it shows any number of distinct phrases on the page simultaneously.
“Wouldn’t doing it server side have odd results if a browser is using a proxy?”
If we understand you correctly the implication here is that the referring URL would be removed by the proxy; thus nothing will be highlighted, the user is none the wiser, and the system degrades gracefully. Can anyone think of any other problems with a client behind a proxy?
On a final note, we hope you see our code as a seed for future ideas, to allow you to think about how best to provide information to web users, to do more than just provide static information. Remember, our code is released under the GPL, so please feel free to take it, fork it, and make it better! If you have any more comments please let us know. Cheers!
Marcin Brzezinski
Overall, the script is good, but v1.8.1 still spoils our <!—comments containing other HTML tags :(
I’ve sent you an e-mail about the subject.
Cheers!
coda
PHP Highlighter (http://www.hotscripts.com/Detailed/21112.html) is another alternative. I recently implemented it in my own site search engine without much hassle.
Snik
resumé seems to break it. This word was on the test search site, but it just highlights the whole thing!
Says up the top “Why is resum鼯span> highlighted?“
Erki Esken
Instead of using auto_prepend php file you can achieve the same thing with Apache 2.0 output filters, and this way you can add search term highlighting to static html files also.
You need to have mod_ext_filter loaded, and then use this in server level config:
ExtFilterDefine highlight-search-terms cmd=”/path/to/php /path/to/highlight_search_terms.php”
And use this directive in <Location> <Directory> or <Files> block depending on where you want this filter to apply:
SetOutputFilter highlight-search-terms
The highlighting script (be it PHP, Python or whatever) should read from stdin, do its thing and output to stdout.
dusoft
suda.co.uk does’t work.
Mark Tranchant
http://tranchant.plus.com/notes/sehl
I’ve made a couple of modifications to fit in with my dynamic generation from template files and database content; and to cope with my caching strategy. See the above URL for details.
Martin Kliehm
Nice script, but when your search term includes a German “Umlaut” like “ü” (ü), all text vanishes!
Also I got strange results when the searched text includes several <div>s, <!—comments—>, and <?php ?> tags, don’t know which is responsible. Then the first <div> block is ignored and only text in the second is highlighted…
Matijs van Zuijlen
Highlighting is nice if the words you were looking for are in an obscure part of the page. However, it quickly becomes annoying when the searched words are very common in the main text. Precisely because those words stand out, it gets very hard to read the rest.
porneL
I’d recommend using XML/DOM parser instead of risky and costly regular expressions. And here you go – another advantage of using valid XHTML.
diseño web
congratulations for the article.
watches
<h1>You are invited to check the sites about adult dvd | adult dvd | http://www.adult-dvd-top-shop.info/ | <br> card credit | card credit | http://www.card-credit-4u.info/ | <br> apply card credit | apply card credit | http://www.apply-card-credit-4u.info/ | <br> cheap flight | cheap flight | http://www.cheap-flight-e-site.info/ | <br> emc mortgage | emc mortgage | http://www.emc-mortgage-advisor.info/ | <br> dating | dating | http://www.dating-e-site.info/ | <br> card credit unsecured | card credit unsecured | http://www.card-credit-unsecured-4u.info/ | <br> personal loans bad credit | personal loans bad credit | http://www.personal-loans-bad-credit-ebanking.info/ | <br> credit report | credit report | http://www.credit-report-4u.info/ | <br> equipment used for golf | equipment used for golf | http://www.equipment-used-for-golf-e-course.info/ | <br> debt management solution | debt management solution | http://www.debt-management-solution-advisor.info/ | <br> capital card credit one | capital card credit one | http://www.top-card-credit-one-4u.info/ | <br>… </h1>
Jeff
Rather than defining colors to use for highlighting, you can use css colors to automatically specify whichever colors the browser uses by default for highlighting. Look here: http://bombingpixels.com/css/testing/css-user-interface.htm
Dante
“Gracias” in inglés es “Thanks” no “Congratulations”.