Highlight words in webbrowser control

In windows forms the WebBrowser control permits to include a fully funcional browser into your application. The interesting things is that you can interact with the html of the site with no problem. As an example you can load a page and highlight some words into the text, here is the result of loading www.nablasoft.com and I want to highlight "laureati" and "passione".

image

As you can see I’ve highlighted the two words, the code is really simple.

private void button1_Click(object sender, EventArgs e) { webBrowser1.Navigate("http://www.nablasoft.com/"); webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted; } void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) { IHTMLDocument2 doc2 = webBrowser1.Document.DomDocument as IHTMLDocument2; StringBuilder html = new StringBuilder(doc2.body.outerHTML); var words = new[] { "laureati", "passione" }; foreach (String key in words) { String substitution = "<span style='background-color: rgb(255, 255, 0);'>" + key + "</span>"; html.Replace(key, substitution); } doc2.body.innerHTML = html.ToString(); }

First of all you need to wait that the document finished to load before you can interact with the content, this is simple because you can use the event DocumentCompleted. To access content you should refer the MSHTML COM Library in your project since the webbrowser uses the mshtml internally. To highlight words I simply grab all content, surround the keywords with a simple span with a background style and then replace the whole html text again…really simple.

alk.

Tags:

Published by

Ricci Gian Maria

.Net programmer, User group and community enthusiast, programmer - aspiring architect - and guitar player :). Visual Studio ALM MVP

21 thoughts on “Highlight words in webbrowser control”

  1. Hello! I’m using the C# language, I have tried your coding and convert it into C#.

    My program haven’t run the code in documentcompleted eventhandler

    when it step to the first statement (HtmlDocument abc = (HtmlDocument)webBrowser1.Document.DomDocument;
    ), it step out

    Can you tell me which part i have caused the problem?

    private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
    HtmlDocument abc = (HtmlDocument)webBrowser1.Document.DomDocument;
    StringBuilder html = new StringBuilder(abc.Body.OuterHtml);

    string text = \school\;

    string substitution = \\ + text.ToString() + \\;

    html.Replace(text.ToString(), substitution);

    abc.Body.InnerHtml = html.ToString();
    }

  2. nice and simple . a simple step further to go seems to put a simple button and a textbox for search

    this is the code for the button

    private void button8_Click(object sender, EventArgs e)
    {
    //if containts >1 word
    if(searchKeywords.Length > 0)
    {
    //reverse the highlighted keywords if it only worked!!
    foreach(string keyword in searchKeywords)
    {
    var temp_str = “” + keyword + “”;
    webBrowser1.DocumentText = webBrowser1.DocumentText.Replace(temp_str, keyword);
    }
    webBrowser1.Invalidate();
    }
    //the textbox may have many word divided by char space
    var keywords = textBox2.Text.Split(‘ ‘);
    //grab the values for use to reverse highlighting
    searchKeywords = keywords;

    //highlight search keywords in webbroswer control
    foreach (string keyw in keywords)
    {
    //MessageBox.Show(index.ToString());
    webBrowser1.DocumentText = webBrowser1.DocumentText.Replace(keyw, @””+keyw+””);
    }
    //control repaint
    webBrowser1.Invalidate();

    }

    but the reversing of highlighting doesn’t work . what do you think?

  3. private void RealBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
    {
    //Watch list
    KeywordManager keys = new KeywordManager();
    string[] words = keys.GetKeywordList();
    try
    {

    mshtml.HTMLDocument doc2 = (mshtml.HTMLDocument)RealBrowser.Document.DomDocument;
    StringBuilder html = new StringBuilder(doc2.body.outerHTML);

    foreach (String key in words)
    {
    String substitution = “” + key + “”;
    html.Replace(key, substitution);

    doc2.body.innerHTML = html.ToString();
    }
    }
    catch (Exception ex)
    {
    MessageBox.Show(ex.Message);
    }
    }

    i use this and when i click on the highlighted HTML link it is not working

  4. The problem with this kind of code, is that we manipulate directly the DOM in a very harsh way. As an example that code replaces key even inside tag, and sometimes this completely destroy the html page. The main problem is that you need an html parser to avoid HTML structure corruption.

    A better solution I found, as I said before, was switching to an alreaady tested javascript function to hilite text.

    To reverse hiliting the best way is to save the original content of the page, and when you need to remove hilighting you simply reassign old content to the innerHTML property.

    alk.

  5. I did your code above and it worked for me. My Question is, can i use outertext instead of outerhtml from the body so that i can highlight those text inside a html tag. like inside an “A” element. Sample is

    “This is a HighLight Me please sample”.

    My search word is “This is a HighLight Me please sample” and this should match the above exampe and ignoring the html element.

  6. Doing what you want is just a matter of creating a javascript that is able to do this. But it seems not simple to me. The javascript functino I used was taken from a third party blog.

    Alk.

  7. HighLight Results in WebBrowser Control for WPF …

    I nedd to do a little variation of this code to make it works in WPF, here is the code:

    1- I need to add the reference: Microsoft.mshtml (solution explorer / Referenes)

    2- add using mshtml;

    3- code for the event LoadCompleted (wbContenido = WebBrowser control)

    private void wbContenido_LoadCompleted(object sender, NavigationEventArgs e)
    {
    mshtml.HTMLDocumentClass dom = (mshtml.HTMLDocumentClass)wbContenido.Document;
    //Console.Write(dom.body.innerHTML);

    IHTMLDocument2 doc2 = dom;// wbContenido.Document.DomDocument as IHTMLDocument2;
    StringBuilder html = new StringBuilder(doc2.body.outerHTML);

    var words = new[] { “Examen”, “passione” };
    foreach (String key in words)
    {
    String substitution = “” + key + “”;
    html.Replace(key, substitution);
    }

    doc2.body.innerHTML = html.ToString();
    }

  8. Thanks for your code. It’s works. Can you help us on how to count the highlighted words on web Browser Content using C#?

  9. Unfortunately it is a long time I do not work anymore with the Browser Control, but you should be able to simply access the innerText property of the doc2.Body and then use whathever technique you want (ex regular expression) to find number of text occurrence in a string.

  10. I tried your codes for our thesis it help a lot to us. Thank you
    But have a problem need to generate a report: counting of hightlighted word, date and who posted it. Our study is Filtering of word on online forum using data mining.

  11. It is more complex, you need to feed the complete page HTML into a parser, then know the structure of HTML to identify single posts, and finally parse posts to find author and date. This is complicated by the fact that each forum can use a different layout

  12. I have these codes below where it can identify the bad words (those words are stored in the database) you’ve entered with the Web Browser Control and turn it into asterisk (*). I have been struggling with case sensitive in which you can enter either lower case or upper case (example: HeLlo)

    string query;
    query = @”select Word from ListWords”;

    List words = new List();

    DataSet ds;
    DataRow drow;

    ds = DatabaseConnection.Connection1(query);
    int index, total;

    total = ds.Tables[0].Rows.Count;

    string current_word;

    for (index = 0; index < total; index++ )
    {
    drow = ds.Tables[0].Rows[index];
    current_word = drow.ItemArray.GetValue(0).ToString();

    words.Add(current_word);
    }

    Console.WriteLine(query);

    Console.WriteLine("array:" + words);
    foreach (String key in words)
    {

    int len = key.Length;
    string replace = "";

    for ( index = 0; index < len; index++)
    {
    replace += "*";
    }

    html.Replace(key, replace);

    }

    doc2.body.innerHTML = html.ToString();
    }

  13. I have these codes below where it can identify the bad words (those words are stored in the database) you’ve entered with the Web Browser Control and turn it into asterisk (*). I have been struggling with case sensitive in which you can enter either lower case or upper case (example: HeLlo)

    string query;
    query = @”select Word from ListWords”;

    List words = new List();

    DataSet ds;
    DataRow drow;

    ds = DatabaseConnection.Connection1(query);
    int index, total;

    total = ds.Tables[0].Rows.Count;

    string current_word;

    for (index = 0; index < total; index++ )
    {
    drow = ds.Tables[0].Rows[index];
    current_word = drow.ItemArray.GetValue(0).ToString();

    words.Add(current_word);
    }

    Console.WriteLine(query);

    Console.WriteLine("array:" + words);
    foreach (String key in words)
    {

    int len = key.Length;
    string replace = "";

    for ( index = 0; index < len; index++)
    {
    replace += "*";
    }

    html.Replace(key, replace);

    }

    doc2.body.innerHTML = html.ToString();
    }

    Thank you

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.