Suppose you need to programmatically analyze some web pages that are protected by a login procedure and you have a valid login to the site. A simple solution is issuing a POST request to the login page with the correct credentials, then continue to use the same cookie container to issue subsequent downloads, but in some situation this is not enough. Suppose the site uses some strange login procedure that uses redirect

This sometimes happens: you do a postback with your credentials, then a page is rendered where a javascript code automatically do another postback to another page, and finally another javascript finally takes you to the landing page for successful login. Other example happens when the login procedure involves some javascript code that needs to be executed before a postback.

A possible solution is using the WebBrowser control to navigate to login page, then locate the texboxes controls for UserName and password, locate the “submit” button, wait for all redirect and finally grab the cookie from the webbrowser control. This solution is simple, because login procedure is executed inside a real Browser and we only need to grab the cookies when the whole procedure ends.

This is a sample of possible code.

   1: hiddenBrowser = new WebBrowser();

   2: hiddenBrowser.DocumentCompleted += webBrowser1_DocumentCompleted;

   3: hiddenBrowser.Navigate("myloginsite");

   4:    

Step 1: create a webbrowser control, handle DocumentCompleted event and then navigate to the login page.

DocumentCompleted is raised when the page is fully loaded, and is where we need to issue the login procedure.

   1: var theBrowser = (WebBrowser)sender;

   2: HtmlElement inputElementByName = GetInputElementByName("name_of_the_input_control_for_username", theBrowser);

   3: if (inputElementByName == null) return;

   4: inputElementByName.SetAttribute("value", "username");

   5: HtmlElement elementByName = GetInputElementByName("name_of_the_input_control_for_password", theBrowser);

   6: elementByName.SetAttribute("value", "********");

   7: HtmlElement htmlElement = GetInputElementByName("name_of_submit_button", theBrowser);

   8: htmlElement.InvokeMember("click");

Step 2: Locate the two input controls for username and password, fill them with right values, then locate the submit button and finally invoke the “click” method

As you can see the code is really simple, input control can be located by name, by id, or by classes, for this simple example I locate them by name with this simple function.

   1: private HtmlElement GetInputElementByName(

   2:     string fieldName, 

   3:     WebBrowser webBrowser)

   4: {

   5:     HtmlElementCollection allInput = 

   6:         webBrowser.Document.GetElementsByTagName("input");

   7:     foreach (HtmlElement htmlElement in allInput)

   8:     {

   9:         if (htmlElement.Name.Equals(fieldName, StringComparison.InvariantCultureIgnoreCase))

  10:         {

  11:             return htmlElement;

  12:         }

  13:     }

  14:     return null;

  15: }

Step 3: Function to locate an input control by name.

This function is really simple, it iterates on all HTMLElement of type “input” present in the page, for each of them check if the name is equal to desidered one, and simply returns the element.

Ok, now we simplylet the WebBrowser control navigates to the login page, wait for every possible redirect, and finally grab the cookie. One of the problem you face when you try to get cookie is due to HttpOnly cookies. HttpOnly cookes lives only inside the browser, and cannot be managed by javascript or other browser code, but we really need to grab them to be able to use a WebRequest to download pages protected by login. Huston, we have a cookie problem

HttpOnly cookie are meant to prevent malicious javascript code to access them, but clearly they are stored somewhere in the system, so we need to resort to windows API to retrieve them.

   1: String hostName = _webBrowser.Url.Scheme + Uri.SchemeDelimiter +

   2:                   _webBrowser.Url.Host;

   3: Uri hostUri = new Uri(hostName);

   4: CookieContainer container = CookieHelpers.GetUriCookieContainer(hostUri);

   5: CookieCollection cookieCollection = container.GetCookies(hostUri);

   6: _container.Add(cookieCollection);

Step 4: Determine base uri of the site and grab all cookies thanks to the function CookieHelpers.GetUriCookieContainer

All the work is done inside the GetUriCookieContainer method, that use windows API to retrieve cookie, once the CookieContainer used by the WebBrowser is grabbed, you can simply get the CookieCollection and set to another CookieContainer that will be used by subsequent WebRequest object.

   1: [DllImport("wininet.dll", SetLastError = true)]

   2: public static extern bool InternetGetCookieEx(

   3:   string url, 

   4:   string cookieName, 

   5:   StringBuilder cookieData, 

   6:   ref int size,

   7:   Int32  dwFlags,

   8:   IntPtr  lpReserved);

   9:  

  10:  private const Int32 InternetCookieHttponly = 0x2000;

Step 5: declare import to use Windows API

Now we can use the InternetGetCookieEx to grab all the cookie.

   1: /// <summary>

   2: /// Gets the URI cookie container.

   3: /// </summary>

   4: /// <param name="uri">The URI.</param>

   5: /// <returns></returns>

   6: public static CookieContainer GetUriCookieContainer(Uri uri)

   7: {

   8:     CookieContainer cookies = null;

   9:     // Determine the size of the cookie

  10:     int datasize = 8192 * 16;

  11:     StringBuilder cookieData = new StringBuilder(datasize);

  12:     if (!InternetGetCookieEx(uri.ToString(), null, cookieData, ref datasize, InternetCookieHttponly, IntPtr.Zero))

  13:     {

  14:         if (datasize < 0)

  15:             return null;

  16:         // Allocate stringbuilder large enough to hold the cookie

  17:         cookieData = new StringBuilder(datasize);

  18:         if (!InternetGetCookieEx(

  19:             uri.ToString(),

  20:             null, cookieData, 

  21:             ref datasize, 

  22:             InternetCookieHttponly, 

  23:             IntPtr.Zero))

  24:             return null;

  25:     }

  26:     if (cookieData.Length > 0)

  27:     {

  28:         cookies = new CookieContainer();

  29:         cookies.SetCookies(uri, cookieData.ToString().Replace(';', ','));

  30:     }

  31:     return cookies;

  32: }

Step 6: Grab all cookie with InternetGetCookieEx api, this is needed to retrieve HttpOnly cookie

Now the game is done. As a last warning I suggest you to clear all WebBrowser cookie before starting the login procedure, because it could lead to problems. I found this solution on StackOverflow (I do not remember the link sorry Smile )

   1: private const int INTERNET_OPTION_END_BROWSER_SESSION = 42;

   2:  

   3: [DllImport("wininet.dll", SetLastError = true)]

   4: private static extern bool InternetSetOption(IntPtr hInternet, int dwOption, IntPtr lpBuffer, int lpdwBufferLength);

   5:  

   6: public static void ClearCookie()

   7: {

   8:     InternetSetOption(IntPtr.Zero, INTERNET_OPTION_END_BROWSER_SESSION, IntPtr.Zero, 0); 

   9: }

Snippet 1: Method to clear all the cookie, this is needed to be sure that the webControl has no cookies when login procedure begins.

alk.

Tags: ,

2 Responses to “Use a WebBrowser to login into a site that use HttpOnly cookie”

  1. Thanks for this post. Excellent!

    One question though:
    Why 42 on this line “private const int INTERNET_OPTION_END_BROWSER_SESSION = 42;”?

  2. It is simply a constant of windows API, you can find value in .h files, or using tools that permits you to understand the value of a specific API constant.