Use a WebBrowser to login into a site that use HttpOnly cookie
Suppose you need to programmatically analyze some web pages that are protected by a login procedure and you have a valid login to the site. A simple solution is issuing a POST request to the login page with the correct credentials, then continue to use the same cookie container to issue subsequent downloads, but in some situation this is not enough. Suppose the site uses some strange login procedure that uses redirect
This sometimes happens: you do a postback with your credentials, then a page is rendered where a javascript code automatically do another postback to another page, and finally another javascript finally takes you to the landing page for successful login. Other example happens when the login procedure involves some javascript code that needs to be executed before a postback.
A possible solution is using the WebBrowser control to navigate to login page, then locate the texboxes controls for UserName and password, locate the submit button, wait for all redirect and finally grab the cookie from the webbrowser control. This solution is simple, because login procedure is executed inside a real Browser and we only need to grab the cookies when the whole procedure ends.
This is a sample of possible code.
|
|
Step 1: create a webbrowser control, handle DocumentCompleted event and then navigate to the login page.
DocumentCompleted is raised when the page is fully loaded, and is where we need to issue the login procedure.
|
|
As you can see the code is really simple, input control can be located by name, by id, or by classes, for this simple example I locate them by name with this simple function.
|
|
Step 3: Function to locate an input control by name.
This function is really simple, it iterates on all HTMLElement of type input present in the page, for each of them check if the name is equal to desidered one, and simply returns the element.
Ok, now we simplylet the WebBrowser control navigates to the login page, wait for every possible redirect, and finally grab the cookie. One of the problem you face when you try to get cookie is due to HttpOnly cookies. HttpOnly cookes lives only inside the browser, and cannot be managed by javascript or other browser code, but we really need to grab them to be able to use a WebRequest to download pages protected by login. Huston, we have a cookie problem
HttpOnly cookie are meant to prevent malicious javascript code to access them, but clearly they are stored somewhere in the system, so we need to resort to windows API to retrieve them.
|
|
Step 4: Determine base uri of the site and grab all cookies thanks to the function CookieHelpers.GetUriCookieContainer
All the work is done inside the GetUriCookieContainer method, that use windows API to retrieve cookie, once the CookieContainer used by the WebBrowser is grabbed, you can simply get the CookieCollection and set to another CookieContainer that will be used by subsequent WebRequest object.
|
|
Step 5: declare import to use Windows API
Now we can use the InternetGetCookieEx to grab all the cookie.
|
|
Step 6: Grab all cookie with InternetGetCookieEx api, this is needed to retrieve HttpOnly cookie
Now the game is done. As a last warning I suggest you to clear all WebBrowser cookie before starting the login procedure, because it could lead to problems. I found this solution on StackOverflow (I do not remember the link sorry )
|
|
Snippet 1: Method to clear all the cookie, this is needed to be sure that the webControl has no cookies when login procedure begins.
alk.