How to Deploy Web Site with PowerShell DSC

I do not want to create another tutorial on DSC and I suggest you reading some introductory articles like: Introducing PowerShell Desired State Configuration before reading this article. Since I’m pretty new with PowerShell and I’m starting experimenting with DSC I decided to start creating a script to deploy my favorite test application (TailspinToys :) ) on a single Windows 2012 R2 server using only DSC. This post aims to share my thought on the subject.

I was able to complete the script, even if I encountered some difficulties and I manage to automate almost everything, except the installation of Sql Server 2012 (I’m working on it). The goal is being able to deploy an application that uses a SQL server database written in Asp.Net 4.5 to a Windows Server with a fresh install, using only DSC Goodness.

First of all I warn you that most of the resources I needed to deploy my site are not available in basic distribution of PoweShell and should be downloaded from Microsoft. To download all the resources in a single package there is a single page in MSDN to download the entire DSC Resource Kit.

After you downloaded the resource kit you should care about a couple of important points, the first one is that these resources are not production ready and they are all experimental. This is the reason why all these resources starts with an x. So do not expect any official program to support them, if you have problem you should ask people in the forum and you will found solution. The other aspect is: if you, like me, appreciate the push model, you need to install all of these modules to all target servers. This violates in a certain way my requirement of being able to install in a clean server, because the server is not really “clean” if you need to have DSC resources deployed on it. This problem will be mitigated with WMF 5.0 that introduces the concept of PowerShellGet to automatically discover, install and update Powershell Modules, so it is really a no-problem.

Once everything is in place, I started creating the script, the first part is the standard one you can find in every PowerShell DSC related article, plus some import instructions to import all the DSC resources I want to use in the package.

Configuration TailspinToys
{
   
  Import-DscResource -Module xWebAdministration
  Import-DscResource -Module xNetworking
  Import-DscResource -Module xSqlPs
  Import-DscResource -Module xDatabase
  #http://www.vexasoft.com/blogs/powershell/9561687-powershell-4-desired-state-configuration-enforce-ntfs-permissions
  Import-DscResource -Module NTFSPermission

  Node $AllNodes.NodeName 
  { 
    

    #Install the IIS Role 
    WindowsFeature IIS 
    { 
      Ensure = “Present” 
      Name = “Web-Server” 
    } 

    # Required for SQL Server 
    WindowsFeature installdotNet35 
    {             
        Ensure = "Present" 
        Name = "Net-Framework-Core" 
        Source = "\\neuromancer\Share\Sources_sxs\?Win2012R2" 
    } 

    #Install ASP.NET 4.5 
    WindowsFeature ASP 
    { 
      Ensure = “Present” 
      Name = “Web-Asp-Net45” 
    } 

In the beginning of the script the Import-DscResource allow me to import the various resources I’ve installed, and NTFS Permission resource is taken from an article on VexaSoft site; many thanks to the author for authoring this module. That article is really useful because it shows how easy is create a resource for DSC in the situation where there is nothing already pre-made to obtain your purpose.

I use a configuration resource and the special name $AllNodes will contain the name of the single server I want to use for the installation. The above part of the scripts takes care of all of the prerequisites of my TailspinToys application. I’m installing .NET 3.5 because it is needed for Sql Server installation, but sadly enough I was not able to make the xSqlServerInstall works, to automatically install Sql Server (Actually it asks me to reboot and even rebooting the DSC scripts stops to run). I’ve decided to install Sql Server manually and wait for a better and more stable version of xSqlServerInstall. Then I request IIS and asp.net 4.5.

Running the above script with the right configuration data produces a mof file that can be used to actually configure the target. Here is the configuration I’,m using.

$ConfigurationData = @{
    AllNodes = @(
        @{
            NodeName="WebTest2"
            SourceDir = "\\neuromancer\Drops\TailspinToys_CD_WebTest1\TailspinToys_CD_WebTest1_20140213.1\"
            PSDscAllowPlainTextPassword=$true
            RebootNodeIfNeeded = $true
         }
   )
}

I need the name of the server and the source directory where I stored the distribution of my WebSite. In this example I’m using a standard Drop Folder of a TFS Build, so I have my binaries indexed with my symbol server. The creation of the mof file is simply triggered calling the new defined function TailspinToys passing the configuration above..

TailspinToys -ConfigurationData $ConfigurationData 

Now I have a mof file that contains everything I need to create the deploy, and I can push configuration to desired nodes with:

Start-DscConfiguration -Path .\TailspinToys -Wait -Verbose

This will start configuration, connect to all the nodes (in this example the single machine WebTest2) and “make it so”, moving the state of the nodes to desired state. The cool part of DSC is that you specify the state you desire on the target nodes, without taking care on how this state will be achieved, this is done by the various resources. Another interesting aspect is, if a resource is already in desired state, the Start-DscConfiguration will do nothing. When you run the above script the first time it needs a little bit time, because it will install IIS, but if IIS is already installed in target node, nothing happens.

With few lines of PowerShell I was able to install IIS and Asp.NET 4.5 plus .NET 3.5 to my machines.

In the next article I’ll deal on how to deploy the website bits.

Gian Maria.

NDepend 4 and CQLinq

You already know that I’m a fan of NDepend because it is a really useful tool to have a deep insight on your code and especially to spot out troublesome areas in your project.

With version 4 it added a really cool capability called CQLinq or Code Query Linq; an amazing feature that permits you to query your code using LINQ stile queries. Basically after I’ve analyzed a solution I got presented with a simple dialog that asked me what I’m interested to do primarily with the result of the analysis.

image

This feature is really amazing because you can query your code to find almost everything. It would be really long to show you every capability of CQLinq, you can read about it in official documentation, but I want to give you just a taste of what you can achieve with it.

image

Here I’m simply selecting all the methods that contains more than 30 lines of code and in the select part I’m interested in the number of comment lines. Long methods are painful, but if you find a method of 40 lines with 30 lines of comment you are in trouble, because if a developer put so much comment in long method, surely there is some weird logic in it.

CQLinq is full of interesting features, as an example if you select from JustMyCode.Methods you are actually selecting only your code, and not code generated from UI Designer. This will permit you to focus primary on your code and avoiding to have the result of the query polluted by Designer generated code.

NDepend designer fully support intellisense and this really helps you to achieve the result in really little time, the result is immediately shown below the query as soon as the query compiles, and if you made some mistake you are immediately presented with a simple list of compiling errors that makes you understand what is wrong with the query.

image

As usual I think that NDepend is a must-to-have tool especially if you need to find problem in legacy code. I strongly suggest you to check out all the possibility on official documentation page, to have an idea of the capability of the tool.

You can also download a trial of the product to try out by yourself in your project.

Gian Maria.

Git is fantastic but IMHO too complex to use

I’ve blogged some time ago on the basic fact that I’m not a Git lover and a couple of days ago I stumbled upon this post (10 things I hate about Git), where the author did a full explanation of various reason why he does not like Git.

If you have read in the past the excellent book by David Platt Why Software Sucks you should agree that, while Git is surely a really good, fast, powerful, complete source control system, that permits you to do marvelous stuff, it lacks simplicity. And although it is a tool for programmer I think that it is too difficult to use for the basic everyday operations. In my opinion Git is a perfect example of a tool that was built to make complex thing possible but not to make simple things simpler (as suggested in David Platt’s book).

The result is a tool that surely is the most complete and powerful source control system you can use today, but that requires a lot of training to avoid to get lost with it. I hope that in the future some effort will be devoted to make everyday work simpler and clearer, avoiding the need to face complexity in everyday work, leaving the complexity only when it is time to do complex things. This will really make Git an exceptional and outstanding product.

Gian Maria.

Getting started with Lucene.NET–Searching

Previous part of the series

In the previous part I’ve showed how easy is to create an index with lucene.net, but in this post I’ll start to explain how to search into it, first of all what I need is a more interesting example, so I decided to download a dump of stack overflow, and I’ve extracted the Posts.Xml file (10 GB of XML file), then I wrote this simple piece of text to create the lucene index.

using (FSDirectory directory = FSDirectory.Open(luceneDirectory))
using (Analyzer analyzerStandard =
    new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29))
using (IndexWriter indexWriter = new IndexWriter(directory, analyzerStandard, IndexWriter.MaxFieldLength.UNLIMITED))
{
    Int32 i = 0;
    using (XmlReader reader = XmlReader.Create(@"D:\posts.xml"))
    {
        while (reader.Read())
        {
            if (reader.NodeType == XmlNodeType.Element &&
                reader.Name == "row")
            {
                Document luceneDocument = new Document();
                luceneDocument.Add(new Field("Id", reader.GetAttribute("Id"), Field.Store.YES, Field.Index.NO));
                luceneDocument.Add(new Field("content", reader.GetAttribute("Body"), Field.Store.NO, Field.Index.ANALYZED));
                indexWriter.AddDocument(luceneDocument);
                Console.CursorLeft = 0;
                Console.Write("Indexed Documents:" + ++i);
            }
        }
    }

    indexWriter.Optimize();
    indexWriter.Commit();
}

This code is really similar to the previous post, the only difference is that the Directory used to store the Index is a FSDirectory (File System Directory) because I want to create a permanent index on disk, then I simply use a XmlReader to scan the file (10 GB Xml file needs to be read by an XMLReader because if you try other method you will find performance trouble), and I decided to analyze the attribute “Body” of the <row> node, storing the Id of the post.

Indexing such a huge amount of data needs time, but the most important thing I want to point out is the call to Optimize() before the call to Commit(). Basically Lucene.NET indexes are composed by segments of various length, more segments in an index, less performance you have, but if you callOptimize() method lucene will collapse the index in a single segment, maximizing search performances. Remember that Optimization is a long and time consuming process because lucene needs to read all the index, merge them and rewrite a single file, so it worth calling it during low system usage time (es during the nigth if the system is idle), or after a big change of the index. You can also pass an integer to Optimize, specifying the maximum number of segments you want in the index; as an example you can specify 5 if you want your index to contains at maximum 5 segments (it is a good tradeoff because it can save time and have a good performing index).

In the above example the call to Optimize could be avoided entirely, because if you continue to add documents to the very same index, lucene try to keep the index optimized during writing, if you start this program you can see lots file of about 7MB length created in the FSDirectory, after a little bit files get chained together so you see less and lager files. The call to optimize is really necessary if you modify the index lots of time closing and reopening the IndexWriter. Remember also that until you do not call Commit, if you open a IndexSearcher on the Index directory you do not see any of the new indexed documents.

After the index was created you can search on it simply using an Index reader and a Query Parser.

using (FSDirectory directory = FSDirectory.Open(luceneDirectory))
using (Analyzer analyzerStandard = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29))
{
    QueryParser parser = new QueryParser("", analyzerStandard);
    using (IndexSearcher indexSearcher = new IndexSearcher(directory, true))
    {
        var query = parser.Parse("content:child*");
        TopDocs result = indexSearcher.Search(query, 20);
        Console.WriteLine("N° results in index:" + result.TotalHits);
        for (int i = 0; i < result.ScoreDocs.Length; i++)
        {
            var score = result.ScoreDocs[i].Score;
            Document doc = indexSearcher.Doc(result.ScoreDocs[i].Doc);
            var Id = doc.Get("Id");
            Console.WriteLine("Match id " + Id + " score " + score);
        }
    }
}

The code is really simple, I open the directory and create an analyzer, then I need a QueryParser, an object capable of parsing query from string, that is really useful to parse user inserted strings. In my example I search all the documents where the Field content contains the word child* because the character * (asterisk) matches any number of chars. This query will match Child, Children, and so on. The query is in the form fieldname:searchstring but the first parameter of QueryParser constructor is the default field, this means that if you create the QueryParser in this way

QueryParser parser = new QueryParser("content", analyzerStandard);

You can simply parse the query “child*” instead of specifying “content:child*” because the QueryParser automatically issue the query to field content. Lucene query syntax permits you to specify more complex query like this “+child* +position*” that matches all documents that contains both child* and position*. You can use AND or and other advanced techniques like for example this query “child* position*”~10 that search word child* and position* but matches only if the distance between them is less or equals to 10 words. You can also search for similarity, if you search for Children~ you are searching for terms similar to Children, so you can match terms like Chidren, a misspelled version of the word you are searching.

The result of a search is a simple object of type TopDocs that contains all the docs that matched in the ScoreDocs array, it contains also the total number of matches in the field TotalHits. To show results you can simply cycle inside the ScoreDocs to get information about documents that matched the query. In this example, since I’ve not included the body in the index (Field.Store.NO) I can only retrieve the field “Id” from the document returned from the query, and I need to reopen the original XML file if I want to know the Body of the post that matches. If you do not want to reopen the original XML file to get the Body of the post, you can change Storage of the content field to Field.Store.COMPRESS.

luceneDocument.Add(new Field("Id", reader.GetAttribute("Id"), Field.Store.YES, Field.Index.NOT_ANALYZED));
luceneDocument.Add(new Field("content", reader.GetAttribute("Body"), Field.Store.COMPRESS, Field.Index.ANALYZED));

In this example I’ve changed the Id from Field.Index.NO to Field.Index.NOT_ANALYZED, this means that the Id field should not be analyzed, but you can search for exact match. If you leave the value as Field.Index.NO, as in the previous snippet, if you issue a query “Id:100” to find the document with Id = 100 you will get no result. Content field is changed from Field.Store.NO to Field.Store.COMPRESS, this means that the entire unchanged value of the field is included in the index in compressed format and can be retrieved from the result of a query. Now you can get the original unchanged content calling doc.Get(“content”). The reason why you need to include the content in the index with the Field.Store.COMPRESS is due to the fact that indexes lose completely the original structure of the field if you specify Field.Index.ANALYZED, because the index only contains terms and you completely lost the original text. Clearly such an index will occupy more space, but with compression is a good tradeoff, because you are immediately able to find original text without the need to go to to the original store (our 10 GB xml file in this example).

Just to conclude this second part I want to summarize the various usage of the Field.Store and Field.Index value for document fields.

A combination of Field.Store.YES and Field.Index.NO is used usually to store data inside the document that will not be used to search, it is useful for Database Primary keys, or other metadata that you need to retrieve from the result of a search, but that you do not need to use in a search query.

A combination of Field.Store.YES and Field.Index.NOT_ANALYZED or Field.Index.NOT_ANALYZED_NO_NORMS is used for fields that you want to use in a search, but should be treated as a single value, as for example Url, Single word, Database Index that you want to use on query, and all identifiers in general. You should use the NOT_ANALYZED_NO_NORMS if you want to save index space and you will not use index boosting (an advanced feature of lucene).

A combination of Field.Store.YES (or Field.Store.COMPRESS) and Field.Index.ANALYZED is used to store text you want to search into and that you want to retrieve from query result. This is useful if the original text is part of a document, or of a large file (as in this example) and retrieving it is a time consuming thing, so it can be better to store it in the index.

A combination of Field.Store.NO and Field.Index.ANALYZED is used to store text you want to search into but you are not interested to retrieve from query result. This is useful if you have the original text in a database and you can retrieve it with a single fast query if needed.

Gian Maria

Getting Started with Lucene.net

I started working with Lucene.Net and I should admit that is a real powerful library, but it is really huge and needs a little bit of time to be mastered completely. Probably one of the best resource to keep in mind is the FAQ, because it contains really most of the more common question you can have on Lucene and it is a good place to start. Another good place is the Wiki that contains other useful information and many other link to relevant resources.

Getting started with lucene.net is really simple, after you grabbed the bits and placed a reference in your project you are ready to search in your “documents”. Lucene has a set of basic concepts that you need to grasp before starting using it, basically it has Analyzers that elaborate documents to create indexes that are stored Directory and permits fast search; searches are done with IndexSearcher that are capable of searching data inside a directory previously populated by analyzers and indexes. Now lets see how you can index two long string of text:

using (RAMDirectory directory = new RAMDirectory())
using (Analyzer analyzer = new Lucene.Net.Analysis.Standard.StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_29))
{
    String test = "la marianna la va in campagna......";
    String test2 = "Lorem Ipsum è un testo segnaposto .....";
    using (IndexWriter ixw = new IndexWriter(directory, analyzer))
    {

        Document document = new Document();

        document.Add(new Field("Id", test.GetHashCode().ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.NO));
        document.Add(new Field("content", test, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
        ixw.AddDocument(document);

        document = new Document();
        document.Add(new Field("Id", test2.GetHashCode().ToString(), Field.Store.YES, Field.Index.NOT_ANALYZED, Field.TermVector.NO));
        document.Add(new Field("content", test2, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
        ixw.AddDocument(document);
        ixw.Commit();
    

The code can seem complex, but it is simpler than you can think if you observe it carefully. First of all we need to create a Directory where we want to store the index, for this sample I use a RAMDirectory that simply stores everything in RAM and it is really fast and useful if you need to do quick search into text and you do not want to maintain the index for future searches. After the Directory you need to create an Analyzer that is the component capable of analyzing the text. Notice how both Directory and Analyzer are in a using clause, because you need to dispose them when you do not need them anymore.

Then I have two long strings to index, so I created an IndexWriter that will use the directory and the analyzer previously created and finally I called AddDocument() adding documents to the index. In Lucene a Document is nothing more than a bunch of Key and Value pairs, that contains data you want to go into the index. The complexity of creating a document is deciding what to do with each pair because you need to tell exactly to lucene what you want to be indexed and/or included in the index. If you look at the Field constructor the first two parameters are the name and value of the field, but they are followed by some specific Lucene enum values.

The first one is storage information, it can be YES, COMPRESS or NO, and basically tells lucene if the content of the field should be stored in the index (YES), stored but compressed (COMPRESS) or not stored (NO). You need to store content in document only if you are interested in retrieving it during a search. Suppose that you are writing an indexing system for data stored in an external Relational Database where you have a table with two column called Id and Content, if you want to index that table with Lucene to find id of documents that contains specific text, you want to store in the index the original Id of the Database  Row to retrieve it with a  search. When you will issue a search you will be able to retrieve that value from the document returned from the query.

The second parameter is an enum that tells Lucene if the field should be analyzed, in my example the Id field (I used the hashcode of the string to create an  unique id for this quick example, but in the previous scenario is is the id of a table in database) is not analyzed, because I do not need to search inside its content. The final constant specify to lucene if we want to store in the index the position of the various world that are contained in the text, again for the Id field I do not need to analyze anything. For the content field I decided to store it in the index (Field.Store.YES) because I want the original text to be included in the index, I want it to be analyzed (Field.Index.Analyzed) because I want to search text inside it and finally I want to store count of terms with positions and offsets (Field.TermVector.WITH_POSITIONS_OFFSETS).

Finally I call Commit() method of the IndexWriter to make it flush everything to the Directory, if you forget to call Commit and you will open an IndexSearcher that pints to the same RAMDirectory, probably you will not find all the documents indexed, because the IndexWriter caches results in memory and does not write directly to the directory each time AddDocument is called.

In the next post I’ll show how easy is to search inside a lucene Index Directory.

Gian Maria.