Microsoft Cloud Authors: Pat Romanski, Andreas Grabner, Nick Basinger, Kevin Benedict, Liz McMillan

Related Topics: Microsoft Cloud

Microsoft Cloud: Article

Corporate Wiki Part 1: Building Your First Wiki Parser

Corporate Wiki Part 1: Building Your First Wiki Parser

  • Corporate Wiki Part 2: Writing Your Own Wiki Search Engine
  • Corporate Wiki Part 3: Files Up and Down - Adding More Features

    Simplicity is the watchword. Before you spend thousands of dollars on a collaboration and knowledge management system, try a corporate wiki. You may find that it fills your needs.

    After a brief introduction to the origin and nature of wikis, this article, the first in a series on building a corporate wiki, will focus on creating a wiki parser, the heart of any wiki.

    Each article in this series will leave you with a complete, usable application. The code listings here illustrate the primary topic of discussion, but you can download the entire solution source code from www.sys-con.com/dotnet/sourcec.cfm.

    Many hypertext navigation and authoring systems have been developed over the years, but not until Ward Cunningham created the WikiWikiWeb did the concept really take off on the Internet. Wiki is reportedly the Hawaiian word for "quick," an alternative to calling it QuickWeb, a name that's not nearly as fun.

    Eight years later, search for "wiki" on Google and you'll get nearly 3 million results. Many personal and enthusiast sites now have a wiki where all visitors may read and contribute.

    There are now as many definitions for "wiki" as there are open source and commercial implementations of wiki Web apps. Along with a variety of features, these applications provide two fundamental services: view and edit.

    The wiki viewer fetches a topic's text and parses it, transforming it to HTML for display (see Figure 1). The editor allows you to edit any topic and save it to the data store from which the viewer fetches topics.


    The first step in building your wiki parser is to decide exactly which plain-text syntax you will support. I have chosen the syntax rules found in Table 1, as they seem to be most common in the wiki sites I have visited. You are certainly free to define your own or modify these.

    If you are already familiar with wikis, you will notice that I have made a strong departure from the traditional Pascal case of wiki topic titles in favor of using case-insensitive topic titles connected or trailed by an underscore "_" character.

    After experimenting with a wiki in my own corporate culture, I chose to do this because users were more comfortable with the underscore. Of course, you could easily modify the code to support any topic title formatting scheme you want to use.

    Some of the plain-text syntax rules here may not be found in Cunningham's original wiki implementation, but I have found all of them in one wiki site or another, with the exception of the ::code:: tag I created to let me insert HTML preformatted text into a topic.

    Given the rules established in Table 1, you will discover that there are three types of text blocks possible: (1) standard plain text, one paragraph per line; (2) bulleted or numbered lists, one bullet per line; and (3) code sections, set off by the ::code:: text on a unique line.


    Each text block type requires a slightly different parsing approach, so I found that it was much simpler to make a quick pass through the text to isolate each block and then iterate through that collection of blocks to parse each by type. To help with this, I created the WikiTextBlock class and WikiTextBlockType enum to handle the job (see Listing 1).

    The parser makes quick work of creating an ArrayList containing each of the text blocks and iterates through the array using a StringBuilder to store the results of each call to the specific parser code that handles that particular block type. Once all of the blocks are parsed, the StringBuilder's ToString() method is called to return the formatted HTML to the page code.

    The only requirements for the Code block type are to replace HTML tag characters ( <, >, and & ) with their HTML-escaped cousins and to surround the text block with the <pre></pre> tag set. That's the easy block type.

    The FormatListWikiText method handles the List type and is the most complex to parse because you have to deal with indentation and closing up the <ul> or <ol> tag sets. This is handily done with a Stack where I push the closing tags of each list when the list opening tag is created, then pull those same closing tags when the list is completed.

    After the HTML bullet tags are handled, the code calls the common parser method FormatStandardWikiLine to prepare the text of each line in the code block. This is also the primary formatting method used for the Plain type text block.

    The FormatStandardWikiLine method executes specific methods, used to easily break down the task in careful order, to transform the text of each line using regular expressions to find and replace the plain-text syntax with HTML formatting.

    In some cases, if the required order of transformations is not followed, unexpected results will occur. The steps in converting the plain text of the wiki entry into HTML are:

    1. Topic links
    2. Hyperlinks
    3. Horizontal rules
    4. Bold italics - phrase
    5. Bold italic underline - word
    6. Headings
    7. Block quotes
    Topic Links
    To parse for topic titles and convert them to HTML links, you need to do two things: (1) find the title in the line; and (2) create a link to the view page or edit page, depending on whether the topic already exists.

    First I created the regular expression string pattern and the instance of the Regex class, using that pattern as static members of the parser (see Listing 2). The parser is entirely static, making it more efficient and easier to use throughout the application.

    The FormatTopicLinks method is called for each line and passed as a reference in the text block. The code (see Listing 3) finds all topic matches in the text using the Regex object, called RxTopic. It then iterates through each match.

    If the match is not the same as the topic of the page being viewed, you process the match to create the link to either the view or edit page. Otherwise, you convert the match to plain text without the underscores, since there is no sense in linking a page to itself.

    To create the link, the match is checked against the TopicManager's RevCount hash table collection of existing topics to determine whether the matched topic exists. If it does exist, the match is transformed using the Regex.Replace static method to create a link to the view.aspx page using the formatted topic title for the text in the link.

    If no match exists, the link points to the edit.aspx page with a different CSS-style class called out and a link title element of "create this topic". In this way, links that do not exist can be easily distinguished from those that do.

    Linking URLs
    Standard wiki formatting for URLs is a simple <A HREF> with the text of the link being a copy of the actual URL. For simple and short URLs, this works well, but so many URLs are quite lengthy. Consider the URL for Microsoft's MSDN coverage of regular expressions (see resources).

    To solve the long URL problem, the FormatHyperLinks method (see Listing 4, available at www.sys-con.com/dotnet/source.cfm) performs three types of text transformations: (1) a link to a URL preceded by descriptive text between [ ] brackets; (2) a simple URL; and (3) a mailto URL.

    Finishing the Parsing
    The remaining steps in parsing each line are simpler, but each takes advantage of regular expression language that probably looks more like chicken scratches to someone unfamiliar with this arcane parsing syntax. Even for seasoned regexers, it's very helpful to keep a reference guide handy. For my own reference, I copied a number of pages from the Microsoft online guide into a static HTML page and stuck it right on my desktop.

    Once the horizontal rules, word and phrase formatting, headings, and block quotes are set into HTML, the code adds the formatted string to the StringBuilder instance in the Convert WikiTextToHTML method of the parser, which iterates through each text block sequentially to get the entire topic completed. The StringBuilder's ToString() method is called to return the completely formatted HTML to the calling page (in this case, the view.aspx page).

    Source Code
    Due to space limitations, not all of the code can be printed here, so download the sample code from www.sys-con.com/dotnet/sourcec.cfm and get started building your own wiki. The source includes the VS.NET 2003 solution source code and the MS SQL 2000 create and stored procedure scripts.

    I have only scratched the surface of the regular expressions language. Check out Microsoft's MSDN reference on regular expressions for .NET.

    This initial stab at a wiki will get you going on your own corporate wiki, but business users will undoubtedly want more. In future installments in this series I will walk you through building the bells and whistles your users will want, while keeping your wiki simple and easy to use. After all, if it's not simple, it's not wiki.

    Next you'll get search, recent changes, revision history, and like topics lookup, as well as delete functionality to remove topics. Future articles will cover uploading and downloading files, parsing images into topics, and implementing teams with forms security and data-based user and groups management.

    Last, I'll help you create a subscription-based topic change e-mail notification service. This will allow users to get immediate e-mail notification when topics in which they are interested are changed by other users.

    Until next time, good luck and have fun building your own corporate wiki.


  • Ward Cunningham's WikiWikiWeb: http://c2.com/cgi/wiki
  • .NET Framework Regular Expressions: Click Here !
  • Sparx Systems Enterprise Architect: www.sparxsystems.com.au

    Corporate Wiki Architecture
    When you need a garage, a Sistine Chapel model might be overkill. The architectural framework of the corporate wiki presented here is simple and effective. The view and edit pages use several convenient classes for accessing configuration information, retrieving and persisting topics, parsing topic text to HTML, and maintaining a list of existing topics (see Figure 2).


    The parser is the heart of the wiki. It performs the transformation of the easy-to-enter plain wiki topic text into standard HTML. Its static methods and properties make it easier to use and improve the performance of the regular expressions, as they are compiled once and used over and over.

    Topic Manager
    Named simply for its primary functionality, the TopicManager maintains a static list of existing topics with a revision count value in a hash table accessible via the RevCount property. RevCount is incremented when the topic is updated in the SaveWiki method after updating the database. This class also provides the GetWikiTopic method to retrieve the latest version of a specific topic.

    WikiTextBlock and WikiTextBlockType
    The WikiTextBlock provides a simple data container object that gets stuck into an ArrayList collection on the parser's first pass through a topic's text. This makes for much easier parsing of different types of text blocks. WikiTextBlockType isn't mentioned.

    Another data container class, a WikiTopic object, is returned by the TopicManager in the GetWikiTopic method. This simplifies the page code that deals with the presentation of the data contained in topic when it's retrieved from the database.

    The DataAccess class provides simplified access to the database's three stored procedures. It uses Microsoft.ApplicationBlocks.Data's SqlHelper class to make using the stored procedures even easier.

    I like to create a Config class in all of my ASP.NET applications. This little bit of work makes it very easy to keep your application highly configurable and introduces a convenient layer of abstraction between the web.config file and your code.

    UML Design Tools
    If you're looking for a great UML tool at a great price, look no further. While I have used Rational Rose for building UML models, I prefer to use Sparx Systems Enterprise Architect (EA). Surprisingly affordable compared to other tools, EA provides an IDE-like user interface with all the resources a software architect needs. The image in Figure 1 was created with EA.

  • More Stories By Tyler Jensen

    Engrossed in enterprise application architecture and development for over ten years, Tyler Jensen is a senior technical consultant in a large health intelligence company, designing and developing claims processing and analysis software. In his spare time he does a little writing and outside consulting.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

    IoT & Smart Cities Stories
    The platform combines the strengths of Singtel's extensive, intelligent network capabilities with Microsoft's cloud expertise to create a unique solution that sets new standards for IoT applications," said Mr Diomedes Kastanis, Head of IoT at Singtel. "Our solution provides speed, transparency and flexibility, paving the way for a more pervasive use of IoT to accelerate enterprises' digitalisation efforts. AI-powered intelligent connectivity over Microsoft Azure will be the fastest connected pat...
    There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
    Codete accelerates their clients growth through technological expertise and experience. Codite team works with organizations to meet the challenges that digitalization presents. Their clients include digital start-ups as well as established enterprises in the IT industry. To stay competitive in a highly innovative IT industry, strong R&D departments and bold spin-off initiatives is a must. Codete Data Science and Software Architects teams help corporate clients to stay up to date with the mod...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...
    Druva is the global leader in Cloud Data Protection and Management, delivering the industry's first data management-as-a-service solution that aggregates data from endpoints, servers and cloud applications and leverages the public cloud to offer a single pane of glass to enable data protection, governance and intelligence-dramatically increasing the availability and visibility of business critical information, while reducing the risk, cost and complexity of managing and protecting it. Druva's...
    BMC has unmatched experience in IT management, supporting 92 of the Forbes Global 100, and earning recognition as an ITSM Gartner Magic Quadrant Leader for five years running. Our solutions offer speed, agility, and efficiency to tackle business challenges in the areas of service management, automation, operations, and the mainframe.
    The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
    With 10 simultaneous tracks, keynotes, general sessions and targeted breakout classes, @CloudEXPO and DXWorldEXPO are two of the most important technology events of the year. Since its launch over eight years ago, @CloudEXPO and DXWorldEXPO have presented a rock star faculty as well as showcased hundreds of sponsors and exhibitors! In this blog post, we provide 7 tips on how, as part of our world-class faculty, you can deliver one of the most popular sessions at our events. But before reading...
    DSR is a supplier of project management, consultancy services and IT solutions that increase effectiveness of a company's operations in the production sector. The company combines in-depth knowledge of international companies with expert knowledge utilising IT tools that support manufacturing and distribution processes. DSR ensures optimization and integration of internal processes which is necessary for companies to grow rapidly. The rapid growth is possible thanks, to specialized services an...
    At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throug...