Microsoft Cloud Authors: Andreas Grabner, Stackify Blog, Liz McMillan, David H Deans, Automic Blog

Related Topics: Microsoft Cloud

Microsoft Cloud: Article

Corporate Wiki Part 1: Building Your First Wiki Parser

Corporate Wiki Part 1: Building Your First Wiki Parser

  • Corporate Wiki Part 2: Writing Your Own Wiki Search Engine
  • Corporate Wiki Part 3: Files Up and Down - Adding More Features

    Simplicity is the watchword. Before you spend thousands of dollars on a collaboration and knowledge management system, try a corporate wiki. You may find that it fills your needs.

    After a brief introduction to the origin and nature of wikis, this article, the first in a series on building a corporate wiki, will focus on creating a wiki parser, the heart of any wiki.

    Each article in this series will leave you with a complete, usable application. The code listings here illustrate the primary topic of discussion, but you can download the entire solution source code from www.sys-con.com/dotnet/sourcec.cfm.

    Many hypertext navigation and authoring systems have been developed over the years, but not until Ward Cunningham created the WikiWikiWeb did the concept really take off on the Internet. Wiki is reportedly the Hawaiian word for "quick," an alternative to calling it QuickWeb, a name that's not nearly as fun.

    Eight years later, search for "wiki" on Google and you'll get nearly 3 million results. Many personal and enthusiast sites now have a wiki where all visitors may read and contribute.

    There are now as many definitions for "wiki" as there are open source and commercial implementations of wiki Web apps. Along with a variety of features, these applications provide two fundamental services: view and edit.

    The wiki viewer fetches a topic's text and parses it, transforming it to HTML for display (see Figure 1). The editor allows you to edit any topic and save it to the data store from which the viewer fetches topics.


    The first step in building your wiki parser is to decide exactly which plain-text syntax you will support. I have chosen the syntax rules found in Table 1, as they seem to be most common in the wiki sites I have visited. You are certainly free to define your own or modify these.

    If you are already familiar with wikis, you will notice that I have made a strong departure from the traditional Pascal case of wiki topic titles in favor of using case-insensitive topic titles connected or trailed by an underscore "_" character.

    After experimenting with a wiki in my own corporate culture, I chose to do this because users were more comfortable with the underscore. Of course, you could easily modify the code to support any topic title formatting scheme you want to use.

    Some of the plain-text syntax rules here may not be found in Cunningham's original wiki implementation, but I have found all of them in one wiki site or another, with the exception of the ::code:: tag I created to let me insert HTML preformatted text into a topic.

    Given the rules established in Table 1, you will discover that there are three types of text blocks possible: (1) standard plain text, one paragraph per line; (2) bulleted or numbered lists, one bullet per line; and (3) code sections, set off by the ::code:: text on a unique line.


    Each text block type requires a slightly different parsing approach, so I found that it was much simpler to make a quick pass through the text to isolate each block and then iterate through that collection of blocks to parse each by type. To help with this, I created the WikiTextBlock class and WikiTextBlockType enum to handle the job (see Listing 1).

    The parser makes quick work of creating an ArrayList containing each of the text blocks and iterates through the array using a StringBuilder to store the results of each call to the specific parser code that handles that particular block type. Once all of the blocks are parsed, the StringBuilder's ToString() method is called to return the formatted HTML to the page code.

    The only requirements for the Code block type are to replace HTML tag characters ( <, >, and & ) with their HTML-escaped cousins and to surround the text block with the <pre></pre> tag set. That's the easy block type.

    The FormatListWikiText method handles the List type and is the most complex to parse because you have to deal with indentation and closing up the <ul> or <ol> tag sets. This is handily done with a Stack where I push the closing tags of each list when the list opening tag is created, then pull those same closing tags when the list is completed.

    After the HTML bullet tags are handled, the code calls the common parser method FormatStandardWikiLine to prepare the text of each line in the code block. This is also the primary formatting method used for the Plain type text block.

    The FormatStandardWikiLine method executes specific methods, used to easily break down the task in careful order, to transform the text of each line using regular expressions to find and replace the plain-text syntax with HTML formatting.

    In some cases, if the required order of transformations is not followed, unexpected results will occur. The steps in converting the plain text of the wiki entry into HTML are:

    1. Topic links
    2. Hyperlinks
    3. Horizontal rules
    4. Bold italics - phrase
    5. Bold italic underline - word
    6. Headings
    7. Block quotes
    Topic Links
    To parse for topic titles and convert them to HTML links, you need to do two things: (1) find the title in the line; and (2) create a link to the view page or edit page, depending on whether the topic already exists.

    First I created the regular expression string pattern and the instance of the Regex class, using that pattern as static members of the parser (see Listing 2). The parser is entirely static, making it more efficient and easier to use throughout the application.

    The FormatTopicLinks method is called for each line and passed as a reference in the text block. The code (see Listing 3) finds all topic matches in the text using the Regex object, called RxTopic. It then iterates through each match.

    If the match is not the same as the topic of the page being viewed, you process the match to create the link to either the view or edit page. Otherwise, you convert the match to plain text without the underscores, since there is no sense in linking a page to itself.

    To create the link, the match is checked against the TopicManager's RevCount hash table collection of existing topics to determine whether the matched topic exists. If it does exist, the match is transformed using the Regex.Replace static method to create a link to the view.aspx page using the formatted topic title for the text in the link.

    If no match exists, the link points to the edit.aspx page with a different CSS-style class called out and a link title element of "create this topic". In this way, links that do not exist can be easily distinguished from those that do.

    Linking URLs
    Standard wiki formatting for URLs is a simple <A HREF> with the text of the link being a copy of the actual URL. For simple and short URLs, this works well, but so many URLs are quite lengthy. Consider the URL for Microsoft's MSDN coverage of regular expressions (see resources).

    To solve the long URL problem, the FormatHyperLinks method (see Listing 4, available at www.sys-con.com/dotnet/source.cfm) performs three types of text transformations: (1) a link to a URL preceded by descriptive text between [ ] brackets; (2) a simple URL; and (3) a mailto URL.

    Finishing the Parsing
    The remaining steps in parsing each line are simpler, but each takes advantage of regular expression language that probably looks more like chicken scratches to someone unfamiliar with this arcane parsing syntax. Even for seasoned regexers, it's very helpful to keep a reference guide handy. For my own reference, I copied a number of pages from the Microsoft online guide into a static HTML page and stuck it right on my desktop.

    Once the horizontal rules, word and phrase formatting, headings, and block quotes are set into HTML, the code adds the formatted string to the StringBuilder instance in the Convert WikiTextToHTML method of the parser, which iterates through each text block sequentially to get the entire topic completed. The StringBuilder's ToString() method is called to return the completely formatted HTML to the calling page (in this case, the view.aspx page).

    Source Code
    Due to space limitations, not all of the code can be printed here, so download the sample code from www.sys-con.com/dotnet/sourcec.cfm and get started building your own wiki. The source includes the VS.NET 2003 solution source code and the MS SQL 2000 create and stored procedure scripts.

    I have only scratched the surface of the regular expressions language. Check out Microsoft's MSDN reference on regular expressions for .NET.

    This initial stab at a wiki will get you going on your own corporate wiki, but business users will undoubtedly want more. In future installments in this series I will walk you through building the bells and whistles your users will want, while keeping your wiki simple and easy to use. After all, if it's not simple, it's not wiki.

    Next you'll get search, recent changes, revision history, and like topics lookup, as well as delete functionality to remove topics. Future articles will cover uploading and downloading files, parsing images into topics, and implementing teams with forms security and data-based user and groups management.

    Last, I'll help you create a subscription-based topic change e-mail notification service. This will allow users to get immediate e-mail notification when topics in which they are interested are changed by other users.

    Until next time, good luck and have fun building your own corporate wiki.


  • Ward Cunningham's WikiWikiWeb: http://c2.com/cgi/wiki
  • .NET Framework Regular Expressions: Click Here !
  • Sparx Systems Enterprise Architect: www.sparxsystems.com.au

    Corporate Wiki Architecture
    When you need a garage, a Sistine Chapel model might be overkill. The architectural framework of the corporate wiki presented here is simple and effective. The view and edit pages use several convenient classes for accessing configuration information, retrieving and persisting topics, parsing topic text to HTML, and maintaining a list of existing topics (see Figure 2).


    The parser is the heart of the wiki. It performs the transformation of the easy-to-enter plain wiki topic text into standard HTML. Its static methods and properties make it easier to use and improve the performance of the regular expressions, as they are compiled once and used over and over.

    Topic Manager
    Named simply for its primary functionality, the TopicManager maintains a static list of existing topics with a revision count value in a hash table accessible via the RevCount property. RevCount is incremented when the topic is updated in the SaveWiki method after updating the database. This class also provides the GetWikiTopic method to retrieve the latest version of a specific topic.

    WikiTextBlock and WikiTextBlockType
    The WikiTextBlock provides a simple data container object that gets stuck into an ArrayList collection on the parser's first pass through a topic's text. This makes for much easier parsing of different types of text blocks. WikiTextBlockType isn't mentioned.

    Another data container class, a WikiTopic object, is returned by the TopicManager in the GetWikiTopic method. This simplifies the page code that deals with the presentation of the data contained in topic when it's retrieved from the database.

    The DataAccess class provides simplified access to the database's three stored procedures. It uses Microsoft.ApplicationBlocks.Data's SqlHelper class to make using the stored procedures even easier.

    I like to create a Config class in all of my ASP.NET applications. This little bit of work makes it very easy to keep your application highly configurable and introduces a convenient layer of abstraction between the web.config file and your code.

    UML Design Tools
    If you're looking for a great UML tool at a great price, look no further. While I have used Rational Rose for building UML models, I prefer to use Sparx Systems Enterprise Architect (EA). Surprisingly affordable compared to other tools, EA provides an IDE-like user interface with all the resources a software architect needs. The image in Figure 1 was created with EA.

  • More Stories By Tyler Jensen

    Engrossed in enterprise application architecture and development for over ten years, Tyler Jensen is a senior technical consultant in a large health intelligence company, designing and developing claims processing and analysis software. In his spare time he does a little writing and outside consulting.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

    @ThingsExpo Stories
    Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
    SYS-CON Events announced today that Outscale will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Outscale's technology makes an automated and adaptable Cloud available to businesses, supporting them in the most complex IT projects while controlling their operational aspects. You boost your IT infrastructure's reactivity, with request responses that only take a few seconds.
    SYS-CON Events announced today that Progress, a global leader in application development, has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Enterprises today are rapidly adopting the cloud, while continuing to retain business-critical/sensitive data inside the firewall. This is creating two separate data silos – one inside the firewall and the other outside the firewall. Cloud ISVs ofte...
    SYS-CON Events announced today that Interoute has been named “Bronze Sponsor” of SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Interoute is the owner operator of Europe's largest network and a global cloud services platform, which encompasses over 70,000 km of lit fiber, 15 data centers, 17 virtual data centers and 33 colocation centers, with connections to 195 additional partner data centers. Our full-service Unifie...
    SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
    SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY, and the 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive ad...
    SYS-CON Events announced today that Peak 10, Inc., a national IT infrastructure and cloud services provider, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Peak 10 provides reliable, tailored data center and network services, cloud and managed services. Its solutions are designed to scale and adapt to customers’ changing business needs, enabling them to lower costs, improve performance and focus intern...
    Everywhere we turn in our industry we can find strong opinions about the direction, type and nature of cloud’s impact on computing and business. Another word that is used in every context in our industry is “hybrid.” In his session at 20th Cloud Expo, Alvaro Gonzalez, Director of Technical, Partner and Field Marketing at Peak 10, will use a combination of a few conceptual props and some research recently commissioned by Peak 10 to offer a real-world consideration of how the various categories of...
    Detecting internal user threats in the Big Data eco-system is challenging and cumbersome. Many organizations monitor internal usage of the Big Data eco-system using a set of alerts. This is not a scalable process given the increase in the number of alerts with the accelerating growth in data volume and user base. Organizations are increasingly leveraging machine learning to monitor only those data elements that are sensitive and critical, autonomously establish monitoring policies, and to detect...
    SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
    The 21st International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Machine Learning and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding busin...
    Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists will look at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deli...
    SYS-CON Events announced today that Systena America will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Systena Group has been in business for various software development and verification in Japan, US, ASEAN, and China by utilizing the knowledge we gained from all types of device development for various industries including smartphones (Android/iOS), wireless communication, security technology and IoT serv...
    SYS-CON Events announced today that Loom Systems will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Founded in 2015, Loom Systems delivers an advanced AI solution to predict and prevent problems in the digital business. Loom stands alone in the industry as an AI analysis platform requiring no prior math knowledge from operators, leveraging the existing staff to succeed in the digital era. With offices in S...
    With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend @CloudExpo | @ThingsExpo, June 6-8, 2017, at the Javits Center in New York City, NY and October 31 - November 2, 2017, Santa Clara Convention Center, CA. Learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
    SYS-CON Events announced today that EARP Integration will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. EARP Integration is a passionate software house. Since its inception in 2009 the company successfully delivers smart solutions for cities and factories that start their digital transformation. EARP provides bespoke solutions like, for example, advanced enterprise portals, business intelligence systems an...
    We build IoT infrastructure products - when you have to integrate different devices, different systems and cloud you have to build an application to do that but we eliminate the need to build an application. Our products can integrate any device, any system, any cloud regardless of protocol," explained Peter Jung, Chief Product Officer at Pulzze Systems, in this SYS-CON.tv interview at @ThingsExpo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA
    Existing Big Data solutions are mainly focused on the discovery and analysis of data. The solutions are scalable and highly available but tedious when swapping in and swapping out occurs in disarray and thrashing takes place. The resolution for thrashing through machine learning algorithms and support nomenclature is through simple techniques. Organizations that have been collecting large customer data are increasingly seeing the need to use the data for swapping in and out and thrashing occurs ...
    DevOps is often described as a combination of technology and culture. Without both, DevOps isn't complete. However, applying the culture to outdated technology is a recipe for disaster; as response times grow and connections between teams are delayed by technology, the culture will die. A Nutanix Enterprise Cloud has many benefits that provide the needed base for a true DevOps paradigm.
    SYS-CON Events announced today that delaPlex will exhibit at SYS-CON's @CloudExpo, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. delaPlex pioneered Software Development as a Service (SDaaS), which provides scalable resources to build, test, and deploy software. It’s a fast and more reliable way to develop a new product or expand your in-house team.