Welcome!

Microsoft Cloud Authors: Janakiram MSV, Yeshim Deniz, David H Deans, Andreas Grabner, Stackify Blog

Related Topics: Microsoft Cloud

Microsoft Cloud: Article

Corporate Wiki Part 1: Building Your First Wiki Parser

Corporate Wiki Part 1: Building Your First Wiki Parser

  • Corporate Wiki Part 2: Writing Your Own Wiki Search Engine
  • Corporate Wiki Part 3: Files Up and Down - Adding More Features

    Simplicity is the watchword. Before you spend thousands of dollars on a collaboration and knowledge management system, try a corporate wiki. You may find that it fills your needs.

    After a brief introduction to the origin and nature of wikis, this article, the first in a series on building a corporate wiki, will focus on creating a wiki parser, the heart of any wiki.

    Each article in this series will leave you with a complete, usable application. The code listings here illustrate the primary topic of discussion, but you can download the entire solution source code from www.sys-con.com/dotnet/sourcec.cfm.

    Many hypertext navigation and authoring systems have been developed over the years, but not until Ward Cunningham created the WikiWikiWeb did the concept really take off on the Internet. Wiki is reportedly the Hawaiian word for "quick," an alternative to calling it QuickWeb, a name that's not nearly as fun.

    Eight years later, search for "wiki" on Google and you'll get nearly 3 million results. Many personal and enthusiast sites now have a wiki where all visitors may read and contribute.

    There are now as many definitions for "wiki" as there are open source and commercial implementations of wiki Web apps. Along with a variety of features, these applications provide two fundamental services: view and edit.

    The wiki viewer fetches a topic's text and parses it, transforming it to HTML for display (see Figure 1). The editor allows you to edit any topic and save it to the data store from which the viewer fetches topics.

     

    The first step in building your wiki parser is to decide exactly which plain-text syntax you will support. I have chosen the syntax rules found in Table 1, as they seem to be most common in the wiki sites I have visited. You are certainly free to define your own or modify these.

    If you are already familiar with wikis, you will notice that I have made a strong departure from the traditional Pascal case of wiki topic titles in favor of using case-insensitive topic titles connected or trailed by an underscore "_" character.

    After experimenting with a wiki in my own corporate culture, I chose to do this because users were more comfortable with the underscore. Of course, you could easily modify the code to support any topic title formatting scheme you want to use.

    Some of the plain-text syntax rules here may not be found in Cunningham's original wiki implementation, but I have found all of them in one wiki site or another, with the exception of the ::code:: tag I created to let me insert HTML preformatted text into a topic.

    Given the rules established in Table 1, you will discover that there are three types of text blocks possible: (1) standard plain text, one paragraph per line; (2) bulleted or numbered lists, one bullet per line; and (3) code sections, set off by the ::code:: text on a unique line.

     

    Each text block type requires a slightly different parsing approach, so I found that it was much simpler to make a quick pass through the text to isolate each block and then iterate through that collection of blocks to parse each by type. To help with this, I created the WikiTextBlock class and WikiTextBlockType enum to handle the job (see Listing 1).

    The parser makes quick work of creating an ArrayList containing each of the text blocks and iterates through the array using a StringBuilder to store the results of each call to the specific parser code that handles that particular block type. Once all of the blocks are parsed, the StringBuilder's ToString() method is called to return the formatted HTML to the page code.

    The only requirements for the Code block type are to replace HTML tag characters ( <, >, and & ) with their HTML-escaped cousins and to surround the text block with the <pre></pre> tag set. That's the easy block type.

    The FormatListWikiText method handles the List type and is the most complex to parse because you have to deal with indentation and closing up the <ul> or <ol> tag sets. This is handily done with a Stack where I push the closing tags of each list when the list opening tag is created, then pull those same closing tags when the list is completed.

    After the HTML bullet tags are handled, the code calls the common parser method FormatStandardWikiLine to prepare the text of each line in the code block. This is also the primary formatting method used for the Plain type text block.

    The FormatStandardWikiLine method executes specific methods, used to easily break down the task in careful order, to transform the text of each line using regular expressions to find and replace the plain-text syntax with HTML formatting.

    In some cases, if the required order of transformations is not followed, unexpected results will occur. The steps in converting the plain text of the wiki entry into HTML are:

    1. Topic links
    2. Hyperlinks
    3. Horizontal rules
    4. Bold italics - phrase
    5. Bold italic underline - word
    6. Headings
    7. Block quotes
    Topic Links
    To parse for topic titles and convert them to HTML links, you need to do two things: (1) find the title in the line; and (2) create a link to the view page or edit page, depending on whether the topic already exists.

    First I created the regular expression string pattern and the instance of the Regex class, using that pattern as static members of the parser (see Listing 2). The parser is entirely static, making it more efficient and easier to use throughout the application.

    The FormatTopicLinks method is called for each line and passed as a reference in the text block. The code (see Listing 3) finds all topic matches in the text using the Regex object, called RxTopic. It then iterates through each match.

    If the match is not the same as the topic of the page being viewed, you process the match to create the link to either the view or edit page. Otherwise, you convert the match to plain text without the underscores, since there is no sense in linking a page to itself.

    To create the link, the match is checked against the TopicManager's RevCount hash table collection of existing topics to determine whether the matched topic exists. If it does exist, the match is transformed using the Regex.Replace static method to create a link to the view.aspx page using the formatted topic title for the text in the link.

    If no match exists, the link points to the edit.aspx page with a different CSS-style class called out and a link title element of "create this topic". In this way, links that do not exist can be easily distinguished from those that do.

    Linking URLs
    Standard wiki formatting for URLs is a simple <A HREF> with the text of the link being a copy of the actual URL. For simple and short URLs, this works well, but so many URLs are quite lengthy. Consider the URL for Microsoft's MSDN coverage of regular expressions (see resources).

    To solve the long URL problem, the FormatHyperLinks method (see Listing 4, available at www.sys-con.com/dotnet/source.cfm) performs three types of text transformations: (1) a link to a URL preceded by descriptive text between [ ] brackets; (2) a simple URL; and (3) a mailto URL.

    Finishing the Parsing
    The remaining steps in parsing each line are simpler, but each takes advantage of regular expression language that probably looks more like chicken scratches to someone unfamiliar with this arcane parsing syntax. Even for seasoned regexers, it's very helpful to keep a reference guide handy. For my own reference, I copied a number of pages from the Microsoft online guide into a static HTML page and stuck it right on my desktop.

    Once the horizontal rules, word and phrase formatting, headings, and block quotes are set into HTML, the code adds the formatted string to the StringBuilder instance in the Convert WikiTextToHTML method of the parser, which iterates through each text block sequentially to get the entire topic completed. The StringBuilder's ToString() method is called to return the completely formatted HTML to the calling page (in this case, the view.aspx page).

    Source Code
    Due to space limitations, not all of the code can be printed here, so download the sample code from www.sys-con.com/dotnet/sourcec.cfm and get started building your own wiki. The source includes the VS.NET 2003 solution source code and the MS SQL 2000 create and stored procedure scripts.

    I have only scratched the surface of the regular expressions language. Check out Microsoft's MSDN reference on regular expressions for .NET.

    This initial stab at a wiki will get you going on your own corporate wiki, but business users will undoubtedly want more. In future installments in this series I will walk you through building the bells and whistles your users will want, while keeping your wiki simple and easy to use. After all, if it's not simple, it's not wiki.

    Next you'll get search, recent changes, revision history, and like topics lookup, as well as delete functionality to remove topics. Future articles will cover uploading and downloading files, parsing images into topics, and implementing teams with forms security and data-based user and groups management.

    Last, I'll help you create a subscription-based topic change e-mail notification service. This will allow users to get immediate e-mail notification when topics in which they are interested are changed by other users.

    Until next time, good luck and have fun building your own corporate wiki.

    Resources

  • Ward Cunningham's WikiWikiWeb: http://c2.com/cgi/wiki
  • .NET Framework Regular Expressions: Click Here !
  • Sparx Systems Enterprise Architect: www.sparxsystems.com.au

    Corporate Wiki Architecture
    When you need a garage, a Sistine Chapel model might be overkill. The architectural framework of the corporate wiki presented here is simple and effective. The view and edit pages use several convenient classes for accessing configuration information, retrieving and persisting topics, parsing topic text to HTML, and maintaining a list of existing topics (see Figure 2).

     

    Parser
    The parser is the heart of the wiki. It performs the transformation of the easy-to-enter plain wiki topic text into standard HTML. Its static methods and properties make it easier to use and improve the performance of the regular expressions, as they are compiled once and used over and over.

    Topic Manager
    Named simply for its primary functionality, the TopicManager maintains a static list of existing topics with a revision count value in a hash table accessible via the RevCount property. RevCount is incremented when the topic is updated in the SaveWiki method after updating the database. This class also provides the GetWikiTopic method to retrieve the latest version of a specific topic.

    WikiTextBlock and WikiTextBlockType
    The WikiTextBlock provides a simple data container object that gets stuck into an ArrayList collection on the parser's first pass through a topic's text. This makes for much easier parsing of different types of text blocks. WikiTextBlockType isn't mentioned.

    WikiTopic
    Another data container class, a WikiTopic object, is returned by the TopicManager in the GetWikiTopic method. This simplifies the page code that deals with the presentation of the data contained in topic when it's retrieved from the database.

    DataAccess
    The DataAccess class provides simplified access to the database's three stored procedures. It uses Microsoft.ApplicationBlocks.Data's SqlHelper class to make using the stored procedures even easier.

    Config
    I like to create a Config class in all of my ASP.NET applications. This little bit of work makes it very easy to keep your application highly configurable and introduces a convenient layer of abstraction between the web.config file and your code.

    UML Design Tools
    If you're looking for a great UML tool at a great price, look no further. While I have used Rational Rose for building UML models, I prefer to use Sparx Systems Enterprise Architect (EA). Surprisingly affordable compared to other tools, EA provides an IDE-like user interface with all the resources a software architect needs. The image in Figure 1 was created with EA.

  • More Stories By Tyler Jensen

    Engrossed in enterprise application architecture and development for over ten years, Tyler Jensen is a senior technical consultant in a large health intelligence company, designing and developing claims processing and analysis software. In his spare time he does a little writing and outside consulting.

    Comments (0)

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    @ThingsExpo Stories
    SYS-CON Events announced today that mruby Forum will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. mruby is the lightweight implementation of the Ruby language. We introduce mruby and the mruby IoT framework that enhances development productivity. For more information, visit http://forum.mruby.org/.
    In his session at @ThingsExpo, Greg Gorman is the Director, IoT Developer Ecosystem, Watson IoT, will provide a short tutorial on Node-RED, a Node.js-based programming tool for wiring together hardware devices, APIs and online services in new and interesting ways. It provides a browser-based editor that makes it easy to wire together flows using a wide range of nodes in the palette that can be deployed to its runtime in a single-click. There is a large library of contributed nodes that help so...
    SYS-CON Events announced today that Ryobi Systems will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Ryobi Systems Co., Ltd., as an information service company, specialized in business support for local governments and medical industry. We are challenging to achive the precision farming with AI. For more information, visit http:...
    SYS-CON Events announced today that SIGMA Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. uLaser flow inspection device from the Japanese top share to Global Standard! Then, make the best use of data to flip to next page. For more information, visit http://www.sigma-k.co.jp/en/.
    SYS-CON Events announced today that Daiya Industry will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Daiya Industry specializes in orthotic support systems and assistive devices with pneumatic artificial muscles in order to contribute to an extended healthy life expectancy. For more information, please visit https://www.daiyak...
    SYS-CON Events announced today that B2Cloud will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. B2Cloud specializes in IoT devices for preventive and predictive maintenance in any kind of equipment retrieving data like Energy consumption, working time, temperature, humidity, pressure, etc.
    SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp em...
    What is the best strategy for selecting the right offshore company for your business? In his session at 21st Cloud Expo, Alan Winters, U.S. Head of Business Development at MobiDev, will discuss the things to look for - positive and negative - in evaluating your options. He will also discuss how to maximize productivity with your offshore developers. Before you start your search, clearly understand your business needs and how that impacts software choices.
    SYS-CON Events announced today that Interface Corporation will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Interface Corporation is a company developing, manufacturing and marketing high quality and wide variety of industrial computers and interface modules such as PCIs and PCI express. For more information, visit http://www.i...
    SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
    SYS-CON Events announced today that Keisoku Research Consultant Co. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Keisoku Research Consultant, Co. offers research and consulting in a wide range of civil engineering-related fields from information construction to preservation of cultural properties. For more information, vi...
    SYS-CON Events announced today that Fusic will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Fusic Co. provides mocks as virtual IoT devices. You can customize mocks, and get any amount of data at any time in your test. For more information, visit https://fusic.co.jp/english/.
    SYS-CON Events announced today that N3N will exhibit at SYS-CON's @ThingsExpo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. N3N’s solutions increase the effectiveness of operations and control centers, increase the value of IoT investments, and facilitate real-time operational decision making. N3N enables operations teams with a four dimensional digital “big board” that consolidates real-time live video feeds alongside IoT sensor data a...
    Mobile device usage has increased exponentially during the past several years, as consumers rely on handhelds for everything from news and weather to banking and purchases. What can we expect in the next few years? The way in which we interact with our devices will fundamentally change, as businesses leverage Artificial Intelligence. We already see this taking shape as businesses leverage AI for cost savings and customer responsiveness. This trend will continue, as AI is used for more sophistica...
    Real IoT production deployments running at scale are collecting sensor data from hundreds / thousands / millions of devices. The goal is to take business-critical actions on the real-time data and find insights from stored datasets. In his session at @ThingsExpo, John Walicki, Watson IoT Developer Advocate at IBM Cloud, will provide a fast-paced developer journey that follows the IoT sensor data from generation, to edge gateway, to edge analytics, to encryption, to the IBM Bluemix cloud, to Wa...
    SYS-CON Events announced today that Massive Networks, that helps your business operate seamlessly with fast, reliable, and secure internet and network solutions, has been named "Exhibitor" of SYS-CON's 21st International Cloud Expo ®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. As a premier telecommunications provider, Massive Networks is headquartered out of Louisville, Colorado. With years of experience under their belt, their team of...
    SYS-CON Events announced today that Enroute Lab will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enroute Lab is an industrial design, research and development company of unmanned robotic vehicle system. For more information, please visit http://elab.co.jp/.
    SYS-CON Events announced today that Mobile Create USA will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Mobile Create USA Inc. is an MVNO-based business model that uses portable communication devices and cellular-based infrastructure in the development, sales, operation and mobile communications systems incorporating GPS capabi...
    There is huge complexity in implementing a successful digital business that requires efficient on-premise and cloud back-end infrastructure, IT and Internet of Things (IoT) data, analytics, Machine Learning, Artificial Intelligence (AI) and Digital Applications. In the data center alone, there are physical and virtual infrastructures, multiple operating systems, multiple applications and new and emerging business and technological paradigms such as cloud computing and XaaS. And then there are pe...
    Agile has finally jumped the technology shark, expanding outside the software world. Enterprises are now increasingly adopting Agile practices across their organizations in order to successfully navigate the disruptive waters that threaten to drown them. In our quest for establishing change as a core competency in our organizations, this business-centric notion of Agile is an essential component of Agile Digital Transformation. In the years since the publication of the Agile Manifesto, the conn...