The traditional term data refers to facts concerning objects and events that can be recorded on computer storage media (Hoffer, Prescott, & Topi, 2009, p.6).For example a customer database might include facts like customer name, address, and telephone number. This type of data is called “structured data”. Structured data are stored in a table format and are found in traditional databases and data warehouses (p.6).
The traditional definition of data has to be expanded to reflect current databases which are used to store objects such as maps, photos, sound images, and videos in addition to structured data (Hoffer et al, 2009, p.6). For example a current customer database that has customer name, address, and phone number might also include a photo of the customer. This is referred to as unstructured data (p.7). Unstructured data requires more storage space than structured data. This increase in data size requires a good understanding of data retrieval methods in order to meet the response times that the end user is expecting. Data can now be defined as stored representations of objects and events that are important and having meaning to the end user (Hoffer et al, 2009, p.8). This includes both structured and unstructured data usually combined to create a multimedia environment for the end user.
The management of unstructured data has long been acknowledged as one of the most important unresolved problems in regards to data management and business intelligence (BI) (Manoj & Deepak, 2011). The biggest reason for this problem is that methods, tools, and systems that are successful at converting structured data into BI are ineffective for use on unstructured data. It is a fact that large amounts of information can now be shared by organizations worldwide over the Internet (Manoj & Deepak, 2011). This worldwide information explosion has resulted in new ways to create tools, methods, and systems for data management and BI with the primary focus being on unstructured data.
The Internet is best known as a huge repository of shared documents, but it also contains a large amount of structured data covering a wide range of topics including product, financial, public records, scientific, hobby related, and government (Cafarella, & Madhavan, 2011). Structured data on the web is similar to traditional data that is managed by commercial database systems. However it also reflects some different characteristics of its own. For example because it is embedded in text web pages it must be extracted before it can be used. There is not a centralized data design such as in traditional database systems and unlike those systems that focus on one domain, web data covers everything (Cafarella, & Madhavan, 2011). Most existing database systems do not address these challenges because they assume that their data is modeled within a well-defined domain (p.72).
A third category of data sits between structured and unstructured data (and is a result of Web 2.0 technologies) is called social data (Chapman, 2010). Social data is structured data that contains a presentation layer. For example Blog entries look like unstructured documents with paragraphs of text and their associated headings. But a Blog entry may actually be stored on a SharePoint server as structured data in a SQL database.
For enterprise content management purposes social data looks like unstructured data to the end user, but the underlying content has some predetermined structure (Chapman, 2010). The underlying content in most cases will be stored in a rational database. In others such as HTML and XML the underlying content is embedded in the actual object. From an enterprise records management perspective, decisions have to be made on whether to retain a copy of both the data and presentation layers of social data or a standalone copy that combines both the data and presentation layers into one (Chapman, 2010). For example to store a web page should you store both the HTML file and associated style sheets, or should the web page be rendered into just one PDF document and stored.