Friday, April 6, 2012

structured, semistructured, unstructured data


Structured data is data that is organized into entities that have a defined format, such as XML documents or database tables that conform to a particular predefined schema. This is the realm of the RDBMS.

Semi-structured data, on the other hand, is looser, and though there may be a schema, it is often ignored, so it may be used only as a guide to the structure of the data: for example, a spreadsheet, in which the structure is the grid of cells, although the cells themselves may hold any form of data.

Unstructured data does not have any particular internal structure: for example, plain text or image data.

Source:  Hadoop: The Definitive Guide

No comments: