Hi, Scott. Lets see, if 80% of the world's data is unstructured in the SQL sense, and if most of that will be represented in XML by various editors and expected to be in XML by various search engines and applications, why wouldn't a "native" store be the most efficient method? Leaves the only layering required to the SQL boys who have 20 legacy years to deal with. So why wouldn't XML replace .txt as the common denominator, and be stored that way? Not of course on the raw file system, but one step above that.
So XML conversion, in-the-flow, makes sense, but why not store that as XML once converted and not re-incur conversion overhead the next time?
I totally agree on schema, and one, non-Microsoft (I almost said proprietary, but have a New Year's resolution to be nice to those poor guys) approach, is the combination of XML name spaces (xmlns) and RDF based inference engines. There will be a lot of schemas. No one can control that - just need to figure out how to "fix up" collisions. A brave new world.
Hello Steve,
You bring up some very interesting points ... several of which I believe are still the areas of confusion with XML ...
> I believe that ALL that unstructured data is a candidate to be > converted to semi-structured using XML, and that database > structured data will be represented as XML, as is already happening > in Oracle, Universal DB, and SQL Server.
"Converted" is a very strong word, and tends to mean many different things. For some reason, many people think that data has to be "converted" and then "stored" in XML format ... and I believe this is where these same people have lost sight of layered development of software, and forgotten about the separation of storage, access protocols, and language (or formatting).
The "conversion" is something that can easily be done with the proper implementation of an access protocol and a service which performs this translation "on the fly" ... and I believe that if you look at Oracle, Universal DB, and SQL Server *this* is what you will see ...
> My point is that all the data in the world is a candidate for XML > structuring, which is powerful if we figure out how to reconcile > schema collisions. That is part of what Berners-Lee addresses.
And as I argued with some folks while at Novell, it is really the *schema* that is important, not XML or any particular "encoding" of the data. The "naming" of data or information is crucial for us to know that we are discussing the same thing ...
If you look at Microsoft's BizTalk ... sure they are using XML, but the point of the site is the "naming" and definition of schema. This allows us to exchange information even if we have two different access protocols and storage methods ...
(I'll predict that there *will* be another format/language post-XML ... it's just a matter of time ... ;-)
Scott C. Lemon |