[Rets-dev] 2.0 overhead redux (long)
Paul Stusiak
pstusiak at falcontechnologies.com
Wed Apr 25 18:59:38 CDT 2007
in line
Matt Lavallee wrote:
>
> As discussed at last week’s session, one of RETS 2.0’s drawbacks is a
> several-fold increase in per-transaction overhead. Consider an example
> listing line from RETS 1.x to 2.0:
>
I heard that as a comment as opposed to a drawback.
Increases described as 'fold' usually refer to geometric growth. I'm
guessing that you aren't suggesting that there is a geometric expansion
in the size of the response. The size of the response in RETS2 using XML
is substantially larger than that of RETS1 Compact. It is correct to
state that this will require more bandwidth. It may possibly be double
or more. It is not 10 x or the square of the original value
Historically, bandwidth has been a valuable commodity, just like memory
and CPU cycles. In recent years, these valuable commodities have been
transformed into much smaller components of any engineering decision.
Going forward, this should become an insignificant consideration -
bandwidth between data centers are basically not worth considering
especially when compared with the cost of development and maintenance.
> 1.x:
>
> <COLUMNS>Col1Name Col2Name Col3Name Col4Name</COLUMNS>
>
> <DATA>Col1 Col2 Col3 Col4</DATA>
>
> 2.0:
>
> <listings>
>
> <listing>
>
> <Col1Name>Col1</Col1Name>
>
> <Col2Name>Col2</Col2Name>
>
> <Col3Name>Col3</Col3Name>
>
> <Col4Name>Col4</Col4Name>
>
> </listing>
>
> </listings>
>
> … a 60%+ increase in bandwidth consumption for the “same” data (even
> worse if you factor-in metadata overheads). And, when I consider my
> 11+MB transmissions from my MLS, that rolls up to some fairly large
> numbers.
>
Let's separate the data (150,000; 123 Elm Street, Active) from the
information (list price, address element 1, address element 2, listing
status). While there is a significant increase in the bandwidth required
to transmit the raw data - overhead, there is a significant increase in
the amount of information transmitted. In the first case, there is
essentially no information transmitted about the structure of the data.
What belongs with what? Does address element 1 come before or after
address element 2? A person must be involved to interpret the
information to build any useful software. In the second case, there is
structure information provided. A person may be involved to interpret
the information before building the software, but is not necessary. In
the first case, there is no information transmitted about the type or
precision of the data - everything is a string. In the second case,
there is a substantial amount of information provided in the schema
(xsd) that is referenced in the xml instance document (and is missing
from your example).
Since you are claiming 11+ MB, I am assuming that you are doing a
distributed database type of solution. For this, as Sergio suggested,
there is the compact format that has little or no additional structure
or typing information and is essentially the same as in RETS1. If you
are building either a dynamic client (pull from the provider, through
your server to the end user, only adding value at your server and not
replicating the information) or you are building a transaction system
(single record - mostly) this is not an issue.
>
> With this, a thought occurred to me this morning: Why isn’t the spec
> using [more] attributes? By example, the above transmission could be
> reduced to:
>
> <listings>
>
> <listing Col1Name=”Col1” Col2Name=”Col2” Col3Name=”Col3”
> Col4Name=”Col4” />
>
> </listings>
>
> While not quite as terse as RETS 1.x, it does dramatically reduce the
> overhead (from 60% to just 10%) in this example, and by proportionally
> more as the number of listings in the payload increases.
>
There is no value in doing this. As I claimed above, there is real value
in adding structure and type information to the data. In an attempt to
make it easier - always difficult since it is so subjective, the
decision was made to attempt to model the domain as opposed to
flattening the domain. It isn't intended to be normalized in the
database sense, but it is intended to be more regular in the modeling
sense. Like things are grouped. Attributes are applied to elements. Your
suggestion is that everything is an attribute of the resource - listing
in your example, significantly reduces this value. This doesn't take
into account the comments made by Eron that are correct. Eron's comments
are spot on. XML Schema provide some important values to the equation.
Compare the schema of RETS2 with the DTD of RETS1. Both are used to emit
an XML document. In RETS1, there is no structure information included.
In RETS2 there is structure information.
Some of the values of following modeling the domain using schema best
practices include the ability to use tools to create stubs - a starting
point if you will to your application, validation and extensibility.
While most if not all of these tools will handle attributes in the
manner that you propose, it is different from the intended use of them
and will reduce the efficacy of the tools. Attributes are also required
on the element (generally speaking) while element can be made optional.
If there is no data for the element and the element is optional, that
element can be omitted in the response. Parser expects to see this type
of behavior. Attributes, while they can be optional, do not have the
same behavior in the parser. Specifically, I don't believe that
attributes can be used to derive behavior on children (prohibited and
required) and they cannot be used to indicate repeat values. An example
of the repeat value is that there may be more than one owner on a
property. The current schema, using elements and types, permit multiple
owners to be represented. Attributes do not permit this at least not the
way you describe.
So, for RETS2, the decision, based on accepted best practices as
described in the RETS2 documentation, applies attributes only on
elements when they modify the behavior of the element, not when they
describe data (mostly - there may be one or two exceptions). Examples
are the use of a currency attribute to modify the SecureMoney type and
measurementUnits to modify the SecureArea type. Both modify the data of
the element. Yes, yes, I hear the pedantic voice saying that the use of
attributes described is data however, I can limit the scope of that to
the element in question and not at a more global level. To do this as
Matt suggests, I would need to create separately named attributes to
apply to each of the attributes that I wanted to modify - eg.
attPropertyBarnLengthUnits="feet" attPropertyBarnLength="80" rather than
creating a supertype of SecureLength and then using it where needed.
There are a couple of tricks that can be used to make this simpler with
attributes, but in any case, it makes the maintenance and reuse of the
schema much, much more difficult.
> Of course, for the nerds out there, there’s also the ever-svelte JSON
> format:
>
> {listings:{listing:{Col1Name:'Col1',Col2Name:'Col2',Col3Name:'Col3',Col4Name:'Col4'}}}
>
> … which weighs in with the same character count as the RETS 1.x
> original for a single line, although you’ll have to repeat the
> attribute names in subsequent listings.
>
> Using the above “sample data”, here are some numbers to consider for
> 100 listings:
>
> RETS 1.7 (per spec)
>
>
>
> 3355 bytes
>
> RETS 2.0 (per spec)
>
>
>
> 12023 bytes
>
> RETS 2.0 (attribute notation)
>
>
>
> 7623 bytes
>
> RETS 2.0 (JSON)
>
>
>
> 6722 bytes
>
> So, circling back around, was there a conscious decision *against*
> attributes? If so, for what purpose?
>
Yes. As described above.
I'd like to understand your concern about bandwidth and size. Why is
this a concern for you? In particular, even with RETS1 it is possible to
use the compression flag in HTTP to significantly reduce the size of the
response. Are you servicing a large number of modem users?
If so, I would suggest that you consider the difference between the
connection from your server to the MLS server (RETS2 by the book) and
the connection between your server and the client (anything that you
want). Very few systems that I can envision will make a direct
connection from a browser to the RETS2 server. It is much more likely
that the system design will route the data request through your server
before getting the data from the RETS2 server. The rendering step -
converting from COMPACT or XML to your display format, will add a
substantial amount of markup. The process of rendering will remove the
XML and replace it with the HTML that you organize the display with.
Assuming that you are a Javascript or Flash wizard and are using only
the elements of the standard without local extensions and choose to
design a true browser only solution, I would still suggest that you run
this through a server to transform the XML to a more compact rendering
of your own choosing. Taking the admittedly verbose description of the
standard - intended to transmit meaning to humans by naming the elements
and types something that can be interpreted easily by a person - and
re-emitting the xml as some much more compact format. There is nothing
to stop you from taking
<Address><StreetName>Broadway</StreetName></Address> and compacting it
into <ad><sn>Broadway</sn></ad> for use by your application.
--
Paul Stusiak
Falcon Technologies Corp.
More information about the Rets-dev
mailing list