[Rets-dev] 2.0 overhead redux (long)

Paul Stusiak pstusiak at falcontechnologies.com
Wed Apr 25 18:59:38 CDT 2007


in line

Matt Lavallee wrote:
>
> As discussed at last week’s session, one of RETS 2.0’s drawbacks is a 
> several-fold increase in per-transaction overhead. Consider an example 
> listing line from RETS 1.x to 2.0:
>
I heard that as a comment as opposed to a drawback.

Increases described as 'fold' usually refer to geometric growth. I'm 
guessing that you aren't suggesting that there is a geometric expansion 
in the size of the response. The size of the response in RETS2 using XML 
is substantially larger than that of RETS1 Compact. It is correct to 
state that this will require more bandwidth. It may possibly be double 
or more. It is not 10 x or the square of the original value

Historically, bandwidth has been a valuable commodity, just like memory 
and CPU cycles. In recent years, these valuable commodities have been 
transformed into much smaller components of any engineering decision. 
Going forward, this should become an insignificant consideration - 
bandwidth between data centers are basically not worth considering 
especially when compared with the cost of development and maintenance.

> 1.x:
>
> <COLUMNS>Col1Name Col2Name Col3Name Col4Name</COLUMNS>
>
> <DATA>Col1 Col2 Col3 Col4</DATA>
>
> 2.0:
>
> <listings>
>
> <listing>
>
> <Col1Name>Col1</Col1Name>
>
> <Col2Name>Col2</Col2Name>
>
> <Col3Name>Col3</Col3Name>
>
> <Col4Name>Col4</Col4Name>
>
> </listing>
>
> </listings>
>
> … a 60%+ increase in bandwidth consumption for the “same” data (even 
> worse if you factor-in metadata overheads). And, when I consider my 
> 11+MB transmissions from my MLS, that rolls up to some fairly large 
> numbers.
>
Let's separate the data (150,000; 123 Elm Street, Active) from the 
information (list price, address element 1, address element 2, listing 
status). While there is a significant increase in the bandwidth required 
to transmit the raw data - overhead, there is a significant increase in 
the amount of information transmitted. In the first case, there is 
essentially no information transmitted about the structure of the data. 
What belongs with what? Does address element 1 come before or after 
address element 2? A person must be involved to interpret the 
information to build any useful software. In the second case, there is 
structure information provided. A person may be involved to interpret 
the information before building the software, but is not necessary. In 
the first case, there is no information transmitted about the type or 
precision of the data - everything is a string. In the second case, 
there is a substantial amount of information provided in the schema 
(xsd) that is referenced in the xml instance document (and is missing 
from your example).

Since you are claiming 11+ MB, I am assuming that you are doing a 
distributed database type of solution. For this, as Sergio suggested, 
there is the compact format that has little or no additional structure 
or typing information and is essentially the same as in RETS1. If you 
are building either a dynamic client (pull from the provider, through 
your server to the end user, only adding value at your server and not 
replicating the information) or you are building a transaction system 
(single record - mostly) this is not an issue.
>
> With this, a thought occurred to me this morning: Why isn’t the spec 
> using [more] attributes? By example, the above transmission could be 
> reduced to:
>
> <listings>
>
> <listing Col1Name=”Col1” Col2Name=”Col2” Col3Name=”Col3” 
> Col4Name=”Col4” />
>
> </listings>
>
> While not quite as terse as RETS 1.x, it does dramatically reduce the 
> overhead (from 60% to just 10%) in this example, and by proportionally 
> more as the number of listings in the payload increases.
>
There is no value in doing this. As I claimed above, there is real value 
in adding structure and type information to the data. In an attempt to 
make it easier - always difficult since it is so subjective, the 
decision was made to attempt to model the domain as opposed to 
flattening the domain. It isn't intended to be normalized in the 
database sense, but it is intended to be more regular in the modeling 
sense. Like things are grouped. Attributes are applied to elements. Your 
suggestion is that everything is an attribute of the resource - listing 
in your example, significantly reduces this value. This doesn't take 
into account the comments made by Eron that are correct. Eron's comments 
are spot on. XML Schema provide some important values to the equation. 
Compare the schema of RETS2 with the DTD of RETS1. Both are used to emit 
an XML document. In RETS1, there is no structure information included. 
In RETS2 there is structure information.

Some of the values of following modeling the domain using schema best 
practices include the ability to use tools to create stubs - a starting 
point if you will to your application, validation and extensibility. 
While most if not all of these tools will handle attributes in the 
manner that you propose, it is different from the intended use of them 
and will reduce the efficacy of the tools. Attributes are also required 
on the element (generally speaking) while element can be made optional. 
If there is no data for the element and the element is optional, that 
element can be omitted in the response. Parser expects to see this type 
of behavior. Attributes, while they can be optional, do not have the 
same behavior in the parser. Specifically, I don't believe that 
attributes can be used to derive behavior on children (prohibited and 
required) and they cannot be used to indicate repeat values. An example 
of the repeat value is that there may be more than one owner on a 
property. The current schema, using elements and types, permit multiple 
owners to be represented. Attributes do not permit this at least not the 
way you describe.

So, for RETS2, the decision, based on accepted best practices as 
described in the RETS2 documentation, applies attributes only on 
elements when they modify the behavior of the element, not when they 
describe data (mostly - there may be one or two exceptions). Examples 
are the use of a currency attribute to modify the SecureMoney type and 
measurementUnits to modify the SecureArea type. Both modify the data of 
the element. Yes, yes, I hear the pedantic voice saying that the use of 
attributes described is data however, I can limit the scope of that to 
the element in question and not at a more global level. To do this as 
Matt suggests, I would need to create separately named attributes to 
apply to each of the attributes that I wanted to modify - eg. 
attPropertyBarnLengthUnits="feet" attPropertyBarnLength="80" rather than 
creating a supertype of SecureLength and then using it where needed. 
There are a couple of tricks that can be used to make this simpler with 
attributes, but in any case, it makes the maintenance and reuse of the 
schema much, much more difficult.

> Of course, for the nerds out there, there’s also the ever-svelte JSON 
> format:
>
> {listings:{listing:{Col1Name:'Col1',Col2Name:'Col2',Col3Name:'Col3',Col4Name:'Col4'}}}
>
> … which weighs in with the same character count as the RETS 1.x 
> original for a single line, although you’ll have to repeat the 
> attribute names in subsequent listings.
>
> Using the above “sample data”, here are some numbers to consider for 
> 100 listings:
>
> RETS 1.7 (per spec)
>
> 	
>
> 3355 bytes
>
> RETS 2.0 (per spec)
>
> 	
>
> 12023 bytes
>
> RETS 2.0 (attribute notation)
>
> 	
>
> 7623 bytes
>
> RETS 2.0 (JSON)
>
> 	
>
> 6722 bytes
>
> So, circling back around, was there a conscious decision *against* 
> attributes? If so, for what purpose?
>
Yes. As described above.

I'd like to understand your concern about bandwidth and size. Why is 
this a concern for you? In particular, even with RETS1 it is possible to 
use the compression flag in HTTP to significantly reduce the size of the 
response. Are you servicing a large number of modem users?

If so, I would suggest that you consider the difference between the 
connection from your server to the MLS server (RETS2 by the book) and 
the connection between your server and the client (anything that you 
want). Very few systems that I can envision will make a direct 
connection from a browser to the RETS2 server. It is much more likely 
that the system design will route the data request through your server 
before getting the data from the RETS2 server. The rendering step - 
converting from COMPACT or XML to your display format, will add a 
substantial amount of markup. The process of rendering will remove the 
XML and replace it with the HTML that you organize the display with.

Assuming that you are a Javascript or Flash wizard and are using only 
the elements of the standard without local extensions and choose to 
design a true browser only solution, I would still suggest that you run 
this through a server to transform the XML to a more compact rendering 
of your own choosing. Taking the admittedly verbose description of the 
standard - intended to transmit meaning to humans by naming the elements 
and types something that can be interpreted easily by a person - and 
re-emitting the xml as some much more compact format. There is nothing 
to stop you from taking 
<Address><StreetName>Broadway</StreetName></Address> and compacting it 
into <ad><sn>Broadway</sn></ad> for use by your application.

-- 
Paul Stusiak
Falcon Technologies Corp.



More information about the Rets-dev mailing list