[Rets-dev] Email validation regular expression
Paul Stusiak
pstusiak at falcontechnologies.com
Wed Apr 25 20:17:43 CDT 2007
There were many valid observations made at the meeting. Many of the
observations were precisely what was needed - what is missing. I think
that you overstate the case that it is not possible to improve the type
information.
Please explain to me how someone would search on the records you
describe below that do not contain meaningful information? How would I
find obtain a match for a property that had an age of 04/2 when I see an
element name of Year Built.
Specifically, I disagree with your definition of meaningful and do not
see any meaningful data in the case of 3310 records (7835 - 4525 those
records that are not 4 digits and not NEW - this was discussed and a
solution was proposed to fix the NEW question). There is the very valid
epistle of garbage in garbage out and if you want the data to have some
value and meaning it had better be something that other people can use.
Basically, it sounds like your MLS does not have any validation rules on
input or has a minimal set of rules that do not enforce any meaningful
limit on what can be entered.
I do not agree with your assessment that your MLS cannot implement RETS2
using stronger data typing. If they wish to retain the information, it
would be meaningful for the MLS to extend the schema to provide a
YearBuiltDescription element that the non-numeric, non-NEW documents
would map the data while the numeric and NEW records would map to the
standard YearBuilt element.
The point about area ranges were brought up during the meeting and are
being addressed.
Where the data is to be used in a transaction, having such weakly typed
information will result in an unusable system. Where the data is to be
used to market a property, having such weakly typed information will
result in unsearchable records and will also be unsuited to the
described purpose. Where the data is to be used in other normal MLS type
functions, CMA or AVM functions, the records will not be usable for that
purpose (assuming that the elements were used in the AVM or CMA
function). If I am trying to compare like properties, I need to find
like properties.
I think that there is a pretty compelling case to be made for better
data quality if we are to keep the MLS relevant and to best serve the
Agents.
As I see it, we should be able to help those who have legacy data that
they deem important by allowing them to extend the schema while laying
down a clear path to better information by having the type information
available to assist in building systems.
P
Matt Lavallee wrote:
> Following on Steve's well-put response (and Michael and Sergio's affirmations), my point was that RETS is still a transport layer and not a data standardization effort (as far as I know). Any normalizing rules, while well-intended, will potentially eliminate *meaningful* data _because_ of RETS.
>
> Regarding my specific example...
> Our MLS defines "Year Built" as varchar(4) in their database, and the Realtors have taken liberty with expressing the property's age in that field. Of the 116545 active or contingent residential properties, 7835 do not have four-digit years in the field and are not blank, and of those 4525 have "NEW". Also of the 116545 listings, there are 300 distinct values in that field. 36991 are blank, which should be valid through some nullability rule. Some of the remainder are meaningful expressions of the _intention_ of the field ("60s", "OLD", "TBB", "04/2", etc.). The "04/2" (and so on) to my eye means February 2004 -- in this case the Realtor being more specific than the field intended. Some, a spare few, do have data that makes no sense to me -- "1379", "1430", etc.
>
> Unfortunately, this simple thread has led me to a fairly dramatic conclusion: My MLS couldn't implement RETS 2.0 with strong-typing in the schema. I know for a fact that the above example is merely one of many "numeric in concept" fields that would not survive our payload specification... another fine example being "Building Area" (labeled as "Approximate Square Footage"), which includes 157 distinct, non-numeric values ("HUGE","3000+", etc.).
>
> -Matt
>
>
>> -----Original Message-----
>> From: Paul Stusiak [mailto:pstusiak at falcontechnologies.com]
>> Sent: Wednesday, April 25, 2007 5:20 PM
>> To: Matt Lavallee
>> Cc: rets-dev at rets.org
>> Subject: Re: [Rets-dev] Email validation regular expression
>>
>> in-line
>>
>> Matt Lavallee wrote:
>>
>>> Indeed. And, as a matter of practicality, we (business front-ends) don't
>>>
>> want an address like "me at here" being allowed to touch our databases, even
>> though it is perfectly valid. However, we (the transport standard) should be
>> /very/ forgiving about establishing rules around the transfer of data...
>> unless we can unequivocally state that _every_ valid value for the field can
>> be transmitted, we should not risk denying *meaningful* data through the
>> system.
>>
>>>
>> me at here is not valid from RFC2822. Your point is valid however, we
>> should be permissive where appropriate.
>>
>> I'm not certain that we have limited ourselves to only transport. Other
>> standards bodies (MISMO, OSCRE) appear to be working towards data
>> standards and completely ignoring transport. While RETS1 was mostly
>> about transport (metadata is not a transport concern, standard names are
>> an attempt to regularize semantic meaning in the industry), RETS2 is not
>> just about transport.
>>
>> Based on feedback from the business side on RETS1, scope has been
>> increased in RETS2 to address the feedback, taking us beyond transport.
>>
>>> For example, *I* have no problem saying that "year built" should be a fairly
>>>
>> strict numeric field (say, 1700+), however, my MLS has it as a string... with
>> results such as "100+", "1430", "01/2", "04/2" coloring the 300 distinct
>> values in today's data. Now, are those 4500 "NEW" values valid? No. Would I
>> still want them passed to my clients? Absolutely.
>>
>>>
>> I'm not sure what you are getting at here. From the number of questions
>> that I have about this, I wonder if it should be a separate thread.
>>
>> If the element name is year built, are you saying that there are house
>> for sale built in 100 AD/CE, 1430 AD? What does 04/2 mean? Are you
>> describing house age?
>>
>> Also what are you referring to about the 300 distinct values? Are there
>> 300 different years or are there 300 different representations of data
>> in the year built element? Can you provide examples?
>>
>> Finally, what do you mean by 4500 'NEW' values? Are you saying that
>> there are 4500 different representations/combinations of the NEW
>> construction? How can anyone search on that? Is this addressed by the
>> requested addition to the year built element that suggested that it was
>> necessary to add an attribute of new="true" to the element?
>>
>> It is also possible that it will be necessary to add an attribute to
>> indicate that the year built is unknown - "old timer" is often used to
>> represent an unknown year of construction.
>>
>> --
>> Paul Stusiak
>> Falcon Technologies Corp.
>>
>
>
>
>
>
>
--
Paul Stusiak
Falcon Technologies Corp.
More information about the Rets-dev
mailing list