[Rets-dev] Email validation regular expression

Paul Stusiak pstusiak at falcontechnologies.com
Wed Apr 25 20:17:43 CDT 2007


There were many valid observations made at the meeting. Many of the 
observations were precisely what was needed - what is missing. I think 
that you overstate the case that it is not possible to improve the type 
information.

Please explain to me how someone would search on the records you 
describe below that do not contain meaningful information? How would I 
find obtain a match for a property that had an age of 04/2 when I see an 
element name of Year Built.

Specifically, I disagree with your definition of meaningful and do not 
see any meaningful data in the case of 3310 records (7835 - 4525 those 
records that are not 4 digits and not NEW - this was discussed and a 
solution was proposed to fix the NEW question). There is the very valid 
epistle of garbage in garbage out and if you want the data to have some 
value and meaning it had better be something that other people can use.

Basically, it sounds like your MLS does not have any validation rules on 
input or has a minimal set of rules that do not enforce any meaningful 
limit on what can be entered.

I do not agree with your assessment that your MLS cannot implement RETS2 
using stronger data typing. If they wish to retain the information, it 
would be meaningful for the MLS to extend the schema to provide a 
YearBuiltDescription element that the non-numeric, non-NEW documents 
would map the data while the numeric and NEW records would map to the 
standard YearBuilt element.

The point about area ranges were brought up during the meeting and are 
being addressed.

Where the data is to be used in a transaction, having such weakly typed 
information will result in an unusable system. Where the data is to be 
used to market a property, having such weakly typed information will 
result in unsearchable records and will also be unsuited to the 
described purpose. Where the data is to be used in other normal MLS type 
functions, CMA or AVM functions, the records will not be usable for that 
purpose (assuming that the elements were used in the AVM or CMA 
function). If I am trying to compare like properties, I need to find 
like properties.

I think that there is a pretty compelling case to be made for better 
data quality if we are to keep the MLS relevant and to best serve the 
Agents.

As I see it, we should be able to help those who have legacy data that 
they deem important by allowing them to extend the schema while laying 
down a clear path to better information by having the type information 
available to assist in building systems.

P

Matt Lavallee wrote:
> Following on Steve's well-put response (and Michael and Sergio's affirmations), my point was that RETS is still a transport layer and not a data standardization effort (as far as I know).  Any normalizing rules, while well-intended, will potentially eliminate *meaningful* data _because_ of RETS.
>
> Regarding my specific example...
> Our MLS defines "Year Built" as varchar(4) in their database, and the Realtors have taken liberty with expressing the property's age in that field.  Of the 116545 active or contingent residential properties, 7835 do not have four-digit years in the field and are not blank, and of those 4525 have "NEW".  Also of the 116545 listings, there are 300 distinct values in that field.  36991 are blank, which should be valid through some nullability rule.  Some of the remainder are meaningful expressions of the _intention_ of the field ("60s", "OLD", "TBB", "04/2", etc.).  The "04/2" (and so on) to my eye means February 2004 -- in this case the Realtor being more specific than the field intended.  Some, a spare few, do have data that makes no sense to me -- "1379", "1430", etc.
>
> Unfortunately, this simple thread has led me to a fairly dramatic conclusion: My MLS couldn't implement RETS 2.0 with strong-typing in the schema.  I know for a fact that the above example is merely one of many "numeric in concept" fields that would not survive our payload specification... another fine example being "Building Area" (labeled as "Approximate Square Footage"), which includes 157 distinct, non-numeric values ("HUGE","3000+", etc.).
>
> -Matt
>
>   
>> -----Original Message-----
>> From: Paul Stusiak [mailto:pstusiak at falcontechnologies.com]
>> Sent: Wednesday, April 25, 2007 5:20 PM
>> To: Matt Lavallee
>> Cc: rets-dev at rets.org
>> Subject: Re: [Rets-dev] Email validation regular expression
>>
>> in-line
>>
>> Matt Lavallee wrote:
>>     
>>> Indeed.  And, as a matter of practicality, we (business front-ends) don't
>>>       
>> want an address like "me at here" being allowed to touch our databases, even
>> though it is perfectly valid.  However, we (the transport standard) should be
>> /very/ forgiving about establishing rules around the transfer of data...
>> unless we can unequivocally state that _every_ valid value for the field can
>> be transmitted, we should not risk denying *meaningful* data through the
>> system.
>>     
>>>       
>> me at here is not valid from RFC2822. Your point is valid however, we
>> should be permissive where appropriate.
>>
>> I'm not certain that we have limited ourselves to only transport. Other
>> standards bodies (MISMO, OSCRE) appear to be working towards data
>> standards and completely ignoring transport. While RETS1 was mostly
>> about transport (metadata is not a transport concern, standard names are
>> an attempt to regularize semantic meaning in the industry), RETS2 is not
>> just about transport.
>>
>> Based on feedback from the business side on RETS1, scope has been
>> increased in RETS2 to address the feedback, taking us beyond transport.
>>     
>>> For example, *I* have no problem saying that "year built" should be a fairly
>>>       
>> strict numeric field (say, 1700+), however, my MLS has it as a string... with
>> results such as "100+", "1430", "01/2", "04/2" coloring the 300 distinct
>> values in today's data.  Now, are those 4500 "NEW" values valid?  No.  Would I
>> still want them passed to my clients?  Absolutely.
>>     
>>>       
>> I'm not sure what you are getting at here. From the number of questions
>> that I have about this, I wonder if it should be a separate thread.
>>
>> If the element name is year built, are you saying that there are house
>> for sale built in 100 AD/CE, 1430 AD? What does 04/2 mean? Are you
>> describing house age?
>>
>> Also what are you referring to about the 300 distinct values? Are there
>> 300 different years or are there 300 different representations of data
>> in the year built element? Can you provide examples?
>>
>> Finally, what do you mean by 4500 'NEW' values? Are you saying that
>> there are 4500 different representations/combinations of the NEW
>> construction? How can anyone search on that? Is this addressed by the
>> requested addition to the year built element that suggested that it was
>> necessary to add an attribute of new="true" to the element?
>>
>> It is also possible that it will be necessary to add an attribute to
>> indicate that the year built is unknown - "old timer" is often used to
>> represent an unknown year of construction.
>>
>> --
>> Paul Stusiak
>> Falcon Technologies Corp.
>>     
>
>
>
>
>
>   

-- 
Paul Stusiak
Falcon Technologies Corp.



More information about the Rets-dev mailing list