[Rets-dev] Email validation regular expression

Matt Lavallee matt at pmptechnology.com
Wed Apr 25 19:34:55 CDT 2007


Following on Steve's well-put response (and Michael and Sergio's affirmations), my point was that RETS is still a transport layer and not a data standardization effort (as far as I know).  Any normalizing rules, while well-intended, will potentially eliminate *meaningful* data _because_ of RETS.

Regarding my specific example...
Our MLS defines "Year Built" as varchar(4) in their database, and the Realtors have taken liberty with expressing the property's age in that field.  Of the 116545 active or contingent residential properties, 7835 do not have four-digit years in the field and are not blank, and of those 4525 have "NEW".  Also of the 116545 listings, there are 300 distinct values in that field.  36991 are blank, which should be valid through some nullability rule.  Some of the remainder are meaningful expressions of the _intention_ of the field ("60s", "OLD", "TBB", "04/2", etc.).  The "04/2" (and so on) to my eye means February 2004 -- in this case the Realtor being more specific than the field intended.  Some, a spare few, do have data that makes no sense to me -- "1379", "1430", etc.

Unfortunately, this simple thread has led me to a fairly dramatic conclusion: My MLS couldn't implement RETS 2.0 with strong-typing in the schema.  I know for a fact that the above example is merely one of many "numeric in concept" fields that would not survive our payload specification... another fine example being "Building Area" (labeled as "Approximate Square Footage"), which includes 157 distinct, non-numeric values ("HUGE","3000+", etc.).

-Matt

> -----Original Message-----
> From: Paul Stusiak [mailto:pstusiak at falcontechnologies.com]
> Sent: Wednesday, April 25, 2007 5:20 PM
> To: Matt Lavallee
> Cc: rets-dev at rets.org
> Subject: Re: [Rets-dev] Email validation regular expression
> 
> in-line
> 
> Matt Lavallee wrote:
> > Indeed.  And, as a matter of practicality, we (business front-ends) don't
> want an address like "me at here" being allowed to touch our databases, even
> though it is perfectly valid.  However, we (the transport standard) should be
> /very/ forgiving about establishing rules around the transfer of data...
> unless we can unequivocally state that _every_ valid value for the field can
> be transmitted, we should not risk denying *meaningful* data through the
> system.
> >
> >
> me at here is not valid from RFC2822. Your point is valid however, we
> should be permissive where appropriate.
> 
> I'm not certain that we have limited ourselves to only transport. Other
> standards bodies (MISMO, OSCRE) appear to be working towards data
> standards and completely ignoring transport. While RETS1 was mostly
> about transport (metadata is not a transport concern, standard names are
> an attempt to regularize semantic meaning in the industry), RETS2 is not
> just about transport.
> 
> Based on feedback from the business side on RETS1, scope has been
> increased in RETS2 to address the feedback, taking us beyond transport.
> > For example, *I* have no problem saying that "year built" should be a fairly
> strict numeric field (say, 1700+), however, my MLS has it as a string... with
> results such as "100+", "1430", "01/2", "04/2" coloring the 300 distinct
> values in today's data.  Now, are those 4500 "NEW" values valid?  No.  Would I
> still want them passed to my clients?  Absolutely.
> >
> >
> I'm not sure what you are getting at here. From the number of questions
> that I have about this, I wonder if it should be a separate thread.
> 
> If the element name is year built, are you saying that there are house
> for sale built in 100 AD/CE, 1430 AD? What does 04/2 mean? Are you
> describing house age?
> 
> Also what are you referring to about the 300 distinct values? Are there
> 300 different years or are there 300 different representations of data
> in the year built element? Can you provide examples?
> 
> Finally, what do you mean by 4500 'NEW' values? Are you saying that
> there are 4500 different representations/combinations of the NEW
> construction? How can anyone search on that? Is this addressed by the
> requested addition to the year built element that suggested that it was
> necessary to add an attribute of new="true" to the element?
> 
> It is also possible that it will be necessary to add an attribute to
> indicate that the year built is unknown - "old timer" is often used to
> represent an unknown year of construction.
> 
> --
> Paul Stusiak
> Falcon Technologies Corp.





More information about the Rets-dev mailing list