SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Moderate Forum -- Ignore unavailable to you. Want to Upgrade?


To: TigerPaw who wrote (14311)12/3/2004 2:43:30 AM
From: cosmicforce  Read Replies (2) | Respond to of 20773
 
Here's the set of rationale I was applying to the data... see if it makes sense to you.

Raw Data is at the bottom of the post: Message 20819029

If a given topic has an unusual word with more than one spelling in a nominal ratio of 1:1 in "the wild" (webpages represent the baseline - see Data Part I), then as the number of story sources decrease, there will be a natural polarization of the ratio toward one or the other spellings (small sample effect). The extreme case would be a single source, which would use only one spelling: Falluja or Fallujah. As more sources are involved, this ratio would gravitate toward the "wild" ratio of 1:1 (assuming everyone who writes on the topic was queried.)

The "negative" news story manifests a ratio of roughly 2:1 (closer to the "wild" ratio) for "faluja(h) ruins" and of 5:1 for a "positive" news story "falluja(h) reconstruction", and since this is relatively consistent across engines, it would seem to indicate a polarization due to agreement between reporters and average spelling of their sources.

Why would "Falluja" be at a lower frequency (much lower) than "Fallujah" on the same topic? My interpretation is that a smaller (biased) sample of sources is the reason, especially when the topic has a positive spin ("reconstruction", for instance, is a positive spin). When the story casts a negative light on operations (as is the case of "ruins"), there are more sources providing negative coverage, so the ratio is closer to the "wild" ratio of 1:1, but it is still biased because most stories are still coming from Central Command even when they are negative. I believe the "positive" stories are ALMOST EXCLUSIVELY coming from the occupying army and therefore have a tendency to polarize toward one of the spellings. In this case, the spelling they use seems to be "Fallujah".

The Central Limit Theorem dictates that as more samples (reports) are taken from more random faces of the die (sources) spelling ratios typical of the "wild" ratio of 1:1 would start to prevail.

That is not what happens. As the political spin increases in a positive direction, so does the bias toward one of the spellings. I imagine that all stories from the Army are screened and spelling is checked by a small group of people. This screening produces a spelling bias and many of these press releases are used almost verbatim by reporters. This produces the spelling bias in the reported stories as well. I think that a type of social facilitation also occurs -- reporters spell it the way other reporters start spelling it -- so there is a tendency for the spellings of reporters to agree with the sources from the outset. If the Army says it's Fallujah, then most reporters will spell it that way too.