<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-10158849</id><updated>2011-04-21T11:23:33.955-07:00</updated><title type='text'>Web Document Analysis 2005</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>15</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-10158849.post-112618610198336749</id><published>2005-09-08T06:25:00.000-07:00</published><updated>2005-09-08T06:28:21.990-07:00</updated><title type='text'>Slides for Dan Lopresti's Talk</title><content type='html'>Dan has made the slides for his talk available &lt;a href="http://www.cse.lehigh.edu/~lopresti/Talks/2005/WDA05.pdf"&gt;here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112618610198336749?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112618610198336749/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112618610198336749' title='12 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112618610198336749'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112618610198336749'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/09/slides-for-dan-loprestis-talk.html' title='Slides for Dan Lopresti&apos;s Talk'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>12</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-112542142920149532</id><published>2005-08-30T10:03:00.000-07:00</published><updated>2005-08-30T10:03:49.210-07:00</updated><title type='text'>What does document analysis give us?</title><content type='html'>The 3rd Web Document Analysis Workshop closed with an interesting discussion around the provocative question:&lt;br /&gt;&lt;br /&gt;    What does document analysis give us, how can we take advantage of it and how can we encourage it?&lt;br /&gt;&lt;br /&gt;The question was inspired largely by the content of Dan Lopresti's excellent invited talk ('The case of the missing dimension(s)'). Dan observed that traditional systems view web documents as linear sequences of tokens but that they were in fact encodings of two dimensional documents.&lt;br /&gt;&lt;br /&gt;Much of the discussion focused on search: how would document analysis affect search results? A number of responses to this were proposed including:&lt;br /&gt;&lt;br /&gt;    * The interpretation of tabular material.&lt;br /&gt;&lt;br /&gt;For example, if you were interested in climactic information about cities in Korea, you might use the query 'average rainfall seoul pusan'. Thomas Breuel pointed out, quite correctly, that issuing this search would most likely produce a page with the desired tabular data. In later discussions I had with Robert Dale and Vanessa Long, we discussed the notion of search result quality. In other words, relevancy is not the same as quality. In the case of the search for climactic information, imagine a system that given such a query could produce a statistical summary of the results found in all tables (e.g. giving the mean and variance in a super table).&lt;br /&gt;&lt;br /&gt;    * Title and other block segmentation.&lt;br /&gt;&lt;br /&gt;Here the desire is to ensure that adjacency in the linear stream of tokens is not confused with token adjacency in the document. For example, treating the last word in a title or section heading as the first work in a phrase including the initial tokens in the following paragraph.&lt;br /&gt;&lt;br /&gt;    * Accurate PDF search.&lt;br /&gt;&lt;br /&gt;PDF documents, and other layout-weak document encodings are commonly returned in search results. These document pose significant challenges at very low levels. Consequently, a reasonable number of standard document analysis processes need to be run against the document prior to indexing.&lt;br /&gt;&lt;br /&gt;    * Document zoning.&lt;br /&gt;&lt;br /&gt;This is something of particular interest to blog or message board search engines. Web pages are generally made up of a number of functional elements (including title, navigation, adverts, main content). Indexers have not recognition of the significance of these areas, which is why in some cases results that take you to a page may not contain the query that got you there. The blogosphere offers a good example with the inclusion of recently updated blogs on typepad blogs. This list is changing constantly and is almost guaranteed to be different from how it appeared at index time.&lt;br /&gt;&lt;br /&gt;    * Sub-page Documents&lt;br /&gt;&lt;br /&gt;Similar to document zoning, the problem of sub-page documents is familiar to blog search engine implementers. It addresses the fact that the basic unit of content is not the web page, but some smaller unit (e.g. a blog post). In addition, the web page contains many such elements which all need to be indexed individually.&lt;br /&gt;&lt;br /&gt;There was recognition that discussion on search applications makes broad assumptions about use cases and user expectation which have been drilled in to the consumers of such interfaces. The example of a search result returning a summary of tabular data illustrates this point and hints at the potential for new interfaces, new user experiences and new user expectations in the search space.&lt;br /&gt;&lt;br /&gt;Document analysis researchers often view the problem of analysing web pages as a very partitioned space - the web documents must be consumed as is. The second part of the discussion looked at what can be done to assist in the analysis of online documents. A big part of this problem is the inclusion of information in the markup which will help with various tasks. In the case of certain layout elements (e.g. titles) that information is already present. However, for many of the issues raised above, there is now clear standard. It was recognized that there are a number of ad-hoc inclusions (e.g. comments to indicate where ads appear, or where navigation appears). These inclusions may be taken advantage of opportunistically but do not represent a stable path to success.&lt;br /&gt;&lt;br /&gt;As with the inclusion of any novel information, adding in this data is going to be challenging from the human behaviour point of view, though it was recognized that structured blogging and microformats were a start.&lt;br /&gt;&lt;br /&gt;I was encouraged to write these notes sooner rather than later by Abdel Belaid (thanks), but do recognize that these are not minutes of the meetings and include my own personal bias and some subsequent conversations with others. This content will be posted both on the WDA2005 blog and on my own blog. Please comment on the WDA blog only.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112542142920149532?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112542142920149532/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112542142920149532' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112542142920149532'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112542142920149532'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/08/what-does-document-analysis-give-us.html' title='What does document analysis give us?'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-112525464786044561</id><published>2005-08-28T11:42:00.000-07:00</published><updated>2005-08-28T11:44:07.866-07:00</updated><title type='text'>Wrap up</title><content type='html'>We would like to thank everyone who attended the workshop and made it a success. As mentioned, we will put the pdf versions of the presented papers online in the near future (subscribe to the RSS feed for this blog to receive the notification).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112525464786044561?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112525464786044561/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112525464786044561' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112525464786044561'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112525464786044561'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/08/wrap-up.html' title='Wrap up'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-112497093701424005</id><published>2005-08-25T04:54:00.000-07:00</published><updated>2005-08-25T04:55:37.020-07:00</updated><title type='text'>Weather</title><content type='html'>Ethan tells me that the weather in Seoul is not all sunshine. Pack a light jacket and something warm to wear.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112497093701424005?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112497093701424005/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112497093701424005' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112497093701424005'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112497093701424005'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/08/weather.html' title='Weather'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-112489209121838848</id><published>2005-08-24T06:51:00.000-07:00</published><updated>2005-08-24T10:14:11.830-07:00</updated><title type='text'>Schedule</title><content type='html'>The workshop is going to be held in room 'Lily' from 9-5 on the 28th of August. Here is a map of the location: &lt;a href="http://datamining.typepad.com/wda2005/lily2.pdf"&gt;map&lt;/a&gt;. Note that it is on the 3rd floor of the Olympic Parktel. The official ICDAR page states that it is the 4th floor, but the maps provided state the 3rd floor. Either way - see you at 'Lily.'&lt;br /&gt;&lt;br /&gt;The schedule for the day will be as follows:&lt;br /&gt;&lt;br /&gt;9:00 Welcome and Opening Remarks.&lt;br /&gt;9:15 Invited Speaker: Dan Lopresti&lt;br /&gt;10:30 Session 1, 3 talks&lt;br /&gt;&lt;br /&gt;* Using Computer Vision to Detect Web Browser Display Errors: Liu, Doerman&lt;br /&gt;* Link-Based Clustering for Finding Subrelevant Web Pages: Masada, Takasu, Adachi&lt;br /&gt;* Indexing the Blogosphere One Post at a Time: Glance&lt;br /&gt;&lt;br /&gt;12-1:30 lunch break&lt;br /&gt;1:30 Session 2, 3 talks&lt;br /&gt;&lt;br /&gt;* Mining Tables on the Web for Finding Attributes of a Specified Topic: Kise, Ohmae&lt;br /&gt;* PACE: an Experimental Web-based Audiovisual Application using FDL: Caillet, Carrive, Brunie, Roisin&lt;br /&gt;* EMD based Visual Similarity for Detection of Phishing Webpages: Fu, Wenyin, Deng&lt;br /&gt;&lt;br /&gt;3-3:20 break&lt;br /&gt;3:20-4:45: Discussion&lt;br /&gt;4:45-5 Wrap up&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112489209121838848?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112489209121838848/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112489209121838848' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112489209121838848'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112489209121838848'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/08/schedule.html' title='Schedule'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-112377029660836569</id><published>2005-08-11T07:23:00.000-07:00</published><updated>2005-08-11T07:24:56.613-07:00</updated><title type='text'>Invited Talk: Title and Abstract</title><content type='html'>Our invited speaker is Dan Lopresti, LeHigh University.&lt;br /&gt;&lt;br /&gt;Web Document Analysis:  the Case of the Missing&lt;br /&gt;Dimension(s)&lt;br /&gt;&lt;br /&gt;Web documents are inherently multidimensional; yet,&lt;br /&gt;they are frequently processed as though they were a&lt;br /&gt;one-dimensional stream of data.  This&lt;br /&gt;over-simplification has proven remarkably effective in&lt;br /&gt;what can only be termed the infancy of the Web.  Will&lt;br /&gt;this continue to hold true for much longer?  We as&lt;br /&gt;document analysis researchers know better.&lt;br /&gt;&lt;br /&gt;In this talk, I will discuss some of the opportunities&lt;br /&gt;I see for applying and adapting techniques from the&lt;br /&gt;field of document image analysis to Web documents.  I&lt;br /&gt;will also present a proposal for melding vexing&lt;br /&gt;problems from document analysis research with a certain&lt;br /&gt;key need in Web-based security in a way that could&lt;br /&gt;prove immensely beneficial to both communities.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112377029660836569?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112377029660836569/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112377029660836569' title='369 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112377029660836569'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112377029660836569'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/08/invited-talk-title-and-abstract.html' title='Invited Talk: Title and Abstract'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>369</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-112370019781327851</id><published>2005-08-10T11:54:00.000-07:00</published><updated>2005-08-10T11:56:37.816-07:00</updated><title type='text'>Intelliseek Sponsors WDA2005</title><content type='html'>Intelliseek was proud to sponsor the previous workshop and once again has stepped up with sponsorship for this meeting. &lt;br /&gt;&lt;br /&gt;Intelliseek's business revolves around the application of text mining to online and internal data to deliver insights into a number of product and brand related areas. Their technology relies heavily on a number of document analysis sub-systems, in particular those involving web content.&lt;br /&gt;&lt;br /&gt;(Matt Hurst, co-chair, is employed by Intelliseek.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112370019781327851?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112370019781327851/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112370019781327851' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112370019781327851'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112370019781327851'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/08/intelliseek-sponsors-wda2005.html' title='Intelliseek Sponsors WDA2005'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-112290674299181085</id><published>2005-08-01T07:20:00.000-07:00</published><updated>2005-08-09T07:28:26.136-07:00</updated><title type='text'>Registration</title><content type='html'>Registration is now available for WDA2005! We are following the same format as CBDAR (&lt;a href="http://www.aso.ecei.tohoku.ac.jp/cbdar/regist.shtml"&gt;here&lt;/a&gt;).&lt;br /&gt;&lt;ol&gt;   &lt;li&gt;Download this &lt;a href="http://datamining.typepad.com/wda2005/Workshop_Registration_Form1-v2.doc"&gt;form&lt;/a&gt;.&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;&lt;/li&gt;   &lt;li&gt;Fill out the form and&lt;/li&gt;   &lt;li&gt;Fax it to the ICDAR secretariat at +82 42 472 7459.&lt;/li&gt; &lt;/ol&gt; You can expect a confirmation letter within a week. Please mail me (mhurst at intelliseek dot com) if you have any questions!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112290674299181085?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112290674299181085/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112290674299181085' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112290674299181085'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112290674299181085'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/08/registration.html' title='Registration'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-112206559193552191</id><published>2005-07-22T13:38:00.000-07:00</published><updated>2005-07-26T10:55:35.033-07:00</updated><title type='text'>Workshop Outline</title><content type='html'>Now that we have reviewed the submitted papers, we can provide some description of how the workshop is going to be structured. We will be posting a detailed schedule on this blog in the near future. The workshop will include:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;   &lt;li&gt;Introduction by Ethan and Matt,&lt;/li&gt;   &lt;li&gt;Invited talk by Dan Lopresti,&lt;/li&gt;   &lt;li&gt;6 papers on Web Document Analysis,&lt;/li&gt;   &lt;li&gt;Discussion session.&lt;/li&gt; &lt;/ul&gt;&lt;br /&gt;We also hope to include some social event either lunch or dinner.&lt;br /&gt;&lt;br /&gt;Our intention was to have the registration information up today. However, we are currently talking with ICDAR and the other workshop chairs to see if we can centralize this process.&lt;br /&gt;&lt;br /&gt;For the accepted papers, we require the camera ready version by August the 12th. Please email it to mhurst at intelliseek dot com.&lt;br /&gt;&lt;br /&gt;The papers that have been selected are as follows:&lt;br /&gt;&lt;br /&gt;Indexing the Blogosphere One Post at a Time&lt;br /&gt;&lt;br /&gt;Natalie Glance (Intelliseek Applied Research Center)&lt;br /&gt;&lt;br /&gt;In order to perform analysis over weblogs, we must first identify the appropriate unit of a weblog that corresponds to a document. We argue in the paper that, for weblogs, the correct unit is the weblog post. A weblog post is a structured document with the following fields: date, timestamp, title, content, permalink and author. We present our approach for segmenting weblogs into posts, which breaks down into several steps: (1) automatic feed discovery; (2) feed-guided segmentation, using the weblog feed and HTML; and (3) model-based weblog segementation.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Link-Based Clustering for Finding Subrelevant Web Pages&lt;br /&gt;&lt;br /&gt;Tomonari Masada (National Institute of Informatics),&lt;br /&gt;Atsuhiro Takasu (National Institute of Informatics),&lt;br /&gt;Jun Adachi (National Institute of Informatics)&lt;br /&gt;&lt;br /&gt;We propose a new Web page clustering. Typical search engines only provide relevant pages, i.e., the pages matching users' needs. However, we design our clustering method to provide non-relevant pages as search results when they refer to relevant pages and help users anticipate the contents of those relevant pages. We call such pages subrelevant. As it is difficult to improve Web search performance, we use subrelevancy to relax the criterion as to what kind of pages should appear in search results with the least drawback, i.e., one click away from a relevant page. Our clustering method is based on three concepts: THP, out-degree path length, and threshold parameter. We use clustering results to modify the feature vectors of Web pages. Hence, each clustering result induces a reranking of search results. We expect the reranking to raise the ranks of subrelevant pages. In the experiments with NTCIR-3 Web task test collection, our clustering largely improved the average precision by 13 percent in comparison with the baseline.&lt;br /&gt;&lt;br /&gt;Using Computer Vision to Detect Web Browser Display Errors&lt;br /&gt;&lt;br /&gt;Xu Liu (University of Maryland, College Park),&lt;br /&gt;David Doermann (University of Maryland, College Park)&lt;br /&gt;&lt;br /&gt;As the functionality and complexity of the WWW continues to grow so does the need for WWW quality assurance and testing. Although there have been numerous approaches to automated Web testing, existing techniques mainly analyze textual information, and the final judgment on correctness of layout is via human observation. The motivation of this paper is to employ computer vision techniques to detect Web display errors. To do this, we analyze images of the rendered pages rather than the HTML and attempt to discover errors. Our approach includes page segmentation, dynamic matching and outlier identification. We show that the approach successfully detects layout errors in the Opera browser on Microsoft Websites, while minimizing false alarms.&lt;br /&gt;&lt;br /&gt;Mining Tables on the Web for Finding Attributes of a Specified Topic&lt;br /&gt;&lt;br /&gt;Koichi Kise (Osaka Prefecture University),&lt;br /&gt;Nobuhiro Ohmae (Osaka Prefecture University)&lt;br /&gt;&lt;br /&gt;Finding attribute-value pairs from a huge collection of HTML pages is a fundamental task for information extraction from the Web. This paper presents an unsupervised method of mining Web tables for finding attributes of a topic specified by the user. The proposed method is based on the assumption that the occurrence of text strings representing attributes is biased to the first rows and columns in tables. The $\chi2$-test is employed to find attribute candidates based on the assumption. Identification of attribute rows and columns using the candidates enables us to improve the accuracy of extraction. The experimental results using 2,700 pages show that precision of extraction is 80\%.&lt;br /&gt;&lt;br /&gt;PACE: an Experimental Web-Based Audiovisual Application using FDL&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Marc Caillet (INRIA Rhône-Alpes, INA),&lt;br /&gt;Jean Carrive (INA),&lt;br /&gt;Vincent Brunie (INA),&lt;br /&gt;Cécile Roisin (INRIA)&lt;br /&gt;&lt;br /&gt;This paper describes the PACE experimental multimedia application that aims at providing automatic tools for television show collections web browsing; experimentations are currently in progress with a fifty-four Le Grand Echiquier show collection. PACE is being built within the FERIA framework and relies on multiple automatic analysis tools. It is thus flexible enough to easily adapt to other collections. Emphasis is then being made on the brand new audiovisual documents description language FDL as it is the core part of FERIA, with a particular attention paid on how it operates in PACE.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;EMD based Visual Similarity for Detection of Phishing Webpages&lt;br /&gt;&lt;br /&gt;Yingjie Fu (City University of Hong Kong),&lt;br /&gt;Liu Wenyin (City University of Hong Kong),&lt;br /&gt;Xiaotie Deng (City University of Hong Kong)&lt;br /&gt;&lt;br /&gt;Phishing has become a severe problem in the Internet society. We propose an effective phishing webpage detection approach using EMD (Earth Mover¯s Distance) based visual similarity of webpages. Both suspected webpage and protected webpage are first preprocessed into low resolution images respectively. The image level colors and coordinate features are used to represent the image signatures. We then use the EMD method to calculate the signature distances of the two images as their visual similarity. When the visual similarity value is higher than a threshold, we classify the suspected webpage as a phishing webpage to the protected one. As our approach is based on image level color and coordinate features rather than HTML, webpage obfuscation scams are cracked. Large scale experiments with 10,279 suspected webpages are carried out to show high classification precision, phishing recall and applicable time performance for online enterprise solution.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-112206559193552191?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/112206559193552191/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=112206559193552191' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112206559193552191'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/112206559193552191'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/07/workshop-outline.html' title='Workshop Outline'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-111722763951411229</id><published>2005-05-27T13:59:00.000-07:00</published><updated>2005-05-27T14:00:39.520-07:00</updated><title type='text'>EXTENSION</title><content type='html'>The deadline for submissions has been extended to June 5th - there is still time to &lt;br /&gt;get those papers written!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-111722763951411229?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/111722763951411229/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=111722763951411229' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/111722763951411229'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/111722763951411229'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/05/extension.html' title='EXTENSION'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-111342137824438475</id><published>2005-04-13T12:38:00.000-07:00</published><updated>2005-04-13T12:42:58.243-07:00</updated><title type='text'>Submission Date Change</title><content type='html'>It has been brought to our attention that the submission date published here&lt;br /&gt;and distributed in the CFP is incorrect. Specifically, it conflicts with the&lt;br /&gt;ICDAR requirement that workshop submission dates fall *after* the April 30th&lt;br /&gt;deadline for ICDAR. Consequently, we are moving the deadline to May 15th and&lt;br /&gt;apologise for having missed this issue until so late in the day. We look&lt;br /&gt;forward to your papers!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-111342137824438475?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/111342137824438475/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=111342137824438475' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/111342137824438475'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/111342137824438475'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/04/submission-date-change.html' title='Submission Date Change'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-111333142522416798</id><published>2005-04-12T11:42:00.000-07:00</published><updated>2005-04-12T12:13:56.796-07:00</updated><title type='text'>Submission Information</title><content type='html'>For paper submission and review, WDA 2005 is using the JEMS system (formerly EDAS) supported by the Brazilian Computer Society. The address for direct access to WDA 2005 is:&lt;br /&gt;&lt;quote&gt;&lt;br /&gt;  &lt;a class="moz-txt-link-freetext" href="https://submissoes.sbc.org.br/home.cgi?c=180"&gt;https://submissoes.sbc.org.br/home.cgi?c=180&lt;/a&gt;&lt;br /&gt;&lt;/quote&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;JEMS has a multi-step submission process:&lt;br /&gt;&lt;OL&gt;&lt;br /&gt;&lt;LI&gt;First, if you have never used JEMS/EDAS before, you will need to create an account. There is a link to do this below the login fields on the entry page.&lt;br /&gt;&lt;br /&gt;&lt;LI&gt;Second, you "register" your paper. After you login to JEMS/EDAS, you will see a button labeled "Submit paper". If you click this button, you can complete a form that lists the authors, title and abstract of the paper. The abstract should be short (150-300 words). If any author does not have a JEMS/EDAS account, you will be asked for the information needed to create an account for that author. You can register your paper before you are ready to upload the final version for review.&lt;br /&gt;&lt;br /&gt;&lt;LI&gt; Third, you upload the final paper. You can either do this right away or you can upload the paper sometime later. There are a number of other options possible, including changing the author list and title.&lt;br /&gt;&lt;/OL&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-111333142522416798?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/111333142522416798/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=111333142522416798' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/111333142522416798'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/111333142522416798'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/04/submission-information.html' title='Submission Information'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-111271539187917773</id><published>2005-04-05T08:31:00.000-07:00</published><updated>2005-04-05T08:36:31.880-07:00</updated><title type='text'>Submissions</title><content type='html'>We will be announcing the submission procedure for WDA 2005 soon.&lt;br /&gt;Please check back here soon for details.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-111271539187917773?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/111271539187917773/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=111271539187917773' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/111271539187917773'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/111271539187917773'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/04/submissions.html' title='Submissions'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-110619095084784989</id><published>2005-01-19T19:09:00.000-08:00</published><updated>2005-05-27T18:24:44.213-07:00</updated><title type='text'>Call For Participation</title><content type='html'>&lt;pre&gt;WDA2005 Call for Participation&lt;br /&gt;Third International Workshop on Web Document Analysis&lt;br /&gt;&lt;br /&gt;August 28, 2005&lt;br /&gt;Seoul, Korea&lt;br /&gt;(co-located with ICDAR2005)&lt;br /&gt;&lt;br /&gt;&lt;a class="moz-txt-link-freetext" href="http://wda2005.blogspot.com/"&gt;http://wda2005.blogspot.com/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;** Short Papers Due May 15, 2005 ** CHANGED FROM APRIL 15th.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;CALL FOR SUBMISSIONS&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;h2&gt;Background&lt;/h2&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;With the ever-increasing use of the Web, a growing number of documents  are&lt;br /&gt;published and accessed on-line. The emerging issues pose new challenges  for&lt;br /&gt;Document Analysis. The need is evident for further discussion to  identify&lt;br /&gt;the role of Document Analysis in Web applications.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;While there has been active research on Web Content Extraction using&lt;br /&gt;text-based techniques, documents are in fact 2-dimensional entities, and&lt;br /&gt;often include multimedia content. Hence, techniques that have been  developed&lt;br /&gt;for image-based documents could prove valuable in the realm of Web&lt;br /&gt;documents, and new methods for the analysis of multimedia content will  be&lt;br /&gt;required.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Following the success of the first two WDA workshops in Seattle, USA  (WDA2001)&lt;br /&gt;and in Edinburgh, UK (WDA2003) the series continues with WDA2005 in  Seoul,&lt;br /&gt;Korea. The aim of the workshop is to bring together researchers from the&lt;br /&gt;Document Analysis and Web communities such as Web Content Extraction,  Web&lt;br /&gt;Publishing, Digital Libraries and e-Commerce Security, to share  experiences&lt;br /&gt;and discuss possible avenues for future collaboration.&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;h2&gt;Technical Focus&lt;/h2&gt;&lt;br /&gt;&lt;p&gt;This workshop is intended as a forum for discussing emerging issues in&lt;br /&gt;document analysis in Web environments. Special attention will be given  to&lt;br /&gt;new applications and requirements created by opportunities on the  Internet&lt;br /&gt;(in the area of multimedia document analysis and management). We invite&lt;br /&gt;contributions in areas including but not limited to the following:&lt;br /&gt;&lt;/p&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Layout analysis of Web documents and its applications to content&lt;br /&gt;extraction, multimodal access and Web mining.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Document understanding and semantic tagging for web services.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Digital document models: homogeneous representation of structured&lt;br /&gt;documents, hypertext and multimedia components.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Structural feature extraction for concept learning, extraction and&lt;br /&gt;retrieval.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Automated and semi-automated wrapping methods for information  extraction.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Knowledge integration from heterogeneous collections of documents.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Theme extraction and document clustering/visualization.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Access of textual information embedded in Internet images.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Document image processing for Internet: data compression, color  analysis,&lt;br /&gt;representation and coding for multiresolution or  resolution-independent&lt;br /&gt;images.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Collaborative annotation and manipulation of documents on the Web.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Authoring, editing and presentation systems for complex multimedia&lt;br /&gt;documents.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Enterprise applications: intranet and workflow management.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Document analysis and reformatting for multimodal interfaces.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Web content summarization and repurposing for mobile access.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Cross-language multi-web-document summarization / knowledge  integration.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Web Security and Image Understanding (CAPTCHAs).&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; Data collection and evaluation methods.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h2&gt;Workshop Format&lt;/h2&gt;&lt;br /&gt;&lt;p&gt;WDA 2005 is planned to be a one-day single-track event. Participants are&lt;br /&gt;expected to give a short description of their work (submitted for  review in the&lt;br /&gt;form of a short paper) and participate in the discussions.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;The workshop will consist of short presentations grouped in thematic&lt;br /&gt;sessions. In addition there will be sessions focusing discussion on&lt;br /&gt;a number of specific topics as yet to be determined.&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;h2&gt;Publications&lt;/h2&gt;&lt;br /&gt;&lt;p&gt;As in the previous editions of the workshop, accepted short papers will  published&lt;br /&gt;in print for distribution at the workshop.  Other formats also being  considered&lt;br /&gt;include the conference  Web site (for digests of discussions) and  publication&lt;br /&gt;of expanded versions of papers in a book.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;h2&gt;Submission Information&lt;/h2&gt;&lt;br /&gt;&lt;p&gt;We invite the submission of original, previously unpublished work.  Papers&lt;br /&gt;should identify current/future needs, open problems and discuss the  authors'&lt;br /&gt;view of the subject and overall direction. Papers describing work in&lt;br /&gt;progress are also encouraged.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;We also welcome, with some restrictions, submissions that are closely&lt;br /&gt;related to work submitted also to ICDAR2005. Authors can use this  workshop&lt;br /&gt;as a forum to present work that differs materially from their ICDAR&lt;br /&gt;presentations, in any of several ways:&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;recent results too late for the ICDAR deadline;&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; methodological issues facing the WDA community;&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt; proposals for community-wide data sets, experiments, competitions,&lt;br /&gt;websites etc.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Papers should be submitted via the Web in camera-ready format and should&lt;br /&gt;not exceed 4 printed pages. The format adopted is that of the IEEE-CS&lt;br /&gt;Conference Publications and is the same as that of ICDAR2005.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;br /&gt;Full details of the formatting instructions, a sample document and  templates&lt;br /&gt;for LaTeX and MS-Word users can be found at the ICDAR2005 submissions  site&lt;br /&gt;(&lt;a class="moz-txt-link-freetext" href="http://image.korea.ac.kr/icdar2005/paper.html"&gt;http://image.korea.ac.kr/icdar2005/paper.html&lt;/a&gt;). PDF is strongly  preferred as&lt;br /&gt;the submission format, though PostScript may also be used.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;h2&gt;Important Dates&lt;/h2&gt;&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;Paper-submission due   15 April 2005&lt;br /&gt;Author Notification    31 May 2005&lt;br /&gt;Camera-ready copy due  30 June 2005&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;p&gt;&lt;br /&gt;Please check the workshop web site at &lt;a class="moz-txt-link-freetext" href="http://wda2005.blogspot.com/"&gt;http://wda2005.blogspot.com/&lt;/a&gt;&lt;br /&gt;for more details and the latest update.&lt;br /&gt;&lt;/p&gt; &lt;p&gt;&lt;br /&gt;&lt;/p&gt; &lt;pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;h2&gt;Programme Committee&lt;/h2&gt;&lt;br /&gt;&lt;PRE&gt;&lt;br /&gt;Apostolos ANTONACOPOULOS&lt;br /&gt;Henry BAIRD&lt;br /&gt;Thomas BREUEL&lt;br /&gt;Horst BUNKE&lt;br /&gt;Andreas DENGEL&lt;br /&gt;David DOERMANN&lt;br /&gt;Jianying HU&lt;br /&gt;Rolf INGOLD&lt;br /&gt;Peter KING&lt;br /&gt;Koichi KISE&lt;br /&gt;Nicholas KUSHMERICK&lt;br /&gt;Dan LOPRESTI&lt;br /&gt;Fuad RAHMAN&lt;br /&gt;Cecile ROISIN&lt;br /&gt;Larry SPITZ&lt;br /&gt;Ah-Hwee TAN&lt;br /&gt;Chew-Lim TAN&lt;br /&gt;Christine VANOIRBEEK &lt;br /&gt;&lt;br /&gt;&lt;/PRE&gt;&lt;br /&gt;Sincerely,&lt;br /&gt;&lt;br /&gt;Matthew Hurst, Intelliseek, USA&lt;br /&gt;Ethan V. Munson, University of Wisconsin-Milwaukee&lt;br /&gt;Co-Chairs&lt;br /&gt;&lt;br /&gt;Matthew Hurst: mhurst atsign intelliseek dot com&lt;br /&gt;Ethan Munson: munson atsign cs dot uwm dot edu&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-110619095084784989?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/110619095084784989/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=110619095084784989' title='120 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/110619095084784989'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/110619095084784989'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/01/call-for-participation.html' title='Call For Participation'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>120</thr:total></entry><entry><id>tag:blogger.com,1999:blog-10158849.post-110573306570558267</id><published>2005-01-14T13:03:00.000-08:00</published><updated>2005-01-14T12:04:25.706-08:00</updated><title type='text'>Welcome to WDA 2005</title><content type='html'>Welcome to WDA 2005&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/10158849-110573306570558267?l=wda2005.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://wda2005.blogspot.com/feeds/110573306570558267/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=10158849&amp;postID=110573306570558267' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/110573306570558267'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/10158849/posts/default/110573306570558267'/><link rel='alternate' type='text/html' href='http://wda2005.blogspot.com/2005/01/welcome-to-wda-2005.html' title='Welcome to WDA 2005'/><author><name>Web Document Analysis 2005</name><uri>http://www.blogger.com/profile/13615073355106440768</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
