<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>archive-crawler at Yahoo! Groups</title>
    <link>http://tech.groups.yahoo.com/group/archive-crawler/</link>
    <description>archive-crawler</description>

    <item>
      <title>about dns</title>
      <pubDate>Wed, 10 Feb 2010 03:12:31 GMT</pubDate>
      <dc:creator>郭芸</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6379</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6379</guid>
      <description>Hi, why  heritrix run for some time,it can not get any new url ,because it has been running dns?</description>
    </item>
    <item>
      <title>Re: Updating crawler-beans.cxml via REST API</title>
      <pubDate>Mon, 08 Feb 2010 16:39:03 GMT</pubDate>
      <dc:creator>Daniel Truemper</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6378</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6378</guid>
      <description>Hi! ... This is very strange! I have again debugged through the code and as far as I can see, there should not be any reason for the rename to fail. The</description>
    </item>
    <item>
      <title>Re: Budget &amp; reporting on exhausted and retired queues</title>
      <pubDate>Fri, 05 Feb 2010 20:38:36 GMT</pubDate>
      <dc:creator>Gordon Mohr</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6377</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6377</guid>
      <description>... It appears you&#39;re using H1. Yes, queues with zero &#39;currentSize&#39; in the &#39;all&#39; report are empty (aka &#39;exhausted&#39;). Queues with &#39;totalSpend/totalBudget&#39; with</description>
    </item>
    <item>
      <title>Budget &amp; reporting on exhausted and retired queues</title>
      <pubDate>Fri, 05 Feb 2010 11:21:52 GMT</pubDate>
      <dc:creator>Nicolas Giraud</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6376</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6376</guid>
      <description>Hi, I have browsed the code in WorkQueueFrontier, and had a look at the JMX service method frontierReport. I would like to know how I could obtain a report</description>
    </item>
    <item>
      <title>Re: H3: Severe Problem or misconfiguration</title>
      <pubDate>Fri, 05 Feb 2010 08:57:47 GMT</pubDate>
      <dc:creator>t.schoellhorn</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6375</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6375</guid>
      <description>Dear Gordon, I just checked my disks and they seemed to be fine. Now I am running that crawl again and will report if this error can be reproduced. It might</description>
    </item>
    <item>
      <title>Re: Heritrix traffic pattern over time</title>
      <pubDate>Fri, 05 Feb 2010 05:15:07 GMT</pubDate>
      <dc:creator>Derek Pappas</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6374</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6374</guid>
      <description>$ uname -a Linux 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008 x86_64 x86_64 x86_64 GNU/Linux $ java -version java version &quot;1.6.0&quot; OpenJDK  Runtime</description>
    </item>
    <item>
      <title>Re: Heritrix traffic pattern over time</title>
      <pubDate>Fri, 05 Feb 2010 01:06:48 GMT</pubDate>
      <dc:creator>Gordon Mohr</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6373</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6373</guid>
      <description>... Actually, they still aren&#39;t coming through (to me or in the archives - &lt;http://tech.groups.yahoo.com/group/archive-crawler/message/6371&gt;). But, let&#39;s just</description>
    </item>
    <item>
      <title>Re: H1 Checkpointing Re: [archive-crawler] Misc questions</title>
      <pubDate>Thu, 04 Feb 2010 23:45:18 GMT</pubDate>
      <dc:creator>Derek Pappas</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6372</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6372</guid>
      <description>... Will if the problem reoccurs. Thanks, Derek</description>
    </item>
    <item>
      <title>Re: Heritrix traffic pattern over time</title>
      <pubDate>Thu, 04 Feb 2010 23:43:31 GMT</pubDate>
      <dc:creator>Derek Pappas</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6371</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6371</guid>
      <description>Here are the images that you could not see in the other post. Below is the toe-thread report. There are quite a few threads with step: ABOUT_TO_BEGIN_PROCESSOR</description>
    </item>
    <item>
      <title>H1 Checkpointing Re: [archive-crawler] Misc questions</title>
      <pubDate>Thu, 04 Feb 2010 23:00:54 GMT</pubDate>
      <dc:creator>Gordon Mohr</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6370</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6370</guid>
      <description>From another thread: ... If you have a reproduceable hang, or time to investigate an intermittent occurrence, it would be useful to capture: - general</description>
    </item>
    <item>
      <title>Re: how can I get the queued urls?</title>
      <pubDate>Thu, 04 Feb 2010 22:55:47 GMT</pubDate>
      <dc:creator>Gordon Mohr</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6369</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6369</guid>
      <description>... The exact techniques available depend on the version you&#39;re using. Some possibilities include: - a frontier &#39;dump-pending-at-close&#39;/dumpPendingAtClose</description>
    </item>
    <item>
      <title>Re: Heritrix traffic pattern over time</title>
      <pubDate>Thu, 04 Feb 2010 19:39:24 GMT</pubDate>
      <dc:creator>Gordon Mohr</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6368</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6368</guid>
      <description>FYI, the images did not display for me in this message; a paste of the text would be as useful. Swapping won&#39;t be an issue here, but it could still be: - a</description>
    </item>
    <item>
      <title>Re: Heritrix traffic pattern over time</title>
      <pubDate>Thu, 04 Feb 2010 19:00:00 GMT</pubDate>
      <dc:creator>Derek Pappas</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6367</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6367</guid>
      <description>Here is another example on a quad core that started out at 16 URI&#39;s/ sec and is down to 8. heap used         : 1.4 GB heap max/allocated: 1.8 GB RAM</description>
    </item>
    <item>
      <title>Re: H3: Severe Problem or misconfiguration</title>
      <pubDate>Thu, 04 Feb 2010 18:37:37 GMT</pubDate>
      <dc:creator>Gordon Mohr</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6366</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6366</guid>
      <description>The SEVERE errors you&#39;re seeing suggest some corruption in the on-disk queues such that they don&#39;t have the expected structure. Most helpful in understanding</description>
    </item>
    <item>
      <title>crawler-beans.cxml</title>
      <pubDate>Thu, 04 Feb 2010 17:27:58 GMT</pubDate>
      <dc:creator>t.schoellhorn</dc:creator>
      <link>http://tech.groups.yahoo.com/group/archive-crawler/message/6365</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/archive-crawler/message/6365</guid>
      <description>Here is my configuration which is a rather minor variation of the standard XML-File: &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt; &lt;!-- HERITRIX 3 CRAWL JOB</description>
    </item>

  </channel>
</rss>
<!-- wr2.grp.sp2.yahoo.com uncompressed/chunked Sun Mar 28 14:33:34 PDT 2010 -->
