<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>bixo-dev at Yahoo! Groups</title>
    <link>http://tech.groups.yahoo.com/group/bixo-dev/</link>
    <description>Bixo</description>

    <item>
      <title>Latest updates in trunk/master</title>
      <pubDate>Tue, 05 Jan 2010 17:08:28 GMT</pubDate>
      <dc:creator>Ken Krugler</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/310</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/310</guid>
      <description>Hi all, I just pushed a set of major changes to trunk. The biggest change was new queue management during robots.txt processing and fetching, to avoid OOM</description>
    </item>
    <item>
      <title>Re: HTTP 204</title>
      <pubDate>Mon, 04 Jan 2010 03:06:19 GMT</pubDate>
      <dc:creator>Fuad Efendi</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/309</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/309</guid>
      <description>int httpStatus = response.getStatusLine().getStatusCode(); if (httpStatus != HttpStatus.SC_OK) { throw new HttpFetchException(url, &quot;Error fetching &quot; + url, </description>
    </item>
    <item>
      <title>Re: HTTP 204</title>
      <pubDate>Mon, 04 Jan 2010 02:34:58 GMT</pubDate>
      <dc:creator>Fuad Efendi</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/308</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/308</guid>
      <description>Hi Ken, Yes, I am using old (1-month probably) code from trunk; I did a lot of changes and can&#39;t easily move to latest version. My changes include additional</description>
    </item>
    <item>
      <title>Re: HTTP 204</title>
      <pubDate>Sun, 03 Jan 2010 23:36:43 GMT</pubDate>
      <dc:creator>Ken Krugler</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/307</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/307</guid>
      <description>Hi Fuad, I&#39;ve updated the code to avoid throwing an exception, but this indicates a potential problem someplace else. The HttpFetchException.mapToUrlStatus()</description>
    </item>
    <item>
      <title>Re: Fetching 242 URLs from 0 domains (251646 URLs remaining)</title>
      <pubDate>Sat, 02 Jan 2010 22:46:32 GMT</pubDate>
      <dc:creator>Ken Krugler</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/306</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/306</guid>
      <description>Hi Fuad, I just tightened up some of the code that increments/decrements status counters, to catch some cases where exceptions would cause the counters to not</description>
    </item>
    <item>
      <title>HTTP 204</title>
      <pubDate>Sat, 02 Jan 2010 18:30:38 GMT</pubDate>
      <dc:creator>Fuad Efendi</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/305</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/305</guid>
      <description>I was running fetch (3rd loop) over 24 hours, and finally... RuntimeException, job failed: [HttpFetchException] default: if (_httpStatus &lt; 300) { throw new</description>
    </item>
    <item>
      <title>Re: OutOfMemoryError</title>
      <pubDate>Thu, 31 Dec 2009 17:25:13 GMT</pubDate>
      <dc:creator>Fuad Efendi</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/304</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/304</guid>
      <description>Hi Ken, My use case: 3 reducers, over 30k hosts (including subdomains). So that I was forced to set in-memory limit 10 for DiskQueue, and now to limit </description>
    </item>
    <item>
      <title>Re: OutOfMemoryError</title>
      <pubDate>Thu, 31 Dec 2009 16:02:30 GMT</pubDate>
      <dc:creator>Ken Krugler</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/303</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/303</guid>
      <description>Hi Fuad, I&#39;ve got a version I&#39;m testing now that returns false if all of the threads are active. Which means that in the case you mention (every site has some</description>
    </item>
    <item>
      <title>Re: OutOfMemoryError</title>
      <pubDate>Thu, 31 Dec 2009 15:00:41 GMT</pubDate>
      <dc:creator>Fuad Efendi</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/302</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/302</guid>
      <description>OOM Disappeared after I implemented this (draft; FetcherQueueMgr): public boolean offer(FetcherQueue newQueue) { if (_queues.size()&gt;100) return false; </description>
    </item>
    <item>
      <title>Re: OutOfMemoryError</title>
      <pubDate>Tue, 29 Dec 2009 15:30:10 GMT</pubDate>
      <dc:creator>Fuad Efendi</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/301</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/301</guid>
      <description>I believe OOM still happens because FetcherQueueMgr.offer() is not fully implemented yet; I started crawl with 10k domains and 3 reducers, using IP for</description>
    </item>
    <item>
      <title>OutOfMemoryError</title>
      <pubDate>Tue, 29 Dec 2009 15:05:41 GMT</pubDate>
      <dc:creator>Fuad Efendi</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/300</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/300</guid>
      <description>I am again with java.lang.OutOfMemoryError (Fetcher/Reducer) After implementing DiskQueue (in my environment), I run 2 first iterations without any problems,</description>
    </item>
    <item>
      <title>Re: New domain name</title>
      <pubDate>Wed, 23 Dec 2009 08:37:40 GMT</pubDate>
      <dc:creator>bruno_abitbol</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/299</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/299</guid>
      <description>Hi Ken, personally I would prefer bixominer.org  because it sounds good and tells what it does. Bruno.</description>
    </item>
    <item>
      <title>New domain name</title>
      <pubDate>Wed, 23 Dec 2009 00:07:31 GMT</pubDate>
      <dc:creator>Ken Krugler</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/298</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/298</guid>
      <description>We&#39;re trying to pick a domain name to use for the Bixo project. Current options are: * bixo-project.org * openbixo.org * bixominer.org Any input and/or</description>
    </item>
    <item>
      <title>Re: Beginner Question</title>
      <pubDate>Tue, 22 Dec 2009 15:39:30 GMT</pubDate>
      <dc:creator>Ken Krugler</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/297</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/297</guid>
      <description>Hi Bruno, ... You could use the LoadUrlsFunction with an Each() operator to import the URLs from a text file. For example, here&#39;s some code from a white-list</description>
    </item>
    <item>
      <title>Re: Beginner Question</title>
      <pubDate>Tue, 22 Dec 2009 15:17:15 GMT</pubDate>
      <dc:creator>bruno_abitbol</dc:creator>
      <link>http://tech.groups.yahoo.com/group/bixo-dev/message/296</link>
      <guid isPermaLink="true">http://tech.groups.yahoo.com/group/bixo-dev/message/296</guid>
      <description>Hi Ken, thank you for your quick response. ... How can I inject the 50 URLS in the crawler? ... 10000 to 50000 URLS per domain so let&#39;s say a total of 1 500</description>
    </item>

  </channel>
</rss>
<!-- wr1.grp.sp2.yahoo.com uncompressed/chunked Wed Jan  6 01:45:50 PST 2010 -->
