arachne.git
10 years agoLog hash after fetching. master
David ‘Bombe’ Roden [Mon, 16 Mar 2009 23:10:59 +0000 (00:10 +0100)]
Log hash after fetching.

10 years agoDon’t try to parse when no parser was found.
David ‘Bombe’ Roden [Mon, 16 Mar 2009 23:10:44 +0000 (00:10 +0100)]
Don’t try to parse when no parser was found.

10 years agoUse different executor for URL fetcher.
David ‘Bombe’ Roden [Mon, 16 Mar 2009 23:10:09 +0000 (00:10 +0100)]
Use different executor for URL fetcher.

10 years agoRemember crawled pages and don’t crawl them again.
David ‘Bombe’ Roden [Mon, 16 Mar 2009 23:09:47 +0000 (00:09 +0100)]
Remember crawled pages and don’t crawl them again.
Don’t add pages that are already scheduled for crawling.

10 years agoDon’t add a page if it’s not a valid freenet page.
David ‘Bombe’ Roden [Sun, 15 Mar 2009 23:44:09 +0000 (00:44 +0100)]
Don’t add a page if it’s not a valid freenet page.

10 years agoAdd collected pages after URL fetcher is done.
David ‘Bombe’ Roden [Sun, 15 Mar 2009 23:41:31 +0000 (00:41 +0100)]
Add collected pages after URL fetcher is done.

10 years agoImplementated adding a page by URL.
David ‘Bombe’ Roden [Sun, 15 Mar 2009 23:41:18 +0000 (00:41 +0100)]
Implementated adding a page by URL.

10 years agoFix javadoc tags.
David ‘Bombe’ Roden [Thu, 12 Mar 2009 10:07:06 +0000 (11:07 +0100)]
Fix javadoc tags.

10 years agoAdd SQL table creation scripts.
David ‘Bombe’ Roden [Tue, 10 Mar 2009 22:14:19 +0000 (23:14 +0100)]
Add SQL table creation scripts.

10 years agoUse a default parser factory in the core and hand it in to every URL fetcher.
David ‘Bombe’ Roden [Tue, 10 Mar 2009 22:11:13 +0000 (23:11 +0100)]
Use a default parser factory in the core and hand it in to every URL fetcher.

10 years agoAdd parser factory.
David ‘Bombe’ Roden [Tue, 10 Mar 2009 17:39:09 +0000 (18:39 +0100)]
Add parser factory.

10 years agoUse lower-case project name.
David ‘Bombe’ Roden [Tue, 10 Mar 2009 08:07:41 +0000 (09:07 +0100)]
Use lower-case project name.

10 years agoRemember title when parsing URL.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 23:27:24 +0000 (00:27 +0100)]
Remember title when parsing URL.

10 years agoRemove core dependency.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 23:26:43 +0000 (00:26 +0100)]
Remove core dependency.

10 years agoMove methods to convert to and from URL to Page.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 23:24:58 +0000 (00:24 +0100)]
Move methods to convert to and from URL to Page.

10 years agoCreate message digest in core and hash fetched URLs.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 23:12:41 +0000 (00:12 +0100)]
Create message digest in core and hash fetched URLs.

10 years agoAdd new links to core.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 23:12:09 +0000 (00:12 +0100)]
Add new links to core.

10 years agoUse new method names from Page and Edition.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 23:11:01 +0000 (00:11 +0100)]
Use new method names from Page and Edition.

10 years agoRename method “getSite” to “getEdition”.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 23:10:09 +0000 (00:10 +0100)]
Rename method “getSite” to “getEdition”.

10 years agoRename member “edition” to “number”.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 23:09:47 +0000 (00:09 +0100)]
Rename member “edition” to “number”.

10 years agoSimplify link title extraction.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 20:31:58 +0000 (21:31 +0100)]
Simplify link title extraction.

10 years agoBasic implementation of HTML parser.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 20:24:39 +0000 (21:24 +0100)]
Basic implementation of HTML parser.

10 years agoFix construction of URL from page.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 20:06:51 +0000 (21:06 +0100)]
Fix construction of URL from page.

10 years agoFix validation of input parameter.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 20:06:41 +0000 (21:06 +0100)]
Fix validation of input parameter.

10 years agoAdd method to set the node’s hostname.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 20:06:15 +0000 (21:06 +0100)]
Add method to set the node’s hostname.

10 years agoUse HTMLEditorKit parser in URL fetcher.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 20:05:47 +0000 (21:05 +0100)]
Use HTMLEditorKit parser in URL fetcher.

10 years agoEnhance parser listener with title attributes for links.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 20:04:44 +0000 (21:04 +0100)]
Enhance parser listener with title attributes for links.

10 years agoAdd charset parameter to parse method.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 20:03:12 +0000 (21:03 +0100)]
Add charset parameter to parse method.

10 years agoStub implementation of page fetching.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 17:01:56 +0000 (18:01 +0100)]
Stub implementation of page fetching.

10 years agoAdd HTML parser stub.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 17:01:35 +0000 (18:01 +0100)]
Add HTML parser stub.

10 years agoAdd URL fetcher.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 17:01:25 +0000 (18:01 +0100)]
Add URL fetcher.

10 years agoAdd interfaces for content parser.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 16:57:24 +0000 (17:57 +0100)]
Add interfaces for content parser.

10 years agoIgnore “.settings” directory.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 14:44:42 +0000 (15:44 +0100)]
Ignore “.settings” directory.

10 years agoIgnore “bin” directory.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 14:44:23 +0000 (15:44 +0100)]
Ignore “bin” directory.

10 years agoFirst implementation of the Arachne core.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 14:43:58 +0000 (15:43 +0100)]
First implementation of the Arachne core.

10 years agoOverride toString() to output sensible information.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 14:35:05 +0000 (15:35 +0100)]
Override toString() to output sensible information.

10 years agoUse an edition instead of a site.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 14:11:00 +0000 (15:11 +0100)]
Use an edition instead of a site.

10 years agoAdd container for an edition.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 14:10:49 +0000 (15:10 +0100)]
Add container for an edition.

10 years agoAdd validation.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 13:58:22 +0000 (14:58 +0100)]
Add validation.

10 years agoAdd page container.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 13:40:38 +0000 (14:40 +0100)]
Add page container.

10 years agoAdd site container.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 13:38:33 +0000 (14:38 +0100)]
Add site container.

10 years agoAdd Eclipse project files.
David ‘Bombe’ Roden [Mon, 9 Mar 2009 13:25:07 +0000 (14:25 +0100)]
Add Eclipse project files.