YaCy is a personal Web crawler and Web search engine. It's also a P2P-based Web index exchange network without a central server and no censoring possibility. Web crawls can be done locally, or you can trigger a collaborative Web crawl with all other YaCy peers. YaCy is fun to use and shows interesting text, image, audio and video search results with direct links to Ogg, MP3, and video files. It has a cooperative bookmark system and many Web publishing functions.
| Tags | Communications File Sharing Information Management Internet Proxy Servers Web HTTP Servers Indexing/Search DNS Dynamic Content Networking |
|---|---|
| Licenses | GPL |
| Operating Systems | OS Independent |
| Implementation | Java |
Recent releases


Changes: The full international character set and all UTF-8 characters are now supported for indexing and search. Support has been added for site:, inurl:, and filetype: operator search. A public API has been added to the search results, the indexing, and link structure in XML and JSON syntax.


Changes: This is a quick release, with a lot of security fixes and bugfixes.


Changes: Automatic re-crawling and a combination of Crawls and Bookmarks has been added. It's now possible to customize a personal search portal with YaCy. The functional range for Windows users has been enhanced.


Changes: The IP and seed handling were improved. Crawl starting was slightly changed. The basic configuration is very easy now, as a result of changes to the authorization mechanism. The way to define and switch networks was improved. YaCy is now SRU compliant. There is ongoing work to the YaCy-UI rich client. In addition, some minor security vulnerabilities have been fixed and a lot of bugfixes have been made.


Changes: Some minor security vulnerabilities have been fixed. Some bugfixes have been made.
- All comments
Recent commentsRe: YAcY is a badly behaved robot
Both is not true:
1) YaCy respects the robots.txt since mid of 2005, it never ignored robots.txt on purpose. At this time it was simply the first time implemented.
2) There is no referrer spam. YaCy shows that the page was indexed by a YaCy peer. Since the corresponding web page is referenced then not only by this peer, but by all peers, there must be a central address where a referred page must see that it was referenced by a non-centralized web crawler. This is a unique problem that other centralized crawlers do not have. In this case YaCy is just honest an references to the YaCy project page. This feature was removed with YaCy 0.43 because of too many people had been confused with this referrer.
Re: YAcY is a badly behaved robot
> 1. YAcY doesnt ask for robots.txt, let
> alone follow it.
> 2. YAcY posts the yacy web address as
> the HTTP Refer[r]er header similar to
> spam bots.
This issues have been resolved for some time now.
YAcY is a badly behaved robot
1. YAcY doesnt ask for robots.txt, let alone follow it.
2. YAcY posts the yacy web address as the HTTP Refer[r]er header similar to spam bots. Well behaved bots may put their url into the Agent header.
I only came across this project whilst researching against HTTP Referrer spammers, nice idea - shame about the implementation.