Download List

프로젝트 설명

Yioop! is a PHP search engine. Yioop! can be configured as either a general purpose search engine for the whole Web or it can be configured to provide search results for a set of URLs or domains. Yioop can crawl pages or can directly index archives such as ARC and WARC. It supports indexing several file formats such as HTML, Atom, PDF, DOC, PPT, RTF, RSS, XML, SVG, PNG, JPG, BMP, GIF, and sitemaps. The Yioop! crawler can be deployed on one or many machines. It supports having one or more to crawl scheduler processes, as well as multiple fetchers and mirrors. Crawling respects robots.txt including Crawl-delay. Yioop! crawls are stored in a Web archive format that is easy to move around. Crawling can be done on one machine and the results deployed elsewhere. Yioop! supports mixing of crawls. Yioop! comes with a search front end that can be localized as desired using a GUI. This GUI supports RTL languages. Management of crawls can also be done using this GUI. Yioop! can be configured in a straightforward manner to make use of file caching or memcache if available.

System Requirements

System requirement is not defined
Information regarding Project Releases and Project Resources. Note that the information here is a quote from Freecode.com page, and the downloads themselves may not be hosted on OSDN.

2011-12-08 08:57 Back to release list
0.80

이 버전에는 시작, 중지 및 큐 서버 및 웹 인터페이스에서 fetchers 로그 파일 보기 지원 있습니다. 웹 인터페이스를 통해 활성 크롤링에 새 Url을 삽입할 지금 하나. 이 버전 Yioop의! 일 수가 고정된 된 후 페이지의 re-crawling를 지원 합니다. 또한, 크롤링 파일 확장명, 바이트 수가 페이지당 다운로드 어떻게 Yioop! 구성 요소 이제 모두 제어할 수 있습니다 그냥 config.php 파일 보다 웹 인터페이스를 통해 다른 페이지 무게. 또한 향상 되었습니다 HTML 프로세서 인덱스 텍스트를 추출 하는 방법에.
Tags: Minor
This version supports starting, stopping, and viewing log files of the queue server and fetchers from a Web interface. One can now inject new URLs into an active crawl via a Web interface. This version of Yioop! supports re-crawling of pages after a fixed number of days. Also, the file extensions that are crawled, the number of bytes downloaded per page, and how Yioop! weighs different page components can now all be controlled through a Web interface rather than just the config.php file. Improvements have also been made to how HTML Processor extracts text to index.

Project Resources