Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente | ||
en:temoignages:moteur_orange [2011/07/06 21:51] daamien |
en:temoignages:moteur_orange [2011/07/06 23:56] (Version actuelle) daamien |
||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
====== France Telecom' | ====== France Telecom' | ||
- | //In the Following | + | <note warning> |
+ | This translation is not finished yet ! Please do not forward this text until it gets properly reviewed... | ||
+ | Please send your comments at : damien@dalibo.info | ||
+ | </ | ||
- | France Telecom has build a web search engine called "Le Moteur" | + | //In the following text, Severine Aubry and Robin POUGET, who are respectively Project Manager and Database Manager at France Telecom, demonstrate their use of PostgreSQL for a major tool of they company.// |
- | This back office must be refreshed in 24hours/24 and 6days/7. | ||
+ | [[http:// | ||
+ | This part of the search engine was build between 2001 and 2002, exclusively with PostgreSQL. At that time, the Postgres project was about to released its versionn 7.4. Nowadays the project is based on PostgreSQL 8.2. Of course, the back office has seen some improvements and fixes over the years. | ||
+ | In details, the engine is composed of a crawler, whose charge is to browse the Internet through a list of URLs and thousands of key sites. It follows links automatically. This data is then processed through several scripts written in TCL. Data is stored in a schema with a few thousand tables. The partitioning is based on technology developed internally, based on hash keys. | ||
- | In other words, we can tolerate | + | From the start, the goals of the project were to have stable and robust solution with an easy maintenance and the ability to handle |
- | The "back office" | + | Here's some figures to describe |
- | In detail, the engine | + | The entire application |
- | The charge of " | ||
- | This data is then processed through several scripts written | + | There has been in the history of the project a few minor issues that were fixed by the community, whose support was effective. Among these issues, there were : |
- | Data is stored in a schema | + | * Data fragmentation due to massive updates. This has been fixed with the new features of PostgreSQL over the years (remember the project has been running PostgreSQL for 10 years ! ) |
+ | * Some concern with memory management that were corrected in version 8 | ||
+ | * VACUUM FULL are now almost ancient history. At first, the system needed 3 or 4 VACUUM FULL per year. Now only 1 is enough. | ||
- | The thrust of the project are from the beginning to have a solution: | + | In conclusion, PostgreSQL has been a satisfaction for over 10 years. |
- | stable | ||
- | robust | ||
- | very maintainable | ||
- | scalable, capable of cashing a soaring volume and therefore | ||
- | free from the problems of disk space, or the number of servers needed PostgreSQL | ||
- | |||
- | Some figures: | ||
- | |||
- | 5 billion and tuples are distributed | ||
- | 160 machines that are home to 800 servers PostgreSQL on Linux, | ||
- | for a total volume of 24 terabytes | ||
- | Note that PostgreSQL is not only running on these machines, there are also applications | ||
- | |||
- | The machines are spread over three geographical sites, with a logical division, called " | ||
- | |||
- | There are data export this set to other data servers for various uses. | ||
- | |||
- | This results in high flexibility of the entire application: | ||
- | |||
- | There has been in the history of the project a few minor issues that were fixed by the community, whose support is deemed " | ||
- | |||
- | problems of fragmentation of data, linked to massive UPDATE. This has been fixed with the new features of PostgreSQL over the years (remember the project over 10 years with PostgreSQL!) | ||
- | |||
- | Version 8 has corrected any concerns in terms of memory management; | ||
- | |||
- | VACUUM FULL's almost ancient history. We went into this project from March to April VACUUM FULL to 1 year only. | ||
- | |||
- | In conclusion, PostgreSQL gives satisfaction here over 10 years. The few problems encountered were all treated with the utmost effectiveness of the community. New versions of PostgreSQL have brought more or solutions to these problems, or improvements or simplifications. | ||
- | |||
- | [Interview by Jean-Paul Argudo, March-June 2011] | ||
- | Séverine AUBRY and Robin POUGET témoignent de l' | ||
- | |||
- | Il s'agit du moteur de recherche “Le moteur”: http:// | ||
- | |||
- | La criticité de cette application est haute car elle conditionne la qualité des documents indexés dans le moteur de recherche. | ||
- | |||
- | Ce “back office” a une contrainte de rafraichissement en 24/7, mais 6 jours sur 7 uniquement. En d' | ||
- | |||
- | La partie “back office” de ce moteur de recherche a été réalisée entre 2001 et 2002, exclusivement avec PostgreSQL. À cette époque, le projet a débuté avec la version 7.4. Aujourd' | ||
- | |||
- | Dans le détail, le moteur est composé d'un “crawler”, | ||
- | |||
- | La charge du “crawler” est donc de récupérer des URLs et un contenu associé. | ||
- | |||
- | Ces données sont ensuite traitées via plusieurs scripts, écrits en TCL. | ||
- | |||
- | Les données sont stockées dans un schéma comportant quelques milliers de tables. Le partitionnement repose sur une technologie développée en interne, basée sur des clés de type hash. | ||
- | |||
- | Les idées-force du projet, sont depuis le début, d' | ||
- | |||
- | stable | ||
- | robuste | ||
- | très maintenable | ||
- | scalable, propre à encaisser une volumétrie galopante et donc, | ||
- | affranchie des problèmes d' | ||
- | |||
- | Quelques chiffres: | ||
- | |||
- | 5 milliards de tuples sont ainsi répartis sur | ||
- | 160 machines qui abritent 800 serveurs PostgreSQL, tous sous Linux, | ||
- | pour une volumétrie totale de 24 téra-octets | ||
- | à noter qu'il n'y a pas que PostgreSQL qui fonctionne sur ces machines, on y trouve aussi des applications | ||
- | |||
- | Les machines sont réparties sur 3 sites géographiques, | ||
- | |||
- | Il existe des exports de données de cet ensemble vers d' | ||
- | |||
- | Il en résulte une extrême souplesse de l' | ||
- | |||
- | Il y a eu dans l' | ||
- | |||
- | des problèmes de fragmentation de données, liés à des UPDATE massifs. Cela a été corrigé grâce aux nouvelles fonctionalités de PostgreSQL au fil des ans (pour rappel le projet a plus de 10 ans avec PostgreSQL!); | ||
- | |||
- | la version 8 a permis de corriger des soucis au niveau de la gestion de la mémoire; | ||
- | |||
- | les VACUUM FULL sont presque de l' | ||
- | |||
- | En guise de conclusion, PostgreSQL donne ici entière satisfaction depuis plus de 10 ans. Les quelques soucis rencontrés ont tous été traités avec la plus grande efficacité de la communauté. Les nouvelles versions de PostgreSQL ont de plus apporté, soit des solutions à ces problèmes, soit des améliorations, | ||
- | |||
- | [Propos recueillis par Jean-Paul Argudo, de mars à juin 2011] | ||
+ | // |