Project Details

A NLP Toolkit project with RSS Feed data

This project consists of the implementation of a semi-automatic text processing chain, from data collection to data presentation.

The project will first ask the question of the linguistic objectives to be achieved and will call on necessary methods and IT tools for their achievement (corpus collection, text standardization, segmentation, labeling, extraction, structuring and presentation of results, etc.).

We are dealing here with a tree structure of RSS feeds from the newspaper Le Monde, collected every day of the year 2018 at 7 p.m. The tree structure contains a directory for each month of the year. Within these daily directories, we find the files that interest us: the RSS feeds for each section, in .xml format.

Here's the website created for this project :
