Com utilitzar NLP per resumir i classificar notícies

🧭 Role: Exercici

🗂️ Area: Data Science

📅 Year: 2025

🧩 Stack: Python

🔗 Demo / Repo

📝 Credits: deGalaLab

Results / Insights

* News monitoring

* Trend detection

* Automatic summaries for newsletters

Real-time News Summarization and Classification with NLP and Google Colab

In this project we explore how to apply Natural Language Processing (NLP) to work with real-time news. The goal is to generate automatic summaries and topic classification using the BBC RSS feed and pre-trained models from Hugging Face.

Method

Reading real-time news via RSS.
Automatic summarization with the BART CNN model.
Zero-shot classification into categories such as politics, sports, business, technology, and entertainment.
Presenting results in a clear table with summary and category.

Results

The results show how it is possible to generate useful summaries and reasonably accurate categories in just a few seconds, with very little code.

Title	Summary	Category
Prince Harry meets King Charles…	Prince Harry meets King Charles for the first time since Feb 2024.	World
Gary Lineker ends Ant and Dec’s…	Gary Lineker breaks their 23-year winning streak.	Sport

Live Notebook

Want to try it yourself?
· Open in Google Colab

Full source code also available on GitHub:
· deGalaLab Gist

Conclusions

This experiment demonstrates how Natural Language Processing can be applied to real-time sources, opening the door to applications such as:

News monitoring
Trend detection
Automatic summaries for newsletters

The next step is to extract the full body of the articles and refine the summaries to obtain even more valuable information.

MORE INFO:

What is Hugging Face?

It is a company and open community that offers:

Model library (transformers)
- Python code that lets you load NLP (and other) pre-trained models with just a few lines.
- Example: from transformers import pipeline summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
Hugging Face Hub
- A “GitHub” for AI models: NLP, computer vision, audio, multimodal…
- Thousands of pre-trained models (BERT, GPT-2, T5, Whisper, Stable Diffusion, etc.).
Datasets
- Thousands of datasets to train and test models (news, translations, images…).
Inference API
- You can use models in the cloud without installing anything locally.
Community and Spaces
- Spaces: small web apps where you can test models with an interface (no coding required).
- An active community that shares models and demos.

Why did we use it in this exercise?

Because it provides quick and easy access to:

Automatic summarization → with models like BART or T5.
Zero-shot classification → with BART MNLI.
No need to train from scratch: we just use what is already available.

Mini Hugging Face Guide

Hugging Face Hub

Website: https://huggingface.co/models
Here you can find thousands of models published by companies, researchers, and the community.
You can filter by tasks: summarization, translation, sentiment analysis, text-generation, speech-to-text, image-classification…

Example: facebook/bart-large-cnn (summarization model we used).

Testing models in the browser (no code)

When you open a model page, there is almost always an interactive demo box: paste text → the model returns the summary / classification / translation.
Great for quick tests before moving to Colab or Python.

Using models with Python

First, install the library:

!pip install transformers torch

Example 1: Automatic summarization

from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

text = """The government announced a new economic reform to boost renewable energy investment across Europe..."""

summary = summarizer(text, max_length=50, min_length=20, do_sample=False)
print(summary[0]['summary_text'])

Example 2: Zero-shot classification

from transformers import pipeline

classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

labels = ["politics", "sport", "economy", "technology", "entertainment"]

result = classifier("The government passed a new law on renewable energy.", candidate_labels=labels)
print(result["labels"][0], result["scores"][0])

Datasets in Hugging Face

There is also a dataset section: https://huggingface.co/datasets
Example in Python:

from datasets import load_dataset

dataset = load_dataset("ag_news")
print(dataset["train"][0])

Hugging Face Spaces

https://huggingface.co/spaces
Small apps created with Gradio or Streamlit.
You can test models with a graphical interface (e.g., speech recognition, live translation, etc.).

Inference API (Optional)

Hugging Face provides a cloud API (requires a personal token).
Example:

from huggingface_hub import InferenceApi

inference = InferenceApi(repo_id="facebook/bart-large-cnn")
result = inference("Your long text here...")
print(result)

With this you have the full cycle:

Find model → 2. Test on the web → 3. Run with Python/Colab → 4. Share results.