Real-time News Summarization and Classification with NLP and Google Colab
In this project we explore how to apply Natural Language Processing (NLP) to work with real-time news. The goal is to generate automatic summaries and topic classification using the BBC RSS feed and pre-trained models from Hugging Face.
Method
- Reading real-time news via RSS.
- Automatic summarization with the BART CNN model.
- Zero-shot classification into categories such as politics, sports, business, technology, and entertainment.
- Presenting results in a clear table with summary and category.
Results
The results show how it is possible to generate useful summaries and reasonably accurate categories in just a few seconds, with very little code.
Title | Summary | Category |
---|---|---|
Prince Harry meets King Charles… | Prince Harry meets King Charles for the first time since Feb 2024. | World |
Gary Lineker ends Ant and Dec’s… | Gary Lineker breaks their 23-year winning streak. | Sport |
Live Notebook
Want to try it yourself?
· Open in Google Colab
Full source code also available on GitHub:
· deGalaLab Gist
Conclusions
This experiment demonstrates how Natural Language Processing can be applied to real-time sources, opening the door to applications such as:
- News monitoring
- Trend detection
- Automatic summaries for newsletters
The next step is to extract the full body of the articles and refine the summaries to obtain even more valuable information.
MORE INFO:
What is Hugging Face?
It is a company and open community that offers:
- Model library (
transformers
)- Python code that lets you load NLP (and other) pre-trained models with just a few lines.
- Example:
from transformers import pipeline summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
- Hugging Face Hub
- A “GitHub” for AI models: NLP, computer vision, audio, multimodal…
- Thousands of pre-trained models (BERT, GPT-2, T5, Whisper, Stable Diffusion, etc.).
- Datasets
- Thousands of datasets to train and test models (news, translations, images…).
- Inference API
- You can use models in the cloud without installing anything locally.
- Community and Spaces
- Spaces: small web apps where you can test models with an interface (no coding required).
- An active community that shares models and demos.
Why did we use it in this exercise?
Because it provides quick and easy access to:
- Automatic summarization → with models like BART or T5.
- Zero-shot classification → with BART MNLI.
- No need to train from scratch: we just use what is already available.
Mini Hugging Face Guide
Hugging Face Hub
- Website: https://huggingface.co/models
- Here you can find thousands of models published by companies, researchers, and the community.
- You can filter by tasks: summarization, translation, sentiment analysis, text-generation, speech-to-text, image-classification…
Example: facebook/bart-large-cnn (summarization model we used).
Testing models in the browser (no code)
- When you open a model page, there is almost always an interactive demo box: paste text → the model returns the summary / classification / translation.
- Great for quick tests before moving to Colab or Python.
Using models with Python
First, install the library:
!pip install transformers torch
Example 1: Automatic summarization
from transformers import pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
text = """The government announced a new economic reform to boost renewable energy investment across Europe..."""
summary = summarizer(text, max_length=50, min_length=20, do_sample=False)
print(summary[0]['summary_text'])
Example 2: Zero-shot classification
from transformers import pipeline
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
labels = ["politics", "sport", "economy", "technology", "entertainment"]
result = classifier("The government passed a new law on renewable energy.", candidate_labels=labels)
print(result["labels"][0], result["scores"][0])
Datasets in Hugging Face
- There is also a dataset section: https://huggingface.co/datasets
- Example in Python:
from datasets import load_dataset
dataset = load_dataset("ag_news")
print(dataset["train"][0])
Hugging Face Spaces
- https://huggingface.co/spaces
- Small apps created with Gradio or Streamlit.
- You can test models with a graphical interface (e.g., speech recognition, live translation, etc.).
Inference API (Optional)
- Hugging Face provides a cloud API (requires a personal token).
- Example:
from huggingface_hub import InferenceApi
inference = InferenceApi(repo_id="facebook/bart-large-cnn")
result = inference("Your long text here...")
print(result)
With this you have the full cycle:
- Find model → 2. Test on the web → 3. Run with Python/Colab → 4. Share results.