The News Research Tool is a Streamlit-based web application designed to extract, process, and analyze news articles. Users can input URLs of news articles, and the application will create embeddings, build a vector store using FAISS, and enable retrieval-based question answering (QA) with sources.
- Extract content from news article URLs.
- Split content into manageable chunks for embedding.
- Generate embeddings using OpenAI models.
- Store and load embeddings with FAISS.
- Perform question answering with sources using LangChain.
- Display answers and reference sources interactively.
- Python 3.9 or higher
- OpenAI API key (for embeddings and LLM usage)
The following libraries are required:
streamlitlangchainopenaipicklefaisspython-dotenv
Install dependencies using:
pip install streamlit langchain openai faiss-cpu python-dotenv-
Clone the Repository
git clone https://github.com/your-username/news-research-tool.git cd news-research-tool -
Set Up Environment Variables Create a
.envfile in the project directory and add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key -
Run the Application Start the Streamlit app:
streamlit run main.py
-
Interact with the Tool
- Enter up to three news article URLs in the sidebar.
- Click the "Process URLs" button to load and process the articles.
- Enter a question in the text box to retrieve answers and sources.
-
Load and Process URLs
- The
UnstructuredURLLoaderfetches article content from the provided URLs. - The content is split into chunks using the
RecursiveCharacterTextSplitter.
- The
-
Generate Embeddings
- OpenAI embeddings are created using the
OpenAIEmbeddingsclass. - FAISS stores the embeddings in a local vector index (
faiss_index).
- OpenAI embeddings are created using the
-
Question Answering
- The FAISS index is used as a retriever via the
vectorstore_openai.as_retriever()method. - The
RetrievalQAWithSourcesChainuses an OpenAI LLM to generate answers and cite sources.
- The FAISS index is used as a retriever via the
-
Display Results
- The tool displays the answer and sources in an interactive Streamlit interface.
- Document Loader: Extracts article content using
UnstructuredURLLoader. - Text Splitter: Splits the text into smaller, overlapping chunks for embeddings.
- Embeddings: Generates vector embeddings using OpenAI's models.
- Vector Store: Stores embeddings using FAISS for fast retrieval.
- Retrieval Chain: Uses LangChain's
RetrievalQAWithSourcesChainfor answering questions.
The tool relies on pickle-based deserialization for loading the FAISS index. This can pose security risks if the pickle file is tampered with. To mitigate this:
- Only use pickle files you trust.
- Deserialization is explicitly enabled with
allow_dangerous_deserialization=True.
- Support for additional languages.
- Integration with more advanced LLMs or embedding models.
- Automated summarization of articles.
- Enhanced UI/UX for better user interaction.
Contributions are welcome! Feel free to open issues or submit pull requests to improve the tool.
This project is licensed under the MIT License. See the LICENSE file for details.
Enjoy researching your favorite news articles with the News Research Tool! 🎉