masete · reiosantos · Oct 22, 2019 · Oct 23, 2019 · Oct 23, 2019 · Oct 23, 2019
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,10 @@
+/__pycache__/
+venv
+../.env
+.DS_Store
+.vscode
+dump.rdb
+/.idea/
+/myenv
+myenv/
+/../.idea/
diff --git a/README.md b/README.md
@@ -1,2 +1,167 @@
-# Machine-Learning-
-Machine Learning Question and Answer on pollicy
+# Policy Question Answering System
+
+A web application that takes a user's question, surveys open access research articles about a given policy, and returns an answer to the user.
+
+# Project Overview
+
+Every year, millions of research articles on different policies and their consequences are published. Each of these is a rich source of information that can help policy advisors in determining the appropriate policies to implement. This application carries out a number of functions.
+
+1. Provide a chatbot that the user interacts with.
+2. Based on the user's answers, generate a question about a given policy consequence.
+3. Process the question to obtain keywords and then use these keywords to fetch open access research articles about the policy.
+4. Perform claim detection on the abstract of each article to determine whether the article found evidence for or against a given policy consequence.
+5. Count the total number of articles that are for or against a given policy consequence.
+6. Display this data in a visualization and also provide short summaries of the policy and its consequence.
+
+# Running locally
+
+Make sure that you have Redis, Python 3.6+, pip, and virtualenv installed on your computer.
+
+Clone the git repository and change into the top directory.
+
+```
+git clone https://github.com/Prosper21/Policy-Question-Answering-System
+cd Policy-Question-Answering-System
+```
+
+Create a virtual environment and activate it.
+
+```
+virtualenv venv
+. venv/bin/activate
+```
+
+Install all the required packages by running the command:
+
+```
+pip install -r requirements.txt.
+```
+
+Create a .env file to store your environment variables. In our case, this would have the following values.
+
+```
+SECRET_KEY = 'xxx' # Use your own secret key here
+REDIS_URL = 'redis://localhost:6379'
+```
+
+We now have all the files and packages that we need to run the application on our local host.
+In one terminal, start the redis server by running:
+
+```
+redis-server
+```
+
+In a second terminal, change into the Policy-Question-Answering-System directory, activate the virtual environment, and start the celery workers by running:
+
+```
+celery worker -A answer_policy_question.celery -O fair
+```
+
+In a third terminal, change into the Policy-Question-Answering-System directory, activate the virtual environment, and start your application by running:
+
+```
+python app.py.
+```
+
+You should see the application running and you can access it on localhost:5000.
+
+# Deploying on Heroku
+
+Open an account on Heroku at <https://signup.heroku.com/>.
+
+Install the Heroku CLI on your computer and carry out any other required configurations. The details can be found here <https://devcenter.heroku.com/articles/heroku-cli>.
+
+Clone the git repository and change into the top directory.
+
+```
+git clone https://github.com/Prosper21/Policy-Question-Answering-System
+cd Policy-Question-Answering-System
+```
+
+Create an app on Heroku, which prepares Heroku to receive your source code.
+
+```
+heroku create
+```
+
+In this case, heroku generates a random name for your app. You can also provide your own name by running:
+
+```
+heroku create <app-name>
+```
+
+Because this application uses Heroku addons, we need to create those first. In our case, we need the redis addon. We do this by running:
+
+```
+heroku addons:create heroku-redis:hobby-dev
+```
+
+This will automatically set the REDIS_URL configuration variable for our application but we also need to set up our SECRET_KEY configuration variable. We do this by running:
+
+```
+heroku config:set SECRET_KEY = 'xxx' # Use your own secret key here
+```
+
+We can now deploy to Heroku by running:
+
+```
+git push heroku master
+```
+Go to the Heroku dashboard and makes sure that your app's web and worker dynos are both switched on.
+
+You can now visit the app at the url generated by its name i.e. herokuapp.app-name.com or simply open the app by running:
+
+```
+heroku open
+```
+
+# Sills Needed
+
+* Python 3.6+
+* JavaScript/HTML/CSS
+* Heroku
+
+# Notes
+
+If you make any changes for example by installing new packages and running 
+```
+pip freeze > requirements.txt
+```
+then you will have to add 
+```
+https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm- 2.1.0.tar.gz#egg=en_core_web_sm
+```
+to your requirements.txt. This ensures that the required spaCy model is loaded. If this causes a 'Double requirement given' error upon deployment, look for
+
+```
+en-core-web-sm=='x.x.x'
+```
+
+in your requirements.txt file and delete it.
+
+# References 
+
+Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web." Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.
+
+# TODO
+
+## Front-End
+
+### Chatbot UI
+
+At the moment, the chatbot assumes that the user provides very specific replies rather than full sentences. For example, when the bot asks 'What policy would you like to research today?', it expects a straight answer like 'cap and trade' rather than 'I would like to research cap and trade'. The same applies to the question 'And what effect of cap and trade on carbon emissions are you interested in?' where we expect an answer like 'carbon emissions' rather than 'I would like to know its effect on carbon emissions'. The reason for this is that the user's replies are being handled using JavaScript in the HTML files that render the page and thus there is no way to apply natural language processing techniques to automatically identify the policy or phenomenon of interest from full sentence replies. I have not yet figured out a way to do the processing of full sentence replies from the backend while maintaining a conversational flow but I believe this can be done.
+
+Another possible improvement is moving all the JavaScript code that is in the HTML files to the JavaScript folder in the static directory. The recommended practice is that JavaScript and HTML code should not me mixed.
+
+## Back-End
+
+### Duplicates
+
+Since we are fetching abstracts from three APIs, there is a chance of the same abstract appearing more than once in our results. It would be a good idea if these duplicates could be removed but some times, an extra character or space in the duplicate essentailly makes the two versions of the abstract different. I think one could use regex to clean up the abstracts and then be able to identify the duplicates.
+
+
+### Claim detection
+
+At the moment, the claim detection algorithm is still a work in progress and could be improved durther for better results.
+
+
diff --git a/dar/.gitignore b/dar/.gitignore
@@ -0,0 +1,10 @@
+../__init__.py
+/__pycache__/
+__init__.pyc
+run.pyc
+config/__init__.pyc
+config/config.pyc
+.env
+/.idea/
+config/__pycache__/
+views/__pycache__/
diff --git a/dar/Procfile b/dar/Procfile
@@ -0,0 +1,2 @@
+web: python clear_redis.py && python app.py
+worker: celery worker -A answer_policy_question.celery -O fair --loglevel=INFO --concurrency=2 --max-tasks-per-child=1
diff --git a/dar/__init__.py b/dar/__init__.py
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		web: python clear_redis.py && python app.py
		worker: celery worker -A answer_policy_question.celery -O fair --loglevel=INFO --concurrency=2 --max-tasks-per-child=1