The goal is to leverage NLP to "watch" the Fed's commentary to understand key parameters in their policy decision making. We always read or listen to follow the Fed. However, we may miss part of the information especially how things change over time. We can leverage NLP to augment our tracking to "quantify" any important changes that we should be flagged.
You can use any public data you wish. Below is one of the sources to consider:
The Fed's site contains some of the key communicated materials. In this case, communications are stored under "Statement", "Press conference", and "Minutes" (PDF/HTML). Again, you don’t have to limit your data sources with this. For example, other regional Feds' websites may have useful information. You can certainly utilize other platforms such as Twitter.
Perform an analysis for 2019-current.You can be creative about the analysis. The goal is to identify key topics/mentioned/sentiment over time with an NLP related approach. The process should be systematic as possible from the data scraping step
It includes two parts: coding scripts and a 1–3-page summary report.
For the coding and analysis part, Python is preferred. Please zip the files or save them on GitHub.
The report summary should contain but not limited to key changes identified in the Fed's mentioned/tone/key parameters in their policy reaction function – and anything you think that's relevant to your conclusion. Your conclusion can be indicating key changes over time that we should pay attention to, or indicating the Fed's becoming more hawkish or less, or indicating if the Fed's policy pivots towards the tighter or looser side (e.g., likely to hike or cut policy rates going forward).
Also consider including key visualizations to support your conclusion.
We take all the urls of the pdf from the fed website starting from 2017. We use requests library, which is built-in in python. We fetch and save all the pdfs in our local directory. Now to read pdf and store them as string in python program we use PyPDF2 library which is really helpful. Now we have text of each pdf file along with the year, month and day(year month day is the name of the pdf file) it was released on. We do some preprocessing like removing newlines and tabs, removing accented character, lower casing all the capital letters in the text. Finally we come to our final point of Sentiment Analysis.
For this, here we use unsupervised lexicon based approach based on Vader lexicon for sentiment analysis. So we have a group of words which indicate positive(beautiful, great, etc.) and group of words which indicate negative (bad, etc.). These are known here as lexicon.
The Vader Sentiment algorithm provides us these lexicons, and we use it to convert each text data for a particular date to be contain just these lexicons. We do this, since to get better sentiment results like whether the sentiment is positive or negative, else since these are official statements by the Feds, for large pdf, say 4 to 5 pages most of the words would be neutral thus reducing our ability to analyse a bit. Now we call the Vader algorithm’s score function to know whether a statement(pdf, or text now as it is stored in python variable) is now net-net negative or positive. Now we find the average value of these scores, i.e., we find average value for negative scores and average value for positive scores. So now how much on each a particular month(in which the statement was released) the statement by the fed change in percentage from the average value. We can understand the above better, by plotting the graph as shown below:-
Now the sharp increase and decrease of positive and negative scores from their average indicate some important changes happening at that time. For example, from the whole data which is from 2017 to 2022, we can see there is something important happening in the timeline June 2019 to March 2021.
Now we also know that during this period Covid-19 also came. During this period we can clearly see, that percentage change in positive score is in negative and % change in negative score has jumped very high(going upto 50% !!!). Now if we see the most repeated negative lexicons/words during this period then they are:-
Debt, pressures, lower, crisis, risks, demand, weaker, low, unemployment, virus, etc.
So these thing were a point of high concern during this period.
So this way we have used nlp to know important changes in some points of history just by looking at a graph.
To get complete solution of this using python you can send your details at: