Week 7: charRNN

Haneen Fathy
3 min readOct 25, 2020

Reflection:

  • Who do we entrust to do this work to?

Everyone. Involve people. Ask them. Get them to understand how machine learning is affecting the world around them. There is a lot to be done at every level and it’s very to important to involve the people who will be impacted by machine learning firsthand. I think all improvements come from breaking down the barrier and demystifying the whole process to allow people to take charge of the things that affect them by participating in data collection and having focus groups where people can discuss their data and its uses amongst other things.

  • How do we deal with credit and attribution of our new creations?

I thought about this question a lot. Who owns the creations and the labor that is carried out by AI? Is it the artist/programmer? Is it the creator of the platform/model? Is it the owners of the data used to create the model? This is not only significant for attribution/ copyright questions but because whoever owns the creation is accountable for it. There are so many possibilities of things that could go wrong and I think it is imperative to recognize who is accountable for that. Since there are many parties involved in creating with machine learning, there needs to be a common ethical framework under which everyone operates. Which brings up an even bigger question of how do you define that ethical framework. I think the first step to answer that is to be clear about who this work can be credited too, and hence, who will be accountable for it.

Coding Exercise:

I trained the charRNN model on different texts. It was a pretty smooth process to run except that I had an error with importing tensorflow. I later realized it was because my python version was too high so I downgraded it. After getting everything set up, I managed to play around with a bunch of different texts. The only obstacle was the time it took to run the model so I had to decrease the amount of text I was giving as input.

My first iteration, I tried to train the model on exported text conversation between me and my friends. The output of that model was pretty nonsensical. I assumed that was because of a few things. First and most importantly, my data was not enough. Secondly, the texts were in both english and arabic transliterated in english so it might have possibly messed up the model. However, I am not sure if the algorithm makes any inferences on the content itself because if not then that probably had no bearing on the outcome. Finally, the text itself contained timestamps for every line that I didn’t have time to clean up so that also played a role.

In my second iteration, I trained the model on a collection of poems by the poet Mahmoud Darwish. Once again the outcome was useless probably because I had very little data.

In my last iteration, I found a dataset that included a bunch of Donald Trump’s rally speeches and trained the model on that. In my first run, I realized that that this would take hours to finish training so I only used about 20% of the dataset to train the model. It worked relatively well and got me excited on how accurate it could be if I used the entire dataset.

Is this Donald Trump or my computer?
Is this Donald Trump or my computer? pt.2

Github repo with model: https://github.com/haneenfathy/charRNN-Trump-Speeches

--

--