Ashmita Bohara

Think about your audience

Think About Your Audience

I can’t even believe it’s been one and half month since my internship started. Past few months have been quite different and an enriching experience for me. There was a time I was so much confused about my career but as my days are passing in internship I can see the difference in myself towards programming. I can see myself growing and learning new terms everyday. Overall, I am learning and enjoying my journey.

I am interning with a Jaeger project under the CNCF tracing. Let’s talk briefly about the Jaeger tracing(; it’s an open source software for tracing transactions between distributed services which is used for monitoring and troubleshooting microservices-based distributed systems. Jaeger basically helps to point out where failures are occurring and what are the main causes for poor performance ultimately resulting in performance and latency optimization, root cause analysis .As an intern, I am trying to take into consideration these traces generated by the services and find some insights into them. I am building an anomaly detection model which will take the streaming data as an input and detect if anything unusual is happening in the system. Basically my system will help to find out the traces that are outliers, identify the trends like increase in traffic, an increase in latency of service over time.

Talking about my internship, at first finding the appropriate dataset that is the same like a production dataset was a difficult task. Instead of searching for a dataset I tried to create one by running Jaeger locally and Jaeger has a sample app HotRod, which consists of several microservices which I used to create sample data. But this data created was nowhere near the production as I was manually creating it. Then as per my mentor suggestion I used another demo application BookInfo….

After getting the data I started to look at the unsupervised learning algorithm as we are going to work on streaming data. After researching and applying multiple clustering algorithms I came across the Amazon SageMaker Random Cut Forest(RCF) algorithm which is designed to detect anomalous data points within a streaming data. At this point I am trying to run this algorithm in my dataset. I will be setting up a repository to hold Jupyter notebooks which contain the different models created by different machine learning algorithms.

Last couple of weeks have been the most exciting and new experience for me. I have learnt how to code better, how to research better, how to read others' code and implement it, how to make connections. Last week I was contacted by one of the non-profit organizations which encourage new commerce into the tech world where they asked me to give a speaker session as an Outreachy intern. Being a part of Outreachy not only has helped me to develop professionally but also personally.