Personal

These are all projects that I pursued out of personal interest. Everything for each of these projects should have completely public source code : )

blursed.py

December 11, 2019

Just because you can do something doesn’t mean you should! What started off as a (mostly) innocent attempt to mess around with code that modifies code ended as all such things tend to: simultaneously blessed and cursed. Read more about what unfortunately managed to be accomplished with Python codec abuse here here.

Source
Tags: Python, Open Source, Fun, Meta

Blockcard

April 1, 2019

An overcomplicated way to send people birthday cards through the power of blockchain. The repo is an implementation of the accompanying paper here which is probably more interesting than the tooling itself :)

Source
Tags: Python, Open Source, Fun, Tools

Showdown.py

January 10, 2019

A Pokemon Showdown! client for Python 3.4 and 3.5. This was written to make it easier to write bots, interact with users, moderate chat rooms, and collect data. Takes advantages of Py3’s async features to allow multiple connections over websockets, maintain IO channels, and to integrate with existing IRC bots.

Source
Tags: Python, Data Collection, Web, Open Source

Cloud Vision App

April 15, 2018

Designed and built app to recognize objects and text in the user’s field of view and display relevant language translations. Built on top of Android OpenCV and Google’s Vision API. Designed and released overnight for SodaHacks 2018, earning finalist position out of 100+ competitors.

Source
Tags: Java, Android, Open Source, Hackathon

Coursework

Stuff I've done for classes. Sadly most of these aren't publicly available to prevent future students from stealing code, but you can request access from me via email if there's sufficient evidence that you're not currently enrolled in the respective class.

For the sake of thoroughness I included all of the projects from each class I've taken, but some were definitely more interesting implementation and concept wise than others. So for each distinct class I put a nice ⭐ next to my favorite project!

If you are currently enrolled (or plan to) and want to get advice on something course related feel free to drop by my UPE office hours or toss me an email.
In addition to being part of my portfolio, this is also secretly an advertisement for prospective Berkeley CS students. Go bears!

⭐ PintOS

Course: CS162 (Operating Systems)
Term: Fall 2019

Built up handful of handy features from a bare bones (single-process, almost no syscalls) operating system.

  • Implemented userspace
    • Processes, threads, page faults, and appropriate syscalls
  • Implemented thread priority
    • Scheduling, timers, deadlock avoidance in synchronization primitives
  • Implemented a file system
    • Inodes, path resolution, caching, and appropriate syscalls

For prospective students: this class is fun, but fun and difficult aren’t mutually exclusive.

Source | Highly Recommended Reading

Tags: C, Design, Berkeley

⭐ Guavabot Rescue

Course: CS170 (Algorithms)
Term: Spring 2019

CS170 (Algorithms) at Berkeley has a long tradition of handing students an NP-complete problem and holding a competition to see who can produce the best computationally tractable approximation. Guavabot rescue gave a two layered problem:

  • Faulty sensing
  • Steiner-tree approximation

Team “go-bears'; UPDATE Teams SET score = 0 WHERE name != 'go-bears';” got rank 23 out of 320 groups. The important thing, of course, was that countless stranded Guavabots around the world were rescued.

As with other projects I can’t really reveal my group’s approach publicly since this project may resurface in future semesters, but if you’re 1) curious and 2) not planning to enroll then let me know!

Source | Specification

Tags: Python, Berkeley

⭐ End-to-End Encrypted File Sharing System

Course: CS161 (Computer Security)
Term: Spring 2019

Title is a mouthful but that’s what it was officially labeled as! In addition to being a programming assignment this was a major exercise in secure design, so the source also includes a design document detailing the possible attack vectors and appropriate defenses. In particular the system was designed to allow trusted users to share files with each other using a client and an insecure data server. The goal was to allow for client’s to share arbitrarily large files, update + revoke access privileges, and modify files in such a way that it would be computationally infeasible for an attacker to recover information, modify a file silently, or impersonate another user even with full control over the storage server.

Implementation relies on traditional RSA, Argon2 for key derivation + password hashing, and generous applications of HMAC. Sadly still vulnerable to Rubber-hose Cryptanalysis.

Source | Relevant XKCD | Specification

Tags: Go, Design, Berkeley

Penetration Testing

Course: CS161 (Computer Security)
Term: Spring 2019

A close contender for my favorite security project. Students were handed an insecure webserver and tasked to find 7 exploits. It turns out breaking stuff is orders of magnitude more fun than making stuff!

Source | Specification

Tags: Python, Web, Berkeley

⭐ RISC-V CPU

Course: CS61C (Computer Architecture)
Term: Fall 2018

A two-stage pipelined CPU that runs the RISC-V instruction set (thanks David Patterson!). This was actually a real doozy that would take a while to list out but in summary:

  • Full Arithmetic Logic Unit built from ground-up with logic gates
    • Including support for word, half-word, and byte level instructions
  • Memory, branching, and instruction decoding
  • 2-stage (instruction-fetch, execute) pipelining
    • And cases for all the wonderful side-effects that crop out in conjunction with branching :’ )
  • Lifetime supply of unit and integration tests

Source | Specification

Tags: Berkeley

Bear Maps

Course: CS61B (Data Structures)
Term: Spring 2018

This project consisted of three main parts:

  • Rasterize map tiles of backend and serve to the front end
  • Modify an XML parser to convert OpenStreetMap data for the city of Berkeley into appropriate data structures on the backend
  • Implement A* search to provide navigation directions

Source | Specification

Tags: Java, Berkeley, Visualization, Web

Professional

Stuff I've done for paid positions. Obviously most of this isn't open sourced so this section is more or less just a quick overview of what my role with each organization was.

Data Engineer Intern @BOLD

June 2019 - August 2019

I interned at the career service company BOLD over the summer of 2019. This job was a really nice extension of some of the embedding projects I was doing earlier in the year for research and it was great to see the technique cropping on in an industry application!

Summary

  • Built backend of semantic search and recommendation system for LiveCareer’s job matching service, incorporated into data pipeline
  • Worked closely with data science team to build set of offline evaluation measures for information retrieval
  • Benchmarked new system against existing one, verified improved relevance and ~80% reduction in preprocessing time
Tags: Web, NLP

SWE Intern @California PATH

April 2019 - June 2019

I interned with Berkeley’s advanced transportation group PATH on their integrated dynamic transit operations project, which is currently partnering with BART and Alameda County to support enhanced transit operations including dynamic routing, ridesharing, and strategic dispatch. I worked on the interface for driver’s to receive and manage requests and the associated backend components to log responses and synchronize with dispatchers.

Summary:

  • Designed and built flexible user interface for bus drivers to view and manage requests from potential passengers and affiliated transit agencies for dynamic routing
  • Integrated IoT input devices with onboard display device and server to create physical interface for operators
Tags: Transportation

Backend Developer @ Berkeley Institute of Transportation Studies

November 2018 - May 2019

Worked in the Transportation Sustainability Research Center on data collection and APIs for two main projects:

  • American Truck Parking is a real-time parking database for long haul truckers along major U.S. highways. I automated some of the data entry tasks and added dynamic parking updates for ≈100 new parking locations. If you’re interested in policy this story explains the motivation for this project.

  • The Mobility Data Specification is trying to standardize an API for “dockless” bikes and scooters as an extension of the General Bikeshare Feed Specification. I worked on archiving data from these public feeds tracking the movements of over 10,000 shared vehicles in the U.S. and Europe

Tags: Transportation

Math Tutor @Mathnasium

August 2016 - August 2017

Tutored K-12 kids in math and spent a lot of time trying to ascertain the motives behind common core curriculum.

Tags: Teaching

Data

Some datasets, dataviz, and data collection tools I've worked on mostly at my time with the Berkeley Neuroecon lab.

Word Embedding Visualizer

November 25, 2019

Website that takes in user inputted tokens, performs a lookup for their corresponding embedding, performs PCA, and renders back the main components and their closest descriptions to the front-end. Intended to be used as a tool to introduce natural language embeddings to MBA students in MBA 261 (Graduate Marketing Research).

See the README of the Github repo for more details.

Source
Tags: NLP, Visualization

CMU Dict Grapheme to Phoneme Alignments

November 24, 2019

Grapheme to phoneme alignments for the Carnegie Melon’s pronouncing dictionary data set. Alignments were produced using Phonetisaurus. I couldn’t find an open version of the most recent set of alignments anywhere, so I produced this to save some other people some time.

A grapheme to phoneme alignment maps letters (graphemes) to their sounds when spoken (phonemes). For example the graphemes H,E,LL,O get aligned to the sounds HH,EH0,L,OW1 where the sounds are represented using ARPAbet notation.

Source
Tags: NLP, Dataset

Stormfront Posts

November 23, 2019

Stormfront is a white nationalist, white supremecist internet forum. For hopefully obvious reasons the views expressed by the forum posts in this dataset are not representative of my own.

Compressed size: 1.9GB
Uncompressed size: 5.0GB

Source
Tags: NLP, Dataset

News Scrapers

November 23, 2019

A handful of specific scraping scrapes for news and social media websites. Done mainly to collect data for word embedding models for use in the Neuroeconomics lab.

Source
Tags: NLP, Data Collection, Open Source

Fox News Articles

November 23, 2019

Fox News is an American pay television news channel. This dataset consists of the text content and metadata of all public web articles published between roughly 2008 and January 2019 on the Fox News and Fox Business websites in CSV format.

Compressed size: 1.6GB
Uncompressed size: 4.9GB

Source
Tags: NLP, Dataset