PersonalThese are all projects that I pursued out of personal interest. Everything for each of these projects should have completely public source code : )
Just because you can do something doesn’t mean you should! What started off as a (mostly) innocent attempt to mess around with code that modifies code ended as all such things tend to: simultaneously blessed and cursed. Read more about what unfortunately managed to be accomplished with Python codec abuse here here.
Tags: Python, Open Source, Fun, Meta
An overcomplicated way to send people birthday cards through the power of blockchain. The repo is an implementation of the accompanying paper here which is probably more interesting than the tooling itself :)
Tags: Python, Open Source, Fun, Tools
A Pokemon Showdown! client for Python 3.4 and 3.5. This was written to make it easier to write bots, interact with users, moderate chat rooms, and collect data. Takes advantages of Py3’s async features to allow multiple connections over websockets, maintain IO channels, and to integrate with existing IRC bots.
Tags: Python, Data Collection, Web, Open Source
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. It is fully open-sourced under the MIT License. This is a port of the original module, which was written in Python. If you’d like to make a contribution, please checkout the original author’s work here.
Tags: Rust, Python, NLP, Open Source
For more details, visit the website, navigate to the portfolio section, look for the “This Website” entry under the personal tab, and follow the directions listed there.
Tags: Web, Meta, Recursion, Open Source
Designed and built app to recognize objects and text in the user’s field of view and display relevant language translations. Built on top of Android OpenCV and Google’s Vision API. Designed and released overnight for SodaHacks 2018, earning finalist position out of 100+ competitors.
Tags: Java, Android, Open Source, Hackathon
An email bot that notifies the user whenever new items have been added to the UC Berkeley Lost and Found. Up-to-date data is synced with a Firebase database, and displayed on the interative webpage. Currently being run off of OCF infrastructure. In loving memory of the pair of glasses that I lost during orientation week :’(
Tags: Python, Firebase, Data Collection, Web, Open Source
Tools for developing optimal strategies for the game of Hog when complete information is available, and for approximating solutions when there is incomplete information. Includes visualization tools to watch strategies evolve between iterations. More details in this post
Tags: Python, Visualization, Open Source, Berkeley
CourseworkStuff I've done for classes. Sadly most of these aren't publicly available to prevent future students from stealing code, but you can request access from me via email if there's sufficient evidence that you're not currently enrolled in the respective class.
For the sake of thoroughness I included all of the projects from each class I've taken, but some were definitely more interesting implementation and concept wise than others. So for each distinct class I put a nice ⭐ next to my favorite project!
If you are currently enrolled (or plan to) and want to get advice on something course related feel free to drop by my UPE office hours or toss me an email.
In addition to being part of my portfolio, this is also secretly an advertisement for prospective Berkeley CS students. Go bears!
Term: Fall 2019
Built up handful of handy features from a bare bones (single-process, almost no syscalls) operating system.
- Implemented userspace
- Processes, threads, page faults, and appropriate syscalls
- Implemented thread priority
- Scheduling, timers, deadlock avoidance in synchronization primitives
- Implemented a file system
- Inodes, path resolution, caching, and appropriate syscalls
For prospective students: this class is fun, but fun and difficult aren’t mutually exclusive.
Term: Spring 2019
CS170 (Algorithms) at Berkeley has a long tradition of handing students an NP-complete problem and holding a competition to see who can produce the best computationally tractable approximation. Guavabot rescue gave a two layered problem:
- Faulty sensing
- Steiner-tree approximation
go-bears'; UPDATE Teams SET score = 0 WHERE name != 'go-bears';” got rank 23 out of
320 groups. The important thing, of course, was that countless stranded Guavabots around the world were rescued.
As with other projects I can’t really reveal my group’s approach publicly since this project may resurface in future semesters, but if you’re 1) curious and 2) not planning to enroll then let me know!
Term: Spring 2019
Title is a mouthful but that’s what it was officially labeled as! In addition to being a programming assignment this was a major exercise in secure design, so the source also includes a design document detailing the possible attack vectors and appropriate defenses. In particular the system was designed to allow trusted users to share files with each other using a client and an insecure data server. The goal was to allow for client’s to share arbitrarily large files, update + revoke access privileges, and modify files in such a way that it would be computationally infeasible for an attacker to recover information, modify a file silently, or impersonate another user even with full control over the storage server.
Implementation relies on traditional RSA, Argon2 for key derivation + password hashing, and generous applications of HMAC. Sadly still vulnerable to Rubber-hose Cryptanalysis.
Term: Spring 2019
A close contender for my favorite security project, however since it mainly consisted of breaking things instead of building I hesitate to list this at the top. I was handed an insecure webserver and was tasked to find 7 exploits. To chain each exploit more effectively I ended up writing a few tools in Python to automate generating and sending malicious payloads.
I can’t really list what the exploits were publicly since this project is still in use by the course, but they’re listed out in the private repo along with the tools.
Term: Fall 2018
- Full Arithmetic Logic Unit built from ground-up with logic gates
- Including support for word, half-word, and byte level instructions
- Memory, branching, and instruction decoding
- 2-stage (instruction-fetch, execute) pipelining
- And cases for all the wonderful side-effects that crop out in conjunction with branching :’ )
- Lifetime supply of unit and integration tests
Term: Spring 2018
A fully functioning “Roguelike” game including:
- Procedural dungeon generation
- Tile engine, lighting, and special game mechanics (further details in video)
- Gameplay recording and playback
Term: Spring 2018
This project consisted of three main parts:
- Rasterize map tiles of backend and serve to the front end
- Modify an XML parser to convert OpenStreetMap data for the city of Berkeley into appropriate data structures on the backend
- Implement A* search to provide navigation directions
Term: Fall 2017
A full featured interpreter for the Scheme programming language written in Python. Supports all the beautiful features of Scheme including:
- Tail-call optimization
ProfessionalStuff I've done for paid positions. Obviously most of this isn't open sourced so this section is more or less just a quick overview of what my role with each organization was.
I interned at the career service company BOLD over the summer of 2019. This job was a really nice extension of some of the embedding projects I was doing earlier in the year for research and it was great to see the technique cropping on in an industry application!
- Built backend of semantic search and recommendation system for LiveCareer’s job matching service, incorporated into data pipeline
- Worked closely with data science team to build set of offline evaluation measures for information retrieval
- Benchmarked new system against existing one, verified improved relevance and ~80% reduction in preprocessing time
I interned with Berkeley’s advanced transportation group PATH on their integrated dynamic transit operations project, which is currently partnering with BART and Alameda County to support enhanced transit operations including dynamic routing, ridesharing, and strategic dispatch. I worked on the interface for driver’s to receive and manage requests and the associated backend components to log responses and synchronize with dispatchers.
- Designed and built flexible user interface for bus drivers to view and manage requests from potential passengers and affiliated transit agencies for dynamic routing
- Integrated IoT input devices with onboard display device and server to create physical interface for operators
Worked in the Transportation Sustainability Research Center on data collection and APIs for two main projects:
American Truck Parking is a real-time parking database for long haul truckers along major U.S. highways. I automated some of the data entry tasks and added dynamic parking updates for ≈100 new parking locations. If you’re interested in policy this story explains the motivation for this project.
The Mobility Data Specification is trying to standardize an API for “dockless” bikes and scooters as an extension of the General Bikeshare Feed Specification. I worked on archiving data from these public feeds tracking the movements of over 10,000 shared vehicles in the U.S. and Europe
DataSome datasets, dataviz, and data collection tools I've worked on mostly at my time with the Berkeley Neuroecon lab.
Website that takes in user inputted tokens, performs a lookup for their corresponding embedding, performs PCA, and renders back the main components and their closest descriptions to the front-end. Intended to be used as a tool to introduce natural language embeddings to MBA students in MBA 261 (Graduate Marketing Research).
See the README of the Github repo for more details.
Tags: NLP, Visualization
Grapheme to phoneme alignments for the Carnegie Melon’s pronouncing dictionary data set. Alignments were produced using Phonetisaurus. I couldn’t find an open version of the most recent set of alignments anywhere, so I produced this to save some other people some time.
A grapheme to phoneme alignment maps letters (graphemes) to their sounds when spoken (phonemes). For example the graphemes
H,E,LL,O get aligned to the sounds
HH,EH0,L,OW1 where the sounds are represented using ARPAbet notation.
Tags: NLP, Dataset
Stormfront is a white nationalist, white supremecist internet forum. For hopefully obvious reasons the views expressed by the forum posts in this dataset are not representative of my own.
Compressed size: 1.9GB
Uncompressed size: 5.0GB
Tags: NLP, Dataset
A handful of specific scraping scrapes for news and social media websites. Done mainly to collect data for word embedding models for use in the Neuroeconomics lab.
Tags: NLP, Data Collection, Open Source
Fox News is an American pay television news channel. This dataset consists of the text content and metadata of all public web articles published between roughly 2008 and January 2019 on the Fox News and Fox Business websites in CSV format.
Compressed size: 1.6GB
Uncompressed size: 4.9GB
Tags: NLP, Dataset