Sourcegraph

This is a post about my favourite engineering productivity tool and how to set it up. I feel bad writing about it, because it feels like stealing. I cannot be alone using this tool in the way I will describe, but it is so great - I just cannot keep it quiet. I hope I don’t get a cease & desist, because, honestly, more engineers with large codebases should know about it and use it.

The tool is Sourcegraph and my previous company could not purchase a licence for the whole team, I have been using it as a single user docker container, running it on my laptop, to search through hundreds of internal repositories. It has been a massive productivity boost. Before explaining the setup I would like to say a few words about the product and the quiet impact the company’s culture has made on me, even though I have never worked there. You can skip to the instructions if you want.

Product

Sourcegraph is a code intelligence platform with many great features, but at its core - it is a code search engine that is criminally easy to try out. For public and open source repositories - you can just run a query on the public Sourcegraph instance. For example, this one finds instances of max_length parameter being passed in models.py file in a specific repo. I won’t get into details, but structural search is magnitudes better than a grep or regex search (both of which Sourcegraph supports too).

Sourcegraph also provides a quickstart docker single container to try it out in a single user mode with time limited full feature set or just code search. As it happens, either I was savvy enough or no one is talking about it, but I ran it on my laptop and with a little bit extra effort I plugged in the browser and IDE extensions, as well as the command line tool.

What’s so cool about it? Here’s a couple of experiences from last year.

6pm weekday evening, someone sends a desperate message on Slack - a part of the website returns 500, can someone help? I know nothing about that part of the codebase, but generally I know how to debug stuff. Plus, I recently indexed all of the repositories on my local Sourcegraph container - an opportunity to test drive it. I jump on a huddle and it takes us less than a minute to find the exception string in the logs. I paste it into my local Sourcegraph - voila! Two places that raise the exception. Two more clicks through references later and I narrowed down to one of them as most likely being the culprit. From that result I could navigate through the history of the file, who changed it, and also walk through the references of the methods in other files. At that point someone familiar with the codebase joined the call, I told about my findings, and they took it from there and fixed it. But the experience stuck with me: something was broken, I came in with zero knowledge of that part of the codebase, and it took me about a minute to identify a lead for a fix.

In another instance, we were planning to migrate a dependency from one client library implementation to another. One of the first things was to identify all the instances the old client methods have been called across a few hundred repositories. I wrote a structural query, used the command line tool and in five minutes had a list of the repositories, files, and line numbers in them that we would have to change, and by proxy - could also tell which teams it would impact.

Beyond code search, Sourcegraph also has an automated change management feature. You write a small configuration file, describe a change to make & repositories to apply it to, and Sourcegraph clones the repositories, applies the change, creates pull requests, and tracks its progress in a small neat dashboard. I am a fan of automated refactoring, and while I don’t have a lot of opportunities to do it, I had one at the time this Sourcegraph feature was still in beta and available for free to try even on the single user setup. Again, it was a simple thing, but it saved probably an hour of my time.

While you can tell I enjoy the product - a single docker container is there to try the product out, not to have a highly-available experience. At one point, the in-memory search index consumed more than 50% of my RAM, so I could not run the service constantly in the background. On the other hand, it takes less than a minute to start. Trade-offs. There was not a day I would not wish we had a real instance, but unfortunately I was not the one making the decisions. I hope this post convinces someone else what a great product Sourcegraph is. And while GitHub code search feature is quite new, I think Sourcegraph is an absolute leader in this, and with all the other features they added in recent years - they are well ahead.

Culture

Interestingly, Sourcegraph is not only an open source Their internal processes and culture documentation is also publicly available.

I was casually browsing the pages about engineer onboarding and manager resources. Turns out as part of onboarding you have to read two books and discuss them. I like reading books, so I took the recommendation. The same books are now on my recommended reading list too.

The first one is called Turn The Ship Around - a story about how the worst performing nuclear US Navy submarine crew became the best (and by a vast margin) within a year. The author walks the reader through that transformation and breaks it down, so that it can be applied in a business environment.

I also like to experiment with ideas, and so I did try and apply some of the ideas from the book within my own team. And, on the feedback I got, and the results we achieved, it was an objective success. The team engages and owns more decisions, we shiped more impactful stuff. It helped me redefine my own role in the team.

The other book - Orbiting a Giant Hairball - is hard to describe. Written tongue in cheek and but in a non-pretentious way (and if you can get a hardback version

  • it’s absolutely worth it for the doodles) it talks about fostering creativity in a bureaucratic environment (think - medium & big companies). Personally, my favourite story was the one about a conference organiser who assembled a great team who had done everything and she didn’t need to flip a finger. It gave me food for thought about what it means to be a productive leader. Is it important to be seen leading or for the results to be visible?

It still baffles me that a big impact on my tech lead & managerial philosophy were books I found by browsing a website of a product I really like. When the product has nothing to do with people management. It is just that their culture documents are open. I think there should be more of this.

As my good friend says: “now, let’s get to the gravy.” Hope it helps you navigate your codebases!

Instructions

Limitations:

  • Single machine, single user - everything runs on your laptop.
  • Not all extensions work (e.g. Sentry doesn’t support custom domain)
  • Can terminate occasionally (although in practice that happened maybe twice in two weeks for me, so it’s not annoying - you just restart the container)
  • This has been tested on macOs, but should not be much different on Linux.

Prerequisites: docker and docker compose.

Steps

Adjusted from the official quick-start guide:

  1. Clone this repository
    • If you want most recent version of the docker container, update the image tag in docker compose file to the most recent tag here.
  2. Run docker-compose up - this will create some defaults in ~/.sourcegraph/ directory, we’ll adjust them later.
  3. Go to https://localhost:7080 and follow the instructions to go through the setup process, create the admin account, and import repos.

Now you have the basics set-up and that’s a good starting point. However, a lot of power comes from various browser and code editor extensions. They require SSL, so I recommend setting that up.

Setup SSL

Setting up SSL is not necessary, unless you want to use the browser extension or the command line tool. The extension can give additional information while browsing code, whereas the src command can help query your Sourcegraph instance from the command line.

The instructions are adapted from the official Sourcegraph self-signed SSL documentation page. It adds a fake root CA so that the self-signed certificate can be validated. This is required for a command line tool to work - as it validates the sourcegraph.local certificate chain.

  1. docker compose stop - if it was running.

  2. brew install mkcert, install mkcert - an abstraction over OpenSSL written by Filippo Valsorda, a cryptographer working at Google on the Go team.

  3. sudo CAROOT=~/.sourcegraph/config mkcert -install - create a root CA.

  4. Create the certificate:

    sudo CAROOT=~/.sourcegraph/config mkcert \
      -cert-file ~/.sourcegraph/config/localhost.crt \
      -key-file ~/.sourcegraph/config/localhost.key \
      sourcegraph.local
    
  5. Allow your user (and sourcegraph container) to read the certificates:

    sudo chown $USER ~/.sourcegraph/config/root* ~/.sourcegraph/config/localhost.*
    
  6. Update the ~/.sourcegraph/config/nginx.conf file to look similar to the one here

  7. Update /etc/hosts file with the following line

    127.0.0.1    sourcegraph.local
    
  8. Run docker compose up

  9. Open https://sourcegraph.local:7443/site-admin/configuration

  10. Update the site-wide configuration to look like the site-configuration.js one in the repository.

Troubleshooting

There may be some errors in the logs, but as long as the system seems to be working - they can be ignored. However, if it does not start / is not accessible…

Database Connection Errors (e.g. Setting up postgres failed, database “sourcegraph” does not exist)

Run commands:

  1. rm -rf ~/.sourcegraph
  2. docker compose up
  3. docker compose rm

This will delete all local files that Sourcegraph has created in ~/.sourcegraph/. If the database files were corrupted - this will start things from scratch.

Socket timeout errors (especially from Docker)

On my machine, Docker for Mac sometimes loses track of containers - and is unable to run them, or they become non-responsive. This happens only if I run Sourcegraph together with multiple instances of other containers. I just restart Docker for Mac and then start Sourcegraph container again - it seems to help.

The Good Stuff

Link GitHub.com

By explicitly listing public github.com repositories to clone, your local Sourcegraph can index various dependencies, so that a search for HttpRequest does not stop at from django.http import HttpRequest, but you can investigate it deeper.

  1. Go to https://sourcegraph.local:7443/site-admin/external-services/new
  2. Choose GitHub and follow the instructions to generate a token.
  3. Update and adjust the configuration to look similar to the github-repositories.js example here.

Sourcegraph Extensions

Extensions are enabled by visiting https://sourcegraph.local:7443/extensions and toggling them on. Here are ones I tried:

  1. codecov - I could not make AJAX requests pass Cloudflare Access
  2. sentry - currently supports ‘sentry.io’ domain only.
  3. git-extras - can show blame inline (or on hover), a bit more convenient than just a sidebar.
  4. vscode-extras - opens the file you’re viewing in Sourcegraph - in VSCode. Very useful! Requires setup, see user settings.

VSCode Extension

  1. Find extension sourcegraph.sourcegraph and install it.
  2. Add "sourcegraph.url": "https://sourcegraph.local:7443" to your VSCode settings file.

This extension does not replace code intelligence, instead it allows users to select an identifier in VSCode and execute “Sourcegraph: Search” action from VSCode command palette. It opens the Sourcegraph search results for that identifier in a new browser tab. Combined with vscode-extras Sourcegraph extension this is actually fantastic, as you can seamlessly navigate between the browser and the editor as you explore the codebase.

Chrome Shortcut

Following the instructions allows you to have a shortcut in browser (I use sg). For example, if I want to search something: open a new tab and type in the address bar: sg file:requirements.txt django== and it will find all requirements files with django dependency, where you can see a list of versions being used by different repositories.

Command Line Tool

src-cli allows you to run search queries from the terminal + some extra stuff.


Recent articles

Command Line Tool As An Integration Point

A software design pattern that you won't find in Martin Fowler's design patterns list, but at the right time and place it can be very useful.

On Large Language Models

I've been asked about my thoughts on Large Language Models (LLMs), so I've decided to put together a blog post to share my reflections.