Farewell Redis/MySQL: Implementing a Persistent Set in 100 Lines of Go
Why use a sledgehammer to crack a nut!
The Problem
While updating the word library for Cipei (词焙), I encountered an issue: if a certain word is invalid, it needs to be marked so it can be skipped directly when encountered again.
To implement this, one might first think of using Redis’s Set, or adding a table in the database with one row per invalid word.
However, Cipei itself doesn’t use Redis, and using it would require configuring memory eviction policies. Putting such a simple requirement into a database feels like using a sledgehammer to crack a nut.
So, I chose to use memory directly + periodic persistence to a file. The whole technical solution isn’t difficult, totaling about a hundred lines of code.
Basic Features
Although the technical solution isn’t complex, we’ll break down the requirements and improve it step by step.
Methods the business side cares about: Add (add new) and Exist (check if exists). As for the details of how to persist, the business side doesn’t care much. So let’s first implement the simplest set:
type PersistentList struct {
records map[string]struct{} // Data set in memory
mu sync.RWMutex // RWMutex for concurrency control
}
// Add adds a value to the cache.
func (c *PersistentList) Add(value string) {
c.mu.Lock()
defer c.mu.Unlock()
if _, found := c.records[value]; !found {
c.records[value] = struct{}{}
}
}
// Exist checks if a value exists in the cache.
func (c *PersistentList) Exist(value string) bool {
c.mu.RLock()
_, found := c.records[value]
c.mu.RUnlock()
return found
}
Now the simplest memory set is implemented. Next, let’s add persistence!
Persistence
We can save the set line by line to a text file. Doing this has two benefits: one is that the logic for loading the set when the process starts is simple, and the other is that even if we write to the file every time we Add an element, the performance is still acceptable because it’s an append-only write.
Modify the struct definition to add pending and file, and add a chan for sending an exit signal:
type PersistentList struct {
records map[string]struct{} // Data set in memory
pending []string // New data waiting to be persisted
mu sync.RWMutex // RWMutex for concurrency control
file *os.File // Persistence file handle
close chan struct{} // Signal for closing background tasks
log contract.Logger
}
We will implement the logic for reading and writing data separately:
Loading Data from File
As mentioned earlier, loading data is just reading line by line and putting it into records. We put this logic into the instantiation:
// The definition of contract.Logger is at the end of the article
func NewPersistentList(dataFile string, log contract.Logger) (*PersistentList, error) {
file, err := os.OpenFile(dataFile, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0644)
if err != nil {
return nil, err
}
cache := &PersistentList{
records: make(map[string]struct{}),
file: file,
close: make(chan struct{}),
log: log,
}
scanner := bufio.NewScanner(file)
for scanner.Scan() {
word := scanner.Text()
if word != "" {
cache.records[word] = struct{}{}
}
}
if err := scanner.Err(); err != nil && err != io.EOF {
file.Close()
return nil, err
}
log.Infof("Loaded %d data into persistent cache from %s", len(cache.records), dataFile)
return cache, nil
}
Persisting Data to File
We could append to the file every time a piece of data is Added, but here we’ll do a little over-engineering: batch flushing. Specifically, every Add will append to the pending array, and a background task will write to disk every minute:
// Modify the Add method:
func (c *PersistentList) Add(value string) {
c.mu.Lock()
defer c.mu.Unlock()
if _, found := c.records[value]; !found {
c.records[value] = struct{}{}
c.pending = append(c.pending, value) // Added this line
}
}
// Add a flush method:
func (c *PersistentList) syncToFile() error {
c.mu.Lock()
defer c.mu.Unlock()
if len(c.pending) == 0 {
return nil
}
count := len(c.pending)
writer := bufio.NewWriter(c.file)
for _, word := range c.pending {
if _, err := writer.WriteString(word + "\n"); err != nil {
return err
}
}
if err := writer.Flush(); err != nil {
return err
}
// Clear data waiting to be flushed, ready for next write
c.pending = nil
c.log.Infof("Successfully synced %d new keys to file.", count)
return nil
}
// Add a background task method (to be executed asynchronously as a goroutine when the process starts):
func (c *PersistentList) Serve(ctx context.Context) error {
const syncInterval = 1 * time.Minute
ticker := time.NewTicker(syncInterval)
defer ticker.Stop()
defer c.file.Close()
for {
select {
case <-ctx.Done():
c.log.Warn("Context canceled. Performing final sync...")
return c.syncToFile()
case <-ticker.C:
if err := c.syncToFile(); err != nil {
return err
}
case <-c.close:
c.log.Warn("Received close signal. Performing final sync...")
return c.syncToFile()
}
}
}
By now, a Set with persistence features is implemented. It can be used like this:
func main() {
myset, err := NewPersistentList("/path/to/setfile")
if err != nil {
panic(err)
}
ctx, cancel := context.WithCancel(context.TODO())
go myset.Serve(ctx)
myset.Add("asdfgh")
fmt.Println(myset.Exist("asdfgh"))
fmt.Println(myset.Exist("qwerty"))
cancel() // Notify to flush when exiting the process
}
But the context.WithCancel method above is still a bit troublesome. Remember the close chan we just defined? We can add a Shutdown method:
func (c *PersistentList) Shutdown() {
c.close <- struct{}{}
}
This way, you can directly use myset.Shutdown() to close it.
Definition
contract.Logger is defined as follows:
package contract
type Logger interface {
Debug(msg string)
Debugf(msg string, args ...interface{})
Info(msg string)
Infof(msg string, args ...interface{})
Warn(msg string)
Warnf(msg string, args ...interface{})
Error(msg string)
Errorf(msg string, args ...interface{})
Close() error
}