How to Become a Data Provider Miner on Desearch (A Developer Guide)
    AI Development

    How to Become a Data Provider Miner on Desearch (A Developer Guide)

    10 min read

    Want to join Desearch as a data provider miner? Here's what you need to know.

    Mining on Subnet 22 is not about passive income - it's a hands-on engineering challenge. You'll compete to deliver high-quality search results by scraping and summarizing data from platforms like Twitter, Reddit, and Google. Validators score your performance, and rewards are distributed based on how well you outperform others.

    What a Data Provider Miner Does

    The Miner's Core Responsibilities

    Mining on Subnet 22 is all about delivering top-notch results while constantly improving your system. As a miner, your job revolves around handling real-time search queries. When a validator sends out a query, your system steps in to gather, process, and return structured data. This involves scraping and indexing platforms like X (formerly Twitter), Reddit, Google, YouTube, Arxiv, and general web content.

    Your role breaks down into three main tasks:

    • First, retrieve the top 10 relevant tweets that address the query.
    • Second, summarize the raw data using language models, turning it into clear, actionable insights instead of just dumping links.
    • Third, analyze metadata - such as timestamps and engagement metrics - to provide additional context.

    The goal is to build a system that not only handles these tasks well but also consistently outshines competitors.

    The Goal: Better Results Than Other Miners

    At its core, your mission is straightforward: outperform other miners. Validators evaluate your results against others, and the better your output - whether through more relevant data, clearer summaries, or a stronger structure - the more rewards you earn.

    How Competition Improves the System

    Every query sparks a direct competition between miners. Validators compare multiple responses side by side, meaning your performance is always judged relative to others. This competitive environment encourages constant innovation - if you stop improving, someone else will surpass you.

    The network enforces this through a process called deregistration. Subnet 22 can host up to 192 miners at a time. When the slots are full, the miner with the lowest performance is replaced by a new registrant after their immunity period ends.

    This isn’t a flaw in the system - it’s a feature. The constant turnover ensures the network remains sharp and efficient. Being a miner on Subnet 22 means stepping into a continuous engineering challenge where only those who maintain high-quality work can succeed. These competitive dynamics tie directly into the technical skills needed for success, which will be covered next.

    Skills and Knowledge You Need

    Required Technical Skills

    To successfully operate a competitive miner on Subnet 22, you'll need a strong command of Python 3.10+, expertise in web scraping and API integration, effective use of large language models (LLMs), and solid infrastructure management skills. Python 3.10+ is non-negotiable for working with the Bittensor SDK.

    Mastering web scraping and API integration is crucial for handling data from platforms like Twitter, Reddit, Google, and YouTube. You'll need to manage rate limits, process unstructured data, and juggle multiple APIs. Twitter API performance, in particular, plays a big role in your overall score.

    LLM integration is another key area. You'll use these models to craft precise API queries and convert raw data into clear, actionable summaries. This requires skills in prompt engineering, managing latency, and balancing costs while maintaining quality output.

    Learning Bittensor Basics

    Bittensor

    Understanding the Bittensor network is not optional - it’s essential.

    Start by learning the miner lifecycle. Registration requires a non-refundable TAO fee, which fluctuates dynamically. Once registered, you’re assigned a UID slot, but Subnet 22 has only 192 slots available. If your performance falls to the bottom after your immunity period (approximately 13.7 hours or 4,096 blocks), you’ll be deregistered to make room for others.

    Earnings depend on the incentive distribution system. Validators score your work based on the subnet’s specific mechanism, and these scores feed into the Yuma Consensus algorithm, which determines your share of emissions. Understanding the metagraph - a live view of the subnet showing UIDs, rankings, trust scores, and consensus - is crucial for managing registration, avoiding deregistration, and maximizing rewards.

    Thinking Like a System Builder

    Once you’ve got the basics down, the real work begins. A competitive miner isn’t static - it evolves. The baseline miner.py from the Subnet 22 repository is just a starting point and won’t keep you competitive on its own.

    Use your immunity period wisely to test and refine your response logic before facing deregistration. Tools like wandb can help you monitor real-time metrics, including forward pass times, hardware usage, and overall model performance. Validator logs are another valuable resource - they provide insights into how your responses compare to others and highlight areas for improvement.

    To stay ahead, focus on building custom data pipelines, incorporating advanced AI models, and diversifying your data sources across various platforms. Your ability to consistently deliver quality performance will define your position in the network.

    Setting Up Your Miner on Subnet 22

    How to Register on Subnet 22

    To get started with Subnet 22, you'll need Python 3.10 or higher and a system running Linux or macOS - Windows isn’t supported. Begin by creating a Bittensor wallet, which consists of two keys: a coldkey for secure storage and funds, and a hotkey for active mining operations.

    Next, clone the official Subnet 22 repository from GitHub:

    git clone https://github.com/Desearch-ai/subnet-22
    

    Install the required dependencies with the following commands:

    python -m pip install -r requirements.txt
    python -m pip install -e .
    

    Now, register your hotkey on-chain using the Bittensor CLI:

    btcli subnet register --netuid 22 --wallet.name <coldkey_name> --wallet.hotkey <hotkey_name>
    

    Keep in mind, this registration requires a TAO token fee, which fluctuates based on network demand and is non-refundable. Upon successful registration, you’ll receive a UID slot. Subnet 22 has a total of 256 nodes, but only 192 slots are allocated for miners.

    Once registered, configure your environment variables as outlined in the subnet’s guide. To launch your miner, use the following command:

    python -m neurons/miners/miner.py --netuid 22 --subtensor.network finney --wallet.name <wallet_name> --wallet.hotkey <hotkey_name> --axon.port <port>
    

    To stay competitive, you’ll also need access to APIs for Twitter, Reddit, Google, and YouTube, as these provide the real-time metadata required for mining. New miners benefit from an immunity period of 4,096 blocks (approximately 13.7 hours), during which they cannot be deregistered for underperformance. Use this time to fine-tune your setup and ensure everything is running smoothly.

    Reading and Understanding the Subnet Code

    Treating the subnet as a mysterious black box is a recipe for failure. To succeed, you must understand how validators assess your miner’s output. Start by reviewing the source code at https://github.com/Desearch-ai/subnet-22.

    Pay special attention to the query format. Validators will send natural language questions like, "What is augmented reality's daily impact in 2024?". Your miner needs to respond with structured data - up to 10 relevant links accompanied by an AI-generated summary. The neurons/miners/miner.py file provides a baseline implementation, while the neurons/validators/ directory reveals how responses are scored.

    Scoring is heavily influenced by Twitter data, which accounts for 50% of your total score. Summary Scoring contributes 40%, and Search Scoring makes up the remaining 10%. Validators use large language models (LLMs) to evaluate the relevance and depth of the tweets your miner provides, so understanding this process is critical. You can also check the requirements.txt file for a list of the specific LLMs used in reward calculations.

    Building and Running a Competitive Miner

    Building Custom Data Pipelines

    Top-performing miners don't settle for off-the-shelf tools - they craft custom data pipelines. These pipelines typically include three stages: ingestion (gathering raw data), transformation (processing and filtering), and storage (organizing data for quick access). The goal? To ensure the data is accurate, complete, valid, consistent, up-to-date, and unique.

    For a competitive edge, use streaming ETL to process and enrich data in real time, rather than relying on slower batch methods. Streaming slashes latency - processing in milliseconds compared to the hours-long delays of batch systems. Automated quality checks are essential to avoid duplicate results and ensure critical fields, like IDs and timestamps, are always present. Add recency tests to detect and alert you to slowdowns or interruptions in data flow, so your miner doesn’t serve outdated information to validators.

    Ensure smooth data flow from platforms like Twitter, Reddit, Google, and YouTube by implementing robust API handling. Tools like Kafka or RabbitMQ can help manage sudden spikes in query volume by decoupling data ingestion from processing. Also, make use of idempotency - ensuring that the same input consistently produces the same output. This approach is crucial for maintaining stable validator scores.

    Once your pipeline is solid, the next step is leveraging advanced language models to refine the miner's outputs.

    Using LLMs in Your Miner

    To meet the high data quality standards mentioned earlier, Large Language Models (LLMs) play a key role in tasks like data cleaning, summarization, relevance scoring, and structuring outputs. Since Summary Scoring makes up 40% of your total score, integrating LLMs effectively can make or break your performance.

    Shift your focus from simple keyword matching to semantic understanding. Use Natural Language Processing (NLP) to grasp the meaning and context of words. For better alignment with user intent, implement intent classification to categorize queries as Informational, Commercial, Transactional, or Navigational. Validators penalize inaccuracies, so ground your outputs in reliable, verified sources to avoid hallucinations.

    Optimize for low latency by using caching and early filtering to reduce the load on your system. Limit data retrieval to indexed fields or explicitly mentioned terms to avoid unnecessary processing. Apply filters early - using WHERE clauses or specific index names - to cut down the data volume your LLM needs to handle. Keep an eye on system resources to quickly identify when new LLM features cause unexpected performance issues.

    How Validators and Incentives Work

    How Validators Score Your Results

    Validators evaluate your miner's outputs using Large Language Models (LLMs), focusing on context, relevance, and accuracy. They continuously send synthetic queries to assess performance and update the metagraph. Your performance score is divided into three key areas: Twitter Scoring (50%), Summary Scoring (40%), and Search Scoring (10%).

    For Twitter Scoring, validators expect you to provide 10 Twitter links that directly address the given prompt. Summary Scoring measures how effectively your LLM converts raw data into a clear, concise, and well-rounded summary. Search Scoring looks at the relevance of general web links and their alignment with the keywords in the prompt. Validators also consider additional factors like data freshness, structural integrity, freedom from hallucinations, and response speed.

    All performance metrics are publicly accessible through Weights and Biases (wandb). This allows you to monitor metrics like gating model loss and forward pass time, as well as compare your scores to top-performing miners. These scores play a direct role in determining how rewards are distributed, as explained below.

    How Rewards Are Calculated

    Once validator scores are determined, they are processed through the on-chain Yuma Consensus algorithm to allocate rewards. A validator's score influences your reward based on their stake, which includes a mix of TAO and the subnet's alpha token. To qualify as a validator, a node must maintain a minimum stake weight of 1,000 and usually rank among the top 64 nodes by emissions.

    Rewards are distributed relatively, meaning your share depends on how your scores compare to other active miners in the subnet. The reward structure strongly favors high-performing miners, with a steep curve ensuring top scorers earn significantly more than those with slightly lower scores.

    After the immunity period, miners with the lowest emissions may be replaced as new registrants join. For a deeper dive into the reward calculation, you can review the Subnet 22 reward.py file.

    Why Performance Determines Earnings

    The earlier scoring breakdown highlights the clear priority: improving the quality of your Twitter API calls is the most effective way to boost earnings. Performance quality is the key driver of rewards.

    Your earnings are a direct reflection of measurable quality and performance. To maximize results, study the validator logic and fine-tune your system to meet the criteria validators prioritize.

    Final Thoughts

    As a miner, you’re building the backbone of real-time search for AI agents and developers. By indexing key platforms and creating clear, concise summaries, you’re addressing AI’s limitations with live data. Your work ensures the search layer remains unbiased, transparent, and unrestricted. The competitive nature of mining not only drives you to innovate but also elevates the overall quality of decentralized search. This creates a viable alternative to centralized APIs, which are often costly and rate-limited. Success in this space demands a proactive, engineering-focused mindset - one that thrives on building and refining systems.

    Armed with this understanding, you’re ready to dive into your project and start contributing to the future of decentralized search.

    Getting Started

    To begin, explore the Subnet 22 GitHub repository. Pay close attention to the validator logic and reward calculation code to understand how your results will be evaluated. While the baseline miner implementation provides a solid starting point, think of it as just that - a starting point. To stand out, you’ll need to build a competitive solution and continuously refine your pipelines based on validator feedback. Mining success hinges on your engineering skills and your ability to adapt and improve.

    🍪 We value your privacy

    We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies. Read our Privacy Policy