Ruby Troubleshooting

Debug Knapsack Pro on your development environment/machine

Regular Mode

To reproduce what Knapsack Pro executed on a specific CI node, check out the same branch and run:

KNAPSACK_PRO_TEST_SUITE_TOKEN_RSPEC=MY_TOKEN \
KNAPSACK_PRO_CI_NODE_INDEX=MY_INDEX \
KNAPSACK_PRO_CI_NODE_TOTAL=MY_TOTAL \
KNAPSACK_PRO_BRANCH=MY_BRANCH \
KNAPSACK_PRO_COMMIT_HASH=MY_COMMIT \
KNAPSACK_PRO_CI_NODE_BUILD_ID=MY_BUILD_ID \
bundle exec rake "knapsack_pro:rspec[--seed MY_SEED]"

KNAPSACK_PRO_CI_NODE_BUILD_ID must be the same as the CI build you are trying to reproduce (if it helps, take a look at what Knapsack Pro uses as node_build_id for your CI provider).

You can also run the same subset of tests without Knapsack Pro: in the logs, find the command that Knapsack Pro used to invoke the test runner:

rspec --default-path spec "spec/models/user_spec.rb" "spec/models/comment_spec.rb"

Queue Mode

To reproduce what Knapsack Pro executed on a specific CI node, check out the same branch and run:

KNAPSACK_PRO_TEST_SUITE_TOKEN_RSPEC=MY_TOKEN \
KNAPSACK_PRO_CI_NODE_INDEX=MY_INDEX \
KNAPSACK_PRO_CI_NODE_TOTAL=MY_TOTAL \
KNAPSACK_PRO_BRANCH=MY_BRANCH \
KNAPSACK_PRO_COMMIT_HASH=MY_COMMIT \
KNAPSACK_PRO_CI_NODE_BUILD_ID=MY_BUILD_ID \
KNAPSACK_PRO_FIXED_QUEUE_SPLIT=true \
bundle exec rake "knapsack_pro:queue:rspec[--seed MY_SEED]"

KNAPSACK_PRO_CI_NODE_BUILD_ID must be the same as the CI build you are trying to reproduce (if it helps, take a look at what Knapsack Pro uses as node_build_id for your CI provider).

You can also run the same subset of tests without Knapsack Pro: in the logs, find the command that Knapsack Pro used to invoke the test runner.

You will find multiple commands to reproduce each batch:

[knapsack_pro] To retry the last batch of tests fetched from the API Queue, please run the following command on your machine:
[knapsack_pro] bundle exec rspec --format documentation --format RspecJunitFormatter --out tmp/rspec.xml --default-path spec --seed 24098 "spec/features/dashboard/billing/subscription_error_path_1_spec.rb[1:1:2:1:1:1]"

Or a single command at the end to execute all the batches the CI node executed:

[knapsack_pro] To retry all the tests assigned to this CI node, please run the following command on your machine:
[knapsack_pro] bundle exec rspec --format documentation --format RspecJunitFormatter --out tmp/rspec.xml --default-path spec --seed 24098 "spec/features/dashboard/billing/subscription_error_path_1_spec.rb[1:1:2:1:1:1]" "spec/features/dashboard/builds/build_distribution_for_build_spec.rb[1:1:1:10:2:1]" "spec/features/dashboard/builds/build_distribution_for_build_spec.rb[1:1:2:1]" "spec/features/dashboard/admin_statistics_spec.rb[1:6:1:1]" "spec/features/dashboard/identity_providers_spec.rb[1:6:1]" "spec/features/dashboard/onboarding_spec.rb[1:2:1:1:1]"

Knapsack Pro hangs

Some users reported frozen CI nodes with:

Test runners producing too much output (e.g., Codeship and Queue Mode): reduce it with KNAPSACK_PRO_LOG_LEVEL=warn
Test runners producing no output: make sure to use a formatter (like RSpec's --format progress)

Timecop (e.g., Travis): ensure Timecop.return is executed after all examples

# spec/spec_helper.rb

RSpec.configure do |c|
  c.after(:all) do
    Timecop.return
  end
end

Chrome 83+ prevents downloads in sandboxed iframes: add an allow-downloads keyword in the sandbox attribute list
Ensure you use the latest version of the simplecov gem that fixes the bug.
Too much memory consumption, leading to a timeout on the CI:
- A heap size too big in Elasticsearch could lead to saturated memory. Add ES_JAVA_OPTS: "-Xms1024m -Xmx1024m" to force the JVM to use 1GB for the heap size.
Recursive loop in the code base. Check if you have hidden loops that could happen when an exception happens. For example, investigate nested rescue blocks. You can use the following backtrace debugging suggestions to find it.

Why does the CI node hang? (backtrace debugging)

Knapsack Pro would print Ruby threads to the output when tests freeze and CI node timeouts (the Ruby process is killed by a CI provider). See the backtrace(s) showing what code line causes the hanging problem. The backtrace is visible only if the Ruby process got killed with one of the OS signals like HUP, INT, TERM, ABRT, QUIT, USR1, or USR2.

Tests output
================================================================================
Start logging 2 detected threads.
Use the following backtrace(s) to find the line of code that got stuck if the CI node hung and terminated your tests.
How to read the backtrace: https://knapsackpro.com/perma/ruby/backtrace-debugging

Running specs in the main thread:
knapsack_pro-ruby/spec_integration/b_spec.rb:7:in `kill'
knapsack_pro-ruby/spec_integration/b_spec.rb:7:in `block (3 levels) in <top (required)>'


Non-main thread inspect: #<Thread:0x000000011e18f6e0 spec_integration/a_spec.rb:5 sleep>
Running specs in non-main thread:
knapsack_pro-ruby/spec_integration/a_spec.rb:6:in `sleep'
knapsack_pro-ruby/spec_integration/a_spec.rb:6:in `block (3 levels) in <top (required)>'


Main thread backtrace:
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:83:in `backtrace'
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:83:in `block in log_threads'
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:75:in `each'
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:75:in `log_threads'
knapsack_pro-ruby/lib/knapsack_pro/runners/queue/base_runner.rb:62:in `block (2 levels) in trap_signals'
knapsack_pro-ruby/spec_integration/b_spec.rb:7:in `kill'
knapsack_pro-ruby/spec_integration/b_spec.rb:7:in `block (3 levels) in <top (required)>'
.rvm/gems/ruby-3.3.4/gems/rspec-core-3.13.0/lib/rspec/core/example.rb:263:in `instance_exec'
...


Non-main thread inspect: #<Thread:0x000000011e18f6e0 spec_integration/a_spec.rb:5 sleep>
Non-main thread backtrace:
knapsack_pro-ruby/spec_integration/a_spec.rb:6:in `sleep'
knapsack_pro-ruby/spec_integration/a_spec.rb:6:in `block (3 levels) in <top (required)>'


End logging threads.
================================================================================

The above-highlighted lines of code got stuck. See them in the following spec files.

The main thread was killed due to the following b_spec.rb:7 line:

b_spec.rb
describe 'B1_describe' do
  describe 'B1.1_describe' do
    xit 'B1.1.1 test example' do
      expect(1).to eq 1
    end
    it 'B1.1.2 test example' do
      Process.kill("INT", Process.pid)
    end
  end
end

The non-main thread was stuck at the time due to the following sleep in the a_spec.rb:6 line:

a_spec.rb
describe 'A_describe' do
  it 'A1 test example' do
    expect(1).to eq 1

    Thread.new do
      sleep 10
    end
  end
end

Use the above approach to discover the root issue of hanging tests in your test suite. See the usual suspects.

tip

You can use the timeout program to send the USR1 signal after Knapsack runs too long if the CI provider does not kill the frozen process.

timeout --signal=USR1 600 bundle exec rake "knapsack_pro:queue:rspec[--format d]"

Notice that 600 is in seconds. It sends the signal after 10 minutes of running the command. You may want to adjust that number to ensure the USR1 signal is sent after the process is stuck and not before. For example, you could use 600 seconds (10 minutes) if you know that your tests take no more than 7 minutes to run.

tip

Check your CI provider documentation to configure a timeout if you worry about wasting resources on the hanging CI node. For example, GitHub Actions uses:

jobs.<job_id>.timeout-minutes - The maximum number of minutes to let a job run before GitHub automatically cancels it.
jobs.<job_id>.steps[*].timeout-minutes - The maximum number of minutes to run the step before killing the process.

You might still need to use the timeout program to terminate tests and capture the backtrace before the CI kills the node completely. Ensure the duration configured for the timeout program is lower than the one configured for the CI provider.

For legacy versions of knapsack_pro older than 7.7.0, click here.

Tests freeze and CI node timeouts (the Ruby process is killed by a CI provider). Add the following script to spec_helper.rb. Thanks to that, you will see the backtrace showing what code line causes the hanging problem. Backtrace is visible only if the Ruby process got kill with USR1 signal. Learn more from this GitHub issue.

# source https://github.com/rspec/rspec-rails/issues/1353#issuecomment-93173691
puts "Process pid: #{Process.pid}"

trap 'USR1' do
  threads = Thread.list

  puts
  puts "=" * 80
  puts "Received USR1 signal; printing all #{threads.count} thread backtraces."

  threads.each do |thr|
    description = thr == Thread.main ? "Main thread" : thr.inspect
    puts
    puts "#{description} backtrace: "
    puts thr.backtrace.join("\n")
  end

  puts "=" * 80
end

If the CI provider does not kill the frozen process, you can do it by running the below command in the terminal (when you are ssh-ed into the CI node):

kill -USR1 <process pid>

Alternatively, you can use the timeout program to send the USR1 signal after Knapsack runs too long.

timeout --signal=USR1 600 bundle exec rake "knapsack_pro:queue:rspec[--format d]"

Notice that 600 is in seconds. It sends the signal after 10 minutes of running the command. You may want to adjust that number to ensure the USR1 signal is sent after the process is stuck and not before.

Knapsack Pro does not work on a forked repository

caution

Make sure your Knapsack Pro API token is set up as a secret in your CI (not hardcoded in the repository) to avoid leaking it.

Since the token won't be available on forked CI builds, you can use a script to run:

Knapsack Pro in Fallback Mode(static split by file names) on forked builds
Knapsack Pro in Queue Mode or Regular Mode on internal builds

Create bin/knapsack_pro_tests, make it executable chmod u+x, and use it on CI to run your tests:

#!/bin/bash

if [ "$KNAPSACK_PRO_TEST_SUITE_TOKEN_RSPEC" = "" ]; then
  KNAPSACK_PRO_ENDPOINT=https://api-disabled-for-fork.knapsackpro.com \
  KNAPSACK_PRO_TEST_SUITE_TOKEN_RSPEC=disabled-for-fork \
  KNAPSACK_PRO_MAX_REQUEST_RETRIES=0 \
  KNAPSACK_PRO_CI_NODE_RETRY_COUNT=0 \
  bundle exec rake knapsack_pro:queue:rspec
else
  bundle exec rake knapsack_pro:queue:rspec
fi

Error `commit_hash` parameter is required

Knapsack Pro takes KNAPSACK_PRO_COMMIT_HASH and KNAPSACK_PRO_BRANCH from the CI environment (see supported CIs). If your CI is not supported you may see the following error:

ERROR -- : [knapsack_pro] {"errors"=>[{"commit_hash"=>["parameter is required"]}]}

In such a case, you have two options:

If you have git installed on CI, set KNAPSACK_PRO_REPOSITORY_ADAPTER=git and KNAPSACK_PRO_PROJECT_DIR
Otherwise, set KNAPSACK_PRO_COMMIT_HASH and KNAPSACK_PRO_BRANCH yourself

`LoadError: cannot load such file -- spec_helper`

If you are using a complex KNAPSACK_PRO_TEST_FILE_PATTERN, Knapsack Pro could have problems finding the directory containing the spec_helper.rb file. Please set KNAPSACK_PRO_TEST_DIR.

CI builds fail with `Test::Unit` but all tests passed

Please ensure you are using Test::Unit only and not loading minitest too.

Tests hitting external APIs fail

Most likely, you have global shared state between your tests.

For example, you may have a hook that deletes Stripe data while a parallel CI node is testing the Stripe integration:

before(:each) do
  Stripe::Subscription.all.each { |subscription| subscription.delete }
  Stripe::Customer.all.each { |customer| customer.delete }
end

You can fix it by either:

Using a gem like VCR to record/replay your HTTP interactions
Write your tests so that they can run in parallel (e.g., each test creates/deletes its own Stripe data)

WebMock/VCR

WebMock or VCR may trigger the following errors when Knapsack Pro interacts with the Knapsack Pro API:

"Real HTTP connections are disabled"
WebMock::NetConnectNotAllowedError
VCR::Errors::UnhandledHTTPRequestError
OpenSSL::SSL::SSLError: SSL_connect returned=1 errno=0 state=error: tlsv1 alert internal error

Update the knapsack_pro gem to the latest version to fix those issues.

For legacy versions of knapsack_pro older than 7.2, click here.

Add the Knapsack Pro API's subdomain to the ignored hosts and disable_web_connect! in spec/spec_helper.rb (or wherever your WebMock/VCR configuration is located):

require 'vcr'

VCR.configure do |config|
  config.hook_into :webmock
  config.ignore_hosts('localhost', '127.0.0.1', '0.0.0.0', 'api.knapsackpro.com')
end

require 'webmock/rspec'
WebMock.disable_net_connect!(allow_localhost: true, allow: ['api.knapsackpro.com'])

You may need to add require: false in your Gemfile:

group :test do
  gem 'vcr'
  gem 'webmock', require: false
end

If some tests are failing due to requests to the Knapsack Pro API, you may have some code that reconfigures WebMock. The usual suspects are calls to WebMock.reset! or an issue with webmock/rspec. In the latter case, you can solve the problem with:

RSpec.configure do |config|
  config.after(:suite) do
    WebMock.disable_net_connect!(
      allow_localhost: true,
      allow: [
        'api.knapsackpro.com',
      ],
    )
  end
end

`ActiveRecord::SubclassNotFound`

If files are changing during a tests run, you may get the following error:

ActiveRecord::SubclassNotFound:
Invalid single-table inheritance type: AuthenticationProviders::AnAuthenticationProvider is not a subclass of AuthenticationProvider

Please make sure classes are loaded once in the test environment:

# environments/test.rb

config.eager_load = true

Adding parallel CI nodes/jobs makes test slower

You can verify if the tests are getting slower on your Knapsack Pro dashboard:

Execution times of your CI builds are increasing: Recorded CI builds > Show (build) > Test Files > Total execution time
Individual test stats are trending up: Statistics of test files history > Stats (test file) > History of the test file (chart)

Here are the most common reasons:

CI nodes share resources (CPU/RAM/IO)
Tests are accessing the same resource (e.g., Stripe sandbox, database)
Your CI gives you a limited pool of parallel CI nodes (and runs the rest serially)

Some tests were skipped in Regular Mode

There is an unlikely scenario where some CI nodes may start in Fallback Mode but others in Regular Mode resulting in some tests being skipped. Our recommendations are either

Switch to Queue Mode and enjoy faster CI builds too
Disable Fallback mode with KNAPSACK_PRO_FALLBACK_MODE_ENABLED=false
Increase the request attempts with KNAPSACK_PRO_MAX_REQUEST_RETRIES

FactoryBot/FactoryGirl raises in Queue Mode

Use the knapsack_pro binary:
```
bundle exec knapsack_pro queue:rspec
```

Avoid implicit associations:

# ⛔️ Bad
FactoryBot.define do
  factory :assignment do
    task
  end
end

# ✅ Good
FactoryBot.define do
  factory :assignment do
    association :task
  end
end

One CI node run the test suite again in Queue Mode

Most likely, that CI node started when all the others finished running and initialized a new queue.

You have a couple of options:

Make sure KNAPSACK_PRO_CI_NODE_BUILD_ID is set (recommended)
Set KNAPSACK_PRO_FIXED_QUEUE_SPLIT=true

Rake tasks under tests are run more than once in Queue Mode

Update the knapsack_pro gem to the latest version to fix this issue.

For legacy versions of knapsack_pro older than 8.1.1, click here.

Make sure the Rake task is loaded once for each test example in the spec file:

rake
spec

class DummyOutput
  class << self
    attr_accessor :count
  end
end

namespace :dummy do
  task do_something_once: :environment do
    DummyOutput.count ||= 0
    DummyOutput.count += 1
    puts "Count: #{DummyOutput.count}"
  end
end

require 'rake'

describe 'Dummy rake' do
  describe "dummy:do_something_once" do
    let(:task_name) { "dummy:do_something_once" }
    let(:task) { Rake::Task[task_name] }

    before(:each) do # make sure the rake task is defined *once*
      Rake::Task[task_name].clear if Rake::Task.task_defined?(task_name)
      Rake.load_rakefile("tasks/dummy.rake")
      Rake::Task.define_task(:environment)
    end

    after(:each) do
      Rake::Task[task_name].reenable
      # Reset the state
      DummyOutput.count = 0
    end

    it "calls the rake task once (increases counter by one)" do
      expect { task.invoke }.to_not raise_error
      expect(DummyOutput.count).to eq(1)
    end

    it "calls the rake task once again (increases counter by one)" do
      expect { task.invoke }.to_not raise_error
      expect(DummyOutput.count).to eq(1)
    end
  end
end

Tests distribution is unbalanced in Queue Mode

Please consider:

Splitting by test examples if you have a bottleneck file that is packed with test examples
If it's a retry, remember that KNAPSACK_PRO_FIXED_QUEUE_SPLIT=true uses a cached split

Execution times are all 0 seconds in Minitest

When using minitest-spec-rails, avoid the unsupported outer describe. Otherwise, Knapsack Pro won't be able to track the execution time of each test.

# ⛔️ Bad
require 'test_helper'
describe User do
end

# ✅ Good
require 'test_helper'
class UserTest < ActiveSupport::TestCase
  let(:user_ken)   { User.create!(email: 'ken@example.com') }

  it 'works' do
    expect(user_ken).must_be_instance_of User
  end
end

Debug Knapsack Pro on your development environment/machine​

Regular Mode​

Queue Mode​

Knapsack Pro hangs​

Why does the CI node hang? (backtrace debugging)​

Knapsack Pro does not work on a forked repository​

Error commit_hash parameter is required​

LoadError: cannot load such file -- spec_helper​

CI builds fail with Test::Unit but all tests passed​

Tests hitting external APIs fail​

WebMock/VCR​

ActiveRecord::SubclassNotFound​

Adding parallel CI nodes/jobs makes test slower​

Some tests were skipped in Regular Mode​

FactoryBot/FactoryGirl raises in Queue Mode​

One CI node run the test suite again in Queue Mode​

Rake tasks under tests are run more than once in Queue Mode​

Tests distribution is unbalanced in Queue Mode​

Execution times are all 0 seconds in Minitest​