The “How do I digitize/transfer/capture video tapes” quick info thread (reddit backup)

FYI this post is a backup of my reddit post. Now that reddit is pulling a tumblr I figured self hosting is in order.

Why a “How do I digitize/transfer/capture video tapes” thread?

I see this question asked on DataHoarder a couple times a week, mostly with the same good and bad answers. Hopefully this thread will serve the purpose of providing basic and in depth info and options for digitizing common video tapes. Will be asking mods to sticky this. I’ll be making updates if anyone has strong opinions or things to add.

Who are you?

I frequent this sub and spend a lot of time transferring tapes for various communities, here’s my setup: https://www.reddit.com/r/DataHoarder/comments/g2l5ow/video_archival_rack_build_one_year_update_more/

Basics

The way in which you digitize your tapes is going to depend on the tape format and the quality you wish to achieve. As with most things, you get what you pay for, and the higher the quality you desire the more you’re going to go down a rabbit hole of information.

This post will cover the common formats you’re likely to deal with: VHS, VHSC, HI8, Video8, Digital8, and miniDV.

ECS cloudwatch task error: “The specified log group does not exist”

This quick one is just for the googlers.

Saw an error bringing up tasks on a fresh ECS cluster and task run. The task definition included configuration to send logs to a cloudwatch log group.

"logConfiguration": {
    "logDriver": "awslogs",
    "options": {
        "awslogs-group": "web",
        "awslogs-region": "us-west-2",
        "awslogs-stream-prefix": "ecs"
    }
}

Unfortunately this is missing a not well documented flag:

"awslogs-create-group": "true",

Adding this argument, along with the correct permissions to create log groups, allows the task to create the group and send logs. The following permission is needed on the task execution role:

logs:CreateLogGroup

Full docs here:

https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_awslogs.html

Boxee Box “Can’t connect to internet” fix, cloned Boxee services

The Boxee Box was a short lived but powerful set top box by D-link that was released 2010 and discontinued 2012.

All Boxee Boxes relied on an application server hosted by D-link at boxee.tv for periodic phone-home calls and service endpoints.

In June 2019 these application servers went down, resulting in all Boxee Boxes still in operation throwing “Can’t connect to internet” errors and all user profiles and apps going offline.

In August 2019 I released a small python Flask app, boxee-server-light, to replace the downed boxee.tv servers. This code was created by referencing an existing project by Jimmy Conner (cigamit, boxeed.in forums).

To use it, you’ll need to add DNS entries for all boxee application urls, pointing to the boxee-server-light application.

For example:

18.211.111.89  app.boxee.tv
18.211.111.89  api.boxee.tv
18.211.111.89  dir.boxee.tv
18.211.111.89  s3.boxee.tv
18.211.111.89  t.boxee.tv
18.211.111.89  res.boxee.tv
18.211.111.89  0.ping.boxee.tv
18.211.111.89  1.ping.boxee.tv
18.211.111.89  2.ping.boxee.tv
18.211.111.89  3.ping.boxee.tv
18.211.111.89  4.ping.boxee.tv
18.211.111.89  5.ping.boxee.tv
18.211.111.89  6.ping.boxee.tv
18.211.111.89  7.ping.boxee.tv
18.211.111.89  8.ping.boxee.tv
18.211.111.89  9.ping.boxee.tv
18.211.111.89  dl.boxee.tv

… where the IP is the address of the Flask application.

For those who are unable to run their own DNS or this application, I am hosting a public version of this code. You can add my public DNS server to your router config, or set it as custom DNS on your Boxee Box in network settings. You can also point directly to my public application server using your own DNS.

My public DNS server is by whitelist only, so please email me (I don’t check comments often) if you would like access.

Public DNS server address: 18.211.111.89
Public application server address: ~~18.213.38.199~~ 18.211.111.89 (also)

For more up to date info and discussion, check out my Reddit post:
https://www.reddit.com/r/boxee/comments/ci4ugj/boxee_cloned_server_updates_working_server/

FAQ:

Do I need a static IP address from my ISP to use the public DNS?
Yes. I’ll need to whitelist your IP address. If you get a new one every day this won’t work.

I run my own local DNS. Do I need to be whitelisted to use your application server?
No. Map the boxee domains to my public app server as shown above. No whitelist required.

I’m logged out of my boxee box. How do I log back in while using this app?
Any username and password combo will work to log you back in.

I reset my boxee box. What firmware do I need to be using to use your public servers?
1.5.1 (latest) seems to work best. If you can’t find this firmware, email me.

Do apps work with this project?
I don’t have any apps connected yet. PRs welcome. I’m not 100% sure if app downloading will work without some additional code.

HAProxy dynamic backend selection with Lua script

HAProxy is a popular load balancer with extensive configuration options, including the ability to influence balancing and other options via Lua scripts.

In this post i’ll show how it’s possible to influence HAProxy backend selection via a Lua script. The use case for this situation was the necessity to choose a backend server based off of the responses from each possible backend.

First, Lua needs to be installed and HAProxy needs to be installed with Lua support. This involves building HAProxy with USE_LUA=1 environment var set during make.

Here’s a stripped down config example, with relevant lines commented. Not all required config attributes are included for brevity.

global
    # Load custom lua script. I usually put this alongside the haproxy.conf.
    lua-load /etc/haproxy/pick_backend.lua 

# Frontend config, rtmp traffic.
frontend frontendrtmp
    bind *:1935
    mode tcp

    # inspect-delay was required or else was seeing timeouts during lua script run
    tcp-request inspect-delay 1m

    # This line intercepts the incoming tcp request and pipes it through lua function, called "pick backend"
    tcp-request content lua.pick_backend

    # use_backend based off of the "streambackend" response variable we inject via lua script
    use_backend %[var(req.streambackend)]


# Example backends. One server per backend. The Lua script will iterate through all backends
# with "backendrtmp" prefix. 
# HAProxy use_server attribute does not yet support lua scripts, so backends necessary.
backend backendrtmp1
    mode tcp
    server rtmp 123.456.789.0:1935 check

Requests to the “frontendrtmp” frontend are routed through the Lua script, which checks each listed backend and chooses one based off its response.

Here’s the Lua script:

local function pick_backend(txn)
    winner_name = 'backendrtmp1' -- Needs to match available backend.
    winner_count = -1 ---initial count flag

    for backend_name ,v in pairs(core.backends) do
      if (backend_name ~= 'MASTER') then -- Filter out built in backend name
        -- iterate backend servers dict, assuming one server per backend.
        for server_name, server in pairs(v.servers) do
          -- Skip any server that is down.
          if server:get_stats()['status'] ~= 'DOWN' then
            address = string.match(server:get_addr(), '%d+.%d+.%d+.%d+')
            local tcp = core.tcp()
            tcp:settimeout(1)

            -- Connect to rtmp server to get stats counts.
            if tcp:connect(address, 80) then
              if tcp:send('GET /statistics\r\n') then
                local line, _ = tcp:receive('*a')

                -- Do whatever checks you want here with the response.
                -- In this case, i'll just check the number returned
                -- from the statistics endpoint.
                streamers = tonumber(string.match(line, '(%d+)'))

                -- Check and set winner.
                if (winner_count == -1) then
                  print('Set initial backend', backend_name)
                  winner_count = streamers
                  winner_backend = backend_name
                else
                  if (streamers < winner_count) then
                    print('New winner', backend_name)
                    winner_count = streamers
                    winner_backend = backend_name
                  end
                end
              end
              tcp:close()
            else
              print('Socket connection failed')
            end
          end
        end
      end
    end
    print('Winner is:', winner_backend)

    -- Set winner backend name to variable on the request.
    txn:set_var('req.streambackend', winner_backend)
end

core.register_action('pick_backend', {'tcp-req', 'http-req'}, pick_backend)

The Lua script:

Iterates over each backend with the required prefix
Hits an endpoint on the listed server if it's up
Checks count from endpoint
Compares with count of previous lowest count server
Sets a response variable with the name of the backend with the lowest count

This enables us to route traffic dynamically to the server with the lowest number of users.

Regression testing releases with Depicted (Dpxdt), Travis, & Saucelabs

Depicted is a release testing tool that compares before and after screenshots of your webpage, highlighting differences between the two.

Depicted supplements your release testing by allowing you to approve any visual changes a new release may cause.

I wrote a script during my time at Sprintly that would take a TravisCI build ID, pull related screenshots from our Saucelabs selenium tests, and upload them to a Depicted API server for comparison.

Before a new release would be deployed, we would manually run our Depicted release script and check and approve any changes.

This script was integrated as a Django management command for ease of use. Check out the full script below with comments.

Django, Redis & AWS ElastiCache primary/replica cluster

AWS’s ElastiCache service is a convenient way to launch a Redis cluster. If you’re using Django, both django-redis-cache and django-redis packages support an ElastiCache Redis instance. If you are launching ElastiCache Redis with any amount of replicas, some additional master-slave configuration is needed in your Django settings.

Here is an example of an ElastiCache Redis cluster with a primary instance and two replicas:

The following is an example of the correct settings for this cluster if you’re using django-redis-cache backend:

CACHES = {
    'default': {
        'BACKEND': 'redis_cache.RedisCache',
        'LOCATION': [
            "test-001.730tfw.0001.use1.cache.amazonaws.com:6379",
            "test-002.730tfw.0001.use1.cache.amazonaws.com:6379",
            "test-003.730tfw.0001.use1.cache.amazonaws.com:6379"
        ],
        'OPTIONS': {
            'DB': 0,
            'MASTER_CACHE': "test-001.730tfw.0001.use1.cache.amazonaws.com:6379"
        },
    }
}

https://django-redis-cache.readthedocs.io/en/latest/advanced_configuration.html#master-slave-setup

Apache Kafka plaintext authentication and kafka-python configuration reference

Apache Kafka config settings and kafka-python arguments for setting up plaintext authentication on Kafka.

You’ll need to follow these instructions for creating the authentication details file and Java options.

I exposed the auth endpoint to port 9095. All other ports were closed via AWS security groups.

Kafka config settings:

security.inter.broker.protocol=PLAINTEXT
sasl.mechanism.inter.broker.protocol=PLAIN
sasl.enabled.mechanisms=PLAIN
advertised.listeners=SASL_PLAINTEXT://example.com:9095,PLAINTEXT://example.com:9092
listeners = SASL_PLAINTEXT://0.0.0.0:9095,PLAINTEXT://0.0.0.0:9092

kafka-python client command for connecting:

from kafka import KafkaProducer
producer = KafkaProducer(bootstrap_servers='example.com:9095', security_protocol="SASL_PLAINTEXT", sasl_mechanism='PLAIN', sasl_plain_username='username', sasl_plain_password='password')

GitHub Desktop & GPG issues “gpg failed to sign the data”

I had some issues while trying to get GPG signing working while using GitHub Desktop. While their docs say the application doesn’t support GPG, a bunch of users seemed to have it working.

I ran into a few errors and got it working correctly. A “gpg failed to sign the data” error is what took a while to find a fix for.

Assuming you followed all the instructions in GitHub’s docs, also make sure your global git settings are pointing to the gpg command and signing is set to true:

user.signingkey=EEDDA4EE375C6D12
gpg.program=/usr/local/bin/gpg
commit.gpgsign=true

And what ultimately fixed my issue was disabling GPG terminal output via:

echo "no-tty" >> ~/.gnupg/gpg.conf

Firefox & python Selenium: Stopping auto update on browser test runs

Found a minor annoyance when running headless selenium browser tests on Ubuntu server 16. For some reason automated tests would start failing when opening Firefox. Apparently the configuration i’m running allows for Firefox to run auto update when opened.

To stop Firefox auto updates during your python Selenium test run, load a custom profile:

from selenium import webdriver

profile = webdriver.FirefoxProfile()
profile.set_preference('app.update.auto', False)
profile.set_preference('app.update.enabled', False)

driver = webdriver.Firefox(profile)

If this doesn’t seem to do the trick, verify that apt unattended-upgrades are not causing this behavior. In one case, I saw that the update was happening in the /var/log/unattended-upgrades/unattended-upgrades-dpkg.log log file.

I disabled auto updates via apt globally with the command:

dpkg-reconfigure -plow unattended-upgrades