Как создать матрицу с количеством вхождений слова в предложение?

Question

Kapasim988 @Kapasim988

Python

Как создать матрицу с количеством вхождений слова в предложение?

Всем привет, в общем я создал словарь, в котором ключом является цифра от 0 до 253, а его значением уникальное слово, которых 254, а всего слов в тексте 468. Как создать матрицу из словаря, чтобы ее элемент с индексом(i, j) был равен количеству вхождений j - го слова в i - ое предложение?

Вот задание, если что:
Составьте список всех слов, встречающихся в предложениях. Сопоставьте каждому слову индекс от нуля до (d - 1), где d — число различных слов в предложениях. Для этого удобно воспользоваться структурой dict.
Создайте матрицу размера n * d, где n — число предложений. Заполните ее: элемент с индексом (i, j) в этой матрице должен быть равен количеству вхождений j-го слова в i-е предложение. У вас должна получиться матрица размера 22 * 254.

Вопрос задан более трёх лет назад
1203 просмотра

Комментировать

Подписаться 1 Простой Комментировать

Помогут разобраться в теме Все курсы

Нетология

Python-разработчик: расширенный курс + нейросети

12 месяцев

Далее
Академия Эдюсон

Python-разработчик + ИИ

9 месяцев

Далее
ProductStar × РБК

Профессия: Python-разработчик + ИИ

8 месяцев

Далее

Решения вопроса 1

7 комментариев

Kapasim988 @Kapasim988 Автор вопроса

Спасибо, в общем я создал список списков такого вида [['in', 'comparison', 'to', 'dogs', 'cats', 'have', 'not', 'undergone', 'major', 'changes', 'during', 'the', 'domestication', 'process'], ['as', 'cat', 'simply', 'catenates', 'streams', 'of', 'bytes'] и список из уникальных слов ['in', 'comparison', 'to', 'dogs', 'cats', 'have', 'not', 'undergone', 'major', 'changes', 'during', 'the', 'domestication', 'process', 'as', 'cat', 'simply', 'catenates', 'streams', 'of', 'bytes'], я написал эту строку, которую вы сказали [[sentence.count(word) for word in unique_words] for sentence in sentences], но у меня вывелись все нули, что делать?

Написано более трёх лет назад

o5a @o5a

Kapasim988,

sentences = [['in', 'comparison', 'to', 'dogs', 'cats', 'have', 'not', 'undergone', 'major', 'changes', 'during', 'the', 'domestication', 'process'], ['as', 'cat', 'simply', 'catenates', 'streams', 'of', 'bytes']]
unique_words = ['in', 'comparison', 'to', 'dogs', 'cats', 'have', 'not', 'undergone', 'major', 'changes', 'during', 'the', 'domestication', 'process', 'as', 'cat', 'simply', 'catenates', 'streams', 'of', 'bytes']
result = [[sentence.count(word) for word in unique_words] for sentence in sentences]
print(result)
# [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]]

Как видно результат как раз показывает на каждой позиции кол-во употреблений слова по каждому предложению.

Написано более трёх лет назад

Kapasim988 @Kapasim988 Автор вопроса

o5a, спасибо большое, не знаю почему в 1 раз такое не сработало, 1 в 1 было, ты мне очень помог

Написано более трёх лет назад
Kapasim988 @Kapasim988 Автор вопроса

o5a, привет, можешь пожалуйста ответить еще на 1 вопрос? Вот мой список списков. Также у меня есть список уникальных слов

output = [['in', 'comparison', 'to', 'dogs', 'cats', 'have', 'not', 'undergone', 'major', 'changes', 'during', 'the', 'domestication', 'process'], ['as', 'cat', 'simply', 'catenates', 'streams', 'of', 'bytes', 'it', 'can', 'be', 'also', 'used', 'to', 'concatenate', 'binary', 'files', 'where', 'it', 'will', 'just', 'concatenate', 'sequence', 'of', 'bytes'], ['a', 'common', 'interactive', 'use', 'of', 'cat', 'for', 'a', 'single', 'file', 'is', 'to', 'output', 'the', 'content', 'of', 'a', 'file', 'to', 'standard', 'output'], ['cats', 'can', 'hear', 'sounds', 'too', 'faint', 'or', 'too', 'high', 'in', 'frequency', 'for', 'human', 'ears', 'such', 'as', 'those', 'made', 'by', 'mice', 'and', 'other', 'small', 'animals'], ['in', 'one', 'people', 'deliberately', 'tamed', 'cats', 'in', 'a', 'process', 'of', 'artificial', 'selection', 'as', 'they', 'were', 'useful', 'predators', 'of', 'vermin'], ['the', 'domesticated', 'cat', 'and', 'its', 'closest', 'wild', 'ancestor', 'are', 'both', 'diploid', 'organisms', 'that', 'possess', 'chromosomes', 'and', 'roughly', 'genes'], ['domestic', 'cats', 'are', 'similar', 'in', 'size', 'to', 'the', 'other', 'members', 'of', 'the', 'genus', 'felis', 'typically', 'weighing', 'between', 'and', 'kg', 'and', 'lb'], ['however', 'if', 'the', 'output', 'is', 'piped', 'or', 'redirected', 'cat', 'is', 'unnecessary'], ['cat', 'with', 'one', 'named', 'file', 'is', 'safer', 'where', 'human', 'error', 'is', 'a', 'concern', 'one', 'wrong', 'use', 'of', 'the', 'default', 'redirection', 'symbol', 'instead', 'of', 'often', 'adjacent', 'on', 'keyboards', 'may', 'permanently', 'delete', 'the', 'file', 'you', 'were', 'just', 'needing', 'to', 'read'], ['in', 'terms', 'of', 'legibility', 'a', 'sequence', 'of', 'commands', 'starting', 'with', 'cat', 'and', 'connected', 'by', 'pipes', 'has', 'a', 'clear', 'left', 'to', 'right', 'flow', 'of', 'information'], ['cat', 'command', 'is', 'one', 'of', 'the', 'basic', 'commands', 'that', 'you', 'learned', 'when', 'you', 'started', 'in', 'the', 'unix', 'linux', 'world'], ['using', 'cat', 'command', 'the', 'lines', 'received', 'from', 'stdin', 'can', 'be', 'redirected', 'to', 'a', 'new', 'file', 'using', 'redirection', 'symbols'], ['when', 'you', 'type', 'simply', 'cat', 'command', 'without', 'any', 'arguments', 'it', 'just', 'receives', 'the', 'stdin', 'content', 'and', 'displays', 'it', 'in', 'the', 'stdout'], ['leopard', 'was', 'released', 'on', 'october', 'as', 'the', 'successor', 'of', 'tiger', 'version', 'and', 'is', 'available', 'in', 'two', 'editions'], ['according', 'to', 'apple', 'leopard', 'contains', 'over', 'changes', 'and', 'enhancements', 'over', 'its', 'predecessor', 'mac', 'os', 'x', 'tiger'], ['as', 'of', 'mid', 'some', 'apple', 'computers', 'have', 'firmware', 'factory', 'installed', 'which', 'will', 'no', 'longer', 'allow', 'installation', 'of', 'mac', 'os', 'x', 'leopard'], ['since', 'apple', 'moved', 'to', 'using', 'intel', 'processors', 'in', 'their', 'computers', 'the', 'osx', 'community', 'has', 'developed', 'and', 'now', 'also', 'allows', 'mac', 'os', 'x', 'tiger', 'and', 'later', 'releases', 'to', 'be', 'installed', 'on', 'non', 'apple', 'x', 'based', 'computers'], ['os', 'x', 'mountain', 'lion', 'was', 'released', 'on', 'july', 'for', 'purchase', 'and', 'download', 'through', 'apple', 's', 'mac', 'app', 'store', 'as', 'part', 'of', 'a', 'switch', 'to', 'releasing', 'os', 'x', 'versions', 'online', 'and', 'every', 'year'], ['apple', 'has', 'released', 'a', 'small', 'patch', 'for', 'the', 'three', 'most', 'recent', 'versions', 'of', 'safari', 'running', 'on', 'os', 'x', 'yosemite', 'mavericks', 'and', 'mountain', 'lion'], ['the', 'mountain', 'lion', 'release', 'marks', 'the', 'second', 'time', 'apple', 'has', 'offered', 'an', 'incremental', 'upgrade', 'rather', 'than', 'releasing', 'a', 'new', 'cat', 'entirely'], ['mac', 'os', 'x', 'mountain', 'lion', 'installs', 'in', 'place', 'so', 'you', 'won', 't', 'need', 'to', 'create', 'a', 'separate', 'disk', 'or', 'run', 'the', 'installation', 'off', 'an', 'external', 'drive'], ['the', 'fifth', 'major', 'update', 'to', 'mac', 'os', 'x', 'leopard', 'contains', 'such', 'a', 'mountain', 'of', 'features', 'more', 'than', 'by', 'apple', 's', 'count']]

new_list = ['in', 'comparison', 'to', 'dogs', 'cats', 'have', 'not', 'undergone', 'major', 'changes', 'during', 'the', 'domestication', 'process', 'as', 'cat', 'simply', 'catenates', 'streams', 'of', 'bytes', 'it', 'can', 'be', 'also', 'used', 'concatenate', 'binary', 'files', 'where', 'will', 'just', 'sequence', 'a', 'common', 'interactive', 'use', 'for', 'single', 'file', 'is', 'output', 'content', 'standard', 'hear', 'sounds', 'too', 'faint', 'or', 'high', 'frequency', 'human', 'ears', 'such', 'those', 'made', 'by', 'mice', 'and', 'other', 'small', 'animals', 'one', 'people', 'deliberately', 'tamed', 'artificial', 'selection', 'they', 'were', 'useful', 'predators', 'vermin', 'domesticated', 'its', 'closest', 'wild', 'ancestor', 'are', 'both', 'diploid', 'organisms', 'that', 'possess', 'chromosomes', 'roughly', 'genes', 'domestic', 'similar', 'size', 'members', 'genus', 'felis', 'typically', 'weighing', 'between', 'kg', 'lb', 'however', 'if', 'piped', 'redirected', 'unnecessary', 'with', 'named', 'safer', 'error', 'concern', 'wrong', 'default', 'redirection', 'symbol', 'instead', 'often', 'adjacent', 'on', 'keyboards', 'may', 'permanently', 'delete', 'you', 'needing', 'read', 'terms', 'legibility', 'commands', 'starting', 'connected', 'pipes', 'has', 'clear', 'left', 'right', 'flow', 'information', 'command', 'basic', 'learned', 'when', 'started', 'unix', 'linux', 'world', 'using', 'lines', 'received', 'from', 'stdin', 'new', 'symbols', 'type', 'without', 'any', 'arguments', 'receives', 'displays', 'stdout', 'leopard', 'was', 'released', 'october', 'successor', 'tiger', 'version', 'available', 'two', 'editions', 'according', 'apple', 'contains', 'over', 'enhancements', 'predecessor', 'mac', 'os', 'x', 'mid', 'some', 'computers', 'firmware', 'factory', 'installed', 'which', 'no', 'longer', 'allow', 'installation', 'since', 'moved', 'intel', 'processors', 'their', 'osx', 'community', 'developed', 'now', 'allows', 'later', 'releases', 'non', 'based', 'mountain', 'lion', 'july', 'purchase', 'download', 'through', 's', 'app', 'store', 'part', 'switch', 'releasing', 'versions', 'online', 'every', 'year', 'patch', 'three', 'most', 'recent', 'safari', 'running', 'yosemite', 'mavericks', 'release', 'marks', 'second', 'time', 'offered', 'an', 'incremental', 'upgrade', 'rather', 'than', 'entirely', 'installs', 'place', 'so', 'won', 't', 'need', 'create', 'separate', 'disk', 'run', 'off', 'external', 'drive', 'fifth', 'update', 'features', 'more', 'count']

Написано более трёх лет назад
Kapasim988 @Kapasim988 Автор вопроса

o5a я применяю такую команду result = [[sentence.count(word) for word in new_list] for sentence in output]

в результате мне выдает список, в котором чисел больше, чем слов. В чем ошибка?

Написано более трёх лет назад
Kapasim988 @Kapasim988 Автор вопроса

Kapasim988, у меня количество элементов в result = 5588, то есть выдает 22 предложения в каждом из которых 254 числа. Почему так получилось?

Написано более трёх лет назад
o5a @o5a

Kapasim988, и что не так? Ведь именно это и требовалось в исходном задании. Цитата прямо из него

У вас должна получиться матрица размера 22 * 254.

Предложений 22, уникальных слов (new_list) 254.

Написано более трёх лет назад

Пригласить эксперта

Ваш ответ на вопрос

Войдите, чтобы написать ответ

Похожие вопросы

Python

+1 ещё

Средний
Почему не работает пример quickstart из документации GLiNKER?
- 1 подписчик
- 15 часов назад
- 43 просмотра
1

ответ
Python

Средний
Как правильно определять изменяющиеся типы полей при наследовании классов в python?
- 1 подписчик
- 17 июл.
- 80 просмотров
1

ответ
Python

+2 ещё

Простой
Можно ли полностью отказаться от vkhost в пользу VK ID для серверного приложения?
- 3 подписчика
- 14 июл.
- 269 просмотров
0

ответов
Python

+1 ещё

Простой
Почему разрывается подключение к бд на сервере?
- 1 подписчик
- 18 июн.
- 233 просмотра
1

ответ
Python

+2 ещё

Средний
Как новичку найти первые заказы на парсинг данных (Python)?
- 1 подписчик
- 17 июн.
- 641 просмотр
2

ответа
Python

+1 ещё

Сложный
Как на Python реализовать алгоритм, чтобы персонаж шёл по определенному маршруту в Genshin Impact?
- 3 подписчика
- 15 июн.
- 548 просмотров
2

ответа
Python

+1 ещё

Средний
Может кто помочь исправить код LTSM нейросети?
- 1 подписчик
- 12 июн.
- 306 просмотров
2

ответа
Python

+1 ещё

Средний
Telethon отказывается соединятся с серверами Telegram, как это обойти?
- 1 подписчик
- 10 июн.
- 560 просмотров
1

ответ
Python

+1 ещё

Простой
Почему копируется атрибут при создании нового экземпляра?
- 1 подписчик
- 08 июн.
- 226 просмотров
2

ответа
Python

+2 ещё

Простой
Как правильно настроить статические и медиафайлы на хостинге?
- 1 подписчик
- 04 июн.
- 140 просмотров
1

ответ
Показать ещё Загружается…

Answer 1 · 2020-11-11 19:07:22

Разбить весь текст на предложения, а внутри на слова. Затем 2 цикла один по предложениям в этом вложенном словаре, другой - по списку уникальных слов. И для каждого уникального слова считать кол-во его вхождений в текущее предложение (через count).
Через list_comprehension будет выглядеть так

[[sentence.count(word) for word in unique_words] for sentence in sentences]

где unique_words - это собственно список уникальных слов
sentences - вложенный список слов по каждому предложению.

Разбить на предложения и слова наверное проще регулярными выражениями

# по предложениям
re.split(r'[\.!\?]', text)
# по словам
re.split(r'[\n \.,!\?]', text)

Удачи.

Как создать матрицу с количеством вхождений слова в предложение?

Войдите, чтобы написать ответ

Минуточку внимания

Войдите на сайт